message-passing parallel programs: Topics by WorldWideScience.org

Sample records for message-passing parallel programs

Protocol-Based Verification of Message-Passing Parallel Programs

DEFF Research Database (Denmark)

López-Acosta, Hugo-Andrés; Eduardo R. B. Marques, Eduardo R. B.; Martins, Francisco

2015-01-01

We present ParTypes, a type-based methodology for the verification of Message Passing Interface (MPI) programs written in the C programming language. The aim is to statically verify programs against protocol specifications, enforcing properties such as fidelity and absence of deadlocks. We develo...
Message passing with parallel queue traversal

Science.gov (United States)

Underwood, Keith D [Albuquerque, NM; Brightwell, Ronald B [Albuquerque, NM; Hemmert, K Scott [Albuquerque, NM

2012-05-01

In message passing implementations, associative matching structures are used to permit list entries to be searched in parallel fashion, thereby avoiding the delay of linear list traversal. List management capabilities are provided to support list entry turnover semantics and priority ordering semantics.
Stampi: a message passing library for distributed parallel computing. User's guide

International Nuclear Information System (INIS)

Imamura, Toshiyuki; Koide, Hiroshi; Takemiya, Hiroshi

1998-11-01

A new message passing library, Stampi, has been developed to realize a computation with different kind of parallel computers arbitrarily and making MPI (Message Passing Interface) as an unique interface for communication. Stampi is based on MPI2 specification. It realizes dynamic process creation to different machines and communication between spawned one within the scope of MPI semantics. Vender implemented MPI as a closed system in one parallel machine and did not support both functions; process creation and communication to external machines. Stampi supports both functions and enables us distributed parallel computing. Currently Stampi has been implemented on COMPACS (COMplex PArallel Computer System) introduced in CCSE, five parallel computers and one graphic workstation, and any communication on them can be processed on. (author)
Future-based Static Analysis of Message Passing Programs

Directory of Open Access Journals (Sweden)

Wytse Oortwijn

2016-06-01

Full Text Available Message passing is widely used in industry to develop programs consisting of several distributed communicating components. Developing functionally correct message passing software is very challenging due to the concurrent nature of message exchanges. Nonetheless, many safety-critical applications rely on the message passing paradigm, including air traffic control systems and emergency services, which makes proving their correctness crucial. We focus on the modular verification of MPI programs by statically verifying concrete Java code. We use separation logic to reason about local correctness and define abstractions of the communication protocol in the process algebra used by mCRL2. We call these abstractions futures as they predict how components will interact during program execution. We establish a provable link between futures and program code and analyse the abstract futures via model checking to prove global correctness. Finally, we verify a leader election protocol to demonstrate our approach.
Parallelization of a hydrological model using the message passing interface

Science.gov (United States)

Wu, Yiping; Li, Tiejian; Sun, Liqun; Chen, Ji

2013-01-01

With the increasing knowledge about the natural processes, hydrological models such as the Soil and Water Assessment Tool (SWAT) are becoming larger and more complex with increasing computation time. Additionally, other procedures such as model calibration, which may require thousands of model iterations, can increase running time and thus further reduce rapid modeling and analysis. Using the widely-applied SWAT as an example, this study demonstrates how to parallelize a serial hydrological model in a Windows® environment using a parallel programing technology—Message Passing Interface (MPI). With a case study, we derived the optimal values for the two parameters (the number of processes and the corresponding percentage of work to be distributed to the master process) of the parallel SWAT (P-SWAT) on an ordinary personal computer and a work station. Our study indicates that model execution time can be reduced by 42%–70% (or a speedup of 1.74–3.36) using multiple processes (two to five) with a proper task-distribution scheme (between the master and slave processes). Although the computation time cost becomes lower with an increasing number of processes (from two to five), this enhancement becomes less due to the accompanied increase in demand for message passing procedures between the master and all slave processes. Our case study demonstrates that the P-SWAT with a five-process run may reach the maximum speedup, and the performance can be quite stable (fairly independent of a project size). Overall, the P-SWAT can help reduce the computation time substantially for an individual model run, manual and automatic calibration procedures, and optimization of best management practices. In particular, the parallelization method we used and the scheme for deriving the optimal parameters in this study can be valuable and easily applied to other hydrological or environmental models.
Stampi: a message passing library for distributed parallel computing. User's guide, second edition

International Nuclear Information System (INIS)

Imamura, Toshiyuki; Koide, Hiroshi; Takemiya, Hiroshi

2000-02-01

A new message passing library, Stampi, has been developed to realize a computation with different kind of parallel computers arbitrarily and making MPI (Message Passing Interface) as an unique interface for communication. Stampi is based on the MPI2 specification, and it realizes dynamic process creation to different machines and communication between spawned one within the scope of MPI semantics. Main features of Stampi are summarized as follows: (i) an automatic switch function between external- and internal communications, (ii) a message routing/relaying with a routing module, (iii) a dynamic process creation, (iv) a support of two types of connection, Master/Slave and Client/Server, (v) a support of a communication with Java applets. Indeed vendors implemented MPI libraries as a closed system in one parallel machine or their systems, and did not support both functions; process creation and communication to external machines. Stampi supports both functions and enables us distributed parallel computing. Currently Stampi has been implemented on COMPACS (COMplex PArallel Computer System) introduced in CCSE, five parallel computers and one graphic workstation, moreover on eight kinds of parallel machines, totally fourteen systems. Stampi provides us MPI communication functionality on them. This report describes mainly the usage of Stampi. (author)
Message Passing Framework for Globally Interconnected Clusters

International Nuclear Information System (INIS)

Hafeez, M; Riaz, N; Asghar, S; Malik, U A; Rehman, A

2011-01-01

In prevailing technology trends it is apparent that the network requirements and technologies will advance in future. Therefore the need of High Performance Computing (HPC) based implementation for interconnecting clusters is comprehensible for scalability of clusters. Grid computing provides global infrastructure of interconnecting clusters consisting of dispersed computing resources over Internet. On the other hand the leading model for HPC programming is Message Passing Interface (MPI). As compared to Grid computing, MPI is better suited for solving most of the complex computational problems. MPI itself is restricted to a single cluster. It does not support message passing over the internet to use the computing resources of different clusters in an optimal way. We propose a model that provides message passing capabilities between parallel applications over the internet. The proposed model is based on Architecture for Java Universal Message Passing (A-JUMP) framework and Enterprise Service Bus (ESB) named as High Performance Computing Bus. The HPC Bus is built using ActiveMQ. HPC Bus is responsible for communication and message passing in an asynchronous manner. Asynchronous mode of communication offers an assurance for message delivery as well as a fault tolerance mechanism for message passing. The idea presented in this paper effectively utilizes wide-area intercluster networks. It also provides scheduling, dynamic resource discovery and allocation, and sub-clustering of resources for different jobs. Performance analysis and comparison study of the proposed framework with P2P-MPI are also presented in this paper.
Parallelization of MCNP Monte Carlo neutron and photon transport code in parallel virtual machine and message passing interface

International Nuclear Information System (INIS)

Deng Li; Xie Zhongsheng

1999-01-01

The coupled neutron and photon transport Monte Carlo code MCNP (version 3B) has been parallelized in parallel virtual machine (PVM) and message passing interface (MPI) by modifying a previous serial code. The new code has been verified by solving sample problems. The speedup increases linearly with the number of processors and the average efficiency is up to 99% for 12-processor. (author)
The specification of Stampi, a message passing library for distributed parallel computing

International Nuclear Information System (INIS)

Imamura, Toshiyuki; Takemiya, Hiroshi; Koide, Hiroshi

2000-03-01

At CCSE, Center for Promotion of Computational Science and Engineering, a new message passing library for heterogeneous and distributed parallel computing has been developed, and it is called as Stampi. Stampi enables us to communicate between any combination of parallel computers as well as workstations. Currently, a Stampi system is constructed from Stampi library and Stampi/Java. It provides functions to connect a Stampi application with not only those on COMPACS, COMplex Parallel Computer System, but also applets which work on WWW browsers. This report summarizes the specifications of Stampi and details the development of its system. (author)
A real-time MPEG software decoder using a portable message-passing library

Energy Technology Data Exchange (ETDEWEB)

Kwong, Man Kam; Tang, P.T. Peter; Lin, Biquan

1995-12-31

We present a real-time MPEG software decoder that uses message-passing libraries such as MPL, p4 and MPI. The parallel MPEG decoder currently runs on the IBM SP system but can be easil ported to other parallel machines. This paper discusses our parallel MPEG decoding algorithm as well as the parallel programming environment under which it uses. Several technical issues are discussed, including balancing of decoding speed, memory limitation, 1/0 capacities, and optimization of MPEG decoding components. This project shows that a real-time portable software MPEG decoder is feasible in a general-purpose parallel machine.
Message-Passing Receivers for Single Carrier Systems with Frequency-Domain Equalization

DEFF Research Database (Denmark)

Zhang, Chuanzong; Manchón, Carles Navarro; Wang, Zhongyong

2015-01-01

In this letter, we design iterative receiver algorithms for joint frequency-domain equalization and decoding in a single carrier system assuming perfect channel state information. Based on an approximate inference framework that combines belief propagation (BP) and the mean field (MF) approximation......, we propose two receiver algorithms with, respectively, parallel and sequential message-passing schedules in the MF part. A recently proposed receiver based on generalized approximate message passing (GAMP) is used as a benchmarking reference. The simulation results show that the BP-MF receiver...
The design of multi-core DSP parallel model based on message passing and multi-level pipeline

Science.gov (United States)

Niu, Jingyu; Hu, Jian; He, Wenjing; Meng, Fanrong; Li, Chuanrong

2017-10-01

Currently, the design of embedded signal processing system is often based on a specific application, but this idea is not conducive to the rapid development of signal processing technology. In this paper, a parallel processing model architecture based on multi-core DSP platform is designed, and it is mainly suitable for the complex algorithms which are composed of different modules. This model combines the ideas of multi-level pipeline parallelism and message passing, and summarizes the advantages of the mainstream model of multi-core DSP (the Master-Slave model and the Data Flow model), so that it has better performance. This paper uses three-dimensional image generation algorithm to validate the efficiency of the proposed model by comparing with the effectiveness of the Master-Slave and the Data Flow model.
Message passing vs. shared address space on a cluster of SMPs

International Nuclear Information System (INIS)

Shan, Hongzhang; Singh, Jaswinder Pal; Oliker, Leonid; Biswas, Rupak

2001-01-01

The emergence of scalable computer architectures using clusters of PCs or PC-SMPs with commodity networking has made them attractive platforms for high-end scientific computing. Currently, message passing (MP) and shared address space (SAS) are the two leading programming paradigms for these systems. MP has been standardized with MPI, and is the most common and mature parallel programming approach. However, MP code development can be extremely difficult, especially for irregularly structured computations. SAS offers substantial ease of programming, but may suffer from performance limitations due to poor spatial locality and high protocol overhead. In this paper, they compare the performance of and programming effort required for six applications under both programming models on a 32-CPU PC-SMP cluster. Our application suite consists of codes that typically do not exhibit scalable performance under shared-memory programming due to their high communication-to-computation ratios and complex communication patterns. Results indicate that SAS can achieve about half the parallel efficiency of MPI for most of the applications; however, on certain classes of problems, SAS performance is competitive with MPI
Message passing for quantified Boolean formulas

International Nuclear Information System (INIS)

Zhang, Pan; Ramezanpour, Abolfazl; Zecchina, Riccardo; Zdeborová, Lenka

2012-01-01

We introduce two types of message passing algorithms for quantified Boolean formulas (QBF). The first type is a message passing based heuristics that can prove unsatisfiability of the QBF by assigning the universal variables in such a way that the remaining formula is unsatisfiable. In the second type, we use message passing to guide branching heuristics of a Davis–Putnam–Logemann–Loveland (DPLL) complete solver. Numerical experiments show that on random QBFs our branching heuristics give robust exponential efficiency gain with respect to state-of-the-art solvers. We also manage to solve some previously unsolved benchmarks from the QBFLIB library. Apart from this, our study sheds light on using message passing in small systems and as subroutines in complete solvers
Testing New Programming Paradigms with NAS Parallel Benchmarks

Science.gov (United States)

Jin, H.; Frumkin, M.; Schultz, M.; Yan, J.

2000-01-01

Over the past decade, high performance computing has evolved rapidly, not only in hardware architectures but also with increasing complexity of real applications. Technologies have been developing to aim at scaling up to thousands of processors on both distributed and shared memory systems. Development of parallel programs on these computers is always a challenging task. Today, writing parallel programs with message passing (e.g. MPI) is the most popular way of achieving scalability and high performance. However, writing message passing programs is difficult and error prone. Recent years new effort has been made in defining new parallel programming paradigms. The best examples are: HPF (based on data parallelism) and OpenMP (based on shared memory parallelism). Both provide simple and clear extensions to sequential programs, thus greatly simplify the tedious tasks encountered in writing message passing programs. HPF is independent of memory hierarchy, however, due to the immaturity of compiler technology its performance is still questionable. Although use of parallel compiler directives is not new, OpenMP offers a portable solution in the shared-memory domain. Another important development involves the tremendous progress in the internet and its associated technology. Although still in its infancy, Java promisses portability in a heterogeneous environment and offers possibility to "compile once and run anywhere." In light of testing these new technologies, we implemented new parallel versions of the NAS Parallel Benchmarks (NPBs) with HPF and OpenMP directives, and extended the work with Java and Java-threads. The purpose of this study is to examine the effectiveness of alternative programming paradigms. NPBs consist of five kernels and three simulated applications that mimic the computation and data movement of large scale computational fluid dynamics (CFD) applications. We started with the serial version included in NPB2.3. Optimization of memory and cache usage
High performance message passing for the ATLAS DAQ/EF-1 project

CERN Document Server

Mornacchi, Giuseppe

1999-01-01

Summary form only. A message passing library has been developed in the context of the ATLAS DAQ/EF-1 project. It is used for time critical applications within the front-end part of the DAQ system, mainly to exchange data control messages between I/O processors. Key objectives of the design were low message overheads, efficient use of the data transfer buses, provision of broadcast functionality and a hardware and operating system independent implementation of the application interface. The design and implementation of the message passing library are presented. As required by the project, the implementation is based on commercial components, namely VMEbus, PCI, the Lynx-OS real-time operating system and an additional inter- processor link, PVIC. The latter offers broadcast functionality identified as being important to the overall performance of the message passing. In addition, performance benchmarks for all implementing buses are presented for both simple test programs and the full DAQ applications. (0 refs)...
MPI_XSTAR: MPI-based Parallelization of the XSTAR Photoionization Program

Science.gov (United States)

Danehkar, Ashkbiz; Nowak, Michael A.; Lee, Julia C.; Smith, Randall K.

2018-02-01

We describe a program for the parallel implementation of multiple runs of XSTAR, a photoionization code that is used to predict the physical properties of an ionized gas from its emission and/or absorption lines. The parallelization program, called MPI_XSTAR, has been developed and implemented in the C++ language by using the Message Passing Interface (MPI) protocol, a conventional standard of parallel computing. We have benchmarked parallel multiprocessing executions of XSTAR, using MPI_XSTAR, against a serial execution of XSTAR, in terms of the parallelization speedup and the computing resource efficiency. Our experience indicates that the parallel execution runs significantly faster than the serial execution, however, the efficiency in terms of the computing resource usage decreases with increasing the number of processors used in the parallel computing.
Statistics of Epidemics in Networks by Passing Messages

Science.gov (United States)

Shrestha, Munik Kumar

Epidemic processes are common out-of-equilibrium phenomena of broad interdisciplinary interest. In this thesis, we show how message-passing approach can be a helpful tool for simulating epidemic models in disordered medium like networks, and in particular for estimating the probability that a given node will become infectious at a particular time. The sort of dynamics we consider are stochastic, where randomness can arise from the stochastic events or from the randomness of network structures. As in belief propagation, variables or messages in message-passing approach are defined on the directed edges of a network. However, unlike belief propagation, where the posterior distributions are updated according to Bayes' rule, in message-passing approach we write differential equations for the messages over time. It takes correlations between neighboring nodes into account while preventing causal signals from backtracking to their immediate source, and thus avoids "echo chamber effects" where a pair of adjacent nodes each amplify the probability that the other is infectious. In our first results, we develop a message-passing approach to threshold models of behavior popular in sociology. These are models, first proposed by Granovetter, where individuals have to hear about a trend or behavior from some number of neighbors before adopting it themselves. In thermodynamic limit of large random networks, we provide an exact analytic scheme while calculating the time dependence of the probabilities and thus learning about the whole dynamics of bootstrap percolation, which is a simple model known in statistical physics for exhibiting discontinuous phase transition. As an application, we apply a similar model to financial networks, studying when bankruptcies spread due to the sudden devaluation of shared assets in overlapping portfolios. We predict that although diversification may be good for individual institutions, it can create dangerous systemic effects, and as a result
Distributed parallel messaging for multiprocessor systems

Science.gov (United States)

Chen, Dong; Heidelberger, Philip; Salapura, Valentina; Senger, Robert M; Steinmacher-Burrow, Burhard; Sugawara, Yutaka

2013-06-04

A method and apparatus for distributed parallel messaging in a parallel computing system. The apparatus includes, at each node of a multiprocessor network, multiple injection messaging engine units and reception messaging engine units, each implementing a DMA engine and each supporting both multiple packet injection into and multiple reception from a network, in parallel. The reception side of the messaging unit (MU) includes a switch interface enabling writing of data of a packet received from the network to the memory system. The transmission side of the messaging unit, includes switch interface for reading from the memory system when injecting packets into the network.
Portable parallel programming in a Fortran environment

International Nuclear Information System (INIS)

May, E.N.

1989-01-01

Experience using the Argonne-developed PARMACs macro package to implement a portable parallel programming environment is described. Fortran programs with intrinsic parallelism of coarse and medium granularity are easily converted to parallel programs which are portable among a number of commercially available parallel processors in the class of shared-memory bus-based and local-memory network based MIMD processors. The parallelism is implemented using standard UNIX (tm) tools and a small number of easily understood synchronization concepts (monitors and message-passing techniques) to construct and coordinate multiple cooperating processes on one or many processors. Benchmark results are presented for parallel computers such as the Alliant FX/8, the Encore MultiMax, the Sequent Balance, the Intel iPSC/2 Hypercube and a network of Sun 3 workstations. These parallel machines are typical MIMD types with from 8 to 30 processors, each rated at from 1 to 10 MIPS processing power. The demonstration code used for this work is a Monte Carlo simulation of the response to photons of a ''nearly realistic'' lead, iron and plastic electromagnetic and hadronic calorimeter, using the EGS4 code system. 6 refs., 2 figs., 2 tabs

Message-passing-interface-based parallel FDTD investigation on the EM scattering from a 1-D rough sea surface using uniaxial perfectly matched layer absorbing boundary.

Science.gov (United States)

Li, J; Guo, L-X; Zeng, H; Han, X-B

2009-06-01

A message-passing-interface (MPI)-based parallel finite-difference time-domain (FDTD) algorithm for the electromagnetic scattering from a 1-D randomly rough sea surface is presented. The uniaxial perfectly matched layer (UPML) medium is adopted for truncation of FDTD lattices, in which the finite-difference equations can be used for the total computation domain by properly choosing the uniaxial parameters. This makes the parallel FDTD algorithm easier to implement. The parallel performance with different processors is illustrated for one sea surface realization, and the computation time of the parallel FDTD algorithm is dramatically reduced compared to a single-process implementation. Finally, some numerical results are shown, including the backscattering characteristics of sea surface for different polarization and the bistatic scattering from a sea surface with large incident angle and large wind speed.
Blind sensor calibration using approximate message passing

International Nuclear Information System (INIS)

Schülke, Christophe; Caltagirone, Francesco; Zdeborová, Lenka

2015-01-01

The ubiquity of approximately sparse data has led a variety of communities to take great interest in compressed sensing algorithms. Although these are very successful and well understood for linear measurements with additive noise, applying them to real data can be problematic if imperfect sensing devices introduce deviations from this ideal signal acquisition process, caused by sensor decalibration or failure. We propose a message passing algorithm called calibration approximate message passing (Cal-AMP) that can treat a variety of such sensor-induced imperfections. In addition to deriving the general form of the algorithm, we numerically investigate two particular settings. In the first, a fraction of the sensors is faulty, giving readings unrelated to the signal. In the second, sensors are decalibrated and each one introduces a different multiplicative gain to the measurements. Cal-AMP shares the scalability of approximate message passing, allowing us to treat large sized instances of these problems, and experimentally exhibits a phase transition between domains of success and failure. (paper)
Application Portable Parallel Library

Science.gov (United States)

Cole, Gary L.; Blech, Richard A.; Quealy, Angela; Townsend, Scott

1995-01-01

Application Portable Parallel Library (APPL) computer program is subroutine-based message-passing software library intended to provide consistent interface to variety of multiprocessor computers on market today. Minimizes effort needed to move application program from one computer to another. User develops application program once and then easily moves application program from parallel computer on which created to another parallel computer. ("Parallel computer" also include heterogeneous collection of networked computers). Written in C language with one FORTRAN 77 subroutine for UNIX-based computers and callable from application programs written in C language or FORTRAN 77.
The serial message-passing schedule for LDPC decoding algorithms

Science.gov (United States)

Liu, Mingshan; Liu, Shanshan; Zhou, Yuan; Jiang, Xue

2015-12-01

The conventional message-passing schedule for LDPC decoding algorithms is the so-called flooding schedule. It has the disadvantage that the updated messages cannot be used until next iteration, thus reducing the convergence speed . In this case, the Layered Decoding algorithm (LBP) based on serial message-passing schedule is proposed. In this paper the decoding principle of LBP algorithm is briefly introduced, and then proposed its two improved algorithms, the grouped serial decoding algorithm (Grouped LBP) and the semi-serial decoding algorithm .They can improve LBP algorithm's decoding speed while maintaining a good decoding performance.
Introduction to parallel programming

CERN Document Server

Brawer, Steven

1989-01-01

Introduction to Parallel Programming focuses on the techniques, processes, methodologies, and approaches involved in parallel programming. The book first offers information on Fortran, hardware and operating system models, and processes, shared memory, and simple parallel programs. Discussions focus on processes and processors, joining processes, shared memory, time-sharing with multiple processors, hardware, loops, passing arguments in function/subroutine calls, program structure, and arithmetic expressions. The text then elaborates on basic parallel programming techniques, barriers and race
Pthreads vs MPI Parallel Performance of Angular-Domain Decomposed S

International Nuclear Information System (INIS)

Azmy, Y.Y.; Barnett, D.A.

2000-01-01

Two programming models for parallelizing the Angular Domain Decomposition (ADD) of the discrete ordinates (S n ) approximation of the neutron transport equation are examined. These are the shared memory model based on the POSIX threads (Pthreads) standard, and the message passing model based on the Message Passing Interface (MPI) standard. These standard libraries are available on most multiprocessor platforms thus making the resulting parallel codes widely portable. The question is: on a fixed platform, and for a particular code solving a given test problem, which of the two programming models delivers better parallel performance? Such comparison is possible on Symmetric Multi-Processors (SMP) architectures in which several CPUs physically share a common memory, and in addition are capable of emulating message passing functionality. Implementation of the two-dimensional,(S n ), Arbitrarily High Order Transport (AHOT) code for solving neutron transport problems using these two parallelization models is described. Measured parallel performance of each model on the COMPAQ AlphaServer 8400 and the SGI Origin 2000 platforms is described, and comparison of the observed speedup for the two programming models is reported. For the case presented in this paper it appears that the MPI implementation scales better than the Pthreads implementation on both platforms
A Model for Speedup of Parallel Programs

Science.gov (United States)

1997-01-01

Sanjeev. K Setia . The interaction between mem- ory allocation and adaptive partitioning in message- passing multicomputers. In IPPS 󈨣 Workshop on Job...Scheduling Strategies for Parallel Processing, pages 89{99, 1995. [15] Sanjeev K. Setia and Satish K. Tripathi. A compar- ative analysis of static
Design strategies for irregularly adapting parallel applications

International Nuclear Information System (INIS)

Oliker, Leonid; Biswas, Rupak; Shan, Hongzhang; Sing, Jaswinder Pal

2000-01-01

Achieving scalable performance for dynamic irregular applications is eminently challenging. Traditional message-passing approaches have been making steady progress towards this goal; however, they suffer from complex implementation requirements. The use of a global address space greatly simplifies the programming task, but can degrade the performance of dynamically adapting computations. In this work, we examine two major classes of adaptive applications, under five competing programming methodologies and four leading parallel architectures. Results indicate that it is possible to achieve message-passing performance using shared-memory programming techniques by carefully following the same high level strategies. Adaptive applications have computational work loads and communication patterns which change unpredictably at runtime, requiring dynamic load balancing to achieve scalable performance on parallel machines. Efficient parallel implementations of such adaptive applications are therefore a challenging task. This work examines the implementation of two typical adaptive applications, Dynamic Remeshing and N-Body, across various programming paradigms and architectural platforms. We compare several critical factors of the parallel code development, including performance, programmability, scalability, algorithmic development, and portability
Algorithms for parallel flow solvers on message passing architectures

Science.gov (United States)

Vanderwijngaart, Rob F.

1995-01-01

The purpose of this project has been to identify and test suitable technologies for implementation of fluid flow solvers -- possibly coupled with structures and heat equation solvers -- on MIMD parallel computers. In the course of this investigation much attention has been paid to efficient domain decomposition strategies for ADI-type algorithms. Multi-partitioning derives its efficiency from the assignment of several blocks of grid points to each processor in the parallel computer. A coarse-grain parallelism is obtained, and a near-perfect load balance results. In uni-partitioning every processor receives responsibility for exactly one block of grid points instead of several. This necessitates fine-grain pipelined program execution in order to obtain a reasonable load balance. Although fine-grain parallelism is less desirable on many systems, especially high-latency networks of workstations, uni-partition methods are still in wide use in production codes for flow problems. Consequently, it remains important to achieve good efficiency with this technique that has essentially been superseded by multi-partitioning for parallel ADI-type algorithms. Another reason for the concentration on improving the performance of pipeline methods is their applicability in other types of flow solver kernels with stronger implied data dependence. Analytical expressions can be derived for the size of the dynamic load imbalance incurred in traditional pipelines. From these it can be determined what is the optimal first-processor retardation that leads to the shortest total completion time for the pipeline process. Theoretical predictions of pipeline performance with and without optimization match experimental observations on the iPSC/860 very well. Analysis of pipeline performance also highlights the effect of uncareful grid partitioning in flow solvers that employ pipeline algorithms. If grid blocks at boundaries are not at least as large in the wall-normal direction as those
Broadcasting a message in a parallel computer

Science.gov (United States)

Berg, Jeremy E [Rochester, MN; Faraj, Ahmad A [Rochester, MN

2011-08-02

Methods, systems, and products are disclosed for broadcasting a message in a parallel computer. The parallel computer includes a plurality of compute nodes connected together using a data communications network. The data communications network optimized for point to point data communications and is characterized by at least two dimensions. The compute nodes are organized into at least one operational group of compute nodes for collective parallel operations of the parallel computer. One compute node of the operational group assigned to be a logical root. Broadcasting a message in a parallel computer includes: establishing a Hamiltonian path along all of the compute nodes in at least one plane of the data communications network and in the operational group; and broadcasting, by the logical root to the remaining compute nodes, the logical root's message along the established Hamiltonian path.
MPI_XSTAR: MPI-based parallelization of XSTAR program

Science.gov (United States)

Danehkar, A.

2017-12-01

MPI_XSTAR parallelizes execution of multiple XSTAR runs using Message Passing Interface (MPI). XSTAR (ascl:9910.008), part of the HEASARC's HEAsoft (ascl:1408.004) package, calculates the physical conditions and emission spectra of ionized gases. MPI_XSTAR invokes XSTINITABLE from HEASoft to generate a job list of XSTAR commands for given physical parameters. The job list is used to make directories in ascending order, where each individual XSTAR is spawned on each processor and outputs are saved. HEASoft's XSTAR2TABLE program is invoked upon the contents of each directory in order to produce table model FITS files for spectroscopy analysis tools.
EXIT Chart Analysis of Binary Message-Passing Decoders

DEFF Research Database (Denmark)

Lechner, Gottfried; Pedersen, Troels; Kramer, Gerhard

2007-01-01

Binary message-passing decoders for LDPC codes are analyzed using EXIT charts. For the analysis, the variable node decoder performs all computations in the L-value domain. For the special case of a hard decision channel, this leads to the well know Gallager B algorithm, while the analysis can...... be extended to channels with larger output alphabets. By increasing the output alphabet from hard decisions to four symbols, a gain of more than 1.0 dB is achieved using optimized codes. For this code optimization, the mixing property of EXIT functions has to be modified to the case of binary message......-passing decoders....
Abstract Level Parallelization of Finite Difference Methods

Directory of Open Access Journals (Sweden)

Edwin Vollebregt

1997-01-01

Full Text Available A formalism is proposed for describing finite difference calculations in an abstract way. The formalism consists of index sets and stencils, for characterizing the structure of sets of data items and interactions between data items (“neighbouring relations”. The formalism provides a means for lifting programming to a more abstract level. This simplifies the tasks of performance analysis and verification of correctness, and opens the way for automaticcode generation. The notation is particularly useful in parallelization, for the systematic construction of parallel programs in a process/channel programming paradigm (e.g., message passing. This is important because message passing, unfortunately, still is the only approach that leads to acceptable performance for many more unstructured or irregular problems on parallel computers that have non-uniform memory access times. It will be shown that the use of index sets and stencils greatly simplifies the determination of which data must be exchanged between different computing processes.
Theoretic derivation of directed acyclic subgraph algorithm and comparisons with message passing algorithm

Science.gov (United States)

Ha, Jeongmok; Jeong, Hong

2016-07-01

This study investigates the directed acyclic subgraph (DAS) algorithm, which is used to solve discrete labeling problems much more rapidly than other Markov-random-field-based inference methods but at a competitive accuracy. However, the mechanism by which the DAS algorithm simultaneously achieves competitive accuracy and fast execution speed, has not been elucidated by a theoretical derivation. We analyze the DAS algorithm by comparing it with a message passing algorithm. Graphical models, inference methods, and energy-minimization frameworks are compared between DAS and message passing algorithms. Moreover, the performances of DAS and other message passing methods [sum-product belief propagation (BP), max-product BP, and tree-reweighted message passing] are experimentally compared.
CUBESIM, Hypercube and Denelcor Hep Parallel Computer Simulation

International Nuclear Information System (INIS)

Dunigan, T.H.

1988-01-01

1 - Description of program or function: CUBESIM is a set of subroutine libraries and programs for the simulation of message-passing parallel computers and shared-memory parallel computers. Subroutines are supplied to simulate the Intel hypercube and the Denelcor HEP parallel computers. The system permits a user to develop and test parallel programs written in C or FORTRAN on a single processor. The user may alter such hypercube parameters as message startup times, packet size, and the computation-to-communication ratio. The simulation generates a trace file that can be used for debugging, performance analysis, or graphical display. 2 - Method of solution: The CUBESIM simulator is linked with the user's parallel application routines to run as a single UNIX process. The simulator library provides a small operating system to perform process and message management. 3 - Restrictions on the complexity of the problem: Up to 128 processors can be simulated with a virtual memory limit of 6 million bytes. Up to 1000 processes can be simulated
Data communications in a parallel active messaging interface of a parallel computer

Science.gov (United States)

Davis, Kristan D; Faraj, Daniel A

2013-07-09

Algorithm selection for data communications in a parallel active messaging interface (`PAMI`) of a parallel computer, the PAMI composed of data communications endpoints, each endpoint including specifications of a client, a context, and a task, endpoints coupled for data communications through the PAMI, including associating in the PAMI data communications algorithms and ranges of message sizes so that each algorithm is associated with a separate range of message sizes; receiving in an origin endpoint of the PAMI a data communications instruction, the instruction specifying transmission of a data communications message from the origin endpoint to a target endpoint, the data communications message characterized by a message size; selecting, from among the associated algorithms and ranges, a data communications algorithm in dependence upon the message size; and transmitting, according to the selected data communications algorithm from the origin endpoint to the target endpoint, the data communications message.
Parallel community climate model: Description and user`s guide

Energy Technology Data Exchange (ETDEWEB)

Drake, J.B.; Flanery, R.E.; Semeraro, B.D.; Worley, P.H. [and others

1996-07-15

This report gives an overview of a parallel version of the NCAR Community Climate Model, CCM2, implemented for MIMD massively parallel computers using a message-passing programming paradigm. The parallel implementation was developed on an Intel iPSC/860 with 128 processors and on the Intel Delta with 512 processors, and the initial target platform for the production version of the code is the Intel Paragon with 2048 processors. Because the implementation uses a standard, portable message-passing libraries, the code has been easily ported to other multiprocessors supporting a message-passing programming paradigm. The parallelization strategy used is to decompose the problem domain into geographical patches and assign each processor the computation associated with a distinct subset of the patches. With this decomposition, the physics calculations involve only grid points and data local to a processor and are performed in parallel. Using parallel algorithms developed for the semi-Lagrangian transport, the fast Fourier transform and the Legendre transform, both physics and dynamics are computed in parallel with minimal data movement and modest change to the original CCM2 source code. Sequential or parallel history tapes are written and input files (in history tape format) are read sequentially by the parallel code to promote compatibility with production use of the model on other computer systems. A validation exercise has been performed with the parallel code and is detailed along with some performance numbers on the Intel Paragon and the IBM SP2. A discussion of reproducibility of results is included. A user`s guide for the PCCM2 version 2.1 on the various parallel machines completes the report. Procedures for compilation, setup and execution are given. A discussion of code internals is included for those who may wish to modify and use the program in their own research.
P3T+: A Performance Estimator for Distributed and Parallel Programs

Directory of Open Access Journals (Sweden)

T. Fahringer

2000-01-01

Full Text Available Developing distributed and parallel programs on today's multiprocessor architectures is still a challenging task. Particular distressing is the lack of effective performance tools that support the programmer in evaluating changes in code, problem and machine sizes, and target architectures. In this paper we introduce P3T+ which is a performance estimator for mostly regular HPF (High Performance Fortran programs but partially covers also message passing programs (MPI. P3T+ is unique by modeling programs, compiler code transformations, and parallel and distributed architectures. It computes at compile-time a variety of performance parameters including work distribution, number of transfers, amount of data transferred, transfer times, computation times, and number of cache misses. Several novel technologies are employed to compute these parameters: loop iteration spaces, array access patterns, and data distributions are modeled by employing highly effective symbolic analysis. Communication is estimated by simulating the behavior of a communication library used by the underlying compiler. Computation times are predicted through pre-measured kernels on every target architecture of interest. We carefully model most critical architecture specific factors such as cache lines sizes, number of cache lines available, startup times, message transfer time per byte, etc. P3T+ has been implemented and is closely integrated with the Vienna High Performance Compiler (VFC to support programmers develop parallel and distributed applications. Experimental results for realistic kernel codes taken from real-world applications are presented to demonstrate both accuracy and usefulness of P3T+.
An efficient communication scheme for solving Sn equations on message-passing multiprocessors

International Nuclear Information System (INIS)

Azmy, Y.Y.

1993-01-01

Early models of Intel's hypercube multiprocessors, e.g., the iPSC/1 and iPSC/2, were characterized by the high latency of message passing. This relatively weak dependence of the communication penalty on the size of messages, in contrast to its strong dependence on the number of messages, justified using the Fan-in Fan-out algorithm (which implements a minimum spanning tree path) to perform global operations, such as global sums, etc. Recent models of message-passing computers, such as the iPSC/860 and the Paragon, have been found to possess much smaller latency, thus forcing a reexamination of the issue of performance optimization with respect to communication schemes. Essentially, the Fan-in Fan-out scheme minimizes the number of nonsimultaneous messages sent but not the volume of data traffic across the network. Furthermore, if a global operation is performed in conjunction with the message passing, a large fraction of the attached nodes remains idle as the number of utilized processors is halved in each step of the process. On the other hand, the Recursive Halving scheme offers the smallest communication cost for global operations but has some drawbacks
Eighth SIAM conference on parallel processing for scientific computing: Final program and abstracts

Energy Technology Data Exchange (ETDEWEB)

NONE

1997-12-31

This SIAM conference is the premier forum for developments in parallel numerical algorithms, a field that has seen very lively and fruitful developments over the past decade, and whose health is still robust. Themes for this conference were: combinatorial optimization; data-parallel languages; large-scale parallel applications; message-passing; molecular modeling; parallel I/O; parallel libraries; parallel software tools; parallel compilers; particle simulations; problem-solving environments; and sparse matrix computations.

Implementing the PM Programming Language using MPI and OpenMP - a New Tool for Programming Geophysical Models on Parallel Systems

Science.gov (United States)

Bellerby, Tim

2015-04-01

PM (Parallel Models) is a new parallel programming language specifically designed for writing environmental and geophysical models. The language is intended to enable implementers to concentrate on the science behind the model rather than the details of running on parallel hardware. At the same time PM leaves the programmer in control - all parallelisation is explicit and the parallel structure of any given program may be deduced directly from the code. This paper describes a PM implementation based on the Message Passing Interface (MPI) and Open Multi-Processing (OpenMP) standards, looking at issues involved with translating the PM parallelisation model to MPI/OpenMP protocols and considering performance in terms of the competing factors of finer-grained parallelisation and increased communication overhead. In order to maximise portability, the implementation stays within the MPI 1.3 standard as much as possible, with MPI-2 MPI-IO file handling the only significant exception. Moreover, it does not assume a thread-safe implementation of MPI. PM adopts a two-tier abstract representation of parallel hardware. A PM processor is a conceptual unit capable of efficiently executing a set of language tasks, with a complete parallel system consisting of an abstract N-dimensional array of such processors. PM processors may map to single cores executing tasks using cooperative multi-tasking, to multiple cores or even to separate processing nodes, efficiently sharing tasks using algorithms such as work stealing. While tasks may move between hardware elements within a PM processor, they may not move between processors without specific programmer intervention. Tasks are assigned to processors using a nested parallelism approach, building on ideas from Reyes et al. (2009). The main program owns all available processors. When the program enters a parallel statement then either processors are divided out among the newly generated tasks (number of new tasks number of processors
Analysis and Design of Binary Message-Passing Decoders

DEFF Research Database (Denmark)

Lechner, Gottfried; Pedersen, Troels; Kramer, Gerhard

2012-01-01

Binary message-passing decoders for low-density parity-check (LDPC) codes are studied by using extrinsic information transfer (EXIT) charts. The channel delivers hard or soft decisions and the variable node decoder performs all computations in the L-value domain. A hard decision channel results...... message-passing decoders. Finally, it is shown that errors on cycles consisting only of degree two and three variable nodes cannot be corrected and a necessary and sufficient condition for the existence of a cycle-free subgraph is derived....... in the well-know Gallager B algorithm, and increasing the output alphabet from hard decisions to two bits yields a gain of more than 1.0 dB in the required signal to noise ratio when using optimized codes. The code optimization requires adapting the mixing property of EXIT functions to the case of binary...
Neighbourhood-consensus message passing and its potentials in image processing applications

Science.gov (United States)

Ružic, Tijana; Pižurica, Aleksandra; Philips, Wilfried

2011-03-01

In this paper, a novel algorithm for inference in Markov Random Fields (MRFs) is presented. Its goal is to find approximate maximum a posteriori estimates in a simple manner by combining neighbourhood influence of iterated conditional modes (ICM) and message passing of loopy belief propagation (LBP). We call the proposed method neighbourhood-consensus message passing because a single joint message is sent from the specified neighbourhood to the central node. The message, as a function of beliefs, represents the agreement of all nodes within the neighbourhood regarding the labels of the central node. This way we are able to overcome the disadvantages of reference algorithms, ICM and LBP. On one hand, more information is propagated in comparison with ICM, while on the other hand, the huge amount of pairwise interactions is avoided in comparison with LBP by working with neighbourhoods. The idea is related to the previously developed iterated conditional expectations algorithm. Here we revisit it and redefine it in a message passing framework in a more general form. The results on three different benchmarks demonstrate that the proposed technique can perform well both for binary and multi-label MRFs without any limitations on the model definition. Furthermore, it manifests improved performance over related techniques either in terms of quality and/or speed.
Track-stitching using graphical models and message passing

CSIR Research Space (South Africa)

Van der Merwe, LJ

2013-07-01

Full Text Available In order to stitch tracks together, two tasks are required, namely tracking and track stitching. In this study track stitching is performed using a graphical model and message passing (belief propagation) approach. Tracks are modelled as nodes in a...
McMPI – a managed-code message passing interface library for high performance communication in C#

OpenAIRE

Holmes, Daniel John

2012-01-01

This work endeavours to achieve technology transfer between established best-practice in academic high-performance computing and current techniques in commercial high-productivity computing. It shows that a credible high-performance message-passing communication library, with semantics and syntax following the Message-Passing Interface (MPI) Standard, can be built in pure C# (one of the .Net suite of computer languages). Message-passing has been the dominant paradigm in high-pe...
Parallel implementation of the PHOENIX generalized stellar atmosphere program. II. Wavelength parallelization

International Nuclear Information System (INIS)

Baron, E.; Hauschildt, Peter H.

1998-01-01

We describe an important addition to the parallel implementation of our generalized nonlocal thermodynamic equilibrium (NLTE) stellar atmosphere and radiative transfer computer program PHOENIX. In a previous paper in this series we described data and task parallel algorithms we have developed for radiative transfer, spectral line opacity, and NLTE opacity and rate calculations. These algorithms divided the work spatially or by spectral lines, that is, distributing the radial zones, individual spectral lines, or characteristic rays among different processors and employ, in addition, task parallelism for logically independent functions (such as atomic and molecular line opacities). For finite, monotonic velocity fields, the radiative transfer equation is an initial value problem in wavelength, and hence each wavelength point depends upon the previous one. However, for sophisticated NLTE models of both static and moving atmospheres needed to accurately describe, e.g., novae and supernovae, the number of wavelength points is very large (200,000 - 300,000) and hence parallelization over wavelength can lead both to considerable speedup in calculation time and the ability to make use of the aggregate memory available on massively parallel supercomputers. Here, we describe an implementation of a pipelined design for the wavelength parallelization of PHOENIX, where the necessary data from the processor working on a previous wavelength point is sent to the processor working on the succeeding wavelength point as soon as it is known. Our implementation uses a MIMD design based on a relatively small number of standard message passing interface (MPI) library calls and is fully portable between serial and parallel computers. copyright 1998 The American Astronomical Society
Comparing the OpenMP, MPI, and Hybrid Programming Paradigm on an SMP Cluster

Science.gov (United States)

Jost, Gabriele; Jin, Hao-Qiang; anMey, Dieter; Hatay, Ferhat F.

2003-01-01

Clusters of SMP (Symmetric Multi-Processors) nodes provide support for a wide range of parallel programming paradigms. The shared address space within each node is suitable for OpenMP parallelization. Message passing can be employed within and across the nodes of a cluster. Multiple levels of parallelism can be achieved by combining message passing and OpenMP parallelization. Which programming paradigm is the best will depend on the nature of the given problem, the hardware components of the cluster, the network, and the available software. In this study we compare the performance of different implementations of the same CFD benchmark application, using the same numerical algorithm but employing different programming paradigms.
Design of a Message Passing Model for Use in a Heterogeneous CPU-NFP Framework for Network Analytics

CSIR Research Space (South Africa)

Pennefather, S

2017-09-01

Full Text Available of applications written in the Go programming language to be executed on a Network Flow Processor (NFP) for enhanced performance. This paper explores the need and feasibility of implementing a message passing model for data transmission between the NFP and CPU...
Pattern-Driven Automatic Parallelization

Directory of Open Access Journals (Sweden)

Christoph W. Kessler

1996-01-01

Full Text Available This article describes a knowledge-based system for automatic parallelization of a wide class of sequential numerical codes operating on vectors and dense matrices, and for execution on distributed memory message-passing multiprocessors. Its main feature is a fast and powerful pattern recognition tool that locally identifies frequently occurring computations and programming concepts in the source code. This tool also works for dusty deck codes that have been "encrypted" by former machine-specific code transformations. Successful pattern recognition guides sophisticated code transformations including local algorithm replacement such that the parallelized code need not emerge from the sequential program structure by just parallelizing the loops. It allows access to an expert's knowledge on useful parallel algorithms, available machine-specific library routines, and powerful program transformations. The partially restored program semantics also supports local array alignment, distribution, and redistribution, and allows for faster and more exact prediction of the performance of the parallelized target code than is usually possible.
Development of whole core thermal-hydraulic analysis program ACT. 4. Simplified fuel assembly model and parallelization by MPI

International Nuclear Information System (INIS)

Ohshima, Hiroyuki

2001-10-01

A whole core thermal-hydraulic analysis program ACT is being developed for the purpose of evaluating detailed in-core thermal hydraulic phenomena of fast reactors including the effect of the flow between wrapper-tube walls (inter-wrapper flow) under various reactor operation conditions. As appropriate boundary conditions in addition to a detailed modeling of the core are essential for accurate simulations of in-core thermal hydraulics, ACT consists of not only fuel assembly and inter-wrapper flow analysis modules but also a heat transport system analysis module that gives response of the plant dynamics to the core model. This report describes incorporation of a simplified model to the fuel assembly analysis module and program parallelization by a message passing method toward large-scale simulations. ACT has a fuel assembly analysis module which can simulate a whole fuel pin bundle in each fuel assembly of the core and, however, it may take much CPU time for a large-scale core simulation. Therefore, a simplified fuel assembly model that is thermal-hydraulically equivalent to the detailed one has been incorporated in order to save the simulation time and resources. This simplified model is applied to several parts of fuel assemblies in a core where the detailed simulation results are not required. With regard to the program parallelization, the calculation load and the data flow of ACT were analyzed and the optimum parallelization has been done including the improvement of the numerical simulation algorithm of ACT. Message Passing Interface (MPI) is applied to data communication between processes and synchronization in parallel calculations. Parallelized ACT was verified through a comparison simulation with the original one. In addition to the above works, input manuals of the core analysis module and the heat transport system analysis module have been prepared. (author)
Coding for Parallel Links to Maximize the Expected Value of Decodable Messages

Science.gov (United States)

Klimesh, Matthew A.; Chang, Christopher S.

2011-01-01

When multiple parallel communication links are available, it is useful to consider link-utilization strategies that provide tradeoffs between reliability and throughput. Interesting cases arise when there are three or more available links. Under the model considered, the links have known probabilities of being in working order, and each link has a known capacity. The sender has a number of messages to send to the receiver. Each message has a size and a value (i.e., a worth or priority). Messages may be divided into pieces arbitrarily, and the value of each piece is proportional to its size. The goal is to choose combinations of messages to send on the links so that the expected value of the messages decodable by the receiver is maximized. There are three parts to the innovation: (1) Applying coding to parallel links under the model; (2) Linear programming formulation for finding the optimal combinations of messages to send on the links; and (3) Algorithms for assisting in finding feasible combinations of messages, as support for the linear programming formulation. There are similarities between this innovation and methods developed in the field of network coding. However, network coding has generally been concerned with either maximizing throughput in a fixed network, or robust communication of a fixed volume of data. In contrast, under this model, the throughput is expected to vary depending on the state of the network. Examples of error-correcting codes that are useful under this model but which are not needed under previous models have been found. This model can represent either a one-shot communication attempt, or a stream of communications. Under the one-shot model, message sizes and link capacities are quantities of information (e.g., measured in bits), while under the communications stream model, message sizes and link capacities are information rates (e.g., measured in bits/second). This work has the potential to increase the value of data returned from
Communication strategies for angular domain decomposition of transport calculations on message passing multiprocessors

International Nuclear Information System (INIS)

Azmy, Y.Y.

1997-01-01

The effect of three communication schemes for solving Arbitrarily High Order Transport (AHOT) methods of the Nodal type on parallel performance is examined via direct measurements and performance models. The target architecture in this study is Oak Ridge National Laboratory's 128 node Paragon XP/S 5 computer and the parallelization is based on the Parallel Virtual Machine (PVM) library. However, the conclusions reached can be easily generalized to a large class of message passing platforms and communication software. The three schemes considered here are: (1) PVM's global operations (broadcast and reduce) which utilizes the Paragon's native corresponding operations based on a spanning tree routing; (2) the Bucket algorithm wherein the angular domain decomposition of the mesh sweep is complemented with a spatial domain decomposition of the accumulation process of the scalar flux from the angular flux and the convergence test; (3) a distributed memory version of the Bucket algorithm that pushes the spatial domain decomposition one step farther by actually distributing the fixed source and flux iterates over the memories of the participating processes. Their conclusion is that the Bucket algorithm is the most efficient of the three if all participating processes have sufficient memories to hold the entire problem arrays. Otherwise, the third scheme becomes necessary at an additional cost to speedup and parallel efficiency that is quantifiable via the parallel performance model
A Message-Passing Hardware/Software Cosimulation Environment for Reconfigurable Computing Systems

Directory of Open Access Journals (Sweden)

Manuel Saldaña

2009-01-01

Full Text Available High-performance reconfigurable computers (HPRCs provide a mix of standard processors and FPGAs to collectively accelerate applications. This introduces new design challenges, such as the need for portable programming models across HPRCs and system-level verification tools. To address the need for cosimulating a complete heterogeneous application using both software and hardware in an HPRC, we have created a tool called the Message-passing Simulation Framework (MSF. We have used it to simulate and develop an interface enabling an MPI-based approach to exchange data between X86 processors and hardware engines inside FPGAs. The MSF can also be used as an application development tool that enables multiple FPGAs in simulation to exchange messages amongst themselves and with X86 processors. As an example, we simulate a LINPACK benchmark hardware core using an Intel-FSB-Xilinx-FPGA platform to quickly prototype the hardware, to test the communications. and to verify the benchmark results.
Teaching Scientific Computing: A Model-Centered Approach to Pipeline and Parallel Programming with C

Directory of Open Access Journals (Sweden)

Vladimiras Dolgopolovas

2015-01-01

Full Text Available The aim of this study is to present an approach to the introduction into pipeline and parallel computing, using a model of the multiphase queueing system. Pipeline computing, including software pipelines, is among the key concepts in modern computing and electronics engineering. The modern computer science and engineering education requires a comprehensive curriculum, so the introduction to pipeline and parallel computing is the essential topic to be included in the curriculum. At the same time, the topic is among the most motivating tasks due to the comprehensive multidisciplinary and technical requirements. To enhance the educational process, the paper proposes a novel model-centered framework and develops the relevant learning objects. It allows implementing an educational platform of constructivist learning process, thus enabling learners’ experimentation with the provided programming models, obtaining learners’ competences of the modern scientific research and computational thinking, and capturing the relevant technical knowledge. It also provides an integral platform that allows a simultaneous and comparative introduction to pipelining and parallel computing. The programming language C for developing programming models and message passing interface (MPI and OpenMP parallelization tools have been chosen for implementation.
A new shared-memory programming paradigm for molecular dynamics simulations on the Intel Paragon

International Nuclear Information System (INIS)

D'Azevedo, E.F.; Romine, C.H.

1994-12-01

This report describes the use of shared memory emulation with DOLIB (Distributed Object Library) to simplify parallel programming on the Intel Paragon. A molecular dynamics application is used as an example to illustrate the use of the DOLIB shared memory library. SOTON-PAR, a parallel molecular dynamics code with explicit message-passing using a Lennard-Jones 6-12 potential, is rewritten using DOLIB primitives. The resulting code has no explicit message primitives and resembles a serial code. The new code can perform dynamic load balancing and achieves better performance than the original parallel code with explicit message-passing
A message passing algorithm for the evaluation of social influence

NARCIS (Netherlands)

Vassio, Luca; Fagnani, Fabio; Frasca, Paolo; Ozdaglar, Asuman

2014-01-01

In this paper, we define a new measure of node centrality in social networks, the Harmonic Influence Centrality, which emerges naturally in the study of social influence over networks. Next, we introduce a distributed message passing algorithm to compute the Harmonic Influence Centrality of each
Portable programming on parallel/networked computers using the Application Portable Parallel Library (APPL)

Science.gov (United States)

Quealy, Angela; Cole, Gary L.; Blech, Richard A.

1993-01-01

The Application Portable Parallel Library (APPL) is a subroutine-based library of communication primitives that is callable from applications written in FORTRAN or C. APPL provides a consistent programmer interface to a variety of distributed and shared-memory multiprocessor MIMD machines. The objective of APPL is to minimize the effort required to move parallel applications from one machine to another, or to a network of homogeneous machines. APPL encompasses many of the message-passing primitives that are currently available on commercial multiprocessor systems. This paper describes APPL (version 2.3.1) and its usage, reports the status of the APPL project, and indicates possible directions for the future. Several applications using APPL are discussed, as well as performance and overhead results.
High-energy physics software parallelization using database techniques

International Nuclear Information System (INIS)

Argante, E.; Van der Stok, P.D.V.; Willers, I.

1997-01-01

A programming model for software parallelization, called CoCa, is introduced that copes with problems caused by typical features of high-energy physics software. By basing CoCa on the database transaction paradigm, the complexity induced by the parallelization is for a large part transparent to the programmer, resulting in a higher level of abstraction than the native message passing software. CoCa is implemented on a Meiko CS-2 and on a SUN SPARCcenter 2000 parallel computer. On the CS-2, the performance is comparable with the performance of native PVM and MPI. (orig.)
Belief propagation decoding of quantum channels by passing quantum messages

International Nuclear Information System (INIS)

Renes, Joseph M

2017-01-01

The belief propagation (BP) algorithm is a powerful tool in a wide range of disciplines from statistical physics to machine learning to computational biology, and is ubiquitous in decoding classical error-correcting codes. The algorithm works by passing messages between nodes of the factor graph associated with the code and enables efficient decoding of the channel, in some cases even up to the Shannon capacity. Here we construct the first BP algorithm which passes quantum messages on the factor graph and is capable of decoding the classical–quantum channel with pure state outputs. This gives explicit decoding circuits whose number of gates is quadratic in the code length. We also show that this decoder can be modified to work with polar codes for the pure state channel and as part of a decoder for transmitting quantum information over the amplitude damping channel. These represent the first explicit capacity-achieving decoders for non-Pauli channels. (fast track communication)
Belief propagation decoding of quantum channels by passing quantum messages

Science.gov (United States)

Renes, Joseph M.

2017-07-01

The belief propagation (BP) algorithm is a powerful tool in a wide range of disciplines from statistical physics to machine learning to computational biology, and is ubiquitous in decoding classical error-correcting codes. The algorithm works by passing messages between nodes of the factor graph associated with the code and enables efficient decoding of the channel, in some cases even up to the Shannon capacity. Here we construct the first BP algorithm which passes quantum messages on the factor graph and is capable of decoding the classical-quantum channel with pure state outputs. This gives explicit decoding circuits whose number of gates is quadratic in the code length. We also show that this decoder can be modified to work with polar codes for the pure state channel and as part of a decoder for transmitting quantum information over the amplitude damping channel. These represent the first explicit capacity-achieving decoders for non-Pauli channels.

Parallel integer sorting with medium and fine-scale parallelism

Science.gov (United States)

Dagum, Leonardo

1993-01-01

Two new parallel integer sorting algorithms, queue-sort and barrel-sort, are presented and analyzed in detail. These algorithms do not have optimal parallel complexity, yet they show very good performance in practice. Queue-sort designed for fine-scale parallel architectures which allow the queueing of multiple messages to the same destination. Barrel-sort is designed for medium-scale parallel architectures with a high message passing overhead. The performance results from the implementation of queue-sort on a Connection Machine CM-2 and barrel-sort on a 128 processor iPSC/860 are given. The two implementations are found to be comparable in performance but not as good as a fully vectorized bucket sort on the Cray YMP.
Parallel imaging for first-pass myocardial perfusion

NARCIS (Netherlands)

Irwan, Roy; Lubbers, Daniel D.; van der Vleuten, Pieter A.; Kappert, Peter; Gotte, Marco J. W.; Sijens, Paul E.

Two parallel imaging methods used for first-pass myocardial perfusion imaging were compared in terms of signal-to-noise ratio (SNR), contrast-to-noise ratio (CNR) and image artifacts. One used adaptive Time-adaptive SENSitivity Encoding (TSENSE) and the other used GeneRalized Autocalibrating
Fault-tolerant Agreement in Synchronous Message-passing Systems

CERN Document Server

Raynal, Michel

2010-01-01

The present book focuses on the way to cope with the uncertainty created by process failures (crash, omission failures and Byzantine behavior) in synchronous message-passing systems (i.e., systems whose progress is governed by the passage of time). To that end, the book considers fundamental problems that distributed synchronous processes have to solve. These fundamental problems concern agreement among processes (if processes are unable to agree in one way or another in presence of failures, no non-trivial problem can be solved). They are consensus, interactive consistency, k-set agreement an
Performance modeling of parallel algorithms for solving neutron diffusion problems

International Nuclear Information System (INIS)

Azmy, Y.Y.; Kirk, B.L.

1995-01-01

Neutron diffusion calculations are the most common computational methods used in the design, analysis, and operation of nuclear reactors and related activities. Here, mathematical performance models are developed for the parallel algorithm used to solve the neutron diffusion equation on message passing and shared memory multiprocessors represented by the Intel iPSC/860 and the Sequent Balance 8000, respectively. The performance models are validated through several test problems, and these models are used to estimate the performance of each of the two considered architectures in situations typical of practical applications, such as fine meshes and a large number of participating processors. While message passing computers are capable of producing speedup, the parallel efficiency deteriorates rapidly as the number of processors increases. Furthermore, the speedup fails to improve appreciably for massively parallel computers so that only small- to medium-sized message passing multiprocessors offer a reasonable platform for this algorithm. In contrast, the performance model for the shared memory architecture predicts very high efficiency over a wide range of number of processors reasonable for this architecture. Furthermore, the model efficiency of the Sequent remains superior to that of the hypercube if its model parameters are adjusted to make its processors as fast as those of the iPSC/860. It is concluded that shared memory computers are better suited for this parallel algorithm than message passing computers
Methodologies and Tools for Tuning Parallel Programs: 80% Art, 20% Science, and 10% Luck

Science.gov (United States)

Yan, Jerry C.; Bailey, David (Technical Monitor)

1996-01-01

The need for computing power has forced a migration from serial computation on a single processor to parallel processing on multiprocessors. However, without effective means to monitor (and analyze) program execution, tuning the performance of parallel programs becomes exponentially difficult as program complexity and machine size increase. In the past few years, the ubiquitous introduction of performance tuning tools from various supercomputer vendors (Intel's ParAide, TMC's PRISM, CRI's Apprentice, and Convex's CXtrace) seems to indicate the maturity of performance instrumentation/monitor/tuning technologies and vendors'/customers' recognition of their importance. However, a few important questions remain: What kind of performance bottlenecks can these tools detect (or correct)? How time consuming is the performance tuning process? What are some important technical issues that remain to be tackled in this area? This workshop reviews the fundamental concepts involved in analyzing and improving the performance of parallel and heterogeneous message-passing programs. Several alternative strategies will be contrasted, and for each we will describe how currently available tuning tools (e.g. AIMS, ParAide, PRISM, Apprentice, CXtrace, ATExpert, Pablo, IPS-2) can be used to facilitate the process. We will characterize the effectiveness of the tools and methodologies based on actual user experiences at NASA Ames Research Center. Finally, we will discuss their limitations and outline recent approaches taken by vendors and the research community to address them.
On the Parallel Elliptic Single/Multigrid Solutions about Aligned and Nonaligned Bodies Using the Virtual Machine for Multiprocessors

Directory of Open Access Journals (Sweden)

A. Averbuch

1994-01-01

Full Text Available Parallel elliptic single/multigrid solutions around an aligned and nonaligned body are presented and implemented on two multi-user and single-user shared memory multiprocessors (Sequent Symmetry and MOS and on a distributed memory multiprocessor (a Transputer network. Our parallel implementation uses the Virtual Machine for Muli-Processors (VMMP, a software package that provides a coherent set of services for explicitly parallel application programs running on diverse multiple instruction multiple data (MIMD multiprocessors, both shared memory and message passing. VMMP is intended to simplify parallel program writing and to promote portable and efficient programming. Furthermore, it ensures high portability of application programs by implementing the same services on all target multiprocessors. The performance of our algorithm is investigated in detail. It is seen to fit well the above architectures when the number of processors is less than the maximal number of grid points along the axes. In general, the efficiency in the nonaligned case is higher than in the aligned case. Alignment overhead is observed to be up to 200% in the shared-memory case and up to 65% in the message-passing case. We have demonstrated that when using VMMP, the portability of the algorithms is straightforward and efficient.
Data communications in a parallel active messaging interface of a parallel computer

Science.gov (United States)

Archer, Charles J; Blocksome, Michael A; Ratterman, Joseph D; Smith, Brian E

2013-10-29

Data communications in a parallel active messaging interface (`PAMI`) of a parallel computer, the parallel computer including a plurality of compute nodes that execute a parallel application, the PAMI composed of data communications endpoints, each endpoint including a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task, the compute nodes and the endpoints coupled for data communications through the PAMI and through data communications resources, including receiving in an origin endpoint of the PAMI a data communications instruction, the instruction characterized by an instruction type, the instruction specifying a transmission of transfer data from the origin endpoint to a target endpoint and transmitting, in accordance with the instruction type, the transfer data from the origin endpoint to the target endpoint.
Data communications for a collective operation in a parallel active messaging interface of a parallel computer

Science.gov (United States)

Faraj, Daniel A

2013-07-16

Algorithm selection for data communications in a parallel active messaging interface (`PAMI`) of a parallel computer, the PAMI composed of data communications endpoints, each endpoint including specifications of a client, a context, and a task, endpoints coupled for data communications through the PAMI, including associating in the PAMI data communications algorithms and bit masks; receiving in an origin endpoint of the PAMI a collective instruction, the instruction specifying transmission of a data communications message from the origin endpoint to a target endpoint; constructing a bit mask for the received collective instruction; selecting, from among the associated algorithms and bit masks, a data communications algorithm in dependence upon the constructed bit mask; and executing the collective instruction, transmitting, according to the selected data communications algorithm from the origin endpoint to the target endpoint, the data communications message.
Data communications in a parallel active messaging interface of a parallel computer

Science.gov (United States)

Archer, Charles J; Blocksome, Michael A; Ratterman, Joseph D; Smith, Brian E

2013-11-12

Data communications in a parallel active messaging interface (`PAMI`) of a parallel computer composed of compute nodes that execute a parallel application, each compute node including application processors that execute the parallel application and at least one management processor dedicated to gathering information regarding data communications. The PAMI is composed of data communications endpoints, each endpoint composed of a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task, the compute nodes and the endpoints coupled for data communications through the PAMI and through data communications resources. Embodiments function by gathering call site statistics describing data communications resulting from execution of data communications instructions and identifying in dependence upon the call cite statistics a data communications algorithm for use in executing a data communications instruction at a call site in the parallel application.
Parallel Object-Oriented Computation Applied to a Finite Element Problem

Directory of Open Access Journals (Sweden)

Jon B. Weissman

1993-01-01

Full Text Available The conventional wisdom in the scientific computing community is that the best way to solve large-scale numerically intensive scientific problems on today's parallel MIMD computers is to use Fortran or C programmed in a data-parallel style using low-level message-passing primitives. This approach inevitably leads to nonportable codes and extensive development time, and restricts parallel programming to the domain of the expert programmer. We believe that these problems are not inherent to parallel computing but are the result of the programming tools used. We will show that comparable performance can be achieved with little effort if better tools that present higher level abstractions are used. The vehicle for our demonstration is a 2D electromagnetic finite element scattering code we have implemented in Mentat, an object-oriented parallel processing system. We briefly describe the application. Mentat, the implementation, and present performance results for both a Mentat and a hand-coded parallel Fortran version.
Enhancing Application Performance Using Mini-Apps: Comparison of Hybrid Parallel Programming Paradigms

Science.gov (United States)

Lawson, Gary; Sosonkina, Masha; Baurle, Robert; Hammond, Dana

2017-01-01

In many fields, real-world applications for High Performance Computing have already been developed. For these applications to stay up-to-date, new parallel strategies must be explored to yield the best performance; however, restructuring or modifying a real-world application may be daunting depending on the size of the code. In this case, a mini-app may be employed to quickly explore such options without modifying the entire code. In this work, several mini-apps have been created to enhance a real-world application performance, namely the VULCAN code for complex flow analysis developed at the NASA Langley Research Center. These mini-apps explore hybrid parallel programming paradigms with Message Passing Interface (MPI) for distributed memory access and either Shared MPI (SMPI) or OpenMP for shared memory accesses. Performance testing shows that MPI+SMPI yields the best execution performance, while requiring the largest number of code changes. A maximum speedup of 23 was measured for MPI+SMPI, but only 11 was measured for MPI+OpenMP.
Endpoint-based parallel data processing in a parallel active messaging interface of a parallel computer

Science.gov (United States)

Archer, Charles J.; Blocksome, Michael A.; Ratterman, Joseph D.; Smith, Brian E.

2014-08-12

Endpoint-based parallel data processing in a parallel active messaging interface (`PAMI`) of a parallel computer, the PAMI composed of data communications endpoints, each endpoint including a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task, the compute nodes coupled for data communications through the PAMI, including establishing a data communications geometry, the geometry specifying, for tasks representing processes of execution of the parallel application, a set of endpoints that are used in collective operations of the PAMI including a plurality of endpoints for one of the tasks; receiving in endpoints of the geometry an instruction for a collective operation; and executing the instruction for a collective operation through the endpoints in dependence upon the geometry, including dividing data communications operations among the plurality of endpoints for one of the tasks.
Ultrascalable petaflop parallel supercomputer

Science.gov (United States)

Blumrich, Matthias A [Ridgefield, CT; Chen, Dong [Croton On Hudson, NY; Chiu, George [Cross River, NY; Cipolla, Thomas M [Katonah, NY; Coteus, Paul W [Yorktown Heights, NY; Gara, Alan G [Mount Kisco, NY; Giampapa, Mark E [Irvington, NY; Hall, Shawn [Pleasantville, NY; Haring, Rudolf A [Cortlandt Manor, NY; Heidelberger, Philip [Cortlandt Manor, NY; Kopcsay, Gerard V [Yorktown Heights, NY; Ohmacht, Martin [Yorktown Heights, NY; Salapura, Valentina [Chappaqua, NY; Sugavanam, Krishnan [Mahopac, NY; Takken, Todd [Brewster, NY

2010-07-20

A massively parallel supercomputer of petaOPS-scale includes node architectures based upon System-On-a-Chip technology, where each processing node comprises a single Application Specific Integrated Circuit (ASIC) having up to four processing elements. The ASIC nodes are interconnected by multiple independent networks that optimally maximize the throughput of packet communications between nodes with minimal latency. The multiple networks may include three high-speed networks for parallel algorithm message passing including a Torus, collective network, and a Global Asynchronous network that provides global barrier and notification functions. These multiple independent networks may be collaboratively or independently utilized according to the needs or phases of an algorithm for optimizing algorithm processing performance. The use of a DMA engine is provided to facilitate message passing among the nodes without the expenditure of processing resources at the node.
Parallel Conjugate Gradient: Effects of Ordering Strategies, Programming Paradigms, and Architectural Platforms

Science.gov (United States)

Oliker, Leonid; Heber, Gerd; Biswas, Rupak

2000-01-01

The Conjugate Gradient (CG) algorithm is perhaps the best-known iterative technique to solve sparse linear systems that are symmetric and positive definite. A sparse matrix-vector multiply (SPMV) usually accounts for most of the floating-point operations within a CG iteration. In this paper, we investigate the effects of various ordering and partitioning strategies on the performance of parallel CG and SPMV using different programming paradigms and architectures. Results show that for this class of applications, ordering significantly improves overall performance, that cache reuse may be more important than reducing communication, and that it is possible to achieve message passing performance using shared memory constructs through careful data ordering and distribution. However, a multi-threaded implementation of CG on the Tera MTA does not require special ordering or partitioning to obtain high efficiency and scalability.
On the adequacy of message-passing parallel supercomputers for solving neutron transport problems

International Nuclear Information System (INIS)

Azmy, Y.Y.

1990-01-01

A coarse-grained, static-scheduling parallelization of the standard iterative scheme used for solving the discrete-ordinates approximation of the neutron transport equation is described. The parallel algorithm is based on a decomposition of the angular domain along the discrete ordinates, thus naturally producing a set of completely uncoupled systems of equations in each iteration. Implementation of the parallel code on Intcl's iPSC/2 hypercube, and solutions to test problems are presented as evidence of the high speedup and efficiency of the parallel code. The performance of the parallel code on the iPSC/2 is analyzed, and a model for the CPU time as a function of the problem size (order of angular quadrature) and the number of participating processors is developed and validated against measured CPU times. The performance model is used to speculate on the potential of massively parallel computers for significantly speeding up real-life transport calculations at acceptable efficiencies. We conclude that parallel computers with a few hundred processors are capable of producing large speedups at very high efficiencies in very large three-dimensional problems. 10 refs., 8 figs
Study on MPI/OpenMP hybrid parallelism for Monte Carlo neutron transport code

International Nuclear Information System (INIS)

Liang Jingang; Xu Qi; Wang Kan; Liu Shiwen

2013-01-01

Parallel programming with mixed mode of messages-passing and shared-memory has several advantages when used in Monte Carlo neutron transport code, such as fitting hardware of distributed-shared clusters, economizing memory demand of Monte Carlo transport, improving parallel performance, and so on. MPI/OpenMP hybrid parallelism was implemented based on a one dimension Monte Carlo neutron transport code. Some critical factors affecting the parallel performance were analyzed and solutions were proposed for several problems such as contention access, lock contention and false sharing. After optimization the code was tested finally. It is shown that the hybrid parallel code can reach good performance just as pure MPI parallel program, while it saves a lot of memory usage at the same time. Therefore hybrid parallel is efficient for achieving large-scale parallel of Monte Carlo neutron transport. (authors)
Analysis of Parallel Algorithms on SMP Node and Cluster of Workstations Using Parallel Programming Models with New Tile-based Method for Large Biological Datasets.

Science.gov (United States)

Shrimankar, D D; Sathe, S R

2016-01-01

Sequence alignment is an important tool for describing the relationships between DNA sequences. Many sequence alignment algorithms exist, differing in efficiency, in their models of the sequences, and in the relationship between sequences. The focus of this study is to obtain an optimal alignment between two sequences of biological data, particularly DNA sequences. The algorithm is discussed with particular emphasis on time, speedup, and efficiency optimizations. Parallel programming presents a number of critical challenges to application developers. Today's supercomputer often consists of clusters of SMP nodes. Programming paradigms such as OpenMP and MPI are used to write parallel codes for such architectures. However, the OpenMP programs cannot be scaled for more than a single SMP node. However, programs written in MPI can have more than single SMP nodes. But such a programming paradigm has an overhead of internode communication. In this work, we explore the tradeoffs between using OpenMP and MPI. We demonstrate that the communication overhead incurs significantly even in OpenMP loop execution and increases with the number of cores participating. We also demonstrate a communication model to approximate the overhead from communication in OpenMP loops. Our results are astonishing and interesting to a large variety of input data files. We have developed our own load balancing and cache optimization technique for message passing model. Our experimental results show that our own developed techniques give optimum performance of our parallel algorithm for various sizes of input parameter, such as sequence size and tile size, on a wide variety of multicore architectures.
Analysis of Parallel Algorithms on SMP Node and Cluster of Workstations Using Parallel Programming Models with New Tile-based Method for Large Biological Datasets

Science.gov (United States)

Shrimankar, D. D.; Sathe, S. R.

2016-01-01

Sequence alignment is an important tool for describing the relationships between DNA sequences. Many sequence alignment algorithms exist, differing in efficiency, in their models of the sequences, and in the relationship between sequences. The focus of this study is to obtain an optimal alignment between two sequences of biological data, particularly DNA sequences. The algorithm is discussed with particular emphasis on time, speedup, and efficiency optimizations. Parallel programming presents a number of critical challenges to application developers. Today’s supercomputer often consists of clusters of SMP nodes. Programming paradigms such as OpenMP and MPI are used to write parallel codes for such architectures. However, the OpenMP programs cannot be scaled for more than a single SMP node. However, programs written in MPI can have more than single SMP nodes. But such a programming paradigm has an overhead of internode communication. In this work, we explore the tradeoffs between using OpenMP and MPI. We demonstrate that the communication overhead incurs significantly even in OpenMP loop execution and increases with the number of cores participating. We also demonstrate a communication model to approximate the overhead from communication in OpenMP loops. Our results are astonishing and interesting to a large variety of input data files. We have developed our own load balancing and cache optimization technique for message passing model. Our experimental results show that our own developed techniques give optimum performance of our parallel algorithm for various sizes of input parameter, such as sequence size and tile size, on a wide variety of multicore architectures. PMID:27932868
S-AMP: Approximate Message Passing for General Matrix Ensembles

DEFF Research Database (Denmark)

Cakmak, Burak; Winther, Ole; Fleury, Bernard H.

2014-01-01

the approximate message-passing (AMP) algorithm to general matrix ensembles with a well-defined large system size limit. The generalization is based on the S-transform (in free probability) of the spectrum of the measurement matrix. Furthermore, we show that the optimality of S-AMP follows directly from its......We propose a novel iterative estimation algorithm for linear observation models called S-AMP. The fixed points of S-AMP are the stationary points of the exact Gibbs free energy under a set of (first- and second-) moment consistency constraints in the large system limit. S-AMP extends...
Parallel object-oriented specification language

NARCIS (Netherlands)

Florescu, O.; Voeten, J.P.M.; Theelen, B.D.; Geilen, M.C.W.; Corporaal, H.; Burns, Alan

2008-01-01

The Parallel Object-Oriented Specification Language (POOSL) is an expressive modelling language for hardware/software systems [10]. It was originally defined in [7] as an object-oriented extension of process algebra CCS [6], supporting (conditional) synchronous message passing between

High Performance Programming Using Explicit Shared Memory Model on Cray T3D1

Science.gov (United States)

Simon, Horst D.; Saini, Subhash; Grassi, Charles

1994-01-01

The Cray T3D system is the first-phase system in Cray Research, Inc.'s (CRI) three-phase massively parallel processing (MPP) program. This system features a heterogeneous architecture that closely couples DEC's Alpha microprocessors and CRI's parallel-vector technology, i.e., the Cray Y-MP and Cray C90. An overview of the Cray T3D hardware and available programming models is presented. Under Cray Research adaptive Fortran (CRAFT) model four programming methods (data parallel, work sharing, message-passing using PVM, and explicit shared memory model) are available to the users. However, at this time data parallel and work sharing programming models are not available to the user community. The differences between standard PVM and CRI's PVM are highlighted with performance measurements such as latencies and communication bandwidths. We have found that the performance of neither standard PVM nor CRI s PVM exploits the hardware capabilities of the T3D. The reasons for the bad performance of PVM as a native message-passing library are presented. This is illustrated by the performance of NAS Parallel Benchmarks (NPB) programmed in explicit shared memory model on Cray T3D. In general, the performance of standard PVM is about 4 to 5 times less than obtained by using explicit shared memory model. This degradation in performance is also seen on CM-5 where the performance of applications using native message-passing library CMMD on CM-5 is also about 4 to 5 times less than using data parallel methods. The issues involved (such as barriers, synchronization, invalidating data cache, aligning data cache etc.) while programming in explicit shared memory model are discussed. Comparative performance of NPB using explicit shared memory programming model on the Cray T3D and other highly parallel systems such as the TMC CM-5, Intel Paragon, Cray C90, IBM-SP1, etc. is presented.
High-speed parallel solution of the neutron diffusion equation with the hierarchical domain decomposition boundary element method incorporating parallel communications

International Nuclear Information System (INIS)

Tsuji, Masashi; Chiba, Gou

2000-01-01

A hierarchical domain decomposition boundary element method (HDD-BEM) for solving the multiregion neutron diffusion equation (NDE) has been fully parallelized, both for numerical computations and for data communications, to accomplish a high parallel efficiency on distributed memory message passing parallel computers. Data exchanges between node processors that are repeated during iteration processes of HDD-BEM are implemented, without any intervention of the host processor that was used to supervise parallel processing in the conventional parallelized HDD-BEM (P-HDD-BEM). Thus, the parallel processing can be executed with only cooperative operations of node processors. The communication overhead was even the dominant time consuming part in the conventional P-HDD-BEM, and the parallelization efficiency decreased steeply with the increase of the number of processors. With the parallel data communication, the efficiency is affected only by the number of boundary elements assigned to decomposed subregions, and the communication overhead can be drastically reduced. This feature can be particularly advantageous in the analysis of three-dimensional problems where a large number of processors are required. The proposed P-HDD-BEM offers a promising solution to the deterioration problem of parallel efficiency and opens a new path to parallel computations of NDEs on distributed memory message passing parallel computers. (author)
A Two-Pass Exact Algorithm for Selection on Parallel Disk Systems.

Science.gov (United States)

Mi, Tian; Rajasekaran, Sanguthevar

2013-07-01

Numerous OLAP queries process selection operations of "top N", median, "top 5%", in data warehousing applications. Selection is a well-studied problem that has numerous applications in the management of data and databases since, typically, any complex data query can be reduced to a series of basic operations such as sorting and selection. The parallel selection has also become an important fundamental operation, especially after parallel databases were introduced. In this paper, we present a deterministic algorithm Recursive Sampling Selection (RSS) to solve the exact out-of-core selection problem, which we show needs no more than (2 + ε ) passes ( ε being a very small fraction). We have compared our RSS algorithm with two other algorithms in the literature, namely, the Deterministic Sampling Selection and QuickSelect on the Parallel Disks Systems. Our analysis shows that DSS is a (2 + ε )-pass algorithm when the total number of input elements N is a polynomial in the memory size M (i.e., N = M c for some constant c ). While, our proposed algorithm RSS runs in (2 + ε ) passes without any assumptions. Experimental results indicate that both RSS and DSS outperform QuickSelect on the Parallel Disks Systems. Especially, the proposed algorithm RSS is more scalable and robust to handle big data when the input size is far greater than the core memory size, including the case of N ≫ M c .
A message-passing approach to random constraint satisfaction problems with growing domains

International Nuclear Information System (INIS)

Zhao, Chunyan; Zheng, Zhiming; Zhou, Haijun; Xu, Ke

2011-01-01

Message-passing algorithms based on belief propagation (BP) are implemented on a random constraint satisfaction problem (CSP) referred to as model RB, which is a prototype of hard random CSPs with growing domain size. In model RB, the number of candidate discrete values (the domain size) of each variable increases polynomially with the variable number N of the problem formula. Although the satisfiability threshold of model RB is exactly known, finding solutions for a single problem formula is quite challenging and attempts have been limited to cases of N ∼ 10 2 . In this paper, we propose two different kinds of message-passing algorithms guided by BP for this problem. Numerical simulations demonstrate that these algorithms allow us to find a solution for random formulas of model RB with constraint tightness slightly less than p cr , the threshold value for the satisfiability phase transition. To evaluate the performance of these algorithms, we also provide a local search algorithm (random walk) as a comparison. Besides this, the simulated time dependence of the problem size N and the entropy of the variables for growing domain size are discussed
PUMA: An Operating System for Massively Parallel Systems

Directory of Open Access Journals (Sweden)

Stephen R. Wheat

1994-01-01

Full Text Available This article presents an overview of PUMA (Performance-oriented, User-managed Messaging Architecture, a message-passing kernel for massively parallel systems. Message passing in PUMA is based on portals – an opening in the address space of an application process. Once an application process has established a portal, other processes can write values into the portal using a simple send operation. Because messages are written directly into the address space of the receiving process, there is no need to buffer messages in the PUMA kernel and later copy them into the applications address space. PUMA consists of two components: the quintessential kernel (Q-Kernel and the process control thread (PCT. Although the PCT provides management decisions, the Q-Kernel controls access and implements the policies specified by the PCT.
A portable implementation of ARPACK for distributed memory parallel architectures

Energy Technology Data Exchange (ETDEWEB)

Maschhoff, K.J.; Sorensen, D.C.

1996-12-31

ARPACK is a package of Fortran 77 subroutines which implement the Implicitly Restarted Arnoldi Method used for solving large sparse eigenvalue problems. A parallel implementation of ARPACK is presented which is portable across a wide range of distributed memory platforms and requires minimal changes to the serial code. The communication layers used for message passing are the Basic Linear Algebra Communication Subprograms (BLACS) developed for the ScaLAPACK project and Message Passing Interface(MPI).
ParaHaplo 3.0: A program package for imputation and a haplotype-based whole-genome association study using hybrid parallel computing

Directory of Open Access Journals (Sweden)

Kamatani Naoyuki

2011-05-01

Full Text Available Abstract Background Use of missing genotype imputations and haplotype reconstructions are valuable in genome-wide association studies (GWASs. By modeling the patterns of linkage disequilibrium in a reference panel, genotypes not directly measured in the study samples can be imputed and used for GWASs. Since millions of single nucleotide polymorphisms need to be imputed in a GWAS, faster methods for genotype imputation and haplotype reconstruction are required. Results We developed a program package for parallel computation of genotype imputation and haplotype reconstruction. Our program package, ParaHaplo 3.0, is intended for use in workstation clusters using the Intel Message Passing Interface. We compared the performance of ParaHaplo 3.0 on the Japanese in Tokyo, Japan and Han Chinese in Beijing, and Chinese in the HapMap dataset. A parallel version of ParaHaplo 3.0 can conduct genotype imputation 20 times faster than a non-parallel version of ParaHaplo. Conclusions ParaHaplo 3.0 is an invaluable tool for conducting haplotype-based GWASs. The need for faster genotype imputation and haplotype reconstruction using parallel computing will become increasingly important as the data sizes of such projects continue to increase. ParaHaplo executable binaries and program sources are available at http://en.sourceforge.jp/projects/parallelgwas/releases/.
Parallel Monte Carlo simulation of aerosol dynamics

KAUST Repository

Zhou, K.; He, Z.; Xiao, M.; Zhang, Z.

2014-01-01

is simulated with a stochastic method (Marcus-Lushnikov stochastic process). Operator splitting techniques are used to synthesize the deterministic and stochastic parts in the algorithm. The algorithm is parallelized using the Message Passing Interface (MPI
Space Reclamation for Uncoordinated Checkpointing in Message-Passing Systems. Ph.D. Thesis

Science.gov (United States)

Wang, Yi-Min

1993-01-01

Checkpointing and rollback recovery are techniques that can provide efficient recovery from transient process failures. In a message-passing system, the rollback of a message sender may cause the rollback of the corresponding receiver, and the system needs to roll back to a consistent set of checkpoints called recovery line. If the processes are allowed to take uncoordinated checkpoints, the above rollback propagation may result in the domino effect which prevents recovery line progression. Traditionally, only obsolete checkpoints before the global recovery line can be discarded, and the necessary and sufficient condition for identifying all garbage checkpoints has remained an open problem. A necessary and sufficient condition for achieving optimal garbage collection is derived and it is proved that the number of useful checkpoints is bounded by N(N+1)/2, where N is the number of processes. The approach is based on the maximum-sized antichain model of consistent global checkpoints and the technique of recovery line transformation and decomposition. It is also shown that, for systems requiring message logging to record in-transit messages, the same approach can be used to achieve optimal message log reclamation. As a final topic, a unifying framework is described by considering checkpoint coordination and exploiting piecewise determinism as mechanisms for bounding rollback propagation, and the applicability of the optimal garbage collection algorithm to domino-free recovery protocols is demonstrated.
Parallelization of applications for networks with homogeneous and heterogeneous processors

International Nuclear Information System (INIS)

Colombet, L.

1994-01-01

The aim of this thesis is to study and develop efficient methods for parallelization of scientific applications on parallel computers with distributed memory. The first part presents two libraries of PVM (Parallel Virtual Machine) and MPI (Message Passing Interface) communication tools. They allow implementation of programs on most parallel machines, but also on heterogeneous computer networks. This chapter illustrates the problems faced when trying to evaluate performances of networks with heterogeneous processors. To evaluate such performances, the concepts of speed-up and efficiency have been modified and adapted to account for heterogeneity. The second part deals with a study of parallel application libraries such as ScaLAPACK and with the development of communication masking techniques. The general concept is based on communication anticipation, in particular by pipelining message sending operations. Experimental results on Cray T3D and IBM SP1 machines validates the theoretical studies performed on basic algorithms of the libraries discussed above. Two examples of scientific applications are given: the first is a model of young stars for astrophysics and the other is a model of photon trajectories in the Compton effect. (J.S.). 83 refs., 65 figs., 24 tabs
Optimisation of a parallel ocean general circulation model

OpenAIRE

M. I. Beare; D. P. Stevens

1997-01-01

International audience; This paper presents the development of a general-purpose parallel ocean circulation model, for use on a wide range of computer platforms, from traditional scalar machines to workstation clusters and massively parallel processors. Parallelism is provided, as a modular option, via high-level message-passing routines, thus hiding the technical intricacies from the user. An initial implementation highlights that the parallel efficiency of the model is adversely affected by...
Towards deductive verification of MPI programs against session types

Directory of Open Access Journals (Sweden)

Eduardo R. B. Marques

2013-12-01

Full Text Available The Message Passing Interface (MPI is the de facto standard message-passing infrastructure for developing parallel applications. Two decades after the first version of the library specification, MPI-based applications are nowadays routinely deployed on super and cluster computers. These applications, written in C or Fortran, exhibit intricate message passing behaviours, making it hard to statically verify important properties such as the absence of deadlocks. Our work builds on session types, a theory for describing protocols that provides for correct-by-construction guarantees in this regard. We annotate MPI primitives and C code with session type contracts, written in the language of a software verifier for C. Annotated code is then checked for correctness with the software verifier. We present preliminary results and discuss the challenges that lie ahead for verifying realistic MPI program compliance against session types.
Hierarchical approach to optimization of parallel matrix multiplication on large-scale platforms

KAUST Repository

Hasanov, Khalid; Quintin, Jean-Noë l; Lastovetsky, Alexey

2014-01-01

-scale parallelism in mind. Indeed, while in 1990s a system with few hundred cores was considered a powerful supercomputer, modern top supercomputers have millions of cores. In this paper, we present a hierarchical approach to optimization of message-passing parallel
HPC parallel programming model for gyrokinetic MHD simulation

International Nuclear Information System (INIS)

Naitou, Hiroshi; Yamada, Yusuke; Tokuda, Shinji; Ishii, Yasutomo; Yagi, Masatoshi

2011-01-01

The 3-dimensional gyrokinetic PIC (particle-in-cell) code for MHD simulation, Gpic-MHD, was installed on SR16000 (“Plasma Simulator”), which is a scalar cluster system consisting of 8,192 logical cores. The Gpic-MHD code advances particle and field quantities in time. In order to distribute calculations over large number of logical cores, the total simulation domain in cylindrical geometry was broken up into N DD-r × N DD-z (number of radial decomposition times number of axial decomposition) small domains including approximately the same number of particles. The axial direction was uniformly decomposed, while the radial direction was non-uniformly decomposed. N RP replicas (copies) of each decomposed domain were used (“particle decomposition”). The hybrid parallelization model of multi-threads and multi-processes was employed: threads were parallelized by the auto-parallelization and N DD-r × N DD-z × N RP processes were parallelized by MPI (message-passing interface). The parallelization performance of Gpic-MHD was investigated for the medium size system of N r × N θ × N z = 1025 × 128 × 128 mesh with 4.196 or 8.192 billion particles. The highest speed for the fixed number of logical cores was obtained for two threads, the maximum number of N DD-z , and optimum combination of N DD-r and N RP . The observed optimum speeds demonstrated good scaling up to 8,192 logical cores. (author)
An efficient implementation of parallel molecular dynamics method on SMP cluster architecture

International Nuclear Information System (INIS)

Suzuki, Masaaki; Okuda, Hiroshi; Yagawa, Genki

2003-01-01

The authors have applied MPI/OpenMP hybrid parallel programming model to parallelize a molecular dynamics (MD) method on a symmetric multiprocessor (SMP) cluster architecture. In that architecture, it can be expected that the hybrid parallel programming model, which uses the message passing library such as MPI for inter-SMP node communication and the loop directive such as OpenMP for intra-SNP node parallelization, is the most effective one. In this study, the parallel performance of the hybrid style has been compared with that of conventional flat parallel programming style, which uses only MPI, both in cases the fast multipole method (FMM) is employed for computing long-distance interactions and that is not employed. The computer environments used here are Hitachi SR8000/MPP placed at the University of Tokyo. The results of calculation are as follows. Without FMM, the parallel efficiency using 16 SMP nodes (128 PEs) is: 90% with the hybrid style, 75% with the flat-MPI style for MD simulation with 33,402 atoms. With FMM, the parallel efficiency using 16 SMP nodes (128 PEs) is: 60% with the hybrid style, 48% with the flat-MPI style for MD simulation with 117,649 atoms. (author)
Development of an MPI benchmark program library

Energy Technology Data Exchange (ETDEWEB)

Uehara, Hitoshi

2001-03-01

Distributed parallel simulation software with message passing interfaces has been developed to realize large-scale and high performance numerical simulations. The most popular API for message communication is an MPI. The MPI will be provided on the Earth Simulator. It is known that performance of message communication using the MPI libraries gives a significant influence on a whole performance of simulation programs. We developed an MPI benchmark program library named MBL in order to measure the performance of message communication precisely. The MBL measures the performance of major MPI functions such as point-to-point communications and collective communications and the performance of major communication patterns which are often found in application programs. In this report, the description of the MBL and the performance analysis of the MPI/SX measured on the SX-4 are presented. (author)
Parallel MCNP Monte Carlo transport calculations with MPI

International Nuclear Information System (INIS)

Wagner, J.C.; Haghighat, A.

1996-01-01

The steady increase in computational performance has made Monte Carlo calculations for large/complex systems possible. However, in order to make these calculations practical, order of magnitude increases in performance are necessary. The Monte Carlo method is inherently parallel (particles are simulated independently) and thus has the potential for near-linear speedup with respect to the number of processors. Further, the ever-increasing accessibility of parallel computers, such as workstation clusters, facilitates the practical use of parallel Monte Carlo. Recognizing the nature of the Monte Carlo method and the trends in available computing, the code developers at Los Alamos National Laboratory implemented the message-passing general-purpose Monte Carlo radiation transport code MCNP (version 4A). The PVM package was chosen by the MCNP code developers because it supports a variety of communication networks, several UNIX platforms, and heterogeneous computer systems. This PVM version of MCNP has been shown to produce speedups that approach the number of processors and thus, is a very useful tool for transport analysis. Due to software incompatibilities on the local IBM SP2, PVM has not been available, and thus it is not possible to take advantage of this useful tool. Hence, it became necessary to implement an alternative message-passing library package into MCNP. Because the message-passing interface (MPI) is supported on the local system, takes advantage of the high-speed communication switches in the SP2, and is considered to be the emerging standard, it was selected
Parallel-Processing Test Bed For Simulation Software

Science.gov (United States)

Blech, Richard; Cole, Gary; Townsend, Scott

1996-01-01

Second-generation Hypercluster computing system is multiprocessor test bed for research on parallel algorithms for simulation in fluid dynamics, electromagnetics, chemistry, and other fields with large computational requirements but relatively low input/output requirements. Built from standard, off-shelf hardware readily upgraded as improved technology becomes available. System used for experiments with such parallel-processing concepts as message-passing algorithms, debugging software tools, and computational steering. First-generation Hypercluster system described in "Hypercluster Parallel Processor" (LEW-15283).
Partition function expansion on region graphs and message-passing equations

International Nuclear Information System (INIS)

Zhou, Haijun; Wang, Chuang; Xiao, Jing-Qing; Bi, Zedong

2011-01-01

Disordered and frustrated graphical systems are ubiquitous in physics, biology, and information science. For models on complete graphs or random graphs, deep understanding has been achieved through the mean-field replica and cavity methods. But finite-dimensional 'real' systems remain very challenging because of the abundance of short loops and strong local correlations. A statistical mechanics theory is constructed in this paper for finite-dimensional models based on the mathematical framework of the partition function expansion and the concept of region graphs. Rigorous expressions for the free energy and grand free energy are derived. Message-passing equations on the region graph, such as belief propagation and survey propagation, are also derived rigorously. (letter)
Processing data communications events by awakening threads in parallel active messaging interface of a parallel computer

Science.gov (United States)

Archer, Charles J.; Blocksome, Michael A.; Ratterman, Joseph D.; Smith, Brian E.

2016-03-15

Processing data communications events in a parallel active messaging interface (`PAMI`) of a parallel computer that includes compute nodes that execute a parallel application, with the PAMI including data communications endpoints, and the endpoints are coupled for data communications through the PAMI and through other data communications resources, including determining by an advance function that there are no actionable data communications events pending for its context, placing by the advance function its thread of execution into a wait state, waiting for a subsequent data communications event for the context; responsive to occurrence of a subsequent data communications event for the context, awakening by the thread from the wait state; and processing by the advance function the subsequent data communications event now pending for the context.

Porting Gravitational Wave Signal Extraction to Parallel Virtual Machine (PVM)

Science.gov (United States)

Thirumalainambi, Rajkumar; Thompson, David E.; Redmon, Jeffery

2009-01-01

Laser Interferometer Space Antenna (LISA) is a planned NASA-ESA mission to be launched around 2012. The Gravitational Wave detection is fundamentally the determination of frequency, source parameters, and waveform amplitude derived in a specific order from the interferometric time-series of the rotating LISA spacecrafts. The LISA Science Team has developed a Mock LISA Data Challenge intended to promote the testing of complicated nested search algorithms to detect the 100-1 millihertz frequency signals at amplitudes of 10E-21. However, it has become clear that, sequential search of the parameters is very time consuming and ultra-sensitive; hence, a new strategy has been developed. Parallelization of existing sequential search algorithms of Gravitational Wave signal identification consists of decomposing sequential search loops, beginning with outermost loops and working inward. In this process, the main challenge is to detect interdependencies among loops and partitioning the loops so as to preserve concurrency. Existing parallel programs are based upon either shared memory or distributed memory paradigms. In PVM, master and node programs are used to execute parallelization and process spawning. The PVM can handle process management and process addressing schemes using a virtual machine configuration. The task scheduling and the messaging and signaling can be implemented efficiently for the LISA Gravitational Wave search process using a master and 6 nodes. This approach is accomplished using a server that is available at NASA Ames Research Center, and has been dedicated to the LISA Data Challenge Competition. Historically, gravitational wave and source identification parameters have taken around 7 days in this dedicated single thread Linux based server. Using PVM approach, the parameter extraction problem can be reduced to within a day. The low frequency computation and a proxy signal-to-noise ratio are calculated in separate nodes that are controlled by the master
Distributed primal–dual interior-point methods for solving tree-structured coupled convex problems using message-passing

DEFF Research Database (Denmark)

Khoshfetrat Pakazad, Sina; Hansson, Anders; Andersen, Martin S.

2017-01-01

In this paper, we propose a distributed algorithm for solving coupled problems with chordal sparsity or an inherent tree structure which relies on primal–dual interior-point methods. We achieve this by distributing the computations at each iteration, using message-passing. In comparison to existi...
Building Blocks for the Rapid Development of Parallel Simulations, Phase I

Data.gov (United States)

National Aeronautics and Space Administration — Scientists need to be able to quickly develop and run parallel simulations without paying the high price of writing low-level message passing codes using compiled...
Static and dynamic load-balancing strategies for parallel reservoir simulation

International Nuclear Information System (INIS)

Anguille, L.; Killough, J.E.; Li, T.M.C.; Toepfer, J.L.

1995-01-01

Accurate simulation of the complex phenomena that occur in flow in porous media can tax even the most powerful serial computers. Emergence of new parallel computer architectures as a future efficient tool in reservoir simulation may overcome this difficulty. Unfortunately, major problems remain to be solved before using parallel computers commercially: production serial programs must be rewritten to be efficient in parallel environments and load balancing methods must be explored to evenly distribute the workload on each processor during the simulation. This study implements both a static load-balancing algorithm and a receiver-initiated dynamic load-sharing algorithm to achieve high parallel efficiencies on both the IBM SP2 and Intel IPSC/860 parallel computers. Significant speedup improvement was recorded for both methods. Further optimization of these algorithms yielded a technique with efficiencies as high as 90% and 70% on 8 and 32 nodes, respectively. The increased performance was the result of the minimization of message-passing overhead
Fencing data transfers in a parallel active messaging interface of a parallel computer

Science.gov (United States)

Blocksome, Michael A.; Mamidala, Amith R.

2015-06-02

Fencing data transfers in a parallel active messaging interface (`PAMI`) of a parallel computer, the PAMI including data communications endpoints, each endpoint including a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task; the compute nodes coupled for data communications through the PAMI and through data communications resources including at least one segment of shared random access memory; including initiating execution through the PAMI of an ordered sequence of active SEND instructions for SEND data transfers between two endpoints, effecting deterministic SEND data transfers through a segment of shared memory; and executing through the PAMI, with no FENCE accounting for SEND data transfers, an active FENCE instruction, the FENCE instruction completing execution only after completion of all SEND instructions initiated prior to execution of the FENCE instruction for SEND data transfers between the two endpoints.
Programs Lucky and LuckyC - 3D parallel transport codes for the multi-group transport equation solution for XYZ geometry by Pm Sn method

International Nuclear Information System (INIS)

Moriakov, A.; Vasyukhno, V.; Netecha, M.; Khacheresov, G.

2003-01-01

Powerful supercomputers are available today. MBC-1000M is one of Russian supercomputers that may be used by distant way access. Programs LUCKY and LUCKY C were created to work for multi-processors systems. These programs have algorithms created especially for these computers and used MPI (message passing interface) service for exchanges between processors. LUCKY may resolved shielding tasks by multigroup discreet ordinate method. LUCKY C may resolve critical tasks by same method. Only XYZ orthogonal geometry is available. Under little space steps to approximate discreet operator this geometry may be used as universal one to describe complex geometrical structures. Cross section libraries are used up to P8 approximation by Legendre polynomials for nuclear data in GIT format. Programming language is Fortran-90. 'Vector' processors may be used that lets get a time profit up to 30 times. But unfortunately MBC-1000M has not these processors. Nevertheless sufficient value for efficiency of parallel calculations was obtained under 'space' (LUCKY) and 'space and energy' (LUCKY C ) paralleling. AUTOCAD program is used to control geometry after a treatment of input data. Programs have powerful geometry module, it is a beautiful tool to achieve any geometry. Output results may be processed by graphic programs on personal computer. (authors)
Plasma Physics Calculations on a Parallel Macintosh Cluster

Science.gov (United States)

Decyk, Viktor; Dauger, Dean; Kokelaar, Pieter

2000-03-01

We have constructed a parallel cluster consisting of 16 Apple Macintosh G3 computers running the MacOS, and achieved very good performance on numerically intensive, parallel plasma particle-in-cell simulations. A subset of the MPI message-passing library was implemented in Fortran77 and C. This library enabled us to port code, without modification, from other parallel processors to the Macintosh cluster. For large problems where message packets are large and relatively few in number, performance of 50-150 MFlops/node is possible, depending on the problem. This is fast enough that 3D calculations can be routinely done. Unlike Unix-based clusters, no special expertise in operating systems is required to build and run the cluster. Full details are available on our web site: http://exodus.physics.ucla.edu/appleseed/.
Programming Models in HPC

Energy Technology Data Exchange (ETDEWEB)

Shipman, Galen M. [Los Alamos National Lab. (LANL), Los Alamos, NM (United States)

2016-06-13

These are the slides for a presentation on programming models in HPC, at the Los Alamos National Laboratory's Parallel Computing Summer School. The following topics are covered: Flynn's Taxonomy of computer architectures; single instruction single data; single instruction multiple data; multiple instruction multiple data; address space organization; definition of Trinity (Intel Xeon-Phi is a MIMD architecture); single program multiple data; multiple program multiple data; ExMatEx workflow overview; definition of a programming model, programming languages, runtime systems; programming model and environments; MPI (Message Passing Interface); OpenMP; Kokkos (Performance Portable Thread-Parallel Programming Model); Kokkos abstractions, patterns, policies, and spaces; RAJA, a systematic approach to node-level portability and tuning; overview of the Legion Programming Model; mapping tasks and data to hardware resources; interoperability: supporting task-level models; Legion S3D execution and performance details; workflow, integration of external resources into the programming model.
Parallel Programming with Intel Parallel Studio XE

CERN Document Server

Blair-Chappell , Stephen

2012-01-01

Optimize code for multi-core processors with Intel's Parallel Studio Parallel programming is rapidly becoming a "must-know" skill for developers. Yet, where to start? This teach-yourself tutorial is an ideal starting point for developers who already know Windows C and C++ and are eager to add parallelism to their code. With a focus on applying tools, techniques, and language extensions to implement parallelism, this essential resource teaches you how to write programs for multicore and leverage the power of multicore in your programs. Sharing hands-on case studies and real-world examples, the
Parallel algorithms for numerical linear algebra

CERN Document Server

van der Vorst, H

1990-01-01

This is the first in a new series of books presenting research results and developments concerning the theory and applications of parallel computers, including vector, pipeline, array, fifth/future generation computers, and neural computers.All aspects of high-speed computing fall within the scope of the series, e.g. algorithm design, applications, software engineering, networking, taxonomy, models and architectural trends, performance, peripheral devices.Papers in Volume One cover the main streams of parallel linear algebra: systolic array algorithms, message-passing systems, algorithms for p
Parallelization of simulation code for liquid-gas model of lattice-gas fluid

International Nuclear Information System (INIS)

Kawai, Wataru; Ebihara, Kenichi; Kume, Etsuo; Watanabe, Tadashi

2000-03-01

A simulation code for hydrodynamical phenomena which is based on the liquid-gas model of lattice-gas fluid is parallelized by using MPI (Message Passing Interface) library. The parallelized code can be applied to the larger size of the simulations than the non-parallelized code. The calculation times of the parallelized code on VPP500 (Vector-Parallel super computer with dispersed memory units), AP3000 (Scalar-parallel server with dispersed memory units), and a workstation cluster decreased in inverse proportion to the number of processors. (author)
Microstructure and mechanical properties of AZ91 tubes fabricated by Multi-pass Parallel Tubular Channel Angular Pressing

OpenAIRE

Hooman Abdolvand; Ghader Faraji; Javad Shahbazi Karami

2017-01-01

Parallel Tubular Channel Angular Pressing (PTCAP) process is a novel recently developed severe plastic deformation (SPD) method for producing ultrafine grained (UFG) and nanograined (NG) tubular specimens with excellent mechanical and physical properties. This process has several advantageous compared to its TCAP counterparts. In this paper, a fine grained AZ91 tube was fabricated via multi pass parallel tubular channel angular pressing (PTCAP) process. Tubes were processed up to three passes...
CALTRANS: A parallel, deterministic, 3D neutronics code

Energy Technology Data Exchange (ETDEWEB)

Carson, L.; Ferguson, J.; Rogers, J.

1994-04-01

Our efforts to parallelize the deterministic solution of the neutron transport equation has culminated in a new neutronics code CALTRANS, which has full 3D capability. In this article, we describe the layout and algorithms of CALTRANS and present performance measurements of the code on a variety of platforms. Explicit implementation of the parallel algorithms of CALTRANS using both the function calls of the Parallel Virtual Machine software package (PVM 3.2) and the Meiko CS-2 tagged message passing library (based on the Intel NX/2 interface) are provided in appendices.
Performance of a Sequential and Parallel Computational Fluid Dynamic (CFD) Solver on a Missile Body Configuration

National Research Council Canada - National Science Library

Hisley, Dixie

1999-01-01

.... The goals of this report are: (1) to investigate the performance of message passing and loop level parallelization techniques, as they were implemented in the computational fluid dynamics (CFD...
Design considerations for parallel graphics libraries

Science.gov (United States)

Crockett, Thomas W.

1994-01-01

Applications which run on parallel supercomputers are often characterized by massive datasets. Converting these vast collections of numbers to visual form has proven to be a powerful aid to comprehension. For a variety of reasons, it may be desirable to provide this visual feedback at runtime. One way to accomplish this is to exploit the available parallelism to perform graphics operations in place. In order to do this, we need appropriate parallel rendering algorithms and library interfaces. This paper provides a tutorial introduction to some of the issues which arise in designing parallel graphics libraries and their underlying rendering algorithms. The focus is on polygon rendering for distributed memory message-passing systems. We illustrate our discussion with examples from PGL, a parallel graphics library which has been developed on the Intel family of parallel systems.
Automatic Migration from PARMACS to MPI in Parallel Fortran Applications

Directory of Open Access Journals (Sweden)

Rolf Hempel

1999-01-01

Full Text Available The PARMACS message passing interface has been in widespread use by application projects, especially in Europe. With the new MPI standard for message passing, many projects face the problem of replacing PARMACS with MPI. An automatic translation tool has been developed which replaces all PARMACS 6.0 calls in an application program with their corresponding MPI calls. In this paper we describe the mapping of the PARMACS programming model onto MPI. We then present some implementation details of the converter tool.
Parallel sparse direct solvers for Poisson's equation in streamer discharges

NARCIS (Netherlands)

M. Nool (Margreet); M. Genseberger (Menno); U. M. Ebert (Ute)

2017-01-01

textabstractThe aim of this paper is to examine whether a hybrid approach of parallel computing, a combination of the message passing model (MPI) with the threads model (OpenMP) can deliver good performance in streamer discharge simulations. Since one of the bottlenecks of almost all streamer
Administering truncated receive functions in a parallel messaging interface

Science.gov (United States)

Archer, Charles J; Blocksome, Michael A; Ratterman, Joseph D; Smith, Brian E

2014-12-09

Administering truncated receive functions in a parallel messaging interface (`PMI`) of a parallel computer comprising a plurality of compute nodes coupled for data communications through the PMI and through a data communications network, including: sending, through the PMI on a source compute node, a quantity of data from the source compute node to a destination compute node; specifying, by an application on the destination compute node, a portion of the quantity of data to be received by the application on the destination compute node and a portion of the quantity of data to be discarded; receiving, by the PMI on the destination compute node, all of the quantity of data; providing, by the PMI on the destination compute node to the application on the destination compute node, only the portion of the quantity of data to be received by the application; and discarding, by the PMI on the destination compute node, the portion of the quantity of data to be discarded.
Extensions to the Parallel Real-Time Artificial Intelligence System (PRAIS) for fault-tolerant heterogeneous cycle-stealing reasoning

Science.gov (United States)

Goldstein, David

1991-01-01

Extensions to an architecture for real-time, distributed (parallel) knowledge-based systems called the Parallel Real-time Artificial Intelligence System (PRAIS) are discussed. PRAIS strives for transparently parallelizing production (rule-based) systems, even under real-time constraints. PRAIS accomplished these goals (presented at the first annual C Language Integrated Production System (CLIPS) conference) by incorporating a dynamic task scheduler, operating system extensions for fact handling, and message-passing among multiple copies of CLIPS executing on a virtual blackboard. This distributed knowledge-based system tool uses the portability of CLIPS and common message-passing protocols to operate over a heterogeneous network of processors. Results using the original PRAIS architecture over a network of Sun 3's, Sun 4's and VAX's are presented. Mechanisms using the producer-consumer model to extend the architecture for fault-tolerance and distributed truth maintenance initiation are also discussed.
High-Level Management of Communication Schedules in HPF-like Languages

National Research Council Canada - National Science Library

Benkner, Siegfried

1997-01-01

..., providing the users with a high-level language interface for programming scalable parallel architectures and delegating to the compiler the task of producing an explicitly parallel message-passing program...

Optimisation of a parallel ocean general circulation model

Science.gov (United States)

Beare, M. I.; Stevens, D. P.

1997-10-01

This paper presents the development of a general-purpose parallel ocean circulation model, for use on a wide range of computer platforms, from traditional scalar machines to workstation clusters and massively parallel processors. Parallelism is provided, as a modular option, via high-level message-passing routines, thus hiding the technical intricacies from the user. An initial implementation highlights that the parallel efficiency of the model is adversely affected by a number of factors, for which optimisations are discussed and implemented. The resulting ocean code is portable and, in particular, allows science to be achieved on local workstations that could otherwise only be undertaken on state-of-the-art supercomputers.
A PARALLEL EXTENSION OF THE UAL ENVIRONMENT

International Nuclear Information System (INIS)

MALITSKY, N.; SHISHLO, A.

2001-01-01

The deployment of the Unified Accelerator Library (UAL) environment on the parallel cluster is presented. The approach is based on the Message-Passing Interface (MPI) library and the Perl adapter that allows one to control and mix together the existing conventional UAL components with the new MPI-based parallel extensions. In the paper, we provide timing results and describe the application of the new environment to the SNS Ring complex beam dynamics studies, particularly, simulations of several physical effects, such as space charge, field errors, fringe fields, and others
Kemari: A Portable High Performance Fortran System for Distributed Memory Parallel Processors

Directory of Open Access Journals (Sweden)

T. Kamachi

1997-01-01

Full Text Available We have developed a compilation system which extends High Performance Fortran (HPF in various aspects. We support the parallelization of well-structured problems with loop distribution and alignment directives similar to HPF's data distribution directives. Such directives give both additional control to the user and simplify the compilation process. For the support of unstructured problems, we provide directives for dynamic data distribution through user-defined mappings. The compiler also allows integration of message-passing interface (MPI primitives. The system is part of a complete programming environment which also comprises a parallel debugger and a performance monitor and analyzer. After an overview of the compiler, we describe the language extensions and related compilation mechanisms in detail. Performance measurements demonstrate the compiler's applicability to a variety of application classes.
Relationship between internal medicine program board examination pass rates, accreditation standards, and program size.

Science.gov (United States)

Falcone, John L; Gonzalo, Jed D

2014-01-19

To determine Internal Medicine residency program compliance with the Accreditation Council for Graduate Medical Education 80% pass-rate standard and the correlation between residency program size and performance on the American Board of Internal Medicine Certifying Examination. Using a cross-sectional study design from 2010-2012 American Board of Internal Medicine Certifying Examination data of all Internal Medicine residency pro-grams, comparisons were made between program pass rates to the Accreditation Council for Graduate Medical Education pass-rate standard. To assess the correlation between program size and performance, a Spearman's rho was calculated. To evaluate program size and its relationship to the pass-rate standard, receiver operative characteristic curves were calculated. Of 372 Internal Medicine residency programs, 276 programs (74%) achieved a pass rate of =80%, surpassing the Accreditation Council for Graduate Medical Education minimum standard. A weak correlation was found between residency program size and pass rate for the three-year period (p=0.19, pInternal Medicine residency programs complied with Accreditation Council for Graduate Medical Education pass-rate standards, a quarter of the programs failed to meet this requirement. Program size is positively but weakly associated with American Board of Internal Medicine Certifying Examination performance, suggesting other unidentified variables significantly contribute to program performance.
Parallelization of applications for networks with homogeneous and heterogeneous processors; Parallelisation d`applications pour des reseaux de processeurs homogenes ou heterogenes

Energy Technology Data Exchange (ETDEWEB)

Colombet, L

1994-10-07

The aim of this thesis is to study and develop efficient methods for parallelization of scientific applications on parallel computers with distributed memory. The first part presents two libraries of PVM (Parallel Virtual Machine) and MPI (Message Passing Interface) communication tools. They allow implementation of programs on most parallel machines, but also on heterogeneous computer networks. This chapter illustrates the problems faced when trying to evaluate performances of networks with heterogeneous processors. To evaluate such performances, the concepts of speed-up and efficiency have been modified and adapted to account for heterogeneity. The second part deals with a study of parallel application libraries such as ScaLAPACK and with the development of communication masking techniques. The general concept is based on communication anticipation, in particular by pipelining message sending operations. Experimental results on Cray T3D and IBM SP1 machines validates the theoretical studies performed on basic algorithms of the libraries discussed above. Two examples of scientific applications are given: the first is a model of young stars for astrophysics and the other is a model of photon trajectories in the Compton effect. (J.S.). 83 refs., 65 figs., 24 tabs.
The CRAFT Fortran Programming Model

Directory of Open Access Journals (Sweden)

Douglas M. Pase

1994-01-01

Full Text Available Many programming models for massively parallel machines exist, and each has its advantages and disadvantages. In this article we present a programming model that combines features from other programming models that (1 can be efficiently implemented on present and future Cray Research massively parallel processor (MPP systems and (2 are useful in constructing highly parallel programs. The model supports several styles of programming: message-passing, data parallel, global address (shared data, and work-sharing. These styles may be combined within the same program. The model includes features that allow a user to define a program in terms of the behavior of the system as a whole, where the behavior of individual tasks is implicit from this systemic definition. (In general, features marked as shared are designed to support this perspective. It also supports an opposite perspective, where a program may be defined in terms of the behaviors of individual tasks, and a program is implicitly the sum of the behaviors of all tasks. (Features marked as private are designed to support this perspective. Users can exploit any combination of either set of features without ambiguity and thus are free to define a program from whatever perspective is most appropriate to the problem at hand.
Parallel Framework for Cooperative Processes

Directory of Open Access Journals (Sweden)

Mitică Craus

2005-01-01

Full Text Available This paper describes the work of an object oriented framework designed to be used in the parallelization of a set of related algorithms. The idea behind the system we are describing is to have a re-usable framework for running several sequential algorithms in a parallel environment. The algorithms that the framework can be used with have several things in common: they have to run in cycles and the work should be possible to be split between several "processing units". The parallel framework uses the message-passing communication paradigm and is organized as a master-slave system. Two applications are presented: an Ant Colony Optimization (ACO parallel algorithm for the Travelling Salesman Problem (TSP and an Image Processing (IP parallel algorithm for the Symmetrical Neighborhood Filter (SNF. The implementations of these applications by means of the parallel framework prove to have good performances: approximatively linear speedup and low communication cost.
High-performance parallel approaches for three-dimensional light detection and ranging point clouds gridding

Science.gov (United States)

Rizki, Permata Nur Miftahur; Lee, Heezin; Lee, Minsu; Oh, Sangyoon

2017-01-01

With the rapid advance of remote sensing technology, the amount of three-dimensional point-cloud data has increased extraordinarily, requiring faster processing in the construction of digital elevation models. There have been several attempts to accelerate the computation using parallel methods; however, little attention has been given to investigating different approaches for selecting the most suited parallel programming model for a given computing environment. We present our findings and insights identified by implementing three popular high-performance parallel approaches (message passing interface, MapReduce, and GPGPU) on time demanding but accurate kriging interpolation. The performances of the approaches are compared by varying the size of the grid and input data. In our empirical experiment, we demonstrate the significant acceleration by all three approaches compared to a C-implemented sequential-processing method. In addition, we also discuss the pros and cons of each method in terms of usability, complexity infrastructure, and platform limitation to give readers a better understanding of utilizing those parallel approaches for gridding purposes.
Optimisation of a parallel ocean general circulation model

Directory of Open Access Journals (Sweden)

M. I. Beare

1997-10-01

Full Text Available This paper presents the development of a general-purpose parallel ocean circulation model, for use on a wide range of computer platforms, from traditional scalar machines to workstation clusters and massively parallel processors. Parallelism is provided, as a modular option, via high-level message-passing routines, thus hiding the technical intricacies from the user. An initial implementation highlights that the parallel efficiency of the model is adversely affected by a number of factors, for which optimisations are discussed and implemented. The resulting ocean code is portable and, in particular, allows science to be achieved on local workstations that could otherwise only be undertaken on state-of-the-art supercomputers.
Optimisation of a parallel ocean general circulation model

Directory of Open Access Journals (Sweden)

M. I. Beare

Full Text Available This paper presents the development of a general-purpose parallel ocean circulation model, for use on a wide range of computer platforms, from traditional scalar machines to workstation clusters and massively parallel processors. Parallelism is provided, as a modular option, via high-level message-passing routines, thus hiding the technical intricacies from the user. An initial implementation highlights that the parallel efficiency of the model is adversely affected by a number of factors, for which optimisations are discussed and implemented. The resulting ocean code is portable and, in particular, allows science to be achieved on local workstations that could otherwise only be undertaken on state-of-the-art supercomputers.
Fencing direct memory access data transfers in a parallel active messaging interface of a parallel computer

Science.gov (United States)

Blocksome, Michael A.; Mamidala, Amith R.

2013-09-03

Fencing direct memory access (`DMA`) data transfers in a parallel active messaging interface (`PAMI`) of a parallel computer, the PAMI including data communications endpoints, each endpoint including specifications of a client, a context, and a task, the endpoints coupled for data communications through the PAMI and through DMA controllers operatively coupled to segments of shared random access memory through which the DMA controllers deliver data communications deterministically, including initiating execution through the PAMI of an ordered sequence of active DMA instructions for DMA data transfers between two endpoints, effecting deterministic DMA data transfers through a DMA controller and a segment of shared memory; and executing through the PAMI, with no FENCE accounting for DMA data transfers, an active FENCE instruction, the FENCE instruction completing execution only after completion of all DMA instructions initiated prior to execution of the FENCE instruction for DMA data transfers between the two endpoints.
Unpacking the performance of a mobile health information messaging program for mothers (MomConnect) in South Africa: evidence on program reach and messaging exposure.

Science.gov (United States)

LeFevre, Amnesty E; Dane, Pierre; Copley, Charles J; Pienaar, Cara; Parsons, Annie Neo; Engelhard, Matt; Woods, David; Bekker, Marcha; Benjamin, Peter; Pillay, Yogan; Barron, Peter; Seebregts, Christopher John; Mohan, Diwakar

2018-01-01

Despite calls to address broader evidence gaps in linking digital technologies to outcome and impact level health indicators, limited attention has been paid to measuring processes pertaining to the performance of programs. In this paper, we assess the program reach and message exposure of a mobile health information messaging program for mothers (MomConnect) in South Africa. In this descriptive study, we draw from system generated data to measure exposure to the program through registration attempts and conversions, message delivery, opt-outs and drop-outs. Using a logit model, we additionally explore determinants for early registration, opt-outs and drop-outs. From August 2014 to April 2017, 1 159 431 women were registered to MomConnect; corresponding to half of women attending antenatal care 1 (ANC1) and nearly 60% of those attending ANC1 estimated to own a mobile phone. In 2016, 26% of registrations started to get women onto MomConnect did not succeed. If registration attempts were converted to successful registrations, coverage of ANC1 attendees would have been 74% in 2016 and 86% in 2017. When considered as percentage of ANC1 attendees with access to a mobile phone, addressing conversion challenges bring registration coverage to an estimated 83%-89% in 2016 and 97%-100% in 2017. Among women registered, nearly 80% of expected short messaging service messages were received. While registration coverage and message delivery success rates exceed those observed for mobile messaging programs elsewhere, study findings highlight opportunities for program improvement and reinforce the need for rigorous and continuous monitoring of delivery systems.
Can rare SAT formulae be easily recognized? On the efficiency of message-passing algorithms for K-SAT at large clause-to-variable ratios

International Nuclear Information System (INIS)

Altarelli, Fabrizio; Monasson, Remi; Zamponi, Francesco

2007-01-01

For large clause-to-variable ratios, typical K-SAT instances drawn from the uniform distribution have no solution. We argue, based on statistical mechanics calculations using the replica and cavity methods, that rare satisfiable instances from the uniform distribution are very similar to typical instances drawn from the so-called planted distribution, where instances are chosen uniformly between the ones that admit a given solution. It then follows, from a recent article by Feige, Mossel and Vilenchik (2006 Complete convergence of message passing algorithms for some satisfiability problems Proc. Random 2006 pp 339-50), that these rare instances can be easily recognized (in O(log N) time and with probability close to 1) by a simple message-passing algorithm
Performance studies of the parallel VIM code

International Nuclear Information System (INIS)

Shi, B.; Blomquist, R.N.

1996-01-01

In this paper, the authors evaluate the performance of the parallel version of the VIM Monte Carlo code on the IBM SPx at the High Performance Computing Research Facility at ANL. Three test problems with contrasting computational characteristics were used to assess effects in performance. A statistical method for estimating the inefficiencies due to load imbalance and communication is also introduced. VIM is a large scale continuous energy Monte Carlo radiation transport program and was parallelized using history partitioning, the master/worker approach, and p4 message passing library. Dynamic load balancing is accomplished when the master processor assigns chunks of histories to workers that have completed a previously assigned task, accommodating variations in the lengths of histories, processor speeds, and worker loads. At the end of each batch (generation), the fission sites and tallies are sent from each worker to the master process, contributing to the parallel inefficiency. All communications are between master and workers, and are serial. The SPx is a scalable 128-node parallel supercomputer with high-performance Omega switches of 63 microsec latency and 35 MBytes/sec bandwidth. For uniform and reproducible performance, they used only the 120 identical regular processors (IBM RS/6000) and excluded the remaining eight planet nodes, which may be loaded by other's jobs
Real-time trajectory optimization on parallel processors

Science.gov (United States)

Psiaki, Mark L.

1993-01-01

A parallel algorithm has been developed for rapidly solving trajectory optimization problems. The goal of the work has been to develop an algorithm that is suitable to do real-time, on-line optimal guidance through repeated solution of a trajectory optimization problem. The algorithm has been developed on an INTEL iPSC/860 message passing parallel processor. It uses a zero-order-hold discretization of a continuous-time problem and solves the resulting nonlinear programming problem using a custom-designed augmented Lagrangian nonlinear programming algorithm. The algorithm achieves parallelism of function, derivative, and search direction calculations through the principle of domain decomposition applied along the time axis. It has been encoded and tested on 3 example problems, the Goddard problem, the acceleration-limited, planar minimum-time to the origin problem, and a National Aerospace Plane minimum-fuel ascent guidance problem. Execution times as fast as 118 sec of wall clock time have been achieved for a 128-stage Goddard problem solved on 32 processors. A 32-stage minimum-time problem has been solved in 151 sec on 32 processors. A 32-stage National Aerospace Plane problem required 2 hours when solved on 32 processors. A speed-up factor of 7.2 has been achieved by using 32-nodes instead of 1-node to solve a 64-stage Goddard problem.
Writing parallel programs that work

CERN Multimedia

CERN. Geneva

2012-01-01

Serial algorithms typically run inefficiently on parallel machines. This may sound like an obvious statement, but it is the root cause of why parallel programming is considered to be difficult. The current state of the computer industry is still that almost all programs in existence are serial. This talk will describe the techniques used in the Intel Parallel Studio to provide a developer with the tools necessary to understand the behaviors and limitations of the existing serial programs. Once the limitations are known the developer can refactor the algorithms and reanalyze the resulting programs with the tools in the Intel Parallel Studio to create parallel programs that work. About the speaker Paul Petersen is a Sr. Principal Engineer in the Software and Solutions Group (SSG) at Intel. He received a Ph.D. degree in Computer Science from the University of Illinois in 1993. After UIUC, he was employed at Kuck and Associates, Inc. (KAI) working on auto-parallelizing compiler (KAP), and was involved in th...
Same-source parallel implementation of the PSU/NCAR MM5

Energy Technology Data Exchange (ETDEWEB)

Michalakes, J.

1997-12-31

The Pennsylvania State/National Center for Atmospheric Research Mesoscale Model is a limited-area model of atmospheric systems, now in its fifth generation, MM5. Designed and maintained for vector and shared-memory parallel architectures, the official version of MM5 does not run on message-passing distributed memory (DM) parallel computers. The authors describe a same-source parallel implementation of the PSU/NCAR MM5 using FLIC, the Fortran Loop and Index Converter. The resulting source is nearly line-for-line identical with the original source code. The result is an efficient distributed memory parallel option to MM5 that can be seamlessly integrated into the official version.
A Parallel Encryption Algorithm Based on Piecewise Linear Chaotic Map

Directory of Open Access Journals (Sweden)

Xizhong Wang

2013-01-01

Full Text Available We introduce a parallel chaos-based encryption algorithm for taking advantage of multicore processors. The chaotic cryptosystem is generated by the piecewise linear chaotic map (PWLCM. The parallel algorithm is designed with a master/slave communication model with the Message Passing Interface (MPI. The algorithm is suitable not only for multicore processors but also for the single-processor architecture. The experimental results show that the chaos-based cryptosystem possesses good statistical properties. The parallel algorithm provides much better performance than the serial ones and would be useful to apply in encryption/decryption file with large size or multimedia.
User-Defined Data Distributions in High-Level Programming Languages

Science.gov (United States)

Diaconescu, Roxana E.; Zima, Hans P.

2006-01-01

One of the characteristic features of today s high performance computing systems is a physically distributed memory. Efficient management of locality is essential for meeting key performance requirements for these architectures. The standard technique for dealing with this issue has involved the extension of traditional sequential programming languages with explicit message passing, in the context of a processor-centric view of parallel computation. This has resulted in complex and error-prone assembly-style codes in which algorithms and communication are inextricably interwoven. This paper presents a high-level approach to the design and implementation of data distributions. Our work is motivated by the need to improve the current parallel programming methodology by introducing a paradigm supporting the development of efficient and reusable parallel code. This approach is currently being implemented in the context of a new programming language called Chapel, which is designed in the HPCS project Cascade.
What it Takes to Get Passed On: Message Content, Style, and Structure as Predictors of Retransmission in the Boston Marathon Bombing Response.

Directory of Open Access Journals (Sweden)

Jeannette Sutton

Full Text Available Message retransmission is a central aspect of information diffusion. In a disaster context, the passing on of official warning messages by members of the public also serves as a behavioral indicator of message salience, suggesting that particular messages are (or are not perceived by the public to be both noteworthy and valuable enough to share with others. This study provides the first examination of terse message retransmission of official warning messages in response to a domestic terrorist attack, the Boston Marathon Bombing in 2013. Using messages posted from public officials' Twitter accounts that were active during the period of the Boston Marathon bombing and manhunt, we examine the features of messages that are associated with their retransmission. We focus on message content, style, and structure, as well as the networked relationships of message senders to answer the question: what characteristics of a terse message sent under conditions of imminent threat predict its retransmission among members of the public? We employ a negative binomial model to examine how message characteristics affect message retransmission. We find that, rather than any single effect dominating the process, retransmission of official Tweets during the Boston bombing response was jointly influenced by various message content, style, and sender characteristics. These findings suggest the need for more work that investigates impact of multiple factors on the allocation of attention and on message retransmission during hazard events.

Statistical Physics, Optimization, Inference, and Message-Passing Algorithms : Lecture Notes of the Les Houches School of Physics : Special Issue, October 2013

CERN Document Server

Ricci-Tersenghi, Federico; Zdeborova, Lenka; Zecchina, Riccardo; Tramel, Eric W; Cugliandolo, Leticia F

2015-01-01

This book contains a collection of the presentations that were given in October 2013 at the Les Houches Autumn School on statistical physics, optimization, inference, and message-passing algorithms. In the last decade, there has been increasing convergence of interest and methods between theoretical physics and fields as diverse as probability, machine learning, optimization, and inference problems. In particular, much theoretical and applied work in statistical physics and computer science has relied on the use of message-passing algorithms and their connection to the statistical physics of glasses and spin glasses. For example, both the replica and cavity methods have led to recent advances in compressed sensing, sparse estimation, and random constraint satisfaction, to name a few. This book’s detailed pedagogical lectures on statistical inference, computational complexity, the replica and cavity methods, and belief propagation are aimed particularly at PhD students, post-docs, and young researchers desir...
Processing communications events in parallel active messaging interface by awakening thread from wait state

Science.gov (United States)

Archer, Charles J; Blocksome, Michael A; Ratterman, Joseph D; Smith, Brian E

2013-10-22

Processing data communications events in a parallel active messaging interface (`PAMI`) of a parallel computer that includes compute nodes that execute a parallel application, with the PAMI including data communications endpoints, and the endpoints are coupled for data communications through the PAMI and through other data communications resources, including determining by an advance function that there are no actionable data communications events pending for its context, placing by the advance function its thread of execution into a wait state, waiting for a subsequent data communications event for the context; responsive to occurrence of a subsequent data communications event for the context, awakening by the thread from the wait state; and processing by the advance function the subsequent data communications event now pending for the context.
Fencing network direct memory access data transfers in a parallel active messaging interface of a parallel computer

Science.gov (United States)

Blocksome, Michael A.; Mamidala, Amith R.

2015-07-07

Fencing direct memory access (`DMA`) data transfers in a parallel active messaging interface (`PAMI`) of a parallel computer, the PAMI including data communications endpoints, each endpoint including specifications of a client, a context, and a task, the endpoints coupled for data communications through the PAMI and through DMA controllers operatively coupled to a deterministic data communications network through which the DMA controllers deliver data communications deterministically, including initiating execution through the PAMI of an ordered sequence of active DMA instructions for DMA data transfers between two endpoints, effecting deterministic DMA data transfers through a DMA controller and the deterministic data communications network; and executing through the PAMI, with no FENCE accounting for DMA data transfers, an active FENCE instruction, the FENCE instruction completing execution only after completion of all DMA instructions initiated prior to execution of the FENCE instruction for DMA data transfers between the two endpoints.
Parallel programming with Easy Java Simulations

Science.gov (United States)

Esquembre, F.; Christian, W.; Belloni, M.

2018-01-01

Nearly all of today's processors are multicore, and ideally programming and algorithm development utilizing the entire processor should be introduced early in the computational physics curriculum. Parallel programming is often not introduced because it requires a new programming environment and uses constructs that are unfamiliar to many teachers. We describe how we decrease the barrier to parallel programming by using a java-based programming environment to treat problems in the usual undergraduate curriculum. We use the easy java simulations programming and authoring tool to create the program's graphical user interface together with objects based on those developed by Kaminsky [Building Parallel Programs (Course Technology, Boston, 2010)] to handle common parallel programming tasks. Shared-memory parallel implementations of physics problems, such as time evolution of the Schrödinger equation, are available as source code and as ready-to-run programs from the AAPT-ComPADRE digital library.
Towards a HPC-oriented parallel implementation of a learning algorithm for bioinformatics applications.

Science.gov (United States)

D'Angelo, Gianni; Rampone, Salvatore

2014-01-01

the communications between different memories (RAM, Cache, Mass, Virtual) and to achieve efficient I/O performance, we design a mass storage structure able to access its data with a high degree of temporal and spatial locality. Then we develop a parallel implementation of the algorithm. We model it as a SPMD system together to a Message-Passing Programming Paradigm. Here, we adopt the high-level message-passing systems MPI (Message Passing Interface) in the version for the Java programming language, MPJ. The parallel processing is organized into four stages: partitioning, communication, agglomeration and mapping. The decomposition of the U-BRAIN algorithm determines the necessity of a communication protocol design among the processors involved. Efficient synchronization design is also discussed. In the context of a collaboration between public and private institutions, the parallel model of U-BRAIN has been implemented and tested on the INTEL XEON E7xxx and E5xxx family of the CRESCO structure of Italian National Agency for New Technologies, Energy and Sustainable Economic Development (ENEA), developed in the framework of the European Grid Infrastructure (EGI), a series of efforts to provide access to high-throughput computing resources across Europe using grid computing techniques. The implementation is able to minimize both the memory space and the execution time. The test data used in this study are IPDATA (Irvine Primate splice- junction DATA set), a subset of HS3D (Homo Sapiens Splice Sites Dataset) and a subset of COSMIC (the Catalogue of Somatic Mutations in Cancer). The execution time and the speed-up on IPDATA reach the best values within about 90 processors. Then the parallelization advantage is balanced by the greater cost of non-local communications between the processors. A similar behaviour is evident on HS3D, but at a greater number of processors, so evidencing the direct relationship between data size and parallelization gain. This behaviour is
Numerical studies on the interaction between atmosphere and ocean using different kinds of parallel computers

International Nuclear Information System (INIS)

Lee, Soon-Hwan; Chino, Masamichi

2000-01-01

The coupling between atmosphere and ocean model has physical and computational difficulties for short-term forecasting of weather and ocean current. In this research, a combination system between high-resolution meso-scale atmospheric model and ocean model has been constructed using a new message-passing library, called Stampi (Seamless Thinking Aid Message Passing Interface), for prediction of particle dispersion at emergency nuclear accident. Stampi, which is based on the MPI (Message Passing Interface) 2 specification, makes us carry out parallel calculations of combination system without parallelization skill to model code. And it realizes dynamic process creation on different machines and communication between spawned one within the scope of MPI semantics. The models included in this combination system are PHYSIC as an atmosphere model, and POM (Princeton Ocean Model) as an ocean model. We applied this combination system to predict sea surface current at Sea of Japan in winter season. Simulation results indicate that the wind stress near the sea surface tends to be a predominant factor to determine surface ocean currents and dispersion of radioactive contamination in the ocean. The surface ocean current is well correspondent with wind direction, induced by high mountains at North Korea. The satellite data of NSCAT (NASA-SCATterometer), which is an image of sea surface current, also agrees well with the results of this system. (author)
Peformance Tuning and Evaluation of a Parallel Community Climate Model

Energy Technology Data Exchange (ETDEWEB)

Drake, J.B.; Worley, P.H.; Hammond, S.

1999-11-13

The Parallel Community Climate Model (PCCM) is a message-passing parallelization of version 2.1 of the Community Climate Model (CCM) developed by researchers at Argonne and Oak Ridge National Laboratories and at the National Center for Atmospheric Research in the early to mid 1990s. In preparation for use in the Department of Energy's Parallel Climate Model (PCM), PCCM has recently been updated with new physics routines from version 3.2 of the CCM, improvements to the parallel implementation, and ports to the SGIKray Research T3E and Origin 2000. We describe our experience in porting and tuning PCCM on these new platforms, evaluating the performance of different parallel algorithm options and comparing performance between the T3E and Origin 2000.
Differences Between Distributed and Parallel Systems

Energy Technology Data Exchange (ETDEWEB)

Brightwell, R.; Maccabe, A.B.; Rissen, R.

1998-10-01

Distributed systems have been studied for twenty years and are now coming into wider use as fast networks and powerful workstations become more readily available. In many respects a massively parallel computer resembles a network of workstations and it is tempting to port a distributed operating system to such a machine. However, there are significant differences between these two environments and a parallel operating system is needed to get the best performance out of a massively parallel system. This report characterizes the differences between distributed systems, networks of workstations, and massively parallel systems and analyzes the impact of these differences on operating system design. In the second part of the report, we introduce Puma, an operating system specifically developed for massively parallel systems. We describe Puma portals, the basic building blocks for message passing paradigms implemented on top of Puma, and show how the differences observed in the first part of the report have influenced the design and implementation of Puma.
Refinement of Parallel and Reactive Programs

OpenAIRE

Back, R. J. R.

1992-01-01

We show how to apply the refinement calculus to stepwise refinement of parallel and reactive programs. We use action systems as our basic program model. Action systems are sequential programs which can be implemented in a parallel fashion. Hence refinement calculus methods, originally developed for sequential programs, carry over to the derivation of parallel programs. Refinement of reactive programs is handled by data refinement techniques originally developed for the sequential refinement c...
Portable Parallel Programming for the Dynamic Load Balancing of Unstructured Grid Applications

Science.gov (United States)

Biswas, Rupak; Das, Sajal K.; Harvey, Daniel; Oliker, Leonid

1999-01-01

The ability to dynamically adapt an unstructured -rid (or mesh) is a powerful tool for solving computational problems with evolving physical features; however, an efficient parallel implementation is rather difficult, particularly from the view point of portability on various multiprocessor platforms We address this problem by developing PLUM, tin automatic anti architecture-independent framework for adaptive numerical computations in a message-passing environment. Portability is demonstrated by comparing performance on an SP2, an Origin2000, and a T3E, without any code modifications. We also present a general-purpose load balancer that utilizes symmetric broadcast networks (SBN) as the underlying communication pattern, with a goal to providing a global view of system loads across processors. Experiments on, an SP2 and an Origin2000 demonstrate the portability of our approach which achieves superb load balance at the cost of minimal extra overhead.
Object-Oriented Parallel Particle-in-Cell Code for Beam Dynamics Simulation in Linear Accelerators

International Nuclear Information System (INIS)

Qiang, J.; Ryne, R.D.; Habib, S.; Decky, V.

1999-01-01

In this paper, we present an object-oriented three-dimensional parallel particle-in-cell code for beam dynamics simulation in linear accelerators. A two-dimensional parallel domain decomposition approach is employed within a message passing programming paradigm along with a dynamic load balancing. Implementing object-oriented software design provides the code with better maintainability, reusability, and extensibility compared with conventional structure based code. This also helps to encapsulate the details of communications syntax. Performance tests on SGI/Cray T3E-900 and SGI Origin 2000 machines show good scalability of the object-oriented code. Some important features of this code also include employing symplectic integration with linear maps of external focusing elements and using z as the independent variable, typical in accelerators. A successful application was done to simulate beam transport through three superconducting sections in the APT linac design
Automated Instrumentation, Monitoring and Visualization of PVM Programs Using AIMS

Science.gov (United States)

Mehra, Pankaj; VanVoorst, Brian; Yan, Jerry; Lum, Henry, Jr. (Technical Monitor)

1994-01-01

We present views and analysis of the execution of several PVM (Parallel Virtual Machine) codes for Computational Fluid Dynamics on a networks of Sparcstations, including: (1) NAS Parallel Benchmarks CG and MG; (2) a multi-partitioning algorithm for NAS Parallel Benchmark SP; and (3) an overset grid flowsolver. These views and analysis were obtained using our Automated Instrumentation and Monitoring System (AIMS) version 3.0, a toolkit for debugging the performance of PVM programs. We will describe the architecture, operation and application of AIMS. The AIMS toolkit contains: (1) Xinstrument, which can automatically instrument various computational and communication constructs in message-passing parallel programs; (2) Monitor, a library of runtime trace-collection routines; (3) VK (Visual Kernel), an execution-animation tool with source-code clickback; and (4) Tally, a tool for statistical analysis of execution profiles. Currently, Xinstrument can handle C and Fortran 77 programs using PVM 3.2.x; Monitor has been implemented and tested on Sun 4 systems running SunOS 4.1.2; and VK uses XIIR5 and Motif 1.2. Data and views obtained using AIMS clearly illustrate several characteristic features of executing parallel programs on networked workstations: (1) the impact of long message latencies; (2) the impact of multiprogramming overheads and associated load imbalance; (3) cache and virtual-memory effects; and (4) significant skews between workstation clocks. Interestingly, AIMS can compensate for constant skew (zero drift) by calibrating the skew between a parent and its spawned children. In addition, AIMS' skew-compensation algorithm can adjust timestamps in a way that eliminates physically impossible communications (e.g., messages going backwards in time). Our current efforts are directed toward creating new views to explain the observed performance of PVM programs. Some of the features planned for the near future include: (1) ConfigView, showing the physical topology
A parallel implementation of 3-d CT image reconstruction on a hypercube multiprocessor

International Nuclear Information System (INIS)

Chen, C.M.; Lee, S.Y.; Cho, Z.H.

1990-01-01

In this paper, the authors describe how image reconstruction in computerized tomography (CT) can be parallelized on a message-passing multiprocessor. In particular, the results obtained from parallel implementation of 3-D CT image reconstruction for parallel beam geometries on the Intel hypercube, iPSC/2, are presented. A two stage pipelining approach is employed for filtering (convolution) and backprojection. The conventional sequential convolution algorithm is modified such that the symmetry of the filter kernel is fully utilized for parallelization. In the backprojection stage, the 3-D incremental algorithm, the authors' recently developed backprojection scheme which is shown to be faster than conventional algorithm, is parallelized
Multi-petascale highly efficient parallel supercomputer

Science.gov (United States)

Asaad, Sameh; Bellofatto, Ralph E.; Blocksome, Michael A.; Blumrich, Matthias A.; Boyle, Peter; Brunheroto, Jose R.; Chen, Dong; Cher, Chen-Yong; Chiu, George L.; Christ, Norman; Coteus, Paul W.; Davis, Kristan D.; Dozsa, Gabor J.; Eichenberger, Alexandre E.; Eisley, Noel A.; Ellavsky, Matthew R.; Evans, Kahn C.; Fleischer, Bruce M.; Fox, Thomas W.; Gara, Alan; Giampapa, Mark E.; Gooding, Thomas M.; Gschwind, Michael K.; Gunnels, John A.; Hall, Shawn A.; Haring, Rudolf A.; Heidelberger, Philip; Inglett, Todd A.; Knudson, Brant L.; Kopcsay, Gerard V.; Kumar, Sameer; Mamidala, Amith R.; Marcella, James A.; Megerian, Mark G.; Miller, Douglas R.; Miller, Samuel J.; Muff, Adam J.; Mundy, Michael B.; O'Brien, John K.; O'Brien, Kathryn M.; Ohmacht, Martin; Parker, Jeffrey J.; Poole, Ruth J.; Ratterman, Joseph D.; Salapura, Valentina; Satterfield, David L.; Senger, Robert M.; Steinmacher-Burow, Burkhard; Stockdell, William M.; Stunkel, Craig B.; Sugavanam, Krishnan; Sugawara, Yutaka; Takken, Todd E.; Trager, Barry M.; Van Oosten, James L.; Wait, Charles D.; Walkup, Robert E.; Watson, Alfred T.; Wisniewski, Robert W.; Wu, Peng

2018-05-15

A Multi-Petascale Highly Efficient Parallel Supercomputer of 100 petaflop-scale includes node architectures based upon System-On-a-Chip technology, where each processing node comprises a single Application Specific Integrated Circuit (ASIC). The ASIC nodes are interconnected by a five dimensional torus network that optimally maximize the throughput of packet communications between nodes and minimize latency. The network implements collective network and a global asynchronous network that provides global barrier and notification functions. Integrated in the node design include a list-based prefetcher. The memory system implements transaction memory, thread level speculation, and multiversioning cache that improves soft error rate at the same time and supports DMA functionality allowing for parallel processing message-passing.
Parallel Sn Sweeps on Unstructured Grids: Algorithms for Prioritization, Grid Partitioning, and Cycle Detection

International Nuclear Information System (INIS)

Plimpton, Steven J.; Hendrickson, Bruce; Burns, Shawn P.; McLendon, William III; Rauchwerger, Lawrence

2005-01-01

The method of discrete ordinates is commonly used to solve the Boltzmann transport equation. The solution in each ordinate direction is most efficiently computed by sweeping the radiation flux across the computational grid. For unstructured grids this poses many challenges, particularly when implemented on distributed-memory parallel machines where the grid geometry is spread across processors. We present several algorithms relevant to this approach: (a) an asynchronous message-passing algorithm that performs sweeps simultaneously in multiple ordinate directions, (b) a simple geometric heuristic to prioritize the computational tasks that a processor works on, (c) a partitioning algorithm that creates columnar-style decompositions for unstructured grids, and (d) an algorithm for detecting and eliminating cycles that sometimes exist in unstructured grids and can prevent sweeps from successfully completing. Algorithms (a) and (d) are fully parallel; algorithms (b) and (c) can be used in conjunction with (a) to achieve higher parallel efficiencies. We describe our message-passing implementations of these algorithms within a radiation transport package. Performance and scalability results are given for unstructured grids with up to 3 million elements (500 million unknowns) running on thousands of processors of Sandia National Laboratories' Intel Tflops machine and DEC-Alpha CPlant cluster
Parallel ray tracing for one-dimensional discrete ordinate computations

International Nuclear Information System (INIS)

Jarvis, R.D.; Nelson, P.

1996-01-01

The ray-tracing sweep in discrete-ordinates, spatially discrete numerical approximation methods applied to the linear, steady-state, plane-parallel, mono-energetic, azimuthally symmetric, neutral-particle transport equation can be reduced to a parallel prefix computation. In so doing, the often severe penalty in convergence rate of the source iteration, suffered by most current parallel algorithms using spatial domain decomposition, can be avoided while attaining parallelism in the spatial domain to whatever extent desired. In addition, the reduction implies parallel algorithm complexity limits for the ray-tracing sweep. The reduction applies to all closed, linear, one-cell functional (CLOF) spatial approximation methods, which encompasses most in current popular use. Scalability test results of an implementation of the algorithm on a 64-node nCube-2S hypercube-connected, message-passing, multi-computer are described. (author)
Weighted community detection and data clustering using message passing

Science.gov (United States)

Shi, Cheng; Liu, Yanchen; Zhang, Pan

2018-03-01

Grouping objects into clusters based on the similarities or weights between them is one of the most important problems in science and engineering. In this work, by extending message-passing algorithms and spectral algorithms proposed for an unweighted community detection problem, we develop a non-parametric method based on statistical physics, by mapping the problem to the Potts model at the critical temperature of spin-glass transition and applying belief propagation to solve the marginals corresponding to the Boltzmann distribution. Our algorithm is robust to over-fitting and gives a principled way to determine whether there are significant clusters in the data and how many clusters there are. We apply our method to different clustering tasks. In the community detection problem in weighted and directed networks, we show that our algorithm significantly outperforms existing algorithms. In the clustering problem, where the data were generated by mixture models in the sparse regime, we show that our method works all the way down to the theoretical limit of detectability and gives accuracy very close to that of the optimal Bayesian inference. In the semi-supervised clustering problem, our method only needs several labels to work perfectly in classic datasets. Finally, we further develop Thouless-Anderson-Palmer equations which heavily reduce the computation complexity in dense networks but give almost the same performance as belief propagation.
Parallelism at Cern: real-time and off-line applications in the GP-MIMD2 project

International Nuclear Information System (INIS)

Calafiura, P.

1997-01-01

A wide range of general purpose high-energy physics applications, ranging from Monte Carlo simulation to data acquisition, from interactive data analysis to on-line filtering, have been ported, or developed, and run in parallel on IBM SP-2 and Meiko CS-2 CERN large multi-processor machines. The ESPRIT project GP-MIMD2 has been a catalyst for the interest in parallel computing at CERN. The project provided the 128 processor Meiko CS-2 system that is now succesfully integrated in the CERN computing environment. The CERN experiment NA48 was involved in the GP-MIMD2 project since the beginning. NA48 physicists run, as part of their day-to-day work, simulation and analysis programs parallelized using the message passing interface MPI. The CS-2 is also a vital component of the experiment data acquisition system and will be used to calibrate in real-time the 13000 channels liquid krypton calorimeter. (orig.)
Parallel computing in cluster of GPU applied to a problem of nuclear engineering

International Nuclear Information System (INIS)

Moraes, Sergio Ricardo S.; Heimlich, Adino; Resende, Pedro

2013-01-01

Cluster computing has been widely used as a low cost alternative for parallel processing in scientific applications. With the use of Message-Passing Interface (MPI) protocol development became even more accessible and widespread in the scientific community. A more recent trend is the use of Graphic Processing Unit (GPU), which is a powerful co-processor able to perform hundreds of instructions in parallel, reaching a capacity of hundreds of times the processing of a CPU. However, a standard PC does not allow, in general, more than two GPUs. Hence, it is proposed in this work development and evaluation of a hybrid low cost parallel approach to the solution to a nuclear engineering typical problem. The idea is to use clusters parallelism technology (MPI) together with GPU programming techniques (CUDA - Compute Unified Device Architecture) to simulate neutron transport through a slab using Monte Carlo method. By using a cluster comprised by four quad-core computers with 2 GPU each, it has been developed programs using MPI and CUDA technologies. Experiments, applying different configurations, from 1 to 8 GPUs has been performed and results were compared with the sequential (non-parallel) version. A speed up of about 2.000 times has been observed when comparing the 8-GPU with the sequential version. Results here presented are discussed and analyzed with the objective of outlining gains and possible limitations of the proposed approach. (author)
Structured Parallel Programming Patterns for Efficient Computation

CERN Document Server

McCool, Michael; Robison, Arch

2012-01-01

Programming is now parallel programming. Much as structured programming revolutionized traditional serial programming decades ago, a new kind of structured programming, based on patterns, is relevant to parallel programming today. Parallel computing experts and industry insiders Michael McCool, Arch Robison, and James Reinders describe how to design and implement maintainable and efficient parallel algorithms using a pattern-based approach. They present both theory and practice, and give detailed concrete examples using multiple programming models. Examples are primarily given using two of th

Parallel phase model : a programming model for high-end parallel machines with manycores.

Energy Technology Data Exchange (ETDEWEB)

Wu, Junfeng (Syracuse University, Syracuse, NY); Wen, Zhaofang; Heroux, Michael Allen; Brightwell, Ronald Brian

2009-04-01

This paper presents a parallel programming model, Parallel Phase Model (PPM), for next-generation high-end parallel machines based on a distributed memory architecture consisting of a networked cluster of nodes with a large number of cores on each node. PPM has a unified high-level programming abstraction that facilitates the design and implementation of parallel algorithms to exploit both the parallelism of the many cores and the parallelism at the cluster level. The programming abstraction will be suitable for expressing both fine-grained and coarse-grained parallelism. It includes a few high-level parallel programming language constructs that can be added as an extension to an existing (sequential or parallel) programming language such as C; and the implementation of PPM also includes a light-weight runtime library that runs on top of an existing network communication software layer (e.g. MPI). Design philosophy of PPM and details of the programming abstraction are also presented. Several unstructured applications that inherently require high-volume random fine-grained data accesses have been implemented in PPM with very promising results.
Parallelization of ITOUGH2 using PVM

International Nuclear Information System (INIS)

Finsterle, Stefan

1998-01-01

ITOUGH2 inversions are computationally intensive because the forward problem must be solved many times to evaluate the objective function for different parameter combinations or to numerically calculate sensitivity coefficients. Most of these forward runs are independent from each other and can therefore be performed in parallel. Message passing based on the Parallel Virtual Machine (PVM) system has been implemented into ITOUGH2 to enable parallel processing of ITOUGH2 jobs on a heterogeneous network of Unix workstations. This report describes the PVM system and its implementation into ITOUGH2. Instructions are given for installing PVM, compiling ITOUGH2-PVM for use on a workstation cluster, the preparation of an 1.TOUGH2 input file under PVM, and the execution of an ITOUGH2-PVM application. Examples are discussed, demonstrating the use of ITOUGH2-PVM
Approximate message passing for nonconvex sparse regularization with stability and asymptotic analysis

Science.gov (United States)

Sakata, Ayaka; Xu, Yingying

2018-03-01

We analyse a linear regression problem with nonconvex regularization called smoothly clipped absolute deviation (SCAD) under an overcomplete Gaussian basis for Gaussian random data. We propose an approximate message passing (AMP) algorithm considering nonconvex regularization, namely SCAD-AMP, and analytically show that the stability condition corresponds to the de Almeida-Thouless condition in spin glass literature. Through asymptotic analysis, we show the correspondence between the density evolution of SCAD-AMP and the replica symmetric (RS) solution. Numerical experiments confirm that for a sufficiently large system size, SCAD-AMP achieves the optimal performance predicted by the replica method. Through replica analysis, a phase transition between replica symmetric and replica symmetry breaking (RSB) region is found in the parameter space of SCAD. The appearance of the RS region for a nonconvex penalty is a significant advantage that indicates the region of smooth landscape of the optimization problem. Furthermore, we analytically show that the statistical representation performance of the SCAD penalty is better than that of \
Implementation of a parallel version of a regional climate model

Energy Technology Data Exchange (ETDEWEB)

Gerstengarbe, F.W. [ed.; Kuecken, M. [Potsdam-Institut fuer Klimafolgenforschung (PIK), Potsdam (Germany); Schaettler, U. [Deutscher Wetterdienst, Offenbach am Main (Germany). Geschaeftsbereich Forschung und Entwicklung

1997-10-01

A regional climate model developed by the Max Planck Institute for Meterology and the German Climate Computing Centre in Hamburg based on the `Europa` and `Deutschland` models of the German Weather Service has been parallelized and implemented on the IBM RS/6000 SP computer system of the Potsdam Institute for Climate Impact Research including parallel input/output processing, the explicit Eulerian time-step, the semi-implicit corrections, the normal-mode initialization and the physical parameterizations of the German Weather Service. The implementation utilizes Fortran 90 and the Message Passing Interface. The parallelization strategy used is a 2D domain decomposition. This report describes the parallelization strategy, the parallel I/O organization, the influence of different domain decomposition approaches for static and dynamic load imbalances and first numerical results. (orig.)
A multitransputer parallel processing system (MTPPS)

International Nuclear Information System (INIS)

Jethra, A.K.; Pande, S.S.; Borkar, S.P.; Khare, A.N.; Ghodgaonkar, M.D.; Bairi, B.R.

1993-01-01

This report describes the design and implementation of a 16 node Multi Transputer Parallel Processing System(MTPPS) which is a platform for parallel program development. It is a MIMD machine based on message passing paradigm. The basic compute engine is an Inmos Transputer Ims T800-20. Transputer with local memory constitutes the processing element (NODE) of this MIMD architecture. Multiple NODES can be connected to each other in an identifiable network topology through the high speed serial links of the transputer. A Network Configuration Unit (NCU) incorporates the necessary hardware to provide software controlled network configuration. System is modularly expandable and more NODES can be added to the system to achieve the required processing power. The system is backend to the IBM-PC which has been integrated into the system to provide user I/O interface. PC resources are available to the programmer. Interface hardware between the PC and the network of transputers is INMOS compatible. Therefore, all the commercially available development software compatible to INMOS products can run on this system. While giving the details of design and implementation, this report briefly summarises MIMD Architectures, Transputer Architecture and Parallel Processing Software Development issues. LINPACK performance evaluation of the system and solutions of neutron physics and plasma physics problem have been discussed along with results. (author). 12 refs., 22 figs., 3 tabs., 3 appendixes
Parallel computing and networking; Heiretsu keisanki to network

Energy Technology Data Exchange (ETDEWEB)

Asakawa, E; Tsuru, T [Japan National Oil Corp., Tokyo (Japan); Matsuoka, T [Japan Petroleum Exploration Co. Ltd., Tokyo (Japan)

1996-05-01

This paper describes the trend of parallel computers used in geophysical exploration. Around 1993 was the early days when the parallel computers began to be used for geophysical exploration. Classification of these computers those days was mainly MIMD (multiple instruction stream, multiple data stream), SIMD (single instruction stream, multiple data stream) and the like. Parallel computers were publicized in the 1994 meeting of the Geophysical Exploration Society as a `high precision imaging technology`. Concerning the library of parallel computers, there was a shift to PVM (parallel virtual machine) in 1993 and to MPI (message passing interface) in 1995. In addition, the compiler of FORTRAN90 was released with support implemented for data parallel and vector computers. In 1993, networks used were Ethernet, FDDI, CDDI and HIPPI. In 1995, the OC-3 products under ATM began to propagate. However, ATM remains to be an interoffice high speed network because the ATM service has not spread yet for the public network. 1 ref.
A Parallel Workload Model and its Implications for Processor Allocation

Science.gov (United States)

1996-11-01

with SEV or AVG, both of which can tolerate c = 0.4 { 0.6 before their performance deteriorates signi cantly. On the other hand, Setia [10] has...Sanjeev. K Setia . The interaction between memory allocation and adaptive partitioning in message-passing multicomputers. In IPPS 󈨣 Workshop on Job...Scheduling Strategies for Parallel Processing, pages 89{99, 1995. [11] Sanjeev K. Setia and Satish K. Tripathi. An analysis of several processor
Hybrid programming model for implicit PDE simulations on multicore architectures

KAUST Repository

Kaushik, Dinesh; Keyes, David E.; Balay, Satish; Smith, Barry F.

2011-01-01

The complexity of programming modern multicore processor based clusters is rapidly rising, with GPUs adding further demand for fine-grained parallelism. This paper analyzes the performance of the hybrid (MPI+OpenMP) programming model in the context of an implicit unstructured mesh CFD code. At the implementation level, the effects of cache locality, update management, work division, and synchronization frequency are studied. The hybrid model presents interesting algorithmic opportunities as well: the convergence of linear system solver is quicker than the pure MPI case since the parallel preconditioner stays stronger when hybrid model is used. This implies significant savings in the cost of communication and synchronization (explicit and implicit). Even though OpenMP based parallelism is easier to implement (with in a subdomain assigned to one MPI process for simplicity), getting good performance needs attention to data partitioning issues similar to those in the message-passing case. © 2011 Springer-Verlag.
Programming massively parallel processors a hands-on approach

CERN Document Server

Kirk, David B

2010-01-01

Programming Massively Parallel Processors discusses basic concepts about parallel programming and GPU architecture. ""Massively parallel"" refers to the use of a large number of processors to perform a set of computations in a coordinated parallel way. The book details various techniques for constructing parallel programs. It also discusses the development process, performance level, floating-point format, parallel patterns, and dynamic parallelism. The book serves as a teaching guide where parallel programming is the main topic of the course. It builds on the basics of C programming for CUDA, a parallel programming environment that is supported on NVI- DIA GPUs. Composed of 12 chapters, the book begins with basic information about the GPU as a parallel computer source. It also explains the main concepts of CUDA, data parallelism, and the importance of memory access efficiency using CUDA. The target audience of the book is graduate and undergraduate students from all science and engineering disciplines who ...
Parallelization of a Monte Carlo particle transport simulation code

Science.gov (United States)

Hadjidoukas, P.; Bousis, C.; Emfietzoglou, D.

2010-05-01

We have developed a high performance version of the Monte Carlo particle transport simulation code MC4. The original application code, developed in Visual Basic for Applications (VBA) for Microsoft Excel, was first rewritten in the C programming language for improving code portability. Several pseudo-random number generators have been also integrated and studied. The new MC4 version was then parallelized for shared and distributed-memory multiprocessor systems using the Message Passing Interface. Two parallel pseudo-random number generator libraries (SPRNG and DCMT) have been seamlessly integrated. The performance speedup of parallel MC4 has been studied on a variety of parallel computing architectures including an Intel Xeon server with 4 dual-core processors, a Sun cluster consisting of 16 nodes of 2 dual-core AMD Opteron processors and a 200 dual-processor HP cluster. For large problem size, which is limited only by the physical memory of the multiprocessor server, the speedup results are almost linear on all systems. We have validated the parallel implementation against the serial VBA and C implementations using the same random number generator. Our experimental results on the transport and energy loss of electrons in a water medium show that the serial and parallel codes are equivalent in accuracy. The present improvements allow for studying of higher particle energies with the use of more accurate physical models, and improve statistics as more particles tracks can be simulated in low response time.
Parallel simulated annealing algorithms for cell placement on hypercube multiprocessors

Science.gov (United States)

Banerjee, Prithviraj; Jones, Mark Howard; Sargent, Jeff S.

1990-01-01

Two parallel algorithms for standard cell placement using simulated annealing are developed to run on distributed-memory message-passing hypercube multiprocessors. The cells can be mapped in a two-dimensional area of a chip onto processors in an n-dimensional hypercube in two ways, such that both small and large cell exchange and displacement moves can be applied. The computation of the cost function in parallel among all the processors in the hypercube is described, along with a distributed data structure that needs to be stored in the hypercube to support the parallel cost evaluation. A novel tree broadcasting strategy is used extensively for updating cell locations in the parallel environment. A dynamic parallel annealing schedule estimates the errors due to interacting parallel moves and adapts the rate of synchronization automatically. Two novel approaches in controlling error in parallel algorithms are described: heuristic cell coloring and adaptive sequence control.
Experiences in Data-Parallel Programming

Directory of Open Access Journals (Sweden)

Terry W. Clark

1997-01-01

Full Text Available To efficiently parallelize a scientific application with a data-parallel compiler requires certain structural properties in the source program, and conversely, the absence of others. A recent parallelization effort of ours reinforced this observation and motivated this correspondence. Specifically, we have transformed a Fortran 77 version of GROMOS, a popular dusty-deck program for molecular dynamics, into Fortran D, a data-parallel dialect of Fortran. During this transformation we have encountered a number of difficulties that probably are neither limited to this particular application nor do they seem likely to be addressed by improved compiler technology in the near future. Our experience with GROMOS suggests a number of points to keep in mind when developing software that may at some time in its life cycle be parallelized with a data-parallel compiler. This note presents some guidelines for engineering data-parallel applications that are compatible with Fortran D or High Performance Fortran compilers.
How to Build an AppleSeed: A Parallel Macintosh Cluster for Numerically Intensive Computing

Science.gov (United States)

Decyk, V. K.; Dauger, D. E.

We have constructed a parallel cluster consisting of a mixture of Apple Macintosh G3 and G4 computers running the Mac OS, and have achieved very good performance on numerically intensive, parallel plasma particle-incell simulations. A subset of the MPI message-passing library was implemented in Fortran77 and C. This library enabled us to port code, without modification, from other parallel processors to the Macintosh cluster. Unlike Unix-based clusters, no special expertise in operating systems is required to build and run the cluster. This enables us to move parallel computing from the realm of experts to the main stream of computing.
Application of parallel computing to seismic damage process simulation of an arch dam

International Nuclear Information System (INIS)

Zhong Hong; Lin Gao; Li Jianbo

2010-01-01

The simulation of damage process of high arch dam subjected to strong earthquake shocks is significant to the evaluation of its performance and seismic safety, considering the catastrophic effect of dam failure. However, such numerical simulation requires rigorous computational capacity. Conventional serial computing falls short of that and parallel computing is a fairly promising solution to this problem. The parallel finite element code PDPAD was developed for the damage prediction of arch dams utilizing the damage model with inheterogeneity of concrete considered. Developed with programming language Fortran, the code uses a master/slave mode for programming, domain decomposition method for allocation of tasks, MPI (Message Passing Interface) for communication and solvers from AZTEC library for solution of large-scale equations. Speedup test showed that the performance of PDPAD was quite satisfactory. The code was employed to study the damage process of a being-built arch dam on a 4-node PC Cluster, with more than one million degrees of freedom considered. The obtained damage mode was quite similar to that of shaking table test, indicating that the proposed procedure and parallel code PDPAD has a good potential in simulating seismic damage mode of arch dams. With the rapidly growing need for massive computation emerged from engineering problems, parallel computing will find more and more applications in pertinent areas.
The effects of fear appeal message repetition on perceived threat, perceived efficacy, and behavioral intention in the extended parallel process model.

Science.gov (United States)

Shi, Jingyuan Jolie; Smith, Sandi W

2016-01-01

This study examined the effect of moderately repeated exposure (three times) to a fear appeal message on the Extended Parallel Processing Model (EPPM) variables of threat, efficacy, and behavioral intentions for the recommended behaviors in the message, as well as the proportions of systematic and message-related thoughts generated after each message exposure. The results showed that after repeated exposure to a fear appeal message about preventing melanoma, perceived threat in terms of susceptibility and perceived efficacy in terms of response efficacy significantly increased. The behavioral intentions of all recommended behaviors did not change after repeated exposure to the message. However, after the second exposure the proportions of both systematic and all message-related thoughts (relative to total thoughts) significantly decreased while the proportion of heuristic thoughts significantly increased, and this pattern held after the third exposure. The findings demonstrated that the predictions in the EPPM are likely to be operative after three exposures to a persuasive message.
Analysis of multigrid methods on massively parallel computers: Architectural implications

Science.gov (United States)

Matheson, Lesley R.; Tarjan, Robert E.

1993-01-01

We study the potential performance of multigrid algorithms running on massively parallel computers with the intent of discovering whether presently envisioned machines will provide an efficient platform for such algorithms. We consider the domain parallel version of the standard V cycle algorithm on model problems, discretized using finite difference techniques in two and three dimensions on block structured grids of size 10(exp 6) and 10(exp 9), respectively. Our models of parallel computation were developed to reflect the computing characteristics of the current generation of massively parallel multicomputers. These models are based on an interconnection network of 256 to 16,384 message passing, 'workstation size' processors executing in an SPMD mode. The first model accomplishes interprocessor communications through a multistage permutation network. The communication cost is a logarithmic function which is similar to the costs in a variety of different topologies. The second model allows single stage communication costs only. Both models were designed with information provided by machine developers and utilize implementation derived parameters. With the medium grain parallelism of the current generation and the high fixed cost of an interprocessor communication, our analysis suggests an efficient implementation requires the machine to support the efficient transmission of long messages, (up to 1000 words) or the high initiation cost of a communication must be significantly reduced through an alternative optimization technique. Furthermore, with variable length message capability, our analysis suggests the low diameter multistage networks provide little or no advantage over a simple single stage communications network.
Media's Moral Messages: Assessing Perceptions of Moral Content in Television Programming

Science.gov (United States)

Glover, Rebecca J.; Garmon, Lance C.; Hull, Darrell M.

2011-01-01

This study extends the examination of moral content in the media by exploring moral messages in television programming and viewer characteristics predictive of the ability to perceive such messages. Generalisability analyses confirmed the reliability of the Media's Moral Messages (MMM) rating form for analysing programme content and the existence…
Numerical discrepancy between serial and MPI parallel computations

Directory of Open Access Journals (Sweden)

Sang Bong Lee

2016-09-01

Full Text Available Numerical simulations of 1D Burgers equation and 2D sloshing problem were carried out to study numerical discrepancy between serial and parallel computations. The numerical domain was decomposed into 2 and 4 subdomains for parallel computations with message passing interface. The numerical solution of Burgers equation disclosed that fully explicit boundary conditions used on subdomains of parallel computation was responsible for the numerical discrepancy of transient solution between serial and parallel computations. Two dimensional sloshing problems in a rectangular domain were solved using OpenFOAM. After a lapse of initial transient time sloshing patterns of water were significantly different in serial and parallel computations although the same numerical conditions were given. Based on the histograms of pressure measured at two points near the wall the statistical characteristics of numerical solution was not affected by the number of subdomains as much as the transient solution was dependent on the number of subdomains.
Application of a Tsunami Warning Message Metric to refine NOAA NWS Tsunami Warning Messages

Science.gov (United States)

Gregg, C. E.; Johnston, D.; Sorensen, J.; Whitmore, P.

2013-12-01

In 2010, the U.S. National Weather Service (NWS) funded a three year project to integrate social science into their Tsunami Program. One of three primary requirements of the grant was to make improvements to tsunami warning messages of the NWS' two Tsunami Warning Centers- the West Coast/Alaska Tsunami Warning Center (WCATWC) in Palmer, Alaska and the Pacific Tsunami Warning Center (PTWC) in Ewa Beach, Hawaii. We conducted focus group meetings with a purposive sample of local, state and Federal stakeholders and emergency managers in six states (AK, WA, OR, CA, HI and NC) and two US Territories (US Virgin Islands and American Samoa) to qualitatively asses information needs in tsunami warning messages using WCATWC tsunami messages for the March 2011 Tohoku earthquake and tsunami event. We also reviewed research literature on behavioral response to warnings to develop a tsunami warning message metric that could be used to guide revisions to tsunami warning messages of both warning centers. The message metric is divided into categories of Message Content, Style, Order and Formatting and Receiver Characteristics. A message is evaluated by cross-referencing the message with the operational definitions of metric factors. Findings are then used to guide revisions of the message until the characteristics of each factor are met. Using findings from this project and findings from a parallel NWS Warning Tiger Team study led by T. Nicolini, the WCATWC implemented the first of two phases of revisions to their warning messages in November 2012. A second phase of additional changes, which will fully implement the redesign of messages based on the metric, is in progress. The resulting messages will reflect current state-of-the-art knowledge on warning message effectiveness. Here we present the message metric; evidence-based rational for message factors; and examples of previous, existing and proposed messages.
PDDP, A Data Parallel Programming Model

Directory of Open Access Journals (Sweden)

Karen H. Warren

1996-01-01

Full Text Available PDDP, the parallel data distribution preprocessor, is a data parallel programming model for distributed memory parallel computers. PDDP implements high-performance Fortran-compatible data distribution directives and parallelism expressed by the use of Fortran 90 array syntax, the FORALL statement, and the WHERE construct. Distributed data objects belong to a global name space; other data objects are treated as local and replicated on each processor. PDDP allows the user to program in a shared memory style and generates codes that are portable to a variety of parallel machines. For interprocessor communication, PDDP uses the fastest communication primitives on each platform.

High-performance computing — an overview

Science.gov (United States)

Marksteiner, Peter

1996-08-01

An overview of high-performance computing (HPC) is given. Different types of computer architectures used in HPC are discussed: vector supercomputers, high-performance RISC processors, various parallel computers like symmetric multiprocessors, workstation clusters, massively parallel processors. Software tools and programming techniques used in HPC are reviewed: vectorizing compilers, optimization and vector tuning, optimization for RISC processors; parallel programming techniques like shared-memory parallelism, message passing and data parallelism; and numerical libraries.
Design Considerations in Developing a Text Messaging Program Aimed at Smoking Cessation

Science.gov (United States)

Holtrop, Jodi Summers; Bağci Bosi, A Tülay; Emri, Salih

2012-01-01

Background Cell phone text messaging is gaining increasing recognition as an important tool that can be harnessed for prevention and intervention programs across a wide variety of health research applications. Despite the growing body of literature reporting positive outcomes, very little is available about the design decisions that scaffold the development of text messaging-based health interventions. What seems to be missing is documentation of the thought process of investigators in the initial stages of protocol and content development. This omission is of particular concern because many researchers seem to view text messaging as the intervention itself instead of simply a delivery mechanism. Certainly, aspects of this technology may increase participant engagement. Like other interventions, however, the content is a central driver of the behavior change. Objective To address this noted gap in the literature, we discuss the protocol decisions and content development for SMS Turkey (or Cebiniz birakin diyor in Turkish), a smoking cessation text messaging program for adult smokers in Turkey. Methods Content was developed in English and translated into Turkish. Efforts were made to ensure that the protocol and content were grounded in evidence-based smoking cessation theory, while also reflective of the cultural aspects of smoking and quitting in Turkey. Results Methodological considerations included whether to provide cell phones and whether to reimburse participants for texting costs; whether to include supplementary intervention resources (eg, personal contact); and whether to utilize unidirectional versus bidirectional messaging. Program design considerations included how messages were tailored to the quitting curve and one’s smoking status after one’s quit date, the number of messages participants received per day, and over what period of time the intervention lasted. Conclusion The content and methods of effective smoking cessation quitline programs were
Language constructs for modular parallel programs

Energy Technology Data Exchange (ETDEWEB)

Foster, I.

1996-03-01

We describe programming language constructs that facilitate the application of modular design techniques in parallel programming. These constructs allow us to isolate resource management and processor scheduling decisions from the specification of individual modules, which can themselves encapsulate design decisions concerned with concurrence, communication, process mapping, and data distribution. This approach permits development of libraries of reusable parallel program components and the reuse of these components in different contexts. In particular, alternative mapping strategies can be explored without modifying other aspects of program logic. We describe how these constructs are incorporated in two practical parallel programming languages, PCN and Fortran M. Compilers have been developed for both languages, allowing experimentation in substantial applications.
Development of parallel/serial program analyzing tool

International Nuclear Information System (INIS)

Watanabe, Hiroshi; Nagao, Saichi; Takigawa, Yoshio; Kumakura, Toshimasa

1999-03-01

Japan Atomic Energy Research Institute has been developing 'KMtool', a parallel/serial program analyzing tool, in order to promote the parallelization of the science and engineering computation program. KMtool analyzes the performance of program written by FORTRAN77 and MPI, and it reduces the effort for parallelization. This paper describes development purpose, design, utilization and evaluation of KMtool. (author)
PSHED: a simplified approach to developing parallel programs

International Nuclear Information System (INIS)

Mahajan, S.M.; Ramesh, K.; Rajesh, K.; Somani, A.; Goel, M.

1992-01-01

This paper presents a simplified approach in the forms of a tree structured computational model for parallel application programs. An attempt is made to provide a standard user interface to execute programs on BARC Parallel Processing System (BPPS), a scalable distributed memory multiprocessor. The interface package called PSHED provides a basic framework for representing and executing parallel programs on different parallel architectures. The PSHED package incorporates concepts from a broad range of previous research in programming environments and parallel computations. (author). 6 refs
Professional Parallel Programming with C# Master Parallel Extensions with NET 4

CERN Document Server

Hillar, Gastón

2010-01-01

Expert guidance for those programming today's dual-core processors PCs As PC processors explode from one or two to now eight processors, there is an urgent need for programmers to master concurrent programming. This book dives deep into the latest technologies available to programmers for creating professional parallel applications using C#, .NET 4, and Visual Studio 2010. The book covers task-based programming, coordination data structures, PLINQ, thread pools, asynchronous programming model, and more. It also teaches other parallel programming techniques, such as SIMD and vectorization.Teach
Solution of finite element problems using hybrid parallelization with MPI and OpenMP Solution of finite element problems using hybrid parallelization with MPI and OpenMP

Directory of Open Access Journals (Sweden)

José Miguel Vargas-Félix

2012-11-01

Full Text Available The Finite Element Method (FEM is used to solve problems like solid deformation and heat diffusion in domains with complex geometries. This kind of geometries requires discretization with millions of elements; this is equivalent to solve systems of equations with sparse matrices and tens or hundreds of millions of variables. The aim is to use computer clusters to solve these systems. The solution method used is Schur substructuration. Using it is possible to divide a large system of equations into many small ones to solve them more efficiently. This method allows parallelization. MPI (Message Passing Interface is used to distribute the systems of equations to solve each one in a computer of a cluster. Each system of equations is solved using a solver implemented to use OpenMP as a local parallelization method.The Finite Element Method (FEM is used to solve problems like solid deformation and heat diffusion in domains with complex geometries. This kind of geometries requires discretization with millions of elements; this is equivalent to solve systems of equations with sparse matrices and tens or hundreds of millions of variables. The aim is to use computer clusters to solve these systems. The solution method used is Schur substructuration. Using it is possible to divide a large system of equations into many small ones to solve them more efficiently. This method allows parallelization. MPI (Message Passing Interface is used to distribute the systems of equations to solve each one in a computer of a cluster. Each system of equations is solved using a solver implemented to use OpenMP as a local parallelization method.
Overview of Parallel Platforms for Common High Performance Computing

Directory of Open Access Journals (Sweden)

T. Fryza

2012-04-01

Full Text Available The paper deals with various parallel platforms used for high performance computing in the signal processing domain. More precisely, the methods exploiting the multicores central processing units such as message passing interface and OpenMP are taken into account. The properties of the programming methods are experimentally proved in the application of a fast Fourier transform and a discrete cosine transform and they are compared with the possibilities of MATLAB's built-in functions and Texas Instruments digital signal processors with very long instruction word architectures. New FFT and DCT implementations were proposed and tested. The implementation phase was compared with CPU based computing methods and with possibilities of the Texas Instruments digital signal processing library on C6747 floating-point DSPs. The optimal combination of computing methods in the signal processing domain and new, fast routines' implementation is proposed as well.
About Parallel Programming: Paradigms, Parallel Execution and Collaborative Systems

Directory of Open Access Journals (Sweden)

Loredana MOCEAN

2009-01-01

Full Text Available In the last years, there were made efforts for delineation of a stabile and unitary frame, where the problems of logical parallel processing must find solutions at least at the level of imperative languages. The results obtained by now are not at the level of the made efforts. This paper wants to be a little contribution at these efforts. We propose an overview in parallel programming, parallel execution and collaborative systems.
A Tutorial on Parallel and Concurrent Programming in Haskell

Science.gov (United States)

Peyton Jones, Simon; Singh, Satnam

This practical tutorial introduces the features available in Haskell for writing parallel and concurrent programs. We first describe how to write semi-explicit parallel programs by using annotations to express opportunities for parallelism and to help control the granularity of parallelism for effective execution on modern operating systems and processors. We then describe the mechanisms provided by Haskell for writing explicitly parallel programs with a focus on the use of software transactional memory to help share information between threads. Finally, we show how nested data parallelism can be used to write deterministically parallel programs which allows programmers to use rich data types in data parallel programs which are automatically transformed into flat data parallel versions for efficient execution on multi-core processors.
Productive Parallel Programming: The PCN Approach

Directory of Open Access Journals (Sweden)

Ian Foster

1992-01-01

Full Text Available We describe the PCN programming system, focusing on those features designed to improve the productivity of scientists and engineers using parallel supercomputers. These features include a simple notation for the concise specification of concurrent algorithms, the ability to incorporate existing Fortran and C code into parallel applications, facilities for reusing parallel program components, a portable toolkit that allows applications to be developed on a workstation or small parallel computer and run unchanged on supercomputers, and integrated debugging and performance analysis tools. We survey representative scientific applications and identify problem classes for which PCN has proved particularly useful.
Research in Parallel Algorithms and Software for Computational Aerosciences

Science.gov (United States)

Domel, Neal D.

1996-01-01

Phase 1 is complete for the development of a computational fluid dynamics CFD) parallel code with automatic grid generation and adaptation for the Euler analysis of flow over complex geometries. SPLITFLOW, an unstructured Cartesian grid code developed at Lockheed Martin Tactical Aircraft Systems, has been modified for a distributed memory/massively parallel computing environment. The parallel code is operational on an SGI network, Cray J90 and C90 vector machines, SGI Power Challenge, and Cray T3D and IBM SP2 massively parallel machines. Parallel Virtual Machine (PVM) is the message passing protocol for portability to various architectures. A domain decomposition technique was developed which enforces dynamic load balancing to improve solution speed and memory requirements. A host/node algorithm distributes the tasks. The solver parallelizes very well, and scales with the number of processors. Partially parallelized and non-parallelized tasks consume most of the wall clock time in a very fine grain environment. Timing comparisons on a Cray C90 demonstrate that Parallel SPLITFLOW runs 2.4 times faster on 8 processors than its non-parallel counterpart autotasked over 8 processors.
Three-pass protocol scheme for bitmap image security by using vernam cipher algorithm

Science.gov (United States)

Rachmawati, D.; Budiman, M. A.; Aulya, L.

2018-02-01

Confidentiality, integrity, and efficiency are the crucial aspects of data security. Among the other digital data, image data is too prone to abuse of operation like duplication, modification, etc. There are some data security techniques, one of them is cryptography. The security of Vernam Cipher cryptography algorithm is very dependent on the key exchange process. If the key is leaked, security of this algorithm will collapse. Therefore, a method that minimizes key leakage during the exchange of messages is required. The method which is used, is known as Three-Pass Protocol. This protocol enables message delivery process without the key exchange. Therefore, the sending messages process can reach the receiver safely without fear of key leakage. The system is built by using Java programming language. The materials which are used for system testing are image in size 200×200 pixel, 300×300 pixel, 500×500 pixel, 800×800 pixel and 1000×1000 pixel. The result of experiments showed that Vernam Cipher algorithm in Three-Pass Protocol scheme could restore the original image.
MPIRUN: A Portable Loader for Multidisciplinary and Multi-Zonal Applications

Science.gov (United States)

Fineberg, Samuel A.; Woodrow, Thomas S. (Technical Monitor)

1994-01-01

Multidisciplinary and multi-zonal applications are an important class of applications in the area of Computational Aerosciences. In these codes, two or more distinct parallel programs or copies of a single program are utilized to model a single problem. To support such applications, it is common to use a programming model where a program is divided into several single program multiple data stream (SPMD) applications, each of which solves the equations for a single physical discipline or grid zone. These SPMD applications are then bound together to form a single multidisciplinary or multi-zonal program in which the constituent parts communicate via point-to-point message passing routines. One method for implementing the message passing portion of these codes is with the new Message Passing Interface (MPI) standard. Unfortunately, this standard only specifies the message passing portion of an application, but does not specify any portable mechanisms for loading an application. MPIRUN was developed to provide a portable means for loading MPI programs, and was specifically targeted at multidisciplinary and multi-zonal applications. Programs using MPIRUN for loading and MPI for message passing are then portable between all machines supported by MPIRUN. MPIRUN is currently implemented for the Intel iPSC/860, TMC CM5, IBM SP-1 and SP-2, Intel Paragon, and workstation clusters. Further, MPIRUN is designed to be simple enough to port easily to any system supporting MPI.
Parallel decomposition of the tight-binding fictitious Lagrangian algorithm for molecular dynamics simulations of semiconductors

International Nuclear Information System (INIS)

Yeh, M.; Kim, J.; Khan, F.S.

1995-01-01

We present a parallel decomposition of the tight-binding fictitious Lagrangian algorithm for the Intel iPSC/860 and the Intel Paragon parallel computers. We show that it is possible to perform long simulations, of the order of 10 000 time steps, on semiconducting clusters consisting of as many as 512 atoms, on a time scale of the order of 20 h or less. We have made a very careful timing analysis of all parts of our code, and have identified the bottlenecks. We have also derived formulas which can predict the timing of our code, based on the number of processors, message passing bandwidth, floating point performance of each node, and the set up time for message passing, appropriate to the machine being used. The time of the simulation scales as the square of the number of particles, if the number of processors is made to scale linearly with the number of particles. We show that for a system as large as 512 atoms, the main bottleneck of the computation is the orthogonalization of the wave functions, which consumes about 90% of the total time of the simulation
Passing in Command Line Arguments and Parallel Cluster/Multicore Batching in R with batch.

Science.gov (United States)

Hoffmann, Thomas J

2011-03-01

It is often useful to rerun a command line R script with some slight change in the parameters used to run it - a new set of parameters for a simulation, a different dataset to process, etc. The R package batch provides a means to pass in multiple command line options, including vectors of values in the usual R format, easily into R. The same script can be setup to run things in parallel via different command line arguments. The R package batch also provides a means to simplify this parallel batching by allowing one to use R and an R-like syntax for arguments to spread a script across a cluster or local multicore/multiprocessor computer, with automated syntax for several popular cluster types. Finally it provides a means to aggregate the results together of multiple processes run on a cluster.
First massively parallel algorithm to be implemented in Apollo-II code

International Nuclear Information System (INIS)

Stankovski, Z.

1994-01-01

The collision probability (CP) method in neutron transport, as applied to arbitrary 2D XY geometries, like the TDT module in APOLLO-II, is very time consuming. Consequently RZ or 3D extensions became prohibitive. Fortunately, this method is very suitable for parallelization. Massively parallel computer architectures, especially MIMD machines, bring a new breath to this method. In this paper we present a CM5 implementation of the CP method. Parallelization is applied to the energy groups, using the CMMD message passing library. In our case we use 32 processors for the standard 99-group APOLLIB-II library. The real advantage of this algorithm will appear in the calculation of the future fine multigroup library (about 8000 groups) of the SAPHYR project with a massively parallel computer (to the order of hundreds of processors). (author). 3 tabs., 4 figs., 4 refs
Step by step parallel programming method for molecular dynamics code

International Nuclear Information System (INIS)

Orii, Shigeo; Ohta, Toshio

1996-07-01

Parallel programming for a numerical simulation program of molecular dynamics is carried out with a step-by-step programming technique using the two phase method. As a result, within the range of a certain computing parameters, it is found to obtain parallel performance by using the level of parallel programming which decomposes the calculation according to indices of do-loops into each processor on the vector parallel computer VPP500 and the scalar parallel computer Paragon. It is also found that VPP500 shows parallel performance in wider range computing parameters. The reason is that the time cost of the program parts, which can not be reduced by the do-loop level of the parallel programming, can be reduced to the negligible level by the vectorization. After that, the time consuming parts of the program are concentrated on less parts that can be accelerated by the do-loop level of the parallel programming. This report shows the step-by-step parallel programming method and the parallel performance of the molecular dynamics code on VPP500 and Paragon. (author)
A high performance image processing platform based on CPU-GPU heterogeneous cluster with parallel image reconstroctions for micro-CT

International Nuclear Information System (INIS)

Ding Yu; Qi Yujin; Zhang Xuezhu; Zhao Cuilan

2011-01-01

In this paper, we report the development of a high-performance image processing platform, which is based on CPU-GPU heterogeneous cluster. Currently, it consists of a Dell Precision T7500 and HP XW8600 workstations with parallel programming and runtime environment, using the message-passing interface (MPI) and CUDA (Compute Unified Device Architecture). We succeeded in developing parallel image processing techniques for 3D image reconstruction of X-ray micro-CT imaging. The results show that a GPU provides a computing efficiency of about 194 times faster than a single CPU, and the CPU-GPU clusters provides a computing efficiency of about 46 times faster than the CPU clusters. These meet the requirements of rapid 3D image reconstruction and real time image display. In conclusion, the use of CPU-GPU heterogeneous cluster is an effective way to build high-performance image processing platform. (authors)
Program Transformation to Identify List-Based Parallel Skeletons

Directory of Open Access Journals (Sweden)

Venkatesh Kannan

2016-07-01

Full Text Available Algorithmic skeletons are used as building-blocks to ease the task of parallel programming by abstracting the details of parallel implementation from the developer. Most existing libraries provide implementations of skeletons that are defined over flat data types such as lists or arrays. However, skeleton-based parallel programming is still very challenging as it requires intricate analysis of the underlying algorithm and often uses inefficient intermediate data structures. Further, the algorithmic structure of a given program may not match those of list-based skeletons. In this paper, we present a method to automatically transform any given program to one that is defined over a list and is more likely to contain instances of list-based skeletons. This facilitates the parallel execution of a transformed program using existing implementations of list-based parallel skeletons. Further, by using an existing transformation called distillation in conjunction with our method, we produce transformed programs that contain fewer inefficient intermediate data structures.

HealthyhornsTXT: A Text-Messaging Program to Promote College Student Health and Wellness.

Science.gov (United States)

Glowacki, Elizabeth M; Kirtz, Susan; Hughes Wagner, Jessica; Cance, Jessica Duncan; Barrera, Denise; Bernhardt, Jay M

2018-01-01

Text-messaging interventions positively affect health behaviors, but their use on college campuses has been limited. Text messaging serves as a relatively affordable way to communicate with large audiences and is one of the preferred modes of communication for young adults. This study examined the feasibility and acceptability of a campus-wide, health text-messaging program. The subscriber pool consisted of approximately 6,000 undergraduate students from a large, southern university. From that pool, 1,095 participants (64% female; 41% White) completed a posttest survey. Text messages covered a range of health topics and information about campus resources. Research was conducted from August through December 2015. Process data were collected throughout the semester; participants' attitudes were assessed via an online survey at the program's conclusion. Students demonstrated engagement with the messages throughout the semester as evidenced by replies to text-back keywords and clicks on website links embedded within messages. Messages about sleep, stress management, and hydration were considered most relevant. The majority of participants (61%) reported increased awareness regarding their health. Text-messaging interventions are a feasible strategy to improve college student health.
Hierarchical approach to optimization of parallel matrix multiplication on large-scale platforms

KAUST Repository

Hasanov, Khalid

2014-03-04

© 2014, Springer Science+Business Media New York. Many state-of-the-art parallel algorithms, which are widely used in scientific applications executed on high-end computing systems, were designed in the twentieth century with relatively small-scale parallelism in mind. Indeed, while in 1990s a system with few hundred cores was considered a powerful supercomputer, modern top supercomputers have millions of cores. In this paper, we present a hierarchical approach to optimization of message-passing parallel algorithms for execution on large-scale distributed-memory systems. The idea is to reduce the communication cost by introducing hierarchy and hence more parallelism in the communication scheme. We apply this approach to SUMMA, the state-of-the-art parallel algorithm for matrix–matrix multiplication, and demonstrate both theoretically and experimentally that the modified Hierarchical SUMMA significantly improves the communication cost and the overall performance on large-scale platforms.
Implementation of Parallel Dynamic Simulation on Shared-Memory vs. Distributed-Memory Environments

Energy Technology Data Exchange (ETDEWEB)

Jin, Shuangshuang; Chen, Yousu; Wu, Di; Diao, Ruisheng; Huang, Zhenyu

2015-12-09

Power system dynamic simulation computes the system response to a sequence of large disturbance, such as sudden changes in generation or load, or a network short circuit followed by protective branch switching operation. It consists of a large set of differential and algebraic equations, which is computational intensive and challenging to solve using single-processor based dynamic simulation solution. High-performance computing (HPC) based parallel computing is a very promising technology to speed up the computation and facilitate the simulation process. This paper presents two different parallel implementations of power grid dynamic simulation using Open Multi-processing (OpenMP) on shared-memory platform, and Message Passing Interface (MPI) on distributed-memory clusters, respectively. The difference of the parallel simulation algorithms and architectures of the two HPC technologies are illustrated, and their performances for running parallel dynamic simulation are compared and demonstrated.
Parallel processor programs in the Federal Government

Science.gov (United States)

Schneck, P. B.; Austin, D.; Squires, S. L.; Lehmann, J.; Mizell, D.; Wallgren, K.

1985-01-01

In 1982, a report dealing with the nation's research needs in high-speed computing called for increased access to supercomputing resources for the research community, research in computational mathematics, and increased research in the technology base needed for the next generation of supercomputers. Since that time a number of programs addressing future generations of computers, particularly parallel processors, have been started by U.S. government agencies. The present paper provides a description of the largest government programs in parallel processing. Established in fiscal year 1985 by the Institute for Defense Analyses for the National Security Agency, the Supercomputing Research Center will pursue research to advance the state of the art in supercomputing. Attention is also given to the DOE applied mathematical sciences research program, the NYU Ultracomputer project, the DARPA multiprocessor system architectures program, NSF research on multiprocessor systems, ONR activities in parallel computing, and NASA parallel processor projects.
Optimization approaches to mpi and area merging-based parallel buffer algorithm

Directory of Open Access Journals (Sweden)

Junfu Fan

Full Text Available On buffer zone construction, the rasterization-based dilation method inevitably introduces errors, and the double-sided parallel line method involves a series of complex operations. In this paper, we proposed a parallel buffer algorithm based on area merging and MPI (Message Passing Interface to improve the performances of buffer analyses on processing large datasets. Experimental results reveal that there are three major performance bottlenecks which significantly impact the serial and parallel buffer construction efficiencies, including the area merging strategy, the task load balance method and the MPI inter-process results merging strategy. Corresponding optimization approaches involving tree-like area merging strategy, the vertex number oriented parallel task partition method and the inter-process results merging strategy were suggested to overcome these bottlenecks. Experiments were carried out to examine the performance efficiency of the optimized parallel algorithm. The estimation results suggested that the optimization approaches could provide high performance and processing ability for buffer construction in a cluster parallel environment. Our method could provide insights into the parallelization of spatial analysis algorithm.
Auctioning Bulk Mobile Messages

NARCIS (Netherlands)

S. Meij (Simon); L-F. Pau (Louis-François); H.W.G.M. van Heck (Eric)

2003-01-01

textabstractThe search for enablers of continued growth of SMS traffic, as well as the take-off of the more diversified MMS message contents, open up for enterprises the potential of bulk use of mobile messaging , instead of essentially one-by-one use. In parallel, such enterprises or value added
Distributed Programming via Safe Closure Passing

Directory of Open Access Journals (Sweden)

Philipp Haller

2016-02-01

Full Text Available Programming systems incorporating aspects of functional programming, e.g., higher-order functions, are becoming increasingly popular for large-scale distributed programming. New frameworks such as Apache Spark leverage functional techniques to provide high-level, declarative APIs for in-memory data analytics, often outperforming traditional "big data" frameworks like Hadoop MapReduce. However, widely-used programming models remain rather ad-hoc; aspects such as implementation trade-offs, static typing, and semantics are not yet well-understood. We present a new asynchronous programming model that has at its core several principles facilitating functional processing of distributed data. The emphasis of our model is on simplicity, performance, and expressiveness. The primary means of communication is by passing functions (closures to distributed, immutable data. To ensure safe and efficient distribution of closures, our model leverages both syntactic and type-based restrictions. We report on a prototype implementation in Scala. Finally, we present preliminary experimental results evaluating the performance impact of a static, type-based optimization of serialization.
Optimization and parallelization of the thermal–hydraulic subchannel code CTF for high-fidelity multi-physics applications

International Nuclear Information System (INIS)

Salko, Robert K.; Schmidt, Rodney C.; Avramova, Maria N.

2015-01-01

Highlights: • COBRA-TF was adopted by the Consortium for Advanced Simulation of LWRs. • We have improved code performance to support running large-scale LWR simulations. • Code optimization has led to reductions in execution time and memory usage. • An MPI parallelization has reduced full-core simulation time from days to minutes. - Abstract: This paper describes major improvements to the computational infrastructure of the CTF subchannel code so that full-core, pincell-resolved (i.e., one computational subchannel per real bundle flow channel) simulations can now be performed in much shorter run-times, either in stand-alone mode or as part of coupled-code multi-physics calculations. These improvements support the goals of the Department Of Energy Consortium for Advanced Simulation of Light Water Reactors (CASL) Energy Innovation Hub to develop high fidelity multi-physics simulation tools for nuclear energy design and analysis. A set of serial code optimizations—including fixing computational inefficiencies, optimizing the numerical approach, and making smarter data storage choices—are first described and shown to reduce both execution time and memory usage by about a factor of ten. Next, a “single program multiple data” parallelization strategy targeting distributed memory “multiple instruction multiple data” platforms utilizing domain decomposition is presented. In this approach, data communication between processors is accomplished by inserting standard Message-Passing Interface (MPI) calls at strategic points in the code. The domain decomposition approach implemented assigns one MPI process to each fuel assembly, with each domain being represented by its own CTF input file. The creation of CTF input files, both for serial and parallel runs, is also fully automated through use of a pressurized water reactor (PWR) pre-processor utility that uses a greatly simplified set of user input compared with the traditional CTF input. To run CTF in
Portable and Transparent Message Compression in MPI Libraries to Improve the Performance and Scalability of Parallel Applications

Energy Technology Data Exchange (ETDEWEB)

Albonesi, David; Burtscher, Martin

2009-04-17

The goal of this project has been to develop a lossless compression algorithm for message-passing libraries that can accelerate HPC systems by reducing the communication time. Because both compression and decompression have to be performed in software in real time, the algorithm has to be extremely fast while still delivering a good compression ratio. During the first half of this project, they designed a new compression algorithm called FPC for scientific double-precision data, made the source code available on the web, and published two papers describing its operation, the first in the proceedings of the Data Compression Conference and the second in the IEEE Transactions on Computers. At comparable average compression ratios, this algorithm compresses and decompresses 10 to 100 times faster than BZIP2, DFCM, FSD, GZIP, and PLMI on the three architectures tested. With prediction tables that fit into the CPU's L1 data acache, FPC delivers a guaranteed throughput of six gigabits per second on a 1.6 GHz Itanium 2 system. The C source code and documentation of FPC are posted on-line and have already been downloaded hundreds of times. To evaluate FPC, they gathered 13 real-world scientific datasets from around the globe, including satellite data, crash-simulation data, and messages from HPC systems. Based on the large number of requests they received, they also made these datasets available to the community (with permission of the original sources). While FPC represents a great step forward, it soon became clear that its throughput was too slow for the emerging 10 gigabits per second networks. Hence, no speedup can be gained by including this algorithm in an MPI library. They therefore changed the aim of the second half of the project. Instead of implementing FPC in an MPI library, they refocused their efforts to develop a parallel compression algorithm to further boost the throughput. After all, all modern high-end microprocessors contain multiple CPUs on a
Feasibility and Acceptability of a Text Messaging Program for Smoking Cessation in Israel.

Science.gov (United States)

Abroms, Lorien; Hershcovitz, Ronit; Boal, Ashley; Levine, Hagai

2015-08-01

Text messaging programs on mobile phones have been shown to promote smoking cessation. This study investigated whether a text-messaging program for smoking cessation, adapted from QuitNowTXT, is feasible in Israel and acceptable to Israeli smokers. Participants (N = 38) were given a baseline assessment, enrolled in the adapted text messaging program, and followed-up with at 2 weeks and 4 weeks after their quit date. The authors used an intent-to-treat analysis and found that 23.7% of participants reported having quit smoking at the 4-week follow-up. Participants sent an average of 12.9 text replies during the study period, and the majority reported reading most or all of the texts. However, 34.2% of participants had unsubscribed by the 4-week follow-up. Moderate levels of satisfaction were reported; more than half agreed that they would recommend the program. Suggestions for improvement included adding advice by an expert counselor, website support, and increased customization. Results indicate that a text messaging smoking cessation program developed by modifying the content of QuitNowTXT is feasible and could be acceptable to smokers in Israel. The experience adapting and pilot testing the program can serve as a model for using QuitNowTXT to develop and implement such programs in other countries.
Evaluation of technologies of parallel computers. Communication networks for a real-time triggering application for a high-energy physics experiment at CERN

International Nuclear Information System (INIS)

Hoertnagl, Ch.

1997-12-01

Experiments at the future Large Hadron Collider (LHC) at CERN will be faced with an extraordinary challenge of event selection in real time. The primary event rate, equal to the bunch crossing frequency of 40 MHz, will have to be reduced by a factor of almost one-in-a-million in order to reveal traces of rare physics processes from an abundant background. This work presents various contributions to ongoing feasibility studies concerning the possible use of commercial technologies from the proximities of parallel computers and their communication networks for the second trigger stage, which faces an average data input rate of 100 kHz. Studies in this thesis apply a combination of methodologies, namely the build-up of lab-scale prototype implementations (including their exposition to test beam runs), algorithm development, technology tracking and benchmarking, as well as discrete event simulation. The main contribution consists of several technology case studies, which are based on the exploration of a set of standard benchmark programs for revealing simple parameters for characterizing delays during communication. Studied technologies include the communication sub-system of the Meiko CS-2, Asynchronous Transfer Mode (ATM), MEMORY CHANNEL, and Scalable Coherent Interface (SCI); all could be considered typical for candidate technologies. The discussion sheds light on the relative benefits and costs associated with different parallel programming models, in general, and with the use of message-passing libraries, such as Message Passing Interface (MPI), in particular. Best observed end-user-to-end-user latencies were ∼ 10 μs, best asymptotic bandwidths were ∼ 70 MByte/s. Typical sub-patterns of communication that have to be applied in the second trigger stage were sustained at ∼ 13 kHz, using today's technologies in realistic embeddings. (author)
Using Partial Reconfiguration and Message Passing to Enable FPGA-Based Generic Computing Platforms

Directory of Open Access Journals (Sweden)

Manuel Saldaña

2012-01-01

Full Text Available Partial reconfiguration (PR is an FPGA feature that allows the modification of certain parts of an FPGA while the rest of the system continues to operate without disruption. This distinctive characteristic of FPGAs has many potential benefits but also challenges. The lack of good CAD tools and the deep hardware knowledge requirement result in a hard-to-use feature. In this paper, the new partition-based Xilinx PR flow is used to incorporate PR within our MPI-based message-passing framework to allow hardware designers to create template bitstreams, which are predesigned, prerouted, generic bitstreams that can be reused for multiple applications. As an example of the generality of this approach, four different applications that use the same template bitstream are run consecutively, with a PR operation performed at the beginning of each application to instantiate the desired application engine. We demonstrate a simplified, reusable, high-level, and portable PR interface for X86-FPGA hybrid machines. PR issues such as local resets of reconfigurable modules and context saving and restoring are addressed in this paper followed by some examples and preliminary PR overhead measurements.
Methods to assess youth engagement in a text messaging supplement to an effective teen pregnancy program.

Science.gov (United States)

Devine, Sharon; Leeds, Caroline; Shlay, Judith C; Leytem, Amber; Beum, Robert; Bull, Sheana

2015-08-01

Youth are prolific users of cell phone minutes and text messaging. Numerous programs using short message service text messaging (SMS) have been employed to help improve health behaviors and health outcomes. However, we lack information on whether and what type of interaction or engagement with SMS program content is required to realize any benefit. We explored youth engagement with an automated SMS program designed to supplement a 25-session youth development program with demonstrated efficacy for reductions in teen pregnancy. Using two years of program data, we report on youth participation in design of message content and response frequency to messages among youth enrolled in the intervention arm of a randomized controlled trial (RCT) as one indicator of engagement. There were 221 youth between the ages of 14-18 enrolled over two years in the intervention arm of the RCT. Just over half (51%) were female; 56% were Hispanic; and 27% African American. Youth were sent 40,006 messages of which 16,501 were considered bi-directional where youth were asked to text a response. Four-fifths (82%) responded at least once to a text. We found variations in response frequency by gender, age, and ethnicity. The most popular types of messages youth responded to include questions and quizzes. The first two months of the program in each year had the highest response frequency. An important next step is to assess whether higher response to SMS results in greater efficacy. This future work can facilitate greater attention to message design and content to ensure messages are engaging for the intended audience. Copyright © 2015 Elsevier Inc. All rights reserved.
Why pass on viral messages? Because they connect emotionally

NARCIS (Netherlands)

Dobele, A.; Lindgreen, A.; Beverland, M.; Vanhamme, J.; Wijk, van R.

2007-01-01

In this article, we identify that successful viral marketing campaigns trigger an emotional response in recipients. Working under this premise, we examine the effects of viral messages containing the six primary emotions (surprise, joy, sadness, anger, fear, and disgust) on recipients' emotional
The Glasgow Parallel Reduction Machine: Programming Shared-memory Many-core Systems using Parallel Task Composition

Directory of Open Access Journals (Sweden)

Ashkan Tousimojarad

2013-12-01

Full Text Available We present the Glasgow Parallel Reduction Machine (GPRM, a novel, flexible framework for parallel task-composition based many-core programming. We allow the programmer to structure programs into task code, written as C++ classes, and communication code, written in a restricted subset of C++ with functional semantics and parallel evaluation. In this paper we discuss the GPRM, the virtual machine framework that enables the parallel task composition approach. We focus the discussion on GPIR, the functional language used as the intermediate representation of the bytecode running on the GPRM. Using examples in this language we show the flexibility and power of our task composition framework. We demonstrate the potential using an implementation of a merge sort algorithm on a 64-core Tilera processor, as well as on a conventional Intel quad-core processor and an AMD 48-core processor system. We also compare our framework with OpenMP tasks in a parallel pointer chasing algorithm running on the Tilera processor. Our results show that the GPRM programs outperform the corresponding OpenMP codes on all test platforms, and can greatly facilitate writing of parallel programs, in particular non-data parallel algorithms such as reductions.
A hybrid parallel framework for the cellular Potts model simulations

Energy Technology Data Exchange (ETDEWEB)

Jiang, Yi [Los Alamos National Laboratory; He, Kejing [SOUTH CHINA UNIV; Dong, Shoubin [SOUTH CHINA UNIV

2009-01-01

The Cellular Potts Model (CPM) has been widely used for biological simulations. However, most current implementations are either sequential or approximated, which can't be used for large scale complex 3D simulation. In this paper we present a hybrid parallel framework for CPM simulations. The time-consuming POE solving, cell division, and cell reaction operation are distributed to clusters using the Message Passing Interface (MPI). The Monte Carlo lattice update is parallelized on shared-memory SMP system using OpenMP. Because the Monte Carlo lattice update is much faster than the POE solving and SMP systems are more and more common, this hybrid approach achieves good performance and high accuracy at the same time. Based on the parallel Cellular Potts Model, we studied the avascular tumor growth using a multiscale model. The application and performance analysis show that the hybrid parallel framework is quite efficient. The hybrid parallel CPM can be used for the large scale simulation ({approx}10{sup 8} sites) of complex collective behavior of numerous cells ({approx}10{sup 6}).
Parallelization of the MAAP-A code neutronics/thermal hydraulics coupling

International Nuclear Information System (INIS)

Froehle, P.H.; Wei, T.Y.C.; Weber, D.P.; Henry, R.E.

1998-01-01

A major new feature, one-dimensional space-time kinetics, has been added to a developmental version of the MAAP code through the introduction of the DIF3D-K module. This code is referred to as MAAP-A. To reduce the overall job time required, a capability has been provided to run the MAAP-A code in parallel. The parallel version of MAAP-A utilizes two machines running in parallel, with the DIF3D-K module executing on one machine and the rest of the MAAP-A code executing on the other machine. Timing results obtained during the development of the capability indicate that reductions in time of 30--40% are possible. The parallel version can be run on two SPARC 20 (SUN OS 5.5) workstations connected through the ethernet. MPI (Message Passing Interface standard) needs to be implemented on the machines. If necessary the parallel version can also be run on only one machine. The results obtained running in this one-machine mode identically match the results obtained from the serial version of the code
A Parallel, Finite-Volume Algorithm for Large-Eddy Simulation of Turbulent Flows

Science.gov (United States)

Bui, Trong T.

1999-01-01

A parallel, finite-volume algorithm has been developed for large-eddy simulation (LES) of compressible turbulent flows. This algorithm includes piecewise linear least-square reconstruction, trilinear finite-element interpolation, Roe flux-difference splitting, and second-order MacCormack time marching. Parallel implementation is done using the message-passing programming model. In this paper, the numerical algorithm is described. To validate the numerical method for turbulence simulation, LES of fully developed turbulent flow in a square duct is performed for a Reynolds number of 320 based on the average friction velocity and the hydraulic diameter of the duct. Direct numerical simulation (DNS) results are available for this test case, and the accuracy of this algorithm for turbulence simulations can be ascertained by comparing the LES solutions with the DNS results. The effects of grid resolution, upwind numerical dissipation, and subgrid-scale dissipation on the accuracy of the LES are examined. Comparison with DNS results shows that the standard Roe flux-difference splitting dissipation adversely affects the accuracy of the turbulence simulation. For accurate turbulence simulations, only 3-5 percent of the standard Roe flux-difference splitting dissipation is needed.
Advanced Messaging Concept Development Basic Safety Message

Data.gov (United States)

Department of Transportation — Contains all Basic Safety Messages (BSMs) collected during the Advanced Messaging Concept Development (AMCD) field testing program. For this project, all of the Part...
Parallel Monte Carlo simulation of aerosol dynamics

KAUST Repository

Zhou, K.

2014-01-01

A highly efficient Monte Carlo (MC) algorithm is developed for the numerical simulation of aerosol dynamics, that is, nucleation, surface growth, and coagulation. Nucleation and surface growth are handled with deterministic means, while coagulation is simulated with a stochastic method (Marcus-Lushnikov stochastic process). Operator splitting techniques are used to synthesize the deterministic and stochastic parts in the algorithm. The algorithm is parallelized using the Message Passing Interface (MPI). The parallel computing efficiency is investigated through numerical examples. Near 60% parallel efficiency is achieved for the maximum testing case with 3.7 million MC particles running on 93 parallel computing nodes. The algorithm is verified through simulating various testing cases and comparing the simulation results with available analytical and/or other numerical solutions. Generally, it is found that only small number (hundreds or thousands) of MC particles is necessary to accurately predict the aerosol particle number density, volume fraction, and so forth, that is, low order moments of the Particle Size Distribution (PSD) function. Accurately predicting the high order moments of the PSD needs to dramatically increase the number of MC particles. 2014 Kun Zhou et al.

Parallel hyperbolic PDE simulation on clusters: Cell versus GPU

Science.gov (United States)

Rostrup, Scott; De Sterck, Hans

2010-12-01

Increasingly, high-performance computing is looking towards data-parallel computational devices to enhance computational performance. Two technologies that have received significant attention are IBM's Cell Processor and NVIDIA's CUDA programming model for graphics processing unit (GPU) computing. In this paper we investigate the acceleration of parallel hyperbolic partial differential equation simulation on structured grids with explicit time integration on clusters with Cell and GPU backends. The message passing interface (MPI) is used for communication between nodes at the coarsest level of parallelism. Optimizations of the simulation code at the several finer levels of parallelism that the data-parallel devices provide are described in terms of data layout, data flow and data-parallel instructions. Optimized Cell and GPU performance are compared with reference code performance on a single x86 central processing unit (CPU) core in single and double precision. We further compare the CPU, Cell and GPU platforms on a chip-to-chip basis, and compare performance on single cluster nodes with two CPUs, two Cell processors or two GPUs in a shared memory configuration (without MPI). We finally compare performance on clusters with 32 CPUs, 32 Cell processors, and 32 GPUs using MPI. Our GPU cluster results use NVIDIA Tesla GPUs with GT200 architecture, but some preliminary results on recently introduced NVIDIA GPUs with the next-generation Fermi architecture are also included. This paper provides computational scientists and engineers who are considering porting their codes to accelerator environments with insight into how structured grid based explicit algorithms can be optimized for clusters with Cell and GPU accelerators. It also provides insight into the speed-up that may be gained on current and future accelerator architectures for this class of applications. Program summaryProgram title: SWsolver Catalogue identifier: AEGY_v1_0 Program summary URL
Passing crisis and emergency risk communications: the effects of communication channel, information type, and repetition.

Science.gov (United States)

Edworthy, Judy; Hellier, Elizabeth; Newbold, Lex; Titchener, Kirsteen

2015-05-01

Three experiments explore several factors which influence information transmission when warning messages are passed from person to person. In Experiment 1, messages were passed down chains of participants using five different modes of communication. Written communication channels resulted in more accurate message transmission than verbal. In addition, some elements of the message endured further down the chain than others. Experiment 2 largely replicated these effects and also demonstrated that simple repetition of a message eliminated differences between written and spoken communication. In a final field experiment, chains of participants passed information however they wanted to, with the proviso that half of the chains could not use telephones. Here, the lack of ability to use a telephone did not affect accuracy, but did slow down the speed of transmission from the recipient of the message to the last person in the chain. Implications of the findings for crisis and emergency risk communication are discussed. Copyright © 2015 Elsevier Ltd and The Ergonomics Society. All rights reserved.
Relationship Between Physician Assistant Program Length and Physician Assistant National Certifying Examination Pass Rates.

Science.gov (United States)

Colletti, Thomas P; Salisbury, Helen; Hertelendy, Attila J; Tseng, Tina

2016-03-01

This study was conducted to examine the relationship between physician assistant (PA) educational program length and PA programs' 5-year average Physician Assistant National Certifying Examination (PANCE) first-time pass rates. This was a retrospective correlational study that analyzed previously collected data from a nonprobability purposive sample of accredited PA program Web sites. Master's level PA programs (n = 108) in the United States with published average PANCE scores for 5 consecutive classes were included. Provisional and probationary programs were excluded (n = 4). Study data were not normally distributed per the Kolmogorov-Smirnov test, P = .00. There was no relationship between program length and PANCE pass rates, ρ (108) = -0.04, P = .68. Further analyses examining a possible relationship between program phase length (didactic and clinical) and PANCE pass rates also demonstrated no differences (ρ [107] = -0.05, P = .60 and ρ [107] = 0.02, P = .80, respectively). The results of this study suggest that shorter length PA programs perform similarly to longer programs in preparing students to pass the PANCE. In light of rapid expansion of PA educational programs, educators may want to consider these findings when planning the length of study for new and established programs.
Parallel Numerical Simulations of Water Reservoirs

Science.gov (United States)

Torres, Pedro; Mangiavacchi, Norberto

2010-11-01

The study of the water flow and scalar transport in water reservoirs is important for the determination of the water quality during the initial stages of the reservoir filling and during the life of the reservoir. For this scope, a parallel 2D finite element code for solving the incompressible Navier-Stokes equations coupled with scalar transport was implemented using the message-passing programming model, in order to perform simulations of hidropower water reservoirs in a computer cluster environment. The spatial discretization is based on the MINI element that satisfies the Babuska-Brezzi (BB) condition, which provides sufficient conditions for a stable mixed formulation. All the distributed data structures needed in the different stages of the code, such as preprocessing, solving and post processing, were implemented using the PETSc library. The resulting linear systems for the velocity and the pressure fields were solved using the projection method, implemented by an approximate block LU factorization. In order to increase the parallel performance in the solution of the linear systems, we employ the static condensation method for solving the intermediate velocity at vertex and centroid nodes separately. We compare performance results of the static condensation method with the approach of solving the complete system. In our tests the static condensation method shows better performance for large problems, at the cost of an increased memory usage. Performance results for other intensive parts of the code in a computer cluster are also presented.
Development of parallel Fokker-Planck code ALLAp

International Nuclear Information System (INIS)

Batishcheva, A.A.; Sigmar, D.J.; Koniges, A.E.

1996-01-01

We report on our ongoing development of the 3D Fokker-Planck code ALLA for a highly collisional scrape-off-layer (SOL) plasma. A SOL with strong gradients of density and temperature in the spatial dimension is modeled. Our method is based on a 3-D adaptive grid (in space, magnitude of the velocity, and cosine of the pitch angle) and a second order conservative scheme. Note that the grid size is typically 100 x 257 x 65 nodes. It was shown in our previous work that only these capabilities make it possible to benchmark a 3D code against a spatially-dependent self-similar solution of a kinetic equation with the Landau collision term. In the present work we show results of a more precise benchmarking against the exact solutions of the kinetic equation using a new parallel code ALLAp with an improved method of parallelization and a modified boundary condition at the plasma edge. We also report first results from the code parallelization using Message Passing Interface for a Massively Parallel CRI T3D platform. We evaluate the ALLAp code performance versus the number of T3D processors used and compare its efficiency against a Work/Data Sharing parallelization scheme and a workstation version
A new parallel algorithm and its simulation on hypercube simulator for low pass digital image filtering using systolic array

International Nuclear Information System (INIS)

Al-Hallaq, A.; Amin, S.

1998-01-01

This paper introduces a new parallel algorithm and its simulation on a hypercube simulator for the low pass digital image filtering using a systolic array. This new algorithm is faster than the old one (Amin, 1988). This is due to the the fact that the old algorithm carries out the addition operations in a sequential mode. But in our new design these addition operations are divided into tow groups, which can be performed in parallel. One group will be performed on one half of the systolic array and the other on the second half, that is, by folding. This parallelism reduces the time required for the whole process by almost quarter the time of the old algorithm.(authors). 18 refs., 3 figs
Feasibility and Acceptability of a Text Message-Based Smoking Cessation Program for Young Adults in Lima, Peru: Pilot Study.

Science.gov (United States)

Blitchtein-Winicki, Dora; Zevallos, Karine; Samolski, M Reuven; Requena, David; Velarde, Chaska; Briceño, Patricia; Piazza, Marina; Ybarra, Michele L

2017-08-04

In Peru's urban communities, tobacco smoking generally starts during adolescence and smoking prevalence is highest among young adults. Each year, many attempt to quit, but access to smoking cessation programs is limited. Evidence-based text messaging smoking cessation programs are an alternative that has been successfully implemented in high-income countries, but not yet in middle- and low-income countries with limited tobacco control policies. The objective was to assess the feasibility and acceptability of an short message service (SMS) text message-based cognitive behavioral smoking cessation program for young adults in Lima, Peru. Recruitment included using flyers and social media ads to direct young adults interested in quitting smoking to a website where interested participants completed a Google Drive survey. Inclusion criteria were being between ages 18 and 25 years, smoking at least four cigarettes per day at least 6 days per week, willing to quit in the next 30 days, owning a mobile phone, using SMS text messaging at least once in past year, and residing in Lima. Participants joined one of three phases: (1) focus groups and in-depth interviews whose feedback was used to develop the SMS text messages, (2) validating the SMS text messages, and (3) a pilot of the SMS text message-based smoking cessation program to test its feasibility and acceptability among young adults in Lima. The outcome measures included adherence to the SMS text message-based program, acceptability of content, and smoking abstinence self-report on days 2, 7, and 30 after quitting. Of 639 participants who completed initial online surveys, 42 met the inclusion criteria and 35 agreed to participate (focus groups and interviews: n=12; validate SMS text messages: n=8; program pilot: n=15). Common quit practices and beliefs emerged from participants in the focus groups and interviews informed the content, tone, and delivery schedule of the messages used in the SMS text message smoking
Parallelizing AT with MatlabMPI

International Nuclear Information System (INIS)

2011-01-01

The Accelerator Toolbox (AT) is a high-level collection of tools and scripts specifically oriented toward solving problems dealing with computational accelerator physics. It is integrated into the MATLAB environment, which provides an accessible, intuitive interface for accelerator physicists, allowing researchers to focus the majority of their efforts on simulations and calculations, rather than programming and debugging difficulties. Efforts toward parallelization of AT have been put in place to upgrade its performance to modern standards of computing. We utilized the packages MatlabMPI and pMatlab, which were developed by MIT Lincoln Laboratory, to set up a message-passing environment that could be called within MATLAB, which set up the necessary pre-requisites for multithread processing capabilities. On local quad-core CPUs, we were able to demonstrate processor efficiencies of roughly 95% and speed increases of nearly 380%. By exploiting the efficacy of modern-day parallel computing, we were able to demonstrate incredibly efficient speed increments per processor in AT's beam-tracking functions. Extrapolating from prediction, we can expect to reduce week-long computation runtimes to less than 15 minutes. This is a huge performance improvement and has enormous implications for the future computing power of the accelerator physics group at SSRL. However, one of the downfalls of parringpass is its current lack of transparency; the pMatlab and MatlabMPI packages must first be well-understood by the user before the system can be configured to run the scripts. In addition, the instantiation of argument parameters requires internal modification of the source code. Thus, parringpass, cannot be directly run from the MATLAB command line, which detracts from its flexibility and user-friendliness. Future work in AT's parallelization will focus on development of external functions and scripts that can be called from within MATLAB and configured on multiple nodes, while
The kpx, a program analyzer for parallelization

International Nuclear Information System (INIS)

Matsuyama, Yuji; Orii, Shigeo; Ota, Toshiro; Kume, Etsuo; Aikawa, Hiroshi.

1997-03-01

The kpx is a program analyzer, developed as a common technological basis for promoting parallel processing. The kpx consists of three tools. The first is ktool, that shows how much execution time is spent in program segments. The second is ptool, that shows parallelization overhead on the Paragon system. The last is xtool, that shows parallelization overhead on the VPP system. The kpx, designed to work for any FORTRAN cord on any UNIX computer, is confirmed to work well after testing on Paragon, SP2, SR2201, VPP500, VPP300, Monte-4, SX-4 and T90. (author)
Speedup predictions on large scientific parallel programs

International Nuclear Information System (INIS)

Williams, E.; Bobrowicz, F.

1985-01-01

How much speedup can we expect for large scientific parallel programs running on supercomputers. For insight into this problem we extend the parallel processing environment currently existing on the Cray X-MP (a shared memory multiprocessor with at most four processors) to a simulated N-processor environment, where N greater than or equal to 1. Several large scientific parallel programs from Los Alamos National Laboratory were run in this simulated environment, and speedups were predicted. A speedup of 14.4 on 16 processors was measured for one of the three most used codes at the Laboratory
Declarative Parallel Programming in Spreadsheet End-User Development

DEFF Research Database (Denmark)

Biermann, Florian

2016-01-01

Spreadsheets are first-order functional languages and are widely used in research and industry as a tool to conveniently perform all kinds of computations. Because cells on a spreadsheet are immutable, there are possibilities for implicit parallelization of spreadsheet computations. In this liter...... can directly apply results from functional array programming to a spreadsheet model of computations.......Spreadsheets are first-order functional languages and are widely used in research and industry as a tool to conveniently perform all kinds of computations. Because cells on a spreadsheet are immutable, there are possibilities for implicit parallelization of spreadsheet computations....... In this literature study, we provide an overview of the publications on spreadsheet end-user programming and declarative array programming to inform further research on parallel programming in spreadsheets. Our results show that there is a clear overlap between spreadsheet programming and array programming and we...
Programming parallel architectures - The BLAZE family of languages

Science.gov (United States)

Mehrotra, Piyush

1989-01-01

This paper gives an overview of the various approaches to programming multiprocessor architectures that are currently being explored. It is argued that two of these approaches, interactive programming environments and functional parallel languages, are particularly attractive, since they remove much of the burden of exploiting parallel architectures from the user. This paper also describes recent work in the design of parallel languages. Research on languages for both shared and nonshared memory multiprocessors is described.
Programming parallel architectures: The BLAZE family of languages

Science.gov (United States)

Mehrotra, Piyush

1988-01-01

Programming multiprocessor architectures is a critical research issue. An overview is given of the various approaches to programming these architectures that are currently being explored. It is argued that two of these approaches, interactive programming environments and functional parallel languages, are particularly attractive since they remove much of the burden of exploiting parallel architectures from the user. Also described is recent work by the author in the design of parallel languages. Research on languages for both shared and nonshared memory multiprocessors is described, as well as the relations of this work to other current language research projects.
Parallel adaptation of a vectorised quantumchemical program system

International Nuclear Information System (INIS)

Van Corler, L.C.H.; Van Lenthe, J.H.

1987-01-01

Supercomputers, like the CRAY 1 or the Cyber 205, have had, and still have, a marked influence on Quantum Chemistry. Vectorization has led to a considerable increase in the performance of Quantum Chemistry programs. However, clockcycle times more than a factor 10 smaller than those of the present supercomputers are not to be expected. Therefore future supercomputers will have to depend on parallel structures. Recently, the first examples of such supercomputers have been installed. To be prepared for this new generation of (parallel) supercomputers one should consider the concepts one wants to use and the kind of problems one will encounter during implementation of existing vectorized programs on those parallel systems. The authors implemented four important parts of a large quantumchemical program system (ATMOL), i.e. integrals, SCF, 4-index and Direct-CI in the parallel environment at ECSEC (Rome, Italy). This system offers simulated parallellism on the host computer (IBM 4381) and real parallellism on at most 10 attached processors (FPS-164). Quantumchemical programs usually handle large amounts of data and very large, often sparse matrices. The transfer of that many data can cause problems concerning communication and overhead, in view of which shared memory and shared disks must be considered. The strategy and the tools that were used to parallellise the programs are shown. Also, some examples are presented to illustrate effectiveness and performance of the system in Rome for these type of calculations
Formative Research regarding Kidney Disease Health Information in a Latino American Sample: Associations among Message Frame, Threat, Efficacy, Message Effectiveness, and Behavioral Intention

Science.gov (United States)

Maguire, Katheryn C.; Gardner, Jay; Sopory, Pradeep; Jian, Guowei; Roach, Marcia; Amschlinger, Joe; Moreno, Marcia; Pettey, Gary; Piccone, Gianfranco

2010-01-01

Using prospect theory and the extended parallel process model, this study examined the effect of gain/loss message framing on perceptions of severity, susceptibility, response efficacy, and self efficacy (derived from the extended parallel process model), as well as perception of message effectiveness and behavioral intention in a community based…
A parallel solver for huge dense linear systems

Science.gov (United States)

Badia, J. M.; Movilla, J. L.; Climente, J. I.; Castillo, M.; Marqués, M.; Mayo, R.; Quintana-Ortí, E. S.; Planelles, J.

2011-11-01

HDSS (Huge Dense Linear System Solver) is a Fortran Application Programming Interface (API) to facilitate the parallel solution of very large dense systems to scientists and engineers. The API makes use of parallelism to yield an efficient solution of the systems on a wide range of parallel platforms, from clusters of processors to massively parallel multiprocessors. It exploits out-of-core strategies to leverage the secondary memory in order to solve huge linear systems O(100.000). The API is based on the parallel linear algebra library PLAPACK, and on its Out-Of-Core (OOC) extension POOCLAPACK. Both PLAPACK and POOCLAPACK use the Message Passing Interface (MPI) as the communication layer and BLAS to perform the local matrix operations. The API provides a friendly interface to the users, hiding almost all the technical aspects related to the parallel execution of the code and the use of the secondary memory to solve the systems. In particular, the API can automatically select the best way to store and solve the systems, depending of the dimension of the system, the number of processes and the main memory of the platform. Experimental results on several parallel platforms report high performance, reaching more than 1 TFLOP with 64 cores to solve a system with more than 200 000 equations and more than 10 000 right-hand side vectors. New version program summaryProgram title: Huge Dense System Solver (HDSS) Catalogue identifier: AEHU_v1_1 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEHU_v1_1.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 87 062 No. of bytes in distributed program, including test data, etc.: 1 069 110 Distribution format: tar.gz Programming language: Fortran90, C Computer: Parallel architectures: multiprocessors, computer clusters Operating system
Parallel multigrid methods: implementation on message-passing computers and applications to fluid dynamics. A draft

International Nuclear Information System (INIS)

Solchenbach, K.; Thole, C.A.; Trottenberg, U.

1987-01-01

For a wide class of problems in scientific computing, in particular for partial differential equations, the multigrid principle has proved to yield highly efficient numerical methods. However, the principle has to be applied carefully: if the multigrid components are not chosen adequately with respect to the given problem, the efficiency may be much smaller than possible. This has been demonstrated for many practical problems. Unfortunately, the general theories on multigrid convergence do not give much help in constructing really efficient multigrid algorithms. Although some progress has been made in bridging the gap between theory and practice during the last few years, there are still several theoretical approaches which are misleading rather than helpful with respect to the objective of real efficiency. The research in finding highly efficient algorithms for non-model applications therefore is still a sophisticated mixture of theoretical considerations, a transfer of experiences from model to real life problems and systematical experimental work. The emphasis of the practical research activity today lies - among others - in the following fields: - finding efficient multigrid components for really complex problems, - combining the multigrid approach with advanced discretizative techniques: - constructing highly parallel multigrid algorithms. In this paper, we want to deal mainly with the last topic
The BLAZE language - A parallel language for scientific programming

Science.gov (United States)

Mehrotra, Piyush; Van Rosendale, John

1987-01-01

A Pascal-like scientific programming language, BLAZE, is described. BLAZE contains array arithmetic, forall loops, and APL-style accumulation operators, which allow natural expression of fine grained parallelism. It also employs an applicative or functional procedure invocation mechanism, which makes it easy for compilers to extract coarse grained parallelism using machine specific program restructuring. Thus BLAZE should allow one to achieve highly parallel execution on multiprocessor architectures, while still providing the user with conceptually sequential control flow. A central goal in the design of BLAZE is portability across a broad range of parallel architectures. The multiple levels of parallelism present in BLAZE code, in principle, allow a compiler to extract the types of parallelism appropriate for the given architecture while neglecting the remainder. The features of BLAZE are described and it is shown how this language would be used in typical scientific programming.
The BLAZE language: A parallel language for scientific programming

Science.gov (United States)

Mehrotra, P.; Vanrosendale, J.

1985-01-01

A Pascal-like scientific programming language, Blaze, is described. Blaze contains array arithmetic, forall loops, and APL-style accumulation operators, which allow natural expression of fine grained parallelism. It also employs an applicative or functional procedure invocation mechanism, which makes it easy for compilers to extract coarse grained parallelism using machine specific program restructuring. Thus Blaze should allow one to achieve highly parallel execution on multiprocessor architectures, while still providing the user with onceptually sequential control flow. A central goal in the design of Blaze is portability across a broad range of parallel architectures. The multiple levels of parallelism present in Blaze code, in principle, allow a compiler to extract the types of parallelism appropriate for the given architecture while neglecting the remainder. The features of Blaze are described and shows how this language would be used in typical scientific programming.
Parallel processing of two-dimensional Sn transport calculations

International Nuclear Information System (INIS)

Uematsu, M.

1997-01-01

A parallel processing method for the two-dimensional S n transport code DOT3.5 has been developed to achieve a drastic reduction in computation time. In the proposed method, parallelization is achieved with angular domain decomposition and/or space domain decomposition. The calculational speed of parallel processing by angular domain decomposition is largely influenced by frequent communications between processing elements. To assess parallelization efficiency, sample problems with up to 32 x 32 spatial meshes were solved with a Sun workstation using the PVM message-passing library. As a result, parallel calculation using 16 processing elements, for example, was found to be nine times as fast as that with one processing element. As for parallel processing by geometry segmentation, the influence of processing element communications on computation time is small; however, discontinuity at the segment boundary degrades convergence speed. To accelerate the convergence, an alternate sweep of angular flux in conjunction with space domain decomposition and a two-step rescaling method consisting of segmentwise rescaling and ordinary pointwise rescaling have been developed. By applying the developed method, the number of iterations needed to obtain a converged flux solution was reduced by a factor of 2. As a result, parallel calculation using 16 processing elements was found to be 5.98 times as fast as the original DOT3.5 calculation

Regional-scale calculation of the LS factor using parallel processing

Science.gov (United States)

Liu, Kai; Tang, Guoan; Jiang, Ling; Zhu, A.-Xing; Yang, Jianyi; Song, Xiaodong

2015-05-01

With the increase of data resolution and the increasing application of USLE over large areas, the existing serial implementation of algorithms for computing the LS factor is becoming a bottleneck. In this paper, a parallel processing model based on message passing interface (MPI) is presented for the calculation of the LS factor, so that massive datasets at a regional scale can be processed efficiently. The parallel model contains algorithms for calculating flow direction, flow accumulation, drainage network, slope, slope length and the LS factor. According to the existence of data dependence, the algorithms are divided into local algorithms and global algorithms. Parallel strategy are designed according to the algorithm characters including the decomposition method for maintaining the integrity of the results, optimized workflow for reducing the time taken for exporting the unnecessary intermediate data and a buffer-communication-computation strategy for improving the communication efficiency. Experiments on a multi-node system show that the proposed parallel model allows efficient calculation of the LS factor at a regional scale with a massive dataset.
Parallel programming practical aspects, models and current limitations

CERN Document Server

Tarkov, Mikhail S

2014-01-01

Parallel programming is designed for the use of parallel computer systems for solving time-consuming problems that cannot be solved on a sequential computer in a reasonable time. These problems can be divided into two classes: 1. Processing large data arrays (including processing images and signals in real time)2. Simulation of complex physical processes and chemical reactions For each of these classes, prospective methods are designed for solving problems. For data processing, one of the most promising technologies is the use of artificial neural networks. Particles-in-cell method and cellular automata are very useful for simulation. Problems of scalability of parallel algorithms and the transfer of existing parallel programs to future parallel computers are very acute now. An important task is to optimize the use of the equipment (including the CPU cache) of parallel computers. Along with parallelizing information processing, it is essential to ensure the processing reliability by the relevant organization ...
Towards high-performance symbolic computing using MuPAD as a problem solving environment

CERN Document Server

Sorgatz, A

1999-01-01

This article discusses the approach of developing MuPAD into an open and parallel problem solving environment for mathematical applications. It introduces the key technologies domains and dynamic modules and describes the current $9 state of macro parallelism which covers three fields of parallel programming: message passing, network variables and work groups. First parallel algorithms and examples of using the prototype of the MuPAD problem solving environment $9 are demonstrated. (12 refs).
Behaviour of Parallel Coupled Microstrip Band Pass Filter and Simple Microstripline due to Thin-Film Al2O3 Overlay

Directory of Open Access Journals (Sweden)

S. B. Rane

1996-01-01

Full Text Available The X-band behaviour of a seven-section parallel-coupled microstrip band pass filter and microstripline due to thin-film Al2O3 overlay of different thickness is reported in this paper. This Al2O3 film can give a homogeneous overlay structure. There is a substantial increase in the bandwidth due to the overlay, the pass band extending towards higher frequency side. In most of the cases, an increase in the pass band transmittance of a microstripline also increases due to a thin-film Al2O3 overlay, especially for frequencies less than 9.0 GHz. At higher frequencies, random variations are observed. It is felt that thin-film overlays can be used to modify the microstripline circuit properties, thereby avoiding costly and time consuming elaborate design procedures.
A massively parallel algorithm for the collision probability calculations in the Apollo-II code using the PVM library

International Nuclear Information System (INIS)

Stankovski, Z.

1995-01-01

The collision probability method in neutron transport, as applied to 2D geometries, consume a great amount of computer time, for a typical 2D assembly calculation evaluations. Consequently RZ or 3D calculations became prohibitive. In this paper we present a simple but efficient parallel algorithm based on the message passing host/node programing model. Parallelization was applied to the energy group treatment. Such approach permits parallelization of the existing code, requiring only limited modifications. Sequential/parallel computer portability is preserved, witch is a necessary condition for a industrial code. Sequential performances are also preserved. The algorithm is implemented on a CRAY 90 coupled to a 128 processor T3D computer, a 16 processor IBM SP1 and a network of workstations, using the Public Domain PVM library. The tests were executed for a 2D geometry with the standard 99-group library. All results were very satisfactory, the best ones with IBM SP1. Because of heterogeneity of the workstation network, we did ask high performances for this architecture. The same source code was used for all computers. A more impressive advantage of this algorithm will appear in the calculations of the SAPHYR project (with the future fine multigroup library of about 8000 groups) with a massively parallel computer, using several hundreds of processors. (author). 5 refs., 6 figs., 2 tabs
On the Automatic Parallelization of Sparse and Irregular Fortran Programs

Directory of Open Access Journals (Sweden)

Yuan Lin

1999-01-01

Full Text Available Automatic parallelization is usually believed to be less effective at exploiting implicit parallelism in sparse/irregular programs than in their dense/regular counterparts. However, not much is really known because there have been few research reports on this topic. In this work, we have studied the possibility of using an automatic parallelizing compiler to detect the parallelism in sparse/irregular programs. The study with a collection of sparse/irregular programs led us to some common loop patterns. Based on these patterns new techniques were derived that produced good speedups when manually applied to our benchmark codes. More importantly, these parallelization methods can be implemented in a parallelizing compiler and can be applied automatically.
Survey on present status and trend of parallel programming environments

International Nuclear Information System (INIS)

Takemiya, Hiroshi; Higuchi, Kenji; Honma, Ichiro; Ohta, Hirofumi; Kawasaki, Takuji; Imamura, Toshiyuki; Koide, Hiroshi; Akimoto, Masayuki.

1997-03-01

This report intends to provide useful information on software tools for parallel programming through the survey on parallel programming environments of the following six parallel computers, Fujitsu VPP300/500, NEC SX-4, Hitachi SR2201, Cray T94, IBM SP, and Intel Paragon, all of which are installed at Japan Atomic Energy Research Institute (JAERI), moreover, the present status of R and D's on parallel softwares of parallel languages, compilers, debuggers, performance evaluation tools, and integrated tools is reported. This survey has been made as a part of our project of developing a basic software for parallel programming environment, which is designed on the concept of STA (Seamless Thinking Aid to programmers). (author)
Parallelization for first principles electronic state calculation program

International Nuclear Information System (INIS)

Watanabe, Hiroshi; Oguchi, Tamio.

1997-03-01

In this report we study the parallelization for First principles electronic state calculation program. The target machines are NEC SX-4 for shared memory type parallelization and FUJITSU VPP300 for distributed memory type parallelization. The features of each parallel machine are surveyed, and the parallelization methods suitable for each are proposed. It is shown that 1.60 times acceleration is achieved with 2 CPU parallelization by SX-4 and 4.97 times acceleration is achieved with 12 PE parallelization by VPP 300. (author)
Parallelization and implementation of approximate root isolation for nonlinear system by Monte Carlo

Science.gov (United States)

Khosravi, Ebrahim

1998-12-01

This dissertation solves a fundamental problem of isolating the real roots of nonlinear systems of equations by Monte-Carlo that were published by Bush Jones. This algorithm requires only function values and can be applied readily to complicated systems of transcendental functions. The implementation of this sequential algorithm provides scientists with the means to utilize function analysis in mathematics or other fields of science. The algorithm, however, is so computationally intensive that the system is limited to a very small set of variables, and this will make it unfeasible for large systems of equations. Also a computational technique was needed for investigating a metrology of preventing the algorithm structure from converging to the same root along different paths of computation. The research provides techniques for improving the efficiency and correctness of the algorithm. The sequential algorithm for this technique was corrected and a parallel algorithm is presented. This parallel method has been formally analyzed and is compared with other known methods of root isolation. The effectiveness, efficiency, enhanced overall performance of the parallel processing of the program in comparison to sequential processing is discussed. The message passing model was used for this parallel processing, and it is presented and implemented on Intel/860 MIMD architecture. The parallel processing proposed in this research has been implemented in an ongoing high energy physics experiment: this algorithm has been used to track neutrinoes in a super K detector. This experiment is located in Japan, and data can be processed on-line or off-line locally or remotely.
PENN PASS: a program for graduates of foreign dental schools.

Science.gov (United States)

Berthold, P; Lopez, N

1994-01-01

An increasing number of graduates of foreign dental schools who enroll in advanced standing programs to qualify for licensure calls for dental schools to be prepared to handle not only the curricular demands but also the growing cultural diversity among its student population. The "reeducation" of this student group not only meets the need of foreign dentists for an American degree but may also provide health professionals to service various ethnic populations whose language and culture they are able to understand and identify with. A survey of students and graduates of a two-year Program for Advanced Standing Students (PASS) for graduates of foreign dental schools representing 34 countries aimed to arrive at an understanding of this student group through characterization of the foreign dentists and identification of their attitudes and feelings toward various aspects of the program, the school and faculty and their experience of stress. This report includes description of the distinctive features of the program which cater to specific needs and concerns of this non-traditional group of dental students. PASS students are accepted on the basis of their grades in dental school in home country, scores in the National Dental Board Examination Part I, Test of English as Foreign Language (TOEFL), and ratings in personal interviews. They complete an intensive summer program consisting of didactic and laboratory courses which prepares them for integration with four-year students for the last two years of didactic and clinical curriculum. Cultural diversity seminars, a special English class, PASS class meetings and seminars are unique additions to their program and aim to assist them adjust to the educational, social and cultural systems in an American school. Results of the survey show a majority of the PASS students feel that they are part of the school and that there is someone in the school whom they can approach for problems. An understanding of their ethnic and
An object-oriented programming paradigm for parallelization of computational fluid dynamics

International Nuclear Information System (INIS)

Ohta, Takashi.

1997-03-01

We propose an object-oriented programming paradigm for parallelization of scientific computing programs, and show that the approach can be a very useful strategy. Generally, parallelization of scientific programs tends to be complicated and unportable due to the specific requirements of each parallel computer or compiler. In this paper, we show that the object-oriented programming design, which separates the parallel processing parts from the solver of the applications, can achieve the large improvement in the maintenance of the codes, as well as the high portability. We design the program for the two-dimensional Euler equations according to the paradigm, and evaluate the parallel performance on IBM SP2. (author)
Performance and scalability analysis of teraflop-scale parallel architectures using multidimensional wavefront applications

International Nuclear Information System (INIS)

Hoisie, A.; Lubeck, O.; Wasserman, H.

1998-01-01

The authors develop a model for the parallel performance of algorithms that consist of concurrent, two-dimensional wavefronts implemented in a message passing environment. The model, based on a LogGP machine parameterization, combines the separate contributions of computation and communication wavefronts. They validate the model on three important supercomputer systems, on up to 500 processors. They use data from a deterministic particle transport application taken from the ASCI workload, although the model is general to any wavefront algorithm implemented on a 2-D processor domain. They also use the validated model to make estimates of performance and scalability of wavefront algorithms on 100-TFLOPS computer systems expected to be in existence within the next decade as part of the ASCI program and elsewhere. In this context, they analyze two problem sizes. The model shows that on the largest such problem (1 billion cells), inter-processor communication performance is not the bottleneck. Single-node efficiency is the dominant factor
SBML-PET-MPI: a parallel parameter estimation tool for Systems Biology Markup Language based models.

Science.gov (United States)

Zi, Zhike

2011-04-01

Parameter estimation is crucial for the modeling and dynamic analysis of biological systems. However, implementing parameter estimation is time consuming and computationally demanding. Here, we introduced a parallel parameter estimation tool for Systems Biology Markup Language (SBML)-based models (SBML-PET-MPI). SBML-PET-MPI allows the user to perform parameter estimation and parameter uncertainty analysis by collectively fitting multiple experimental datasets. The tool is developed and parallelized using the message passing interface (MPI) protocol, which provides good scalability with the number of processors. SBML-PET-MPI is freely available for non-commercial use at http://www.bioss.uni-freiburg.de/cms/sbml-pet-mpi.html or http://sites.google.com/site/sbmlpetmpi/.
Automatic Parallelization Tool: Classification of Program Code for Parallel Computing

Directory of Open Access Journals (Sweden)

Mustafa Basthikodi

2016-04-01

Full Text Available Performance growth of single-core processors has come to a halt in the past decade, but was re-enabled by the introduction of parallelism in processors. Multicore frameworks along with Graphical Processing Units empowered to enhance parallelism broadly. Couples of compilers are updated to developing challenges forsynchronization and threading issues. Appropriate program and algorithm classifications will have advantage to a great extent to the group of software engineers to get opportunities for effective parallelization. In present work we investigated current species for classification of algorithms, in that related work on classification is discussed along with the comparison of issues that challenges the classification. The set of algorithms are chosen which matches the structure with different issues and perform given task. We have tested these algorithms utilizing existing automatic species extraction toolsalong with Bones compiler. We have added functionalities to existing tool, providing a more detailed characterization. The contributions of our work include support for pointer arithmetic, conditional and incremental statements, user defined types, constants and mathematical functions. With this, we can retain significant data which is not captured by original speciesof algorithms. We executed new theories into the device, empowering automatic characterization of program code.
Communicating with the workforce during emergencies: developing an employee text messaging program in a local public health setting.

Science.gov (United States)

Karasz, Hilary N; Bogan, Sharon; Bosslet, Lindsay

2014-01-01

Short message service (SMS) text messaging can be useful for communicating information to public health employees and improving workforce situational awareness during emergencies. We sought to understand how the 1,500 employees at Public Health--Seattle & King County, Washington, perceived barriers to and benefits of participation in a voluntary, employer-based SMS program. Based on employee feedback, we developed the system, marketed it, and invited employees to opt in. The system was tested during an ice storm in January 2012. Employee concerns about opting into an SMS program included possible work encroachment during non-work time and receiving excessive irrelevant messages. Employees who received messages during the weather event reported high levels of satisfaction and perceived utility from the program. We conclude that text messaging is a feasible form of communication with employees during emergencies. Care should be taken to design and deploy a program that maximizes employee satisfaction.
Algorithm for solving the linear Cauchy problem for large systems of ordinary differential equations with the use of parallel computations

Energy Technology Data Exchange (ETDEWEB)

Moryakov, A. V., E-mail: sailor@orc.ru [National Research Centre Kurchatov Institute (Russian Federation)

2016-12-15

An algorithm for solving the linear Cauchy problem for large systems of ordinary differential equations is presented. The algorithm for systems of first-order differential equations is implemented in the EDELWEISS code with the possibility of parallel computations on supercomputers employing the MPI (Message Passing Interface) standard for the data exchange between parallel processes. The solution is represented by a series of orthogonal polynomials on the interval [0, 1]. The algorithm is characterized by simplicity and the possibility to solve nonlinear problems with a correction of the operator in accordance with the solution obtained in the previous iterative process.
Text Messaging Based Obesity Prevention Program for Parents of Pre-Adolescent African American Girls

Directory of Open Access Journals (Sweden)

Chishinga Callender

2017-12-01

Full Text Available African American girls are at a greater risk of obesity than their nonminority peers. Parents have the primary control over the home environment and play an important role in the child obesity prevention. Obesity prevention programs to help parents develop an obesity-preventive home environment are needed. The purpose of this study was to collect formative research from parents of 8–10-year old African American girls about perceptions, expectations, and content for a text messaging based program. Mothers (n = 30 participated in surveys and interviews to inform message development and content. A professional expert panel (n = 10 reviewed draft text messages via a survey. All the mothers reported owning a cellphone with an unlimited texting plan, and they used their cellphones for texting (90.0% and accessing the Internet (100.0%. The majority were interested in receiving text messages about healthy eating and physical activity (86.7%. Interviews confirmed survey findings. One hundred and seven text messages promoting an obesity-preventive home environment were developed. The expert panel and parents reported positive reactions to draft text messages. This research provides evidence that mobile health (mHealth interventions appeal to parents of African American girls and they have ready access to the technology with which to support this approach.
Parallel definition of tear film maps on distributed-memory clusters for the support of dry eye diagnosis.

Science.gov (United States)

González-Domínguez, Jorge; Remeseiro, Beatriz; Martín, María J

2017-02-01

The analysis of the interference patterns on the tear film lipid layer is a useful clinical test to diagnose dry eye syndrome. This task can be automated with a high degree of accuracy by means of the use of tear film maps. However, the time required by the existing applications to generate them prevents a wider acceptance of this method by medical experts. Multithreading has been previously successfully employed by the authors to accelerate the tear film map definition on multicore single-node machines. In this work, we propose a hybrid message-passing and multithreading parallel approach that further accelerates the generation of tear film maps by exploiting the computational capabilities of distributed-memory systems such as multicore clusters and supercomputers. The algorithm for drawing tear film maps is parallelized using Message Passing Interface (MPI) for inter-node communications and the multithreading support available in the C++11 standard for intra-node parallelization. The original algorithm is modified to reduce the communications and increase the scalability. The hybrid method has been tested on 32 nodes of an Intel cluster (with two 12-core Haswell 2680v3 processors per node) using 50 representative images. Results show that maximum runtime is reduced from almost two minutes using the previous only-multithreaded approach to less than ten seconds using the hybrid method. The hybrid MPI/multithreaded implementation can be used by medical experts to obtain tear film maps in only a few seconds, which will significantly accelerate and facilitate the diagnosis of the dry eye syndrome. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Corporate social marketing: message design to recruit program participants.

Science.gov (United States)

Black, David R; Blue, Carolyn L; Coster, Daniel C; Chrysler, Lisa M

2002-01-01

To identify variables for a corporate social marketing (SM) health message based on the 4 Ps of SM in order to recruit future participants to an existing national, commercial, self-administered weight-loss program. A systematically evaluated, author-developed, 310-response survey was administered to a random sample of 270 respondents. A previously established research plan was used to empirically identify the audience segments and the "marketing mix" appropriate for the total sample and each segment. Tangible product, pertaining to the unique program features, should be emphasized rather than positive core product and outcome expectation related to use of the program.
Experiences in the parallelization of the discrete ordinates method using OpenMP and MPI

Energy Technology Data Exchange (ETDEWEB)

Pautz, A. [TUV Hannover/Sachsen-Anhalt e.V. (Germany); Langenbuch, S. [Gesellschaft fur Anlagen- und Reaktorsicherheit (GRS) mbH (Germany)

2003-07-01

The method of Discrete Ordinates is in principle parallelizable to a high degree, since the transport 'mesh sweeps' are mutually independent for all angular directions. However, in the well-known production code Dort such a type of angular domain decomposition has to be done on a spatial line-byline basis, causing the parallelism in the code to be very fine-grained. The construction of scalar fluxes and moments requires a large effort for inter-thread or inter-process communication. We have implemented two different parallelization approaches in Dort: firstly, we have used a shared-memory model suitable for SMP (Symmetric Multiprocessor) machines based on the standard OpenMP. The second approach uses the well-known Message Passing Interface (MPI) to establish communication between parallel processes running in a distributed-memory environment. We investigate the benefits and drawbacks of both models and show first results on performance and scaling behaviour of the parallel Dort code. (authors)

Experiences in the parallelization of the discrete ordinates method using OpenMP and MPI

International Nuclear Information System (INIS)

Pautz, A.; Langenbuch, S.

2003-01-01

The method of Discrete Ordinates is in principle parallelizable to a high degree, since the transport 'mesh sweeps' are mutually independent for all angular directions. However, in the well-known production code Dort such a type of angular domain decomposition has to be done on a spatial line-byline basis, causing the parallelism in the code to be very fine-grained. The construction of scalar fluxes and moments requires a large effort for inter-thread or inter-process communication. We have implemented two different parallelization approaches in Dort: firstly, we have used a shared-memory model suitable for SMP (Symmetric Multiprocessor) machines based on the standard OpenMP. The second approach uses the well-known Message Passing Interface (MPI) to establish communication between parallel processes running in a distributed-memory environment. We investigate the benefits and drawbacks of both models and show first results on performance and scaling behaviour of the parallel Dort code. (authors)
Relationship of residency program characteristics with pass rate of the American Board of Internal Medicine certifying exam

Directory of Open Access Journals (Sweden)

Amporn Atsawarungruangkit

2015-09-01

Full Text Available Objectives: To evaluate the relationship between the pass rate of the American Board of Internal Medicine (ABIM certifying exam and the characteristics of residency programs. Methods: The study used a retrospective, cross-sectional design with publicly available data from the ABIM and the Fellowship and Residency Electronic Interactive Database. All categorical residency programs with reported pass rates were included. Using univariate and multivariate, linear regression analyses, I analyzed how 69 factors (e.g., location, general information, number of faculty and trainees, work schedule, educational environment are related to the pass rate. Results: Of 371 programs, only one region had a significantly different pass rate from the other regions; however, as no other characteristics were reported in this region, I excluded program location from further analysis. In the multivariate analysis, pass rate was significantly associated with four program characteristics: ratio of full-time equivalent paid faculty to positions, percentage of osteopathic doctors, formal mentoring program, and on-site child care (OCC. Numerous factors were not associated at all, including minimum exam scores, salary, vacation days, and average hours per week. Conclusions: As shown through the ratio of full-time equivalent paid faculty to positions and whether there was a formal mentoring program, a highly supervised training experience was strongly associated with the pass rate. In contrast, percentage of osteopathic doctors was inversely related to the pass rate. Programs with OCC significantly outperformed programs without OCC. This study suggested that enhancing supervision of training programs and offering parental support may help attract and produce competitive residents.
Effects of Ordering Strategies and Programming Paradigms on Sparse Matrix Computations

Science.gov (United States)

Oliker, Leonid; Li, Xiaoye; Husbands, Parry; Biswas, Rupak; Biegel, Bryan (Technical Monitor)

2002-01-01

The Conjugate Gradient (CG) algorithm is perhaps the best-known iterative technique to solve sparse linear systems that are symmetric and positive definite. For systems that are ill-conditioned, it is often necessary to use a preconditioning technique. In this paper, we investigate the effects of various ordering and partitioning strategies on the performance of parallel CG and ILU(O) preconditioned CG (PCG) using different programming paradigms and architectures. Results show that for this class of applications: ordering significantly improves overall performance on both distributed and distributed shared-memory systems, that cache reuse may be more important than reducing communication, that it is possible to achieve message-passing performance using shared-memory constructs through careful data ordering and distribution, and that a hybrid MPI+OpenMP paradigm increases programming complexity with little performance gains. A implementation of CG on the Cray MTA does not require special ordering or partitioning to obtain high efficiency and scalability, giving it a distinct advantage for adaptive applications; however, it shows limited scalability for PCG due to a lack of thread level parallelism.
Detecting and Preventing Sybil Attacks in Wireless Sensor Networks Using Message Authentication and Passing Method.

Science.gov (United States)

Dhamodharan, Udaya Suriya Raj Kumar; Vayanaperumal, Rajamani

2015-01-01

Wireless sensor networks are highly indispensable for securing network protection. Highly critical attacks of various kinds have been documented in wireless sensor network till now by many researchers. The Sybil attack is a massive destructive attack against the sensor network where numerous genuine identities with forged identities are used for getting an illegal entry into a network. Discerning the Sybil attack, sinkhole, and wormhole attack while multicasting is a tremendous job in wireless sensor network. Basically a Sybil attack means a node which pretends its identity to other nodes. Communication to an illegal node results in data loss and becomes dangerous in the network. The existing method Random Password Comparison has only a scheme which just verifies the node identities by analyzing the neighbors. A survey was done on a Sybil attack with the objective of resolving this problem. The survey has proposed a combined CAM-PVM (compare and match-position verification method) with MAP (message authentication and passing) for detecting, eliminating, and eventually preventing the entry of Sybil nodes in the network. We propose a scheme of assuring security for wireless sensor network, to deal with attacks of these kinds in unicasting and multicasting.
Detecting and Preventing Sybil Attacks in Wireless Sensor Networks Using Message Authentication and Passing Method

Directory of Open Access Journals (Sweden)

Udaya Suriya Raj Kumar Dhamodharan

2015-01-01

Full Text Available Wireless sensor networks are highly indispensable for securing network protection. Highly critical attacks of various kinds have been documented in wireless sensor network till now by many researchers. The Sybil attack is a massive destructive attack against the sensor network where numerous genuine identities with forged identities are used for getting an illegal entry into a network. Discerning the Sybil attack, sinkhole, and wormhole attack while multicasting is a tremendous job in wireless sensor network. Basically a Sybil attack means a node which pretends its identity to other nodes. Communication to an illegal node results in data loss and becomes dangerous in the network. The existing method Random Password Comparison has only a scheme which just verifies the node identities by analyzing the neighbors. A survey was done on a Sybil attack with the objective of resolving this problem. The survey has proposed a combined CAM-PVM (compare and match-position verification method with MAP (message authentication and passing for detecting, eliminating, and eventually preventing the entry of Sybil nodes in the network. We propose a scheme of assuring security for wireless sensor network, to deal with attacks of these kinds in unicasting and multicasting.
A parallel neural network training algorithm for control of discrete dynamical systems.

Energy Technology Data Exchange (ETDEWEB)

Gordillo, J. L.; Hanebutte, U. R.; Vitela, J. E.

1998-01-20

In this work we present a parallel neural network controller training code, that uses MPI, a portable message passing environment. A comprehensive performance analysis is reported which compares results of a performance model with actual measurements. The analysis is made for three different load assignment schemes: block distribution, strip mining and a sliding average bin packing (best-fit) algorithm. Such analysis is crucial since optimal load balance can not be achieved because the work load information is not available a priori. The speedup results obtained with the above schemes are compared with those corresponding to the bin packing load balance scheme with perfect load prediction based on a priori knowledge of the computing effort. Two multiprocessor platforms: a SGI/Cray Origin 2000 and a IBM SP have been utilized for this study. It is shown that for the best load balance scheme a parallel efficiency of over 50% for the entire computation is achieved by 17 processors of either parallel computers.
Development of massively parallel quantum chemistry program SMASH

International Nuclear Information System (INIS)

Ishimura, Kazuya

2015-01-01

A massively parallel program for quantum chemistry calculations SMASH was released under the Apache License 2.0 in September 2014. The SMASH program is written in the Fortran90/95 language with MPI and OpenMP standards for parallelization. Frequently used routines, such as one- and two-electron integral calculations, are modularized to make program developments simple. The speed-up of the B3LYP energy calculation for (C 150 H 30 ) 2 with the cc-pVDZ basis set (4500 basis functions) was 50,499 on 98,304 cores of the K computer
A parallel solution for high resolution histological image analysis.

Science.gov (United States)

Bueno, G; González, R; Déniz, O; García-Rojo, M; González-García, J; Fernández-Carrobles, M M; Vállez, N; Salido, J

2012-10-01

This paper describes a general methodology for developing parallel image processing algorithms based on message passing for high resolution images (on the order of several Gigabytes). These algorithms have been applied to histological images and must be executed on massively parallel processing architectures. Advances in new technologies for complete slide digitalization in pathology have been combined with developments in biomedical informatics. However, the efficient use of these digital slide systems is still a challenge. The image processing that these slides are subject to is still limited both in terms of data processed and processing methods. The work presented here focuses on the need to design and develop parallel image processing tools capable of obtaining and analyzing the entire gamut of information included in digital slides. Tools have been developed to assist pathologists in image analysis and diagnosis, and they cover low and high-level image processing methods applied to histological images. Code portability, reusability and scalability have been tested by using the following parallel computing architectures: distributed memory with massive parallel processors and two networks, INFINIBAND and Myrinet, composed of 17 and 1024 nodes respectively. The parallel framework proposed is flexible, high performance solution and it shows that the efficient processing of digital microscopic images is possible and may offer important benefits to pathology laboratories. Copyright © 2012 Elsevier Ireland Ltd. All rights reserved.
Parallel file system performances in fusion data storage

International Nuclear Information System (INIS)

Iannone, F.; Podda, S.; Bracco, G.; Manduchi, G.; Maslennikov, A.; Migliori, S.; Wolkersdorfer, K.

2012-01-01

High I/O flow rates, up to 10 GB/s, are required in large fusion Tokamak experiments like ITER where hundreds of nodes store simultaneously large amounts of data acquired during the plasma discharges. Typical network topologies such as linear arrays (systolic), rings, meshes (2-D arrays), tori (3-D arrays), trees, butterfly, hypercube in combination with high speed data transports like Infiniband or 10G-Ethernet, are the main areas in which the effort to overcome the so-called parallel I/O bottlenecks is most focused. The high I/O flow rates were modelled in an emulated testbed based on the parallel file systems such as Lustre and GPFS, commonly used in High Performance Computing. The test runs on High Performance Computing–For Fusion (8640 cores) and ENEA CRESCO (3392 cores) supercomputers. Message Passing Interface based applications were developed to emulate parallel I/O on Lustre and GPFS using data archival and access solutions like MDSPLUS and Universal Access Layer. These methods of data storage organization are widely diffused in nuclear fusion experiments and are being developed within the EFDA Integrated Tokamak Modelling – Task Force; the authors tried to evaluate their behaviour in a realistic emulation setup.
Parallel file system performances in fusion data storage

Energy Technology Data Exchange (ETDEWEB)

Iannone, F., E-mail: francesco.iannone@enea.it [Associazione EURATOM-ENEA sulla Fusione, C.R.ENEA Frascati, via E.Fermi, 45 - 00044 Frascati, Rome (Italy); Podda, S.; Bracco, G. [ENEA Information Communication Tecnologies, Lungotevere Thaon di Revel, 76 - 00196 Rome (Italy); Manduchi, G. [Associazione EURATOM-ENEA sulla Fusione, Consorzio RFX, Corso Stati Uniti, 4 - 35127 Padua (Italy); Maslennikov, A. [CASPUR Inter-University Consortium for the Application of Super-Computing for Research, via dei Tizii, 6b - 00185 Rome (Italy); Migliori, S. [ENEA Information Communication Tecnologies, Lungotevere Thaon di Revel, 76 - 00196 Rome (Italy); Wolkersdorfer, K. [Juelich Supercomputing Centre-FZJ, D-52425 Juelich (Germany)

2012-12-15

High I/O flow rates, up to 10 GB/s, are required in large fusion Tokamak experiments like ITER where hundreds of nodes store simultaneously large amounts of data acquired during the plasma discharges. Typical network topologies such as linear arrays (systolic), rings, meshes (2-D arrays), tori (3-D arrays), trees, butterfly, hypercube in combination with high speed data transports like Infiniband or 10G-Ethernet, are the main areas in which the effort to overcome the so-called parallel I/O bottlenecks is most focused. The high I/O flow rates were modelled in an emulated testbed based on the parallel file systems such as Lustre and GPFS, commonly used in High Performance Computing. The test runs on High Performance Computing-For Fusion (8640 cores) and ENEA CRESCO (3392 cores) supercomputers. Message Passing Interface based applications were developed to emulate parallel I/O on Lustre and GPFS using data archival and access solutions like MDSPLUS and Universal Access Layer. These methods of data storage organization are widely diffused in nuclear fusion experiments and are being developed within the EFDA Integrated Tokamak Modelling - Task Force; the authors tried to evaluate their behaviour in a realistic emulation setup.
Xyce Parallel Electronic Simulator Users' Guide Version 6.8

Energy Technology Data Exchange (ETDEWEB)

Keiter, Eric R. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Aadithya, Karthik Venkatraman [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Mei, Ting [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Russo, Thomas V. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Schiek, Richard L. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Sholander, Peter E. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Thornquist, Heidi K. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Verley, Jason C. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)

2017-10-01

This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been de- signed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: Capability to solve extremely large circuit problems by supporting large-scale parallel com- puting platforms (up to thousands of processors). This includes support for most popular parallel and serial computers. A differential-algebraic-equation (DAE) formulation, which better isolates the device model package from solver algorithms. This allows one to develop new types of analysis without requiring the implementation of analysis-specific device models. Device models that are specifically tailored to meet Sandia's needs, including some radiation- aware devices (for Sandia users only). Object-oriented code design and implementation using modern coding practices. Xyce is a parallel code in the most general sense of the phrase$-$ a message passing parallel implementation $-$ which allows it to run efficiently a wide range of computing platforms. These include serial, shared-memory and distributed-memory parallel platforms. Attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows.
Stepwise Development a Text Messaging-Based Bullying Prevention Program for Middle School Students (BullyDown).

Science.gov (United States)

Ybarra, Michele L; Prescott, Tonya L; Espelage, Dorothy L

2016-06-13

Bullying is a significant public health issue among middle school-aged youth. Current prevention programs have only a moderate impact. Cell phone text messaging technology (mHealth) can potentially overcome existing challenges, particularly those that are structural (e.g., limited time that teachers can devote to non-educational topics). To date, the description of the development of empirically-based mHealth-delivered bullying prevention programs are lacking in the literature. To describe the development of BullyDown, a text messaging-based bullying prevention program for middle school students, guided by the Social-Emotional Learning model. We implemented five activities over a 12-month period: (1) national focus groups (n=37 youth) to gather acceptability of program components; (2) development of content; (3) a national Content Advisory Team (n=9 youth) to confirm content tone; and (4) an internal team test of software functionality followed by a beta test (n=22 youth) to confirm the enrollment protocol and the feasibility and acceptability of the program. Recruitment experiences suggested that Facebook advertising was less efficient than using a recruitment firm to recruit youth nationally, and recruiting within schools for the pilot test was feasible. Feedback from the Content Advisory Team suggests a preference for 2-4 brief text messages per day. Beta test findings suggest that BullyDown is both feasible and acceptable: 100% of youth completed the follow-up survey, 86% of whom liked the program. Text messaging appears to be a feasible and acceptable delivery method for bullying prevention programming delivered to middle school students.
Parallelization for X-ray crystal structural analysis program

Energy Technology Data Exchange (ETDEWEB)

Watanabe, Hiroshi [Japan Atomic Energy Research Inst., Tokyo (Japan); Minami, Masayuki; Yamamoto, Akiji

1997-10-01

In this report we study vectorization and parallelization for X-ray crystal structural analysis program. The target machine is NEC SX-4 which is a distributed/shared memory type vector parallel supercomputer. X-ray crystal structural analysis is surveyed, and a new multi-dimensional discrete Fourier transform method is proposed. The new method is designed to have a very long vector length, so that it enables to obtain the 12.0 times higher performance result that the original code. Besides the above-mentioned vectorization, the parallelization by micro-task functions on SX-4 reaches 13.7 times acceleration in the part of multi-dimensional discrete Fourier transform with 14 CPUs, and 3.0 times acceleration in the whole program. Totally 35.9 times acceleration to the original 1CPU scalar version is achieved with vectorization and parallelization on SX-4. (author)
The FORCE: A highly portable parallel programming language

Science.gov (United States)

Jordan, Harry F.; Benten, Muhammad S.; Alaghband, Gita; Jakob, Ruediger

1989-01-01

Here, it is explained why the FORCE parallel programming language is easily portable among six different shared-memory microprocessors, and how a two-level macro preprocessor makes it possible to hide low level machine dependencies and to build machine-independent high level constructs on top of them. These FORCE constructs make it possible to write portable parallel programs largely independent of the number of processes and the specific shared memory multiprocessor executing them.
The FORCE - A highly portable parallel programming language

Science.gov (United States)

Jordan, Harry F.; Benten, Muhammad S.; Alaghband, Gita; Jakob, Ruediger

1989-01-01

This paper explains why the FORCE parallel programming language is easily portable among six different shared-memory multiprocessors, and how a two-level macro preprocessor makes it possible to hide low-level machine dependencies and to build machine-independent high-level constructs on top of them. These FORCE constructs make it possible to write portable parallel programs largely independent of the number of processes and the specific shared-memory multiprocessor executing them.
Development of massively parallel quantum chemistry program SMASH

Energy Technology Data Exchange (ETDEWEB)

Ishimura, Kazuya [Department of Theoretical and Computational Molecular Science, Institute for Molecular Science 38 Nishigo-Naka, Myodaiji, Okazaki, Aichi 444-8585 (Japan)

2015-12-31

A massively parallel program for quantum chemistry calculations SMASH was released under the Apache License 2.0 in September 2014. The SMASH program is written in the Fortran90/95 language with MPI and OpenMP standards for parallelization. Frequently used routines, such as one- and two-electron integral calculations, are modularized to make program developments simple. The speed-up of the B3LYP energy calculation for (C{sub 150}H{sub 30}){sub 2} with the cc-pVDZ basis set (4500 basis functions) was 50,499 on 98,304 cores of the K computer.
A System for Exchanging Control and Status Messages in the NOvA Data Acquisition

International Nuclear Information System (INIS)

Biery, K.A.; Cooper, R.G.; Foulkes, S.C.; Guglielmo, G.M.; Piccoli, L.P.; Votava, M.E.V.; Fermilab

2007-01-01

In preparation for NOvA, a future neutrino experiment at Fermilab, we are developing a system for passing control and status messages in the data acquisition system. The DAQ system will consist of applications running on approximately 450 nodes. The message passing system will use a publish-subscribe model and will provide support for sending messages and receiving the associated replies. Additional features of the system include a layered architecture with custom APIs tailored to the needs of a DAQ system, the use of an open source messaging system for handling the reliable delivery of messages, the ability to send broadcasts to groups of applications, and APIs in Java, C++, and Python. Our choice for the open source system to deliver messages is EPICS. We will discuss the architecture of the system, our experience with EPICS, and preliminary test results
A System for Exchanging Control and Status Messages in the NOvA Data Acquisition

Energy Technology Data Exchange (ETDEWEB)

Biery, K.A.; Cooper, R.G.; Foulkes, S.C.; Guglielmo, G.M.; Piccoli, L.P.; Votava, M.E.V.; /Fermilab

2007-04-01

In preparation for NOvA, a future neutrino experiment at Fermilab, we are developing a system for passing control and status messages in the data acquisition system. The DAQ system will consist of applications running on approximately 450 nodes. The message passing system will use a publish-subscribe model and will provide support for sending messages and receiving the associated replies. Additional features of the system include a layered architecture with custom APIs tailored to the needs of a DAQ system, the use of an open source messaging system for handling the reliable delivery of messages, the ability to send broadcasts to groups of applications, and APIs in Java, C++, and Python. Our choice for the open source system to deliver messages is EPICS. We will discuss the architecture of the system, our experience with EPICS, and preliminary test results.
Three dimensional Burn-up program parallelization using socket programming

International Nuclear Information System (INIS)

Haliyati R, Evi; Su'ud, Zaki

2002-01-01

A computer parallelization process was built with a purpose to decrease execution time of a physics program. In this case, a multi computer system was built to be used to analyze burn-up process of a nuclear reactor. This multi computer system was design need using a protocol communication among sockets, i.e. TCP/IP. This system consists of computer as a server and the rest as clients. The server has a main control to all its clients. The server also divides the reactor core geometrically to in parts in accordance with the number of clients, each computer including the server has a task to conduct burn-up analysis of 1/n part of the total reactor core measure. This burn-up analysis was conducted simultaneously and in a parallel way by all computers, so a faster program execution time was achieved close to 1/n times that of one computer. Then an analysis was carried out and states that in order to calculate the density of atoms in a reactor of 91 cm x 91 cm x 116 cm, the usage of a parallel system of 2 computers has the highest efficiency
Integrated Task And Data Parallel Programming: Language Design

Science.gov (United States)

Grimshaw, Andrew S.; West, Emily A.

1998-01-01

his research investigates the combination of task and data parallel language constructs within a single programming language. There are an number of applications that exhibit properties which would be well served by such an integrated language. Examples include global climate models, aircraft design problems, and multidisciplinary design optimization problems. Our approach incorporates data parallel language constructs into an existing, object oriented, task parallel language. The language will support creation and manipulation of parallel classes and objects of both types (task parallel and data parallel). Ultimately, the language will allow data parallel and task parallel classes to be used either as building blocks or managers of parallel objects of either type, thus allowing the development of single and multi-paradigm parallel applications. 1995 Research Accomplishments In February I presented a paper at Frontiers '95 describing the design of the data parallel language subset. During the spring I wrote and defended my dissertation proposal. Since that time I have developed a runtime model for the language subset. I have begun implementing the model and hand-coding simple examples which demonstrate the language subset. I have identified an astrophysical fluid flow application which will validate the data parallel language subset. 1996 Research Agenda Milestones for the coming year include implementing a significant portion of the data parallel language subset over the Legion system. Using simple hand-coded methods, I plan to demonstrate (1) concurrent task and data parallel objects and (2) task parallel objects managing both task and data parallel objects. My next steps will focus on constructing a compiler and implementing the fluid flow application with the language. Concurrently, I will conduct a search for a real-world application exhibiting both task and data parallelism within the same program m. Additional 1995 Activities During the fall I collaborated

Parallel computing techniques for rotorcraft aerodynamics

Science.gov (United States)

Ekici, Kivanc

The modification of unsteady three-dimensional Navier-Stokes codes for application on massively parallel and distributed computing environments is investigated. The Euler/Navier-Stokes code TURNS (Transonic Unsteady Rotor Navier-Stokes) was chosen as a test bed because of its wide use by universities and industry. For the efficient implementation of TURNS on parallel computing systems, two algorithmic changes are developed. First, main modifications to the implicit operator, Lower-Upper Symmetric Gauss Seidel (LU-SGS) originally used in TURNS, is performed. Second, application of an inexact Newton method, coupled with a Krylov subspace iterative method (Newton-Krylov method) is carried out. Both techniques have been tried previously for the Euler equations mode of the code. In this work, we have extended the methods to the Navier-Stokes mode. Several new implicit operators were tried because of convergence problems of traditional operators with the high cell aspect ratio (CAR) grids needed for viscous calculations on structured grids. Promising results for both Euler and Navier-Stokes cases are presented for these operators. For the efficient implementation of Newton-Krylov methods to the Navier-Stokes mode of TURNS, efficient preconditioners must be used. The parallel implicit operators used in the previous step are employed as preconditioners and the results are compared. The Message Passing Interface (MPI) protocol has been used because of its portability to various parallel architectures. It should be noted that the proposed methodology is general and can be applied to several other CFD codes (e.g. OVERFLOW).
Development of a parallel genetic algorithm using MPI and its application in a nuclear reactor core. Design optimization

International Nuclear Information System (INIS)

Waintraub, Marcel; Pereira, Claudio M.N.A.; Baptista, Rafael P.

2005-01-01

This work presents the development of a distributed parallel genetic algorithm applied to a nuclear reactor core design optimization. In the implementation of the parallelism, a 'Message Passing Interface' (MPI) library, standard for parallel computation in distributed memory platforms, has been used. Another important characteristic of MPI is its portability for various architectures. The main objectives of this paper are: validation of the results obtained by the application of this algorithm in a nuclear reactor core optimization problem, through comparisons with previous results presented by Pereira et al.; and performance test of the Brazilian Nuclear Engineering Institute (IEN) cluster in reactors physics optimization problems. The experiments demonstrated that the developed parallel genetic algorithm using the MPI library presented significant gains in the obtained results and an accentuated reduction of the processing time. Such results ratify the use of the parallel genetic algorithms for the solution of nuclear reactor core optimization problems. (author)
Helping Students with Difficult First Year Subjects through the PASS Program

Science.gov (United States)

Sultan, Fauziah K. P. D.; Narayansany, Kannaki S.; Kee, Hooi Ling; Kuan, Chin Hoay; Palaniappa Manickam, M. Kamala; Tee, Meng Yew

2013-01-01

The purpose of this action research was to find out if participants of a pilot PASS program found it to be helpful. The program was implemented for the first time in an institute of higher learning in Malaysia. An action research design guided the study, with surveys, documents, and reflections as primary data sources. The findings were largely…
A lightweight messaging-based distributed processing and workflow execution framework for real-time and big data analysis

Science.gov (United States)

Laban, Shaban; El-Desouky, Aly

2014-05-01

To achieve a rapid, simple and reliable parallel processing of different types of tasks and big data processing on any compute cluster, a lightweight messaging-based distributed applications processing and workflow execution framework model is proposed. The framework is based on Apache ActiveMQ and Simple (or Streaming) Text Oriented Message Protocol (STOMP). ActiveMQ , a popular and powerful open source persistence messaging and integration patterns server with scheduler capabilities, acts as a message broker in the framework. STOMP provides an interoperable wire format that allows framework programs to talk and interact between each other and ActiveMQ easily. In order to efficiently use the message broker a unified message and topic naming pattern is utilized to achieve the required operation. Only three Python programs and simple library, used to unify and simplify the implementation of activeMQ and STOMP protocol, are needed to use the framework. A watchdog program is used to monitor, remove, add, start and stop any machine and/or its different tasks when necessary. For every machine a dedicated one and only one zoo keeper program is used to start different functions or tasks, stompShell program, needed for executing the user required workflow. The stompShell instances are used to execute any workflow jobs based on received message. A well-defined, simple and flexible message structure, based on JavaScript Object Notation (JSON), is used to build any complex workflow systems. Also, JSON format is used in configuration, communication between machines and programs. The framework is platform independent. Although, the framework is built using Python the actual workflow programs or jobs can be implemented by any programming language. The generic framework can be used in small national data centres for processing seismological and radionuclide data received from the International Data Centre (IDC) of the Preparatory Commission for the Comprehensive Nuclear
User's guide of parallel program development environment (PPDE). The 2nd edition

International Nuclear Information System (INIS)

Ueno, Hirokazu; Takemiya, Hiroshi; Imamura, Toshiyuki; Koide, Hiroshi; Matsuda, Katsuyuki; Higuchi, Kenji; Hirayama, Toshio; Ohta, Hirofumi

2000-03-01

The STA basic system has been enhanced to accelerate support for parallel programming on heterogeneous parallel computers, through a series of R and D on the technology of parallel processing. The enhancement has been made through extending the function of the PPDF, Parallel Program Development Environment in the STA basic system. The extended PPDE has the function to make: 1) the automatic creation of a 'makefile' and a shell script file for its execution, 2) the multi-tools execution which makes the tools on heterogeneous computers to execute with one operation a task on a computer, and 3) the mirror composition to reflect editing results of a file on a computer into all related files on other computers. These additional functions will enhance the work efficiency for program development on some computers. More functions have been added to the PPDE to provide help for parallel program development. New functions were also designed to complement a HPF translator and a parallelizing support tool when working together so that a sequential program is efficiently converted to a parallel program. This report describes the use of extended PPDE. (author)
Xyce parallel electronic simulator users guide, version 6.0.

Energy Technology Data Exchange (ETDEWEB)

Keiter, Eric R; Mei, Ting; Russo, Thomas V.; Schiek, Richard Louis; Thornquist, Heidi K.; Verley, Jason C.; Fixel, Deborah A.; Coffey, Todd S; Pawlowski, Roger P; Warrender, Christina E.; Baur, David Gregory.

2013-08-01

This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). This includes support for most popular parallel and serial computers. A differential-algebraic-equation (DAE) formulation, which better isolates the device model package from solver algorithms. This allows one to develop new types of analysis without requiring the implementation of analysis-specific device models. Device models that are specifically tailored to meet Sandias needs, including some radiationaware devices (for Sandia users only). Object-oriented code design and implementation using modern coding practices. Xyce is a parallel code in the most general sense of the phrase a message passing parallel implementation which allows it to run efficiently a wide range of computing platforms. These include serial, shared-memory and distributed-memory parallel platforms. Attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows.
Xyce parallel electronic simulator users' guide, Version 6.0.1.

Energy Technology Data Exchange (ETDEWEB)

Keiter, Eric R; Mei, Ting; Russo, Thomas V.; Schiek, Richard Louis; Thornquist, Heidi K.; Verley, Jason C.; Fixel, Deborah A.; Coffey, Todd S; Pawlowski, Roger P; Warrender, Christina E.; Baur, David Gregory.

2014-01-01

This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). This includes support for most popular parallel and serial computers. A differential-algebraic-equation (DAE) formulation, which better isolates the device model package from solver algorithms. This allows one to develop new types of analysis without requiring the implementation of analysis-specific device models. Device models that are specifically tailored to meet Sandias needs, including some radiationaware devices (for Sandia users only). Object-oriented code design and implementation using modern coding practices. Xyce is a parallel code in the most general sense of the phrase a message passing parallel implementation which allows it to run efficiently a wide range of computing platforms. These include serial, shared-memory and distributed-memory parallel platforms. Attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows.
Xyce parallel electronic simulator users guide, version 6.1

Energy Technology Data Exchange (ETDEWEB)

Keiter, Eric R; Mei, Ting; Russo, Thomas V.; Schiek, Richard Louis; Sholander, Peter E.; Thornquist, Heidi K.; Verley, Jason C.; Baur, David Gregory

2014-03-01

This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas; Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). This includes support for most popular parallel and serial computers; A differential-algebraic-equation (DAE) formulation, which better isolates the device model package from solver algorithms. This allows one to develop new types of analysis without requiring the implementation of analysis-specific device models; Device models that are specifically tailored to meet Sandia's needs, including some radiationaware devices (for Sandia users only); and Object-oriented code design and implementation using modern coding practices. Xyce is a parallel code in the most general sense of the phrase-a message passing parallel implementation-which allows it to run efficiently a wide range of computing platforms. These include serial, shared-memory and distributed-memory parallel platforms. Attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows.
Load-balancing techniques for a parallel electromagnetic particle-in-cell code

Energy Technology Data Exchange (ETDEWEB)

PLIMPTON,STEVEN J.; SEIDEL,DAVID B.; PASIK,MICHAEL F.; COATS,REBECCA S.

2000-01-01

QUICKSILVER is a 3-d electromagnetic particle-in-cell simulation code developed and used at Sandia to model relativistic charged particle transport. It models the time-response of electromagnetic fields and low-density-plasmas in a self-consistent manner: the fields push the plasma particles and the plasma current modifies the fields. Through an LDRD project a new parallel version of QUICKSILVER was created to enable large-scale plasma simulations to be run on massively-parallel distributed-memory supercomputers with thousands of processors, such as the Intel Tflops and DEC CPlant machines at Sandia. The new parallel code implements nearly all the features of the original serial QUICKSILVER and can be run on any platform which supports the message-passing interface (MPI) standard as well as on single-processor workstations. This report describes basic strategies useful for parallelizing and load-balancing particle-in-cell codes, outlines the parallel algorithms used in this implementation, and provides a summary of the modifications made to QUICKSILVER. It also highlights a series of benchmark simulations which have been run with the new code that illustrate its performance and parallel efficiency. These calculations have up to a billion grid cells and particles and were run on thousands of processors. This report also serves as a user manual for people wishing to run parallel QUICKSILVER.
Load-balancing techniques for a parallel electromagnetic particle-in-cell code

International Nuclear Information System (INIS)

Plimpton, Steven J.; Seidel, David B.; Pasik, Michael F.; Coats, Rebecca S.

2000-01-01

QUICKSILVER is a 3-d electromagnetic particle-in-cell simulation code developed and used at Sandia to model relativistic charged particle transport. It models the time-response of electromagnetic fields and low-density-plasmas in a self-consistent manner: the fields push the plasma particles and the plasma current modifies the fields. Through an LDRD project a new parallel version of QUICKSILVER was created to enable large-scale plasma simulations to be run on massively-parallel distributed-memory supercomputers with thousands of processors, such as the Intel Tflops and DEC CPlant machines at Sandia. The new parallel code implements nearly all the features of the original serial QUICKSILVER and can be run on any platform which supports the message-passing interface (MPI) standard as well as on single-processor workstations. This report describes basic strategies useful for parallelizing and load-balancing particle-in-cell codes, outlines the parallel algorithms used in this implementation, and provides a summary of the modifications made to QUICKSILVER. It also highlights a series of benchmark simulations which have been run with the new code that illustrate its performance and parallel efficiency. These calculations have up to a billion grid cells and particles and were run on thousands of processors. This report also serves as a user manual for people wishing to run parallel QUICKSILVER
Introducing heterogeneity in Monte Carlo models for risk assessments of high-level nuclear waste. A parallel implementation of the MLCRYSTAL code

Energy Technology Data Exchange (ETDEWEB)

Andersson, M.

1996-09-01

We have introduced heterogeneity to an existing model as a special feature and simultaneously extended the model from 1D to 3D. Briefly, the code generates stochastic fractures in a given geosphere. These fractures are connected in series to form one pathway for radionuclide transport from the repository to the biosphere. Rock heterogeneity is realized by simulating physical and chemical properties for each fracture, i.e. these properties vary along the transport pathway (which is an ensemble of all fractures serially connected). In this case, each Monte Carlo simulation involves a set of many thousands of realizations, one for each pathway. Each pathway can be formed by approx. 100 fractures. This means that for a Monte Carlo simulation of 1000 realizations, we need to perform a total of 100,000 simulations. Therefore the introduction of heterogeneity has increased the CPU demands by two orders of magnitude. To overcome the demand for CPU, the program, MLCRYSTAL, has been implemented in a parallel workstation environment using the MPI, Message Passing Interface, and later on ported to an IBM-SP2 parallel supercomputer. The program is presented here and a preliminary set of results is given with the conclusions that can be drawn. 3 refs, 12 figs.
Introducing heterogeneity in Monte Carlo models for risk assessments of high-level nuclear waste. A parallel implementation of the MLCRYSTAL code

International Nuclear Information System (INIS)

Andersson, M.

1996-09-01

We have introduced heterogeneity to an existing model as a special feature and simultaneously extended the model from 1D to 3D. Briefly, the code generates stochastic fractures in a given geosphere. These fractures are connected in series to form one pathway for radionuclide transport from the repository to the biosphere. Rock heterogeneity is realized by simulating physical and chemical properties for each fracture, i.e. these properties vary along the transport pathway (which is an ensemble of all fractures serially connected). In this case, each Monte Carlo simulation involves a set of many thousands of realizations, one for each pathway. Each pathway can be formed by approx. 100 fractures. This means that for a Monte Carlo simulation of 1000 realizations, we need to perform a total of 100,000 simulations. Therefore the introduction of heterogeneity has increased the CPU demands by two orders of magnitude. To overcome the demand for CPU, the program, MLCRYSTAL, has been implemented in a parallel workstation environment using the MPI, Message Passing Interface, and later on ported to an IBM-SP2 parallel supercomputer. The program is presented here and a preliminary set of results is given with the conclusions that can be drawn. 3 refs, 12 figs
Getting Your Message Across: Mobile Phone Text Messaging

Science.gov (United States)

Beecher, Constance C.; Hayungs, Lori

2017-01-01

Want to send a message that 99% of your audience will read? Many Extension professionals are familiar with using social media tools to enhance Extension programming. Extension professionals may be less familiar with the use of mobile phone text-based marketing tools. The purpose of this article is to introduce SMS (short message system) marketing…
A Programming Model for Massive Data Parallelism with Data Dependencies

International Nuclear Information System (INIS)

Cui, Xiaohui; Mueller, Frank; Potok, Thomas E.; Zhang, Yongpeng

2009-01-01

Accelerating processors can often be more cost and energy effective for a wide range of data-parallel computing problems than general-purpose processors. For graphics processor units (GPUs), this is particularly the case when program development is aided by environments such as NVIDIA s Compute Unified Device Architecture (CUDA), which dramatically reduces the gap between domain-specific architectures and general purpose programming. Nonetheless, general-purpose GPU (GPGPU) programming remains subject to several restrictions. Most significantly, the separation of host (CPU) and accelerator (GPU) address spaces requires explicit management of GPU memory resources, especially for massive data parallelism that well exceeds the memory capacity of GPUs. One solution to this problem is to transfer data between the GPU and host memories frequently. In this work, we investigate another approach. We run massively data-parallel applications on GPU clusters. We further propose a programming model for massive data parallelism with data dependencies for this scenario. Experience from micro benchmarks and real-world applications shows that our model provides not only ease of programming but also significant performance gains
Parallel SN algorithms in shared- and distributed-memory environments

International Nuclear Information System (INIS)

Haghighat, Alireza; Hunter, Melissa A.; Mattis, Ronald E.

1995-01-01

Different 2-D spatial domain partitioning Sn transport theory algorithms have been developed on the basis of the Block-Jacobi iterative scheme. These algorithms have been incorporated into TWOTRAN-II, and tested on a shared-memory CRAY Y-MP C90 and a distributed-memory IBM SP1. For a series of fixed source r-z geometry homogeneous problems, parallel efficiencies in a range of 50-90% are achieved on the C90 with 6 processors, and lower values (20-60%) are obtained on the SP1. It is demonstrated that better performance is attainable if one addresses issues such as convergence rate, load-balancing, and granularity for both architectures, as well as message passing (network bandwidth and latency) for SP1. (author). 17 refs, 4 figs
Actors with Multi-Headed Message Receive Patterns

DEFF Research Database (Denmark)

Sulzmann, Martin; Lam, Edmund Soon Lee; Van Weert, Peter

2008-01-01

style actors with receive clauses containing multi-headed message patterns. Patterns may be non-linear and constrained by guards. We provide a number of examples to show the usefulness of the extension. We also explore the design space for multi-headed message matching semantics, for example first-match......The actor model provides high-level concurrency abstractions to coordinate simultaneous computations by message passing. Languages implementing the actor model such as Erlang commonly only support single-headed pattern matching over received messages. We propose and design an extension of Erlang...... and rule priority-match semantics. The various semantics are inspired by the multi-set constraint matching semantics found in Constraint Handling Rules. This provides us with a formal model to study actors with multi-headed message receive patterns. The system can be implemented efficiently and we have...
Parallel implementation of many-body mean-field equations

International Nuclear Information System (INIS)

Chinn, C.R.; Umar, A.S.; Vallieres, M.; Strayer, M.R.

1994-01-01

We describe the numerical methods used to solve the system of stiff, nonlinear partial differential equations resulting from the Hartree-Fock description of many-particle quantum systems, as applied to the structure of the nucleus. The solutions are performed on a three-dimensional Cartesian lattice. Discretization is achieved through the lattice basis-spline collocation method, in which quantum-state vectors and coordinate-space operators are expressed in terms of basis-spline functions on a spatial lattice. All numerical procedures reduce to a series of matrix-vector multiplications and other elementary operations, which we perform on a number of different computing architectures, including the Intel Paragon and the Intel iPSC/860 hypercube. Parallelization is achieved through a combination of mechanisms employing the Gram-Schmidt procedure, broadcasts, global operations, and domain decomposition of state vectors. We discuss the approach to the problems of limited node memory and node-to-node communication overhead inherent in using distributed-memory, multiple-instruction, multiple-data stream parallel computers. An algorithm was developed to reduce the communication overhead by pipelining some of the message passing procedures
How My Program Passed the Turing Test

Science.gov (United States)

Humphrys, Mark

In 1989, the author put an ELIZA-like chatbot on the Internet. The conversations this program had can be seen - depending on how one defines the rules (and how seriously one takes the idea of the test itself) - as a passing of the Turing Test. This is the first time this event has been properly written. This chatbot succeeded due to profanity, relentless aggression, prurient queries about the user, and implying that they were a liar when they responded. The element of surprise was also crucial. Most chatbots exist in an environment where people expectto find some bots among the humans. Not this one. What was also novel was the onlineelement. This was certainly one of the first AI programs online. It seems to have been the first (a) AI real-time chat program, which (b) had the element of surprise, and (c) was on the Internet. We conclude with some speculation that the future of all of AI is on the Internet, and a description of the "World- Wide-Mind" project that aims to bring this about.
Parallel implementation of a Lagrangian-based model on an adaptive mesh in C++: Application to sea-ice

Science.gov (United States)

Samaké, Abdoulaye; Rampal, Pierre; Bouillon, Sylvain; Ólason, Einar

2017-12-01

We present a parallel implementation framework for a new dynamic/thermodynamic sea-ice model, called neXtSIM, based on the Elasto-Brittle rheology and using an adaptive mesh. The spatial discretisation of the model is done using the finite-element method. The temporal discretisation is semi-implicit and the advection is achieved using either a pure Lagrangian scheme or an Arbitrary Lagrangian Eulerian scheme (ALE). The parallel implementation presented here focuses on the distributed-memory approach using the message-passing library MPI. The efficiency and the scalability of the parallel algorithms are illustrated by the numerical experiments performed using up to 500 processor cores of a cluster computing system. The performance obtained by the proposed parallel implementation of the neXtSIM code is shown being sufficient to perform simulations for state-of-the-art sea ice forecasting and geophysical process studies over geographical domain of several millions squared kilometers like the Arctic region.
Residency program characteristics that are associated with pass rate of the American Board of Pediatrics certifying exam.

Science.gov (United States)

Atsawarungruangkit, Amporn

2015-01-01

The US is home to almost 200 pediatrics residency programs; despite this, there is little information about the relationship between program characteristics and performance in the American Board of Pediatrics (ABP) certifying exam. To evaluate the relationship between pass rate of the ABP certifying exam with the characteristics of categorical pediatrics residency programs. This retrospective, cross-sectional study used publicly available data from the ABP website and the Fellowship and Residency Electronic Interactive Database. All programs that reported pass rates were included. The analysis, comprising univariate and multivariate linear regression, involved determining how 69 factors (eg, general information, number of faculty and trainees, work schedule, educational environment) related to the pass rate. Of 199 programs, 194 reported pass rates. The univariate analysis revealed 20 program characteristics with P-values program characteristics: ratio of full-time equivalent paid faculty to positions, percentage of US medical graduates, and average hours per week of regularly scheduled lectures or conferences. Unlike in previous studies, location and program size were not significantly associated with the pass rate in this multivariate analysis. The finding regarding the ratio of full-time equivalent paid faculty to positions highlighted the benefits of a well-supervised training environment, while that regarding the percentage of US medical graduates indicated the necessity of high competition in residency programs. Finally, longer hours per week of regularly scheduled lectures or conferences were associated with better academic outcomes, both statistically and intuitively.

An environment for parallel structuring of Fortran programs

International Nuclear Information System (INIS)

Sridharan, K.; McShea, M.; Denton, C.; Eventoff, B.; Browne, J.C.; Newton, P.; Ellis, M.; Grossbard, D.; Wise, T.; Clemmer, D.

1990-01-01

The paper describes and illustrates an environment for interactive support of the detection and implementation of macro-level parallelism in Fortran programs. The approach couples algorithms for dependence analysis with both innovative techniques for complexity management and capabilities for the measurement and analysis of the parallel computation structures generated through use of the environment. The resulting environment is complementary to the more common approach of seeking local parallelism by loop unrolling, either by an automatic compiler or manually. (orig.)
Adapting high-level language programs for parallel processing using data flow

Science.gov (United States)

Standley, Hilda M.

1988-01-01

EASY-FLOW, a very high-level data flow language, is introduced for the purpose of adapting programs written in a conventional high-level language to a parallel environment. The level of parallelism provided is of the large-grained variety in which parallel activities take place between subprograms or processes. A program written in EASY-FLOW is a set of subprogram calls as units, structured by iteration, branching, and distribution constructs. A data flow graph may be deduced from an EASY-FLOW program.
Xyce™ Parallel Electronic Simulator Users' Guide, Version 6.5.

Energy Technology Data Exchange (ETDEWEB)

Keiter, Eric R. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States). Electrical Models and Simulation; Aadithya, Karthik V. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States). Electrical Models and Simulation; Mei, Ting [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States). Electrical Models and Simulation; Russo, Thomas V. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States). Electrical Models and Simulation; Schiek, Richard L. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States). Electrical Models and Simulation; Sholander, Peter E. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States). Electrical Models and Simulation; Thornquist, Heidi K. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States). Electrical Models and Simulation; Verley, Jason C. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States). Electrical Models and Simulation

2016-06-01

This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). This includes support for most popular parallel and serial computers. A differential-algebraic-equation (DAE) formulation, which better isolates the device model package from solver algorithms. This allows one to develop new types of analysis without requiring the implementation of analysis-specific device models. Device models that are specifically tailored to meet Sandia's needs, including some radiation- aware devices (for Sandia users only). Object-oriented code design and implementation using modern coding practices. Xyce is a parallel code in the most general sense of the phrase -- a message passing parallel implementation -- which allows it to run efficiently a wide range of computing platforms. These include serial, shared-memory and distributed-memory parallel platforms. Attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows. The information herein is subject to change without notice. Copyright © 2002-2016 Sandia Corporation. All rights reserved.
Compiling Scientific Programs for Scalable Parallel Systems

National Research Council Canada - National Science Library

Kennedy, Ken

2001-01-01

...). The research performed in this project included new techniques for recognizing implicit parallelism in sequential programs, a powerful and precise set-based framework for analysis and transformation...
Energy savings from transit passes : an evaluation of the University at Buffalo NFTA transit pass program for students, faculty, and staff.

Science.gov (United States)

2014-04-01

The University Transportation Research Center Region 2 supported a study entitled Connections Beyond Campus: An Evaluation of the Niagara Frontier Transportation : Authority University at Buffalo Transit Pass Program. Unlimited Access t...
Studies of parallel algorithms for the solution of a Fokker-Planck equation

International Nuclear Information System (INIS)

Deck, D.; Samba, G.

1995-11-01

The study of laser-created plasmas often requires the use of a kinetic model rather than a hydrodynamic one. This model change occurs, for example, in the hot spot formation in an ICF experiment or during the relaxation of colliding plasmas. When the gradients scalelengths or the size of a given system are not small compared to the characteristic mean-free-path, we have to deal with non-equilibrium situations, which can be described by the distribution functions of every species in the system. We present here a numerical method in plane or spherical 1-D geometry, for the solution of a Fokker-Planck equation that describes the evolution of stich functions in the phase space. The size and the time scale of kinetic simulations require the use of Massively Parallel Computers (MPP). We have adopted a message-passing strategy using Parallel Virtual Machine (PVM)
Parallel file system with metadata distributed across partitioned key-value store c

Science.gov (United States)

Bent, John M.; Faibish, Sorin; Grider, Gary; Torres, Aaron

2017-09-19

Improved techniques are provided for storing metadata associated with a plurality of sub-files associated with a single shared file in a parallel file system. The shared file is generated by a plurality of applications executing on a plurality of compute nodes. A compute node implements a Parallel Log Structured File System (PLFS) library to store at least one portion of the shared file generated by an application executing on the compute node and metadata for the at least one portion of the shared file on one or more object storage servers. The compute node is also configured to implement a partitioned data store for storing a partition of the metadata for the shared file, wherein the partitioned data store communicates with partitioned data stores on other compute nodes using a message passing interface. The partitioned data store can be implemented, for example, using Multidimensional Data Hashing Indexing Middleware (MDHIM).
User's guide of parallel program development environment (PPDE). The 2nd edition

Energy Technology Data Exchange (ETDEWEB)

Ueno, Hirokazu; Takemiya, Hiroshi; Imamura, Toshiyuki; Koide, Hiroshi; Matsuda, Katsuyuki; Higuchi, Kenji; Hirayama, Toshio [Center for Promotion of Computational Science and Engineering, Japan Atomic Energy Research Institute, Tokyo (Japan); Ohta, Hirofumi [Hitachi Ltd., Tokyo (Japan)

2000-03-01

The STA basic system has been enhanced to accelerate support for parallel programming on heterogeneous parallel computers, through a series of R and D on the technology of parallel processing. The enhancement has been made through extending the function of the PPDF, Parallel Program Development Environment in the STA basic system. The extended PPDE has the function to make: 1) the automatic creation of a 'makefile' and a shell script file for its execution, 2) the multi-tools execution which makes the tools on heterogeneous computers to execute with one operation a task on a computer, and 3) the mirror composition to reflect editing results of a file on a computer into all related files on other computers. These additional functions will enhance the work efficiency for program development on some computers. More functions have been added to the PPDE to provide help for parallel program development. New functions were also designed to complement a HPF translator and a paralleilizing support tool when working together so that a sequential program is efficiently converted to a parallel program. This report describes the use of extended PPDE. (author)
Mouse myocardial first-pass perfusion MR imaging

NARCIS (Netherlands)

Coolen, Bram F.; Moonen, Rik P. M.; Paulis, Leonie E. M.; Geelen, Tessa; Nicolay, Klaas; Strijkers, Gustav J.

2010-01-01

A first-pass myocardial perfusion sequence for mouse cardiac MRI is presented. A segmented ECG-triggered acquisition combined with parallel imaging acceleration was used to capture the first pass of a Gd-DTPA bolus through the mouse heart with a temporal resolution of 300-400 msec. The method was
A parallel Monte Carlo code for planar and SPECT imaging: implementation, verification and applications in (131)I SPECT.

Science.gov (United States)

Dewaraja, Yuni K; Ljungberg, Michael; Majumdar, Amitava; Bose, Abhijit; Koral, Kenneth F

2002-02-01

This paper reports the implementation of the SIMIND Monte Carlo code on an IBM SP2 distributed memory parallel computer. Basic aspects of running Monte Carlo particle transport calculations on parallel architectures are described. Our parallelization is based on equally partitioning photons among the processors and uses the Message Passing Interface (MPI) library for interprocessor communication and the Scalable Parallel Random Number Generator (SPRNG) to generate uncorrelated random number streams. These parallelization techniques are also applicable to other distributed memory architectures. A linear increase in computing speed with the number of processors is demonstrated for up to 32 processors. This speed-up is especially significant in Single Photon Emission Computed Tomography (SPECT) simulations involving higher energy photon emitters, where explicit modeling of the phantom and collimator is required. For (131)I, the accuracy of the parallel code is demonstrated by comparing simulated and experimental SPECT images from a heart/thorax phantom. Clinically realistic SPECT simulations using the voxel-man phantom are carried out to assess scatter and attenuation correction.
Public-channel cryptography based on mutual chaos pass filters.

Science.gov (United States)

Klein, Einat; Gross, Noam; Kopelowitz, Evi; Rosenbluh, Michael; Khaykovich, Lev; Kinzel, Wolfgang; Kanter, Ido

2006-10-01

We study the mutual coupling of chaotic lasers and observe both experimentally and in numeric simulations that there exists a regime of parameters for which two mutually coupled chaotic lasers establish isochronal synchronization, while a third laser coupled unidirectionally to one of the pair does not synchronize. We then propose a cryptographic scheme, based on the advantage of mutual coupling over unidirectional coupling, where all the parameters of the system are public knowledge. We numerically demonstrate that in such a scheme the two communicating lasers can add a message signal (compressed binary message) to the transmitted coupling signal and recover the message in both directions with high fidelity by using a mutual chaos pass filter procedure. An attacker, however, fails to recover an errorless message even if he amplifies the coupling signal.
Program For Parallel Discrete-Event Simulation

Science.gov (United States)

Beckman, Brian C.; Blume, Leo R.; Geiselman, John S.; Presley, Matthew T.; Wedel, John J., Jr.; Bellenot, Steven F.; Diloreto, Michael; Hontalas, Philip J.; Reiher, Peter L.; Weiland, Frederick P.

1991-01-01

User does not have to add any special logic to aid in synchronization. Time Warp Operating System (TWOS) computer program is special-purpose operating system designed to support parallel discrete-event simulation. Complete implementation of Time Warp mechanism. Supports only simulations and other computations designed for virtual time. Time Warp Simulator (TWSIM) subdirectory contains sequential simulation engine interface-compatible with TWOS. TWOS and TWSIM written in, and support simulations in, C programming language.
Performance Comparison of a Matrix Solver on a Heterogeneous Network Using Two Implementations of MPI: MPICH and LAM

Science.gov (United States)

Phillips, Jennifer K.

1995-01-01

Two of the current and most popular implementations of the Message-Passing Standard, Message Passing Interface (MPI), were contrasted: MPICH by Argonne National Laboratory, and LAM by the Ohio Supercomputer Center at Ohio State University. A parallel skyline matrix solver was adapted to be run in a heterogeneous environment using MPI. The Message-Passing Interface Forum was held in May 1994 which lead to a specification of library functions that implement the message-passing model of parallel communication. LAM, which creates it's own environment, is more robust in a highly heterogeneous network. MPICH uses the environment native to the machine architecture. While neither of these free-ware implementations provides the performance of native message-passing or vendor's implementations, MPICH begins to approach that performance on the SP-2. The machines used in this study were: IBM RS6000, 3 Sun4, SGI, and the IBM SP-2. Each machine is unique and a few machines required specific modifications during the installation. When installed correctly, both implementations worked well with only minor problems.
Preventing messaging queue deadlocks in a DMA environment

Science.gov (United States)

Blocksome, Michael A; Chen, Dong; Gooding, Thomas; Heidelberger, Philip; Parker, Jeff

2014-01-14

Embodiments of the invention may be used to manage message queues in a parallel computing environment to prevent message queue deadlock. A direct memory access controller of a compute node may determine when a messaging queue is full. In response, the DMA may generate and interrupt. An interrupt handler may stop the DMA and swap all descriptors from the full messaging queue into a larger queue (or enlarge the original queue). The interrupt handler then restarts the DMA. Alternatively, the interrupt handler stops the DMA, allocates a memory block to hold queue data, and then moves descriptors from the full messaging queue into the allocated memory block. The interrupt handler then restarts the DMA. During a normal messaging advance cycle, a messaging manager attempts to inject the descriptors in the memory block into other messaging queues until the descriptors have all been processed.
Secret Message Decryption: Group Consulting Projects Using Matrices and Linear Programming

Science.gov (United States)

Gurski, Katharine F.

2009-01-01

We describe two short group projects for finite mathematics students that incorporate matrices and linear programming into fictional consulting requests presented as a letter to the students. The students are required to use mathematics to decrypt secret messages in one project involving matrix multiplication and inversion. The second project…
Mouse myocardial first-pass perfusion MR imaging

NARCIS (Netherlands)

Coolen, B.F.; Moonen, R.P.M.; Paulis, L.E.M.; Geelen, T.; Nicolay, K.; Strijkers, G.J.

2010-01-01

A first-pass myocardial perfusion sequence for mouse cardiac MRI is presented. A segmented ECG-triggered acquisition combined with parallel imaging acceleration was used to capture the first pass of a Gd-DTPA bolus through the mouse heart with a temporal resolution of 300–400 msec. The method was
A 3D gyrokinetic particle-in-cell simulation of fusion plasma microturbulence on parallel computers

Science.gov (United States)

Williams, T. J.

1992-12-01

One of the grand challenge problems now supported by HPCC is the Numerical Tokamak Project. A goal of this project is the study of low-frequency micro-instabilities in tokamak plasmas, which are believed to cause energy loss via turbulent thermal transport across the magnetic field lines. An important tool in this study is gyrokinetic particle-in-cell (PIC) simulation. Gyrokinetic, as opposed to fully-kinetic, methods are particularly well suited to the task because they are optimized to study the frequency and wavelength domain of the microinstabilities. Furthermore, many researchers now employ low-noise delta(f) methods to greatly reduce statistical noise by modelling only the perturbation of the gyrokinetic distribution function from a fixed background, not the entire distribution function. In spite of the increased efficiency of these improved algorithms over conventional PIC algorithms, gyrokinetic PIC simulations of tokamak micro-turbulence are still highly demanding of computer power--even fully-vectorized codes on vector supercomputers. For this reason, we have worked for several years to redevelop these codes on massively parallel computers. We have developed 3D gyrokinetic PIC simulation codes for SIMD and MIMD parallel processors, using control-parallel, data-parallel, and domain-decomposition message-passing (DDMP) programming paradigms. This poster summarizes our earlier work on codes for the Connection Machine and BBN TC2000 and our development of a generic DDMP code for distributed-memory parallel machines. We discuss the memory-access issues which are of key importance in writing parallel PIC codes, with special emphasis on issues peculiar to gyrokinetic PIC. We outline the domain decompositions in our new DDMP code and discuss the interplay of different domain decompositions suited for the particle-pushing and field-solution components of the PIC algorithm.
Cell verification of parallel burnup calculation program MCBMPI based on MPI

International Nuclear Information System (INIS)

Yang Wankui; Liu Yaoguang; Ma Jimin; Wang Guanbo; Yang Xin; She Ding

2014-01-01

The parallel burnup calculation program MCBMPI was developed. The program was modularized. The parallel MCNP5 program MCNP5MPI was employed as neutron transport calculation module. And a composite of three solution methods was used to solve burnup equation, i.e. matrix exponential technique, TTA analytical solution, and Gauss Seidel iteration. MPI parallel zone decomposition strategy was concluded in the program. The program system only consists of MCNP5MPI and burnup subroutine. The latter achieves three main functions, i.e. zone decomposition, nuclide transferring and decaying, and data exchanging with MCNP5MPI. Also, the program was verified with the pressurized water reactor (PWR) cell burnup benchmark. The results show that it,s capable to apply the program to burnup calculation of multiple zones, and the computation efficiency could be significantly improved with the development of computer hardware. (authors)
Vdebug: debugging tool for parallel scientific programs. Design report on vdebug

International Nuclear Information System (INIS)

Matsuda, Katsuyuki; Takemiya, Hiroshi

2000-02-01

We report on a debugging tool called vdebug which supports debugging work for parallel scientific simulation programs. It is difficult to debug scientific programs with an existing debugger, because the volume of data generated by the programs is too large for users to check data in characters. Usually, the existing debugger shows data values in characters. To alleviate it, we have developed vdebug which enables to check the validity of large amounts of data by showing these data values visually. Although targets of vdebug have been restricted to sequential programs, we have made it applicable to parallel programs by realizing the function of merging and visualizing data distributed on programs on each computer node. Now, vdebug works on seven kinds of parallel computers. In this report, we describe the design of vdebug. (author)
New parallel SOR method by domain partitioning

Energy Technology Data Exchange (ETDEWEB)

Xie, Dexuan [Courant Inst. of Mathematical Sciences New York Univ., NY (United States)

1996-12-31

In this paper, we propose and analyze a new parallel SOR method, the PSOR method, formulated by using domain partitioning together with an interprocessor data-communication technique. For the 5-point approximation to the Poisson equation on a square, we show that the ordering of the PSOR based on the strip partition leads to a consistently ordered matrix, and hence the PSOR and the SOR using the row-wise ordering have the same convergence rate. However, in general, the ordering used in PSOR may not be {open_quote}consistently ordered{close_quotes}. So, there is a need to analyze the convergence of PSOR directly. In this paper, we present a PSOR theory, and show that the PSOR method can have the same asymptotic rate of convergence as the corresponding sequential SOR method for a wide class of linear systems in which the matrix is {open_quotes}consistently ordered{close_quotes}. Finally, we demonstrate the parallel performance of the PSOR method on four different message passing multiprocessors (a KSR1, the Intel Delta, an Intel Paragon and an IBM SP2), along with a comparison with the point Red-Black and four-color SOR methods.

Non-Blocking Concurrent Imperative Programming with Session Types

Directory of Open Access Journals (Sweden)

Miguel Silva

2017-01-01

Full Text Available Concurrent C0 is an imperative programming language in the C family with session-typed message-passing concurrency. The previously proposed semantics implements asynchronous (non-blocking output; we extend it here with non-blocking input. A key idea is to postpone message reception as much as possible by interpreting receive commands as a request for a message. We implemented our ideas as a translation from a blocking intermediate language to a non-blocking language. Finally, we evaluated our techniques with several benchmark programs and show the results obtained. While the abstract measure of span always decreases (or remains unchanged, only a few of the examples reap a practical benefit.
From sequential to parallel programming with patterns

CERN Document Server

CERN. Geneva

2018-01-01

To increase in both performance and efficiency, our programming models need to adapt to better exploit modern processors. The classic idioms and patterns for programming such as loops, branches or recursion are the pillars of almost every code and are well known among all programmers. These patterns all have in common that they are sequential in nature. Embracing parallel programming patterns, which allow us to program for multi- and many-core hardware in a natural way, greatly simplifies the task of designing a program that scales and performs on modern hardware, independently of the used programming language, and in a generic way.
I spy with my little eye: cognitive processing of framed physical activity messages.

Science.gov (United States)

Bassett-Gunter, Rebecca L; Latimer-Cheung, Amy E; Martin Ginis, Kathleen A; Castelhano, Monica

2014-01-01

The primary purpose was to examine the relative cognitive processing of gain-framed versus loss-framed physical activity messages following exposure to health risk information. Guided by the Extended Parallel Process Model, the secondary purpose was to examine the relation between dwell time, message recall, and message-relevant thoughts, as well as perceived risk, personal relevance, and fear arousal. Baseline measures of perceived risk for inactivity-related disease and health problems were administered to 77 undergraduate students. Participants read population-specific health risk information while wearing a head-mounted eye tracker, which measured dwell time on message content. Perceived risk was then reassessed. Next, participants read PA messages while the eye tracker measured dwell time on message content. Immediately following message exposure, recall, thought-listing, fear arousal, and personal relevance were measured. Dwell time on gain-framed messages was significantly greater than loss-framed messages. However, message recall and thought-listing did not differ by message frame. Dwell time was not significantly related to recall or thought-listing. Consistent with the Extended Parallel Process Model, fear arousal was significantly related to recall, thought-listing, and personal relevance. In conclusion, gain-framed messages may evoke greater dwell time than loss-famed messages. However, dwell time alone may be insufficient for evoking further cognitive processing.
On program restructuring, scheduling, and communication for parallel processor systems

Energy Technology Data Exchange (ETDEWEB)

Polychronopoulos, Constantine D. [Univ. of Illinois, Urbana, IL (United States)

1986-08-01

This dissertation discusses several software and hardware aspects of program execution on large-scale, high-performance parallel processor systems. The issues covered are program restructuring, partitioning, scheduling and interprocessor communication, synchronization, and hardware design issues of specialized units. All this work was performed focusing on a single goal: to maximize program speedup, or equivalently, to minimize parallel execution time. Parafrase, a Fortran restructuring compiler was used to transform programs in a parallel form and conduct experiments. Two new program restructuring techniques are presented, loop coalescing and subscript blocking. Compile-time and run-time scheduling schemes are covered extensively. Depending on the program construct, these algorithms generate optimal or near-optimal schedules. For the case of arbitrarily nested hybrid loops, two optimal scheduling algorithms for dynamic and static scheduling are presented. Simulation results are given for a new dynamic scheduling algorithm. The performance of this algorithm is compared to that of self-scheduling. Techniques for program partitioning and minimization of interprocessor communication for idealized program models and for real Fortran programs are also discussed. The close relationship between scheduling, interprocessor communication, and synchronization becomes apparent at several points in this work. Finally, the impact of various types of overhead on program speedup and experimental results are presented.
Managing internode data communications for an uninitialized process in a parallel computer

Science.gov (United States)

Archer, Charles J; Blocksome, Michael A; Miller, Douglas R; Parker, Jeffrey J; Ratterman, Joseph D; Smith, Brian E

2014-05-20

A parallel computer includes nodes, each having main memory and a messaging unit (MU). Each MU includes computer memory, which in turn includes, MU message buffers. Each MU message buffer is associated with an uninitialized process on the compute node. In the parallel computer, managing internode data communications for an uninitialized process includes: receiving, by an MU of a compute node, one or more data communications messages in an MU message buffer associated with an uninitialized process on the compute node; determining, by an application agent, that the MU message buffer associated with the uninitialized process is full prior to initialization of the uninitialized process; establishing, by the application agent, a temporary message buffer for the uninitialized process in main computer memory; and moving, by the application agent, data communications messages from the MU message buffer associated with the uninitialized process to the temporary message buffer in main computer memory.
Internode data communications in a parallel computer

Science.gov (United States)

Archer, Charles J.; Blocksome, Michael A.; Miller, Douglas R.; Parker, Jeffrey J.; Ratterman, Joseph D.; Smith, Brian E.

2013-09-03

Internode data communications in a parallel computer that includes compute nodes that each include main memory and a messaging unit, the messaging unit including computer memory and coupling compute nodes for data communications, in which, for each compute node at compute node boot time: a messaging unit allocates, in the messaging unit's computer memory, a predefined number of message buffers, each message buffer associated with a process to be initialized on the compute node; receives, prior to initialization of a particular process on the compute node, a data communications message intended for the particular process; and stores the data communications message in the message buffer associated with the particular process. Upon initialization of the particular process, the process establishes a messaging buffer in main memory of the compute node and copies the data communications message from the message buffer of the messaging unit into the message buffer of main memory.
Characterizing and Mitigating Work Time Inflation in Task Parallel Programs

Directory of Open Access Journals (Sweden)

Stephen L. Olivier

2013-01-01

Full Text Available Task parallelism raises the level of abstraction in shared memory parallel programming to simplify the development of complex applications. However, task parallel applications can exhibit poor performance due to thread idleness, scheduling overheads, and work time inflation – additional time spent by threads in a multithreaded computation beyond the time required to perform the same work in a sequential computation. We identify the contributions of each factor to lost efficiency in various task parallel OpenMP applications and diagnose the causes of work time inflation in those applications. Increased data access latency can cause significant work time inflation in NUMA systems. Our locality framework for task parallel OpenMP programs mitigates this cause of work time inflation. Our extensions to the Qthreads library demonstrate that locality-aware scheduling can improve performance up to 3X compared to the Intel OpenMP task scheduler.
Optimal data replication: A new approach to optimizing parallel EM algorithms on a mesh-connected multiprocessor for 3D PET image reconstruction

International Nuclear Information System (INIS)

Chen, C.M.; Lee, S.Y.

1995-01-01

The EM algorithm promises an estimated image with the maximal likelihood for 3D PET image reconstruction. However, due to its long computation time, the EM algorithm has not been widely used in practice. While several parallel implementations of the EM algorithm have been developed to make the EM algorithm feasible, they do not guarantee an optimal parallelization efficiency. In this paper, the authors propose a new parallel EM algorithm which maximizes the performance by optimizing data replication on a mesh-connected message-passing multiprocessor. To optimize data replication, the authors have formally derived the optimal allocation of shared data, group sizes, integration and broadcasting of replicated data as well as the scheduling of shared data accesses. The proposed parallel EM algorithm has been implemented on an iPSC/860 with 16 PEs. The experimental and theoretical results, which are consistent with each other, have shown that the proposed parallel EM algorithm could improve performance substantially over those using unoptimized data replication
Assessing the National Cancer Institute's SmokefreeMOM Text-Messaging Program for Pregnant Smokers: Pilot Randomized Trial.

Science.gov (United States)

Abroms, Lorien C; Chiang, Shawn; Macherelli, Laura; Leavitt, Leah; Montgomery, Margaret

2017-10-03

Automated text messages on mobile phones have been found to be effective for smoking cessation in adult smokers. This study aims to test the acceptability and feasibility of SmokefreeMOM, a national smoking cessation text-messaging program for pregnant smokers. Participants were recruited from prenatal care and randomized to receive SmokefreeMOM (n=55), an automated smoking cessation text-messaging program, or a control text message quitline referral (n=44). Participants were surveyed by phone at baseline and at 1 month and 3 months after enrollment. Results indicate that the SmokefreeMOM program was highly rated overall and rated more favorably than the control condition in its helpfulness at 3-month follow-up (Pmessaging at both 1-month and 3-month follow-ups (Pmessages, and few participants unsubscribed from the program. There were no significant differences between groups on the use of extra treatment resources or on smoking-related outcomes. However, at the 3-month follow-up, some outcomes favored the intervention group. SmokefreeMOM is acceptable for pregnant smokers. It is recommended that SmokefreeMOM be further refined and evaluated. Clinicaltrials.gov NCT02412956; https://clinicaltrials.gov/ct2/show/NCT02412956 (Archived by WebCite at http://www.webcitation.org/6tcmeRnbC). ©Lorien C Abroms, Shawn Chiang, Laura Macherelli, Leah Leavitt, Margaret Montgomery. Originally published in the Journal of Medical Internet Research (http://www.jmir.org), 03.10.2017.
Control rod drop transient analysis with the coupled parallel code pCTF-PARCSv2.7

International Nuclear Information System (INIS)

Ramos, Enrique; Roman, Jose E.; Abarca, Agustín; Miró, Rafael; Bermejo, Juan A.

2016-01-01

Highlights: • An MPI parallel version of the thermal–hydraulic subchannel code COBRA-TF has been developed. • The parallel code has been coupled to the 3D neutron diffusion code PARCSv2.7. • The new codes are validated with a control rod drop transient. - Abstract: In order to reduce the response time when simulating large reactors in detail, a parallel version of the thermal–hydraulic subchannel code COBRA-TF (CTF) has been developed using the standard Message Passing Interface (MPI). The parallelization is oriented to reactor cells, so it is best suited for models consisting of many cells. The generation of the Jacobian matrix is parallelized, in such a way that each processor is in charge of generating the data associated with a subset of cells. Also, the solution of the linear system of equations is done in parallel, using the PETSc toolkit. With the goal of creating a powerful tool to simulate the reactor core behavior during asymmetrical transients, the 3D neutron diffusion code PARCSv2.7 (PARCS) has been coupled with the parallel version of CTF (pCTF) using the Parallel Virtual Machine (PVM) technology. In order to validate the correctness of the parallel coupled code, a control rod drop transient has been simulated comparing the results with the real experimental measures acquired during an NPP real test.
An extension of the extended parallel process model (EPPM) in television health news: the influence of health consciousness on individual message processing and acceptance.

Science.gov (United States)

Hong, Hyehyun

2011-06-01

The purpose of this study is to examine the role of health consciousness in processing TV news that contains potential health threats and preventive recommendations. Based on the extended parallel process model (Witte, 1992), relationships among health consciousness, perceived severity, perceived susceptibility, perceived response efficacy, perceived self-efficacy, and message acceptance/rejection were hypothesized. Responses collected from 175 participants after viewing four TV health news stories were analyzed using the bootstrapping analysis (Preacher & Hayes, 2008). Results confirmed three mediators (i.e., perceived severity, response efficacy, self-efficacy) in the influence of health consciousness on message acceptance. A negative association found between health consciousness and perceived susceptibility is discussed in relation to characteristics of health conscious individuals and optimistic bias of health risks.
Parallel Volunteer Learning during Youth Programs

Science.gov (United States)

Lesmeister, Marilyn K.; Green, Jeremy; Derby, Amy; Bothum, Candi

2012-01-01

Lack of time is a hindrance for volunteers to participate in educational opportunities, yet volunteer success in an organization is tied to the orientation and education they receive. Meeting diverse educational needs of volunteers can be a challenge for program managers. Scheduling a Volunteer Learning Track for chaperones that is parallel to a…
Communications oriented programming of parallel iterative solutions of sparse linear systems

Science.gov (United States)

Patrick, M. L.; Pratt, T. W.

1986-01-01

Parallel algorithms are developed for a class of scientific computational problems by partitioning the problems into smaller problems which may be solved concurrently. The effectiveness of the resulting parallel solutions is determined by the amount and frequency of communication and synchronization and the extent to which communication can be overlapped with computation. Three different parallel algorithms for solving the same class of problems are presented, and their effectiveness is analyzed from this point of view. The algorithms are programmed using a new programming environment. Run-time statistics and experience obtained from the execution of these programs assist in measuring the effectiveness of these algorithms.
Massive parallel electromagnetic field simulation program JEMS-FDTD design and implementation on jasmin

International Nuclear Information System (INIS)

Li Hanyu; Zhou Haijing; Dong Zhiwei; Liao Cheng; Chang Lei; Cao Xiaolin; Xiao Li

2010-01-01

A large-scale parallel electromagnetic field simulation program JEMS-FDTD(J Electromagnetic Solver-Finite Difference Time Domain) is designed and implemented on JASMIN (J parallel Adaptive Structured Mesh applications INfrastructure). This program can simulate propagation, radiation, couple of electromagnetic field by solving Maxwell equations on structured mesh explicitly with FDTD method. JEMS-FDTD is able to simulate billion-mesh-scale problems on thousands of processors. In this article, the program is verified by simulating the radiation of an electric dipole. A beam waveguide is simulated to demonstrate the capability of large scale parallel computation. A parallel performance test indicates that a high parallel efficiency is obtained. (authors)
Bistatic scattering from a three-dimensional object above a two-dimensional randomly rough surface modeled with the parallel FDTD approach.

Science.gov (United States)

Guo, L-X; Li, J; Zeng, H

2009-11-01

We present an investigation of the electromagnetic scattering from a three-dimensional (3-D) object above a two-dimensional (2-D) randomly rough surface. A Message Passing Interface-based parallel finite-difference time-domain (FDTD) approach is used, and the uniaxial perfectly matched layer (UPML) medium is adopted for truncation of the FDTD lattices, in which the finite-difference equations can be used for the total computation domain by properly choosing the uniaxial parameters. This makes the parallel FDTD algorithm easier to implement. The parallel performance with different number of processors is illustrated for one rough surface realization and shows that the computation time of our parallel FDTD algorithm is dramatically reduced relative to a single-processor implementation. Finally, the composite scattering coefficients versus scattered and azimuthal angle are presented and analyzed for different conditions, including the surface roughness, the dielectric constants, the polarization, and the size of the 3-D object.
The Utility of the Memorable Messages Framework as an Intermediary Evaluation Tool for Fruit and Vegetable Consumption in a Nutrition Education Program.

Science.gov (United States)

Davis, LaShara A; Morgan, Susan E; Mobley, Amy R

2016-06-01

Additional strategies to evaluate the impact of community nutrition education programs on low-income individuals are needed. The objective of this qualitative study was to examine the use of the Memorable Messages Framework as an intermediary nutrition education program evaluation tool to determine what fruit and vegetable messages were reported as memorable and the characteristics of those memorable messages. A convenience sample of low-income, primarily African American adults (N = 58) who previously completed a series of community nutrition education lessons within an urban area of Indiana participated in a focus group (N = 8 focus groups). A lead moderator using a semistructured script conducted the focus groups to determine what information about fruits and vegetables was most memorable from the participants' nutrition lessons and why this information was memorable. All focus group audiotapes were transcribed verbatim and ATLAS.ti software was used to code and identify themes within the data. Participants cited quantity, variety, and the positive nutritional impact of eating fruits and vegetables as most memorable. Information given in the form of recipes was also cited as most memorable. For example, participants referred to the recipe demonstrations as not only fun but also key components of the program that helped with message retention and memorability. Key characteristics of memorable messages included personal relevance and message vividness. These findings indicated that the Memorable Messages Framework may serve as an intermediary program evaluation tool to identify what information and messages are most influential to participants in community nutrition education programs. © 2015 Society for Public Health Education.
Xyce parallel electronic simulator : users' guide.

Energy Technology Data Exchange (ETDEWEB)

Mei, Ting; Rankin, Eric Lamont; Thornquist, Heidi K.; Santarelli, Keith R.; Fixel, Deborah A.; Coffey, Todd Stirling; Russo, Thomas V.; Schiek, Richard Louis; Warrender, Christina E.; Keiter, Eric Richard; Pawlowski, Roger Patrick

2011-05-01

This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: (1) Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). Note that this includes support for most popular parallel and serial computers; (2) Improved performance for all numerical kernels (e.g., time integrator, nonlinear and linear solvers) through state-of-the-art algorithms and novel techniques. (3) Device models which are specifically tailored to meet Sandia's needs, including some radiation-aware devices (for Sandia users only); and (4) Object-oriented code design and implementation using modern coding practices that ensure that the Xyce Parallel Electronic Simulator will be maintainable and extensible far into the future. Xyce is a parallel code in the most general sense of the phrase - a message passing parallel implementation - which allows it to run efficiently on the widest possible number of computing platforms. These include serial, shared-memory and distributed-memory parallel as well as heterogeneous platforms. Careful attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows. The development of Xyce provides a platform for computational research and development aimed specifically at the needs of the Laboratory. With Xyce, Sandia has an 'in-house' capability with which both new electrical (e.g., device model development) and algorithmic (e.g., faster time-integration methods, parallel solver algorithms) research and development can be performed. As a result, Xyce is
Couriers in the Inca Empire: Getting Your Message Across. [Lesson Plan].

Science.gov (United States)

2002

This lesson shows how the Inca communicated across the vast stretches of their mountain realm, the largest empire of the pre-industrial world. The lesson explains how couriers carried messages along mountain-ridge roads, up and down stone steps, and over chasm-spanning footbridges. It states that couriers could pass a message from Quito (Ecuador)…
New adaptive differencing strategy in the PENTRAN 3-d parallel Sn code

International Nuclear Information System (INIS)

Sjoden, G.E.; Haghighat, A.

1996-01-01

It is known that three-dimensional (3-D) discrete ordinates (S n ) transport problems require an immense amount of storage and computational effort to solve. For this reason, parallel codes that offer a capability to completely decompose the angular, energy, and spatial domains among a distributed network of processors are required. One such code recently developed is PENTRAN, which iteratively solves 3-D multi-group, anisotropic S n problems on distributed-memory platforms, such as the IBM-SP2. Because large problems typically contain several different material zones with various properties, available differencing schemes should automatically adapt to the transport physics in each material zone. To minimize the memory and message-passing overhead required for massively parallel S n applications, available differencing schemes in an adaptive strategy should also offer reasonable accuracy and positivity, yet require only the zeroth spatial moment of the transport equation; differencing schemes based on higher spatial moments, in spite of their greater accuracy, require at least twice the amount of storage and communication cost for implementation in a massively parallel transport code. This paper discusses a new adaptive differencing strategy that uses increasingly accurate schemes with low parallel memory and communication overhead. This strategy, implemented in PENTRAN, includes a new scheme, exponential directional averaged (EDA) differencing
Parallel keyed hash function construction based on chaotic maps

International Nuclear Information System (INIS)

Xiao Di; Liao Xiaofeng; Deng Shaojiang

2008-01-01

Recently, a variety of chaos-based hash functions have been proposed. Nevertheless, none of them works efficiently in parallel computing environment. In this Letter, an algorithm for parallel keyed hash function construction is proposed, whose structure can ensure the uniform sensitivity of hash value to the message. By means of the mechanism of both changeable-parameter and self-synchronization, the keystream establishes a close relation with the algorithm key, the content and the order of each message block. The entire message is modulated into the chaotic iteration orbit, and the coarse-graining trajectory is extracted as the hash value. Theoretical analysis and computer simulation indicate that the proposed algorithm can satisfy the performance requirements of hash function. It is simple, efficient, practicable, and reliable. These properties make it a good choice for hash on parallel computing platform

Improving Type Error Messages in OCaml

Directory of Open Access Journals (Sweden)

Arthur Charguéraud

2015-12-01

Full Text Available Cryptic type error messages are a major obstacle to learning OCaml or other ML-based languages. In many cases, error messages cannot be interpreted without a sufficiently-precise model of the type inference algorithm. The problem of improving type error messages in ML has received quite a bit of attention over the past two decades, and many different strategies have been considered. The challenge is not only to produce error messages that are both sufficiently concise and systematically useful to the programmer, but also to handle a full-blown programming language and to cope with large-sized programs efficiently. In this work, we present a modification to the traditional ML type inference algorithm implemented in OCaml that, by significantly reducing the left-to-right bias, allows us to report error messages that are more helpful to the programmer. Our algorithm remains fully predictable and continues to produce fairly concise error messages that always help making some progress towards fixing the code. We implemented our approach as a patch to the OCaml compiler in just a few hundred lines of code. We believe that this patch should benefit not just to beginners, but also to experienced programs developing large-scale OCaml programs.
The simplified spherical harmonics (SPL) methodology with space and moment decomposition in parallel environments

International Nuclear Information System (INIS)

Gianluca, Longoni; Alireza, Haghighat

2003-01-01

In recent years, the SP L (simplified spherical harmonics) equations have received renewed interest for the simulation of nuclear systems. We have derived the SP L equations starting from the even-parity form of the S N equations. The SP L equations form a system of (L+1)/2 second order partial differential equations that can be solved with standard iterative techniques such as the Conjugate Gradient (CG). We discretized the SP L equations with the finite-volume approach in a 3-D Cartesian space. We developed a new 3-D general code, Pensp L (Parallel Environment Neutral-particle SP L ). Pensp L solves both fixed source and criticality eigenvalue problems. In order to optimize the memory management, we implemented a Compressed Diagonal Storage (CDS) to store the SP L matrices. Pensp L includes parallel algorithms for space and moment domain decomposition. The computational load is distributed on different processors, using a mapping function, which maps the 3-D Cartesian space and moments onto processors. The code is written in Fortran 90 using the Message Passing Interface (MPI) libraries for the parallel implementation of the algorithm. The code has been tested on the Pcpen cluster and the parallel performance has been assessed in terms of speed-up and parallel efficiency. (author)
Development and benchmark verification of a parallelized Monte Carlo burnup calculation program MCBMPI

International Nuclear Information System (INIS)

Yang Wankui; Liu Yaoguang; Ma Jimin; Yang Xin; Wang Guanbo

2014-01-01

MCBMPI, a parallelized burnup calculation program, was developed. The program is modularized. Neutron transport calculation module employs the parallelized MCNP5 program MCNP5MPI, and burnup calculation module employs ORIGEN2, with the MPI parallel zone decomposition strategy. The program system only consists of MCNP5MPI and an interface subroutine. The interface subroutine achieves three main functions, i.e. zone decomposition, nuclide transferring and decaying, data exchanging with MCNP5MPI. Also, the program was verified with the Pressurized Water Reactor (PWR) cell burnup benchmark, the results showed that it's capable to apply the program to burnup calculation of multiple zones, and the computation efficiency could be significantly improved with the development of computer hardware. (authors)
Efficacy of Web-Based Collection of Strength-Based Testimonials for Text Message Extension of Youth Suicide Prevention Program: Randomized Controlled Experiment.

Science.gov (United States)

Thiha, Phyo; Pisani, Anthony R; Gurditta, Kunali; Cherry, Erin; Peterson, Derick R; Kautz, Henry; Wyman, Peter A

2016-11-09

Equipping members of a target population to deliver effective public health messaging to peers is an established approach in health promotion. The Sources of Strength program has demonstrated the promise of this approach for "upstream" youth suicide prevention. Text messaging is a well-established medium for promoting behavior change and is the dominant communication medium for youth. In order for peer 'opinion leader' programs like Sources of Strength to use scalable, wide-reaching media such as text messaging to spread peer-to-peer messages, they need techniques for assisting peer opinion leaders in creating effective testimonials to engage peers and match program goals. We developed a Web interface, called Stories of Personal Resilience in Managing Emotions (StoryPRIME), which helps peer opinion leaders write effective, short-form messages that can be delivered to the target population in youth suicide prevention program like Sources of Strength. To determine the efficacy of StoryPRIME, a Web-based interface for remotely eliciting high school peer leaders, and helping them produce high-quality, personal testimonials for use in a text messaging extension of an evidence-based, peer-led suicide prevention program. In a double-blind randomized controlled experiment, 36 high school students wrote testimonials with or without eliciting from the StoryPRIME interface. The interface was created in the context of Sources of Strength-an evidence-based youth suicide prevention program-and 24 ninth graders rated these testimonials on relatability, usefulness/relevance, intrigue, and likability. Testimonials written with the StoryPRIME interface were rated as more relatable, useful/relevant, intriguing, and likable than testimonials written without StoryPRIME, P=.054. StoryPRIME is a promising way to elicit high-quality, personal testimonials from youth for prevention programs that draw on members of a target population to spread public health messages. ©Phyo Thiha, Anthony
Parallel Implementation of Triangular Cellular Automata for Computing Two-Dimensional Elastodynamic Response on Arbitrary Domains

Science.gov (United States)

Leamy, Michael J.; Springer, Adam C.

In this research we report parallel implementation of a Cellular Automata-based simulation tool for computing elastodynamic response on complex, two-dimensional domains. Elastodynamic simulation using Cellular Automata (CA) has recently been presented as an alternative, inherently object-oriented technique for accurately and efficiently computing linear and nonlinear wave propagation in arbitrarily-shaped geometries. The local, autonomous nature of the method should lead to straight-forward and efficient parallelization. We address this notion on symmetric multiprocessor (SMP) hardware using a Java-based object-oriented CA code implementing triangular state machines (i.e., automata) and the MPI bindings written in Java (MPJ Express). We use MPJ Express to reconfigure our existing CA code to distribute a domain's automata to cores present on a dual quad-core shared-memory system (eight total processors). We note that this message passing parallelization strategy is directly applicable to computer clustered computing, which will be the focus of follow-on research. Results on the shared memory platform indicate nearly-ideal, linear speed-up. We conclude that the CA-based elastodynamic simulator is easily configured to run in parallel, and yields excellent speed-up on SMP hardware.
A lightweight, flow-based toolkit for parallel and distributed bioinformatics pipelines

Directory of Open Access Journals (Sweden)

Cieślik Marcin

2011-02-01

Full Text Available Abstract Background Bioinformatic analyses typically proceed as chains of data-processing tasks. A pipeline, or 'workflow', is a well-defined protocol, with a specific structure defined by the topology of data-flow interdependencies, and a particular functionality arising from the data transformations applied at each step. In computer science, the dataflow programming (DFP paradigm defines software systems constructed in this manner, as networks of message-passing components. Thus, bioinformatic workflows can be naturally mapped onto DFP concepts. Results To enable the flexible creation and execution of bioinformatics dataflows, we have written a modular framework for parallel pipelines in Python ('PaPy'. A PaPy workflow is created from re-usable components connected by data-pipes into a directed acyclic graph, which together define nested higher-order map functions. The successive functional transformations of input data are evaluated on flexibly pooled compute resources, either local or remote. Input items are processed in batches of adjustable size, all flowing one to tune the trade-off between parallelism and lazy-evaluation (memory consumption. An add-on module ('NuBio' facilitates the creation of bioinformatics workflows by providing domain specific data-containers (e.g., for biomolecular sequences, alignments, structures and functionality (e.g., to parse/write standard file formats. Conclusions PaPy offers a modular framework for the creation and deployment of parallel and distributed data-processing workflows. Pipelines derive their functionality from user-written, data-coupled components, so PaPy also can be viewed as a lightweight toolkit for extensible, flow-based bioinformatics data-processing. The simplicity and flexibility of distributed PaPy pipelines may help users bridge the gap between traditional desktop/workstation and grid computing. PaPy is freely distributed as open-source Python code at http://muralab.org/PaPy, and
Starting Off on the Best Foot: A Review of Message Framing and Message Tailoring, and Recommendations for the Comprehensive Messaging Strategy for Sustained Behavior Change.

Science.gov (United States)

Pope, J Paige; Pelletier, Luc; Guertin, Camille

2018-09-01

Health promotion programs represent a salient means through which physical activity promoters can cultivate positive health behavior change and maintenance. The messages communicated within these programs serve as an essential component as they are often used to convey valuable information, resources, or tools that facilitate health behavior initiation and sustained engagement. Identifying the most effective way to communicate health promotion information is, therefore, of considerable importance to ensuring that people not only attend to these messages, but also connect with and internalize the information conveyed within them. This paper was written to (1) summarize and evaluate the most prominent reviewed research approaches of message framing and tailoring to message design; and (2) offer a comprehensive messaging strategy to promote sustained health behavior change. A review of the literature demonstrated that a messaging strategy that has consistently led to healthy behavior change has yet to be identified. Furthermore, scholars have articulated that a multi-theoretical approach that places emphasis on facilitating motivation and healthy behavior change needs to be employed. Thus, this paper proposes and provides recommendations for employing the Comprehensive Messaging Strategy for Sustained Behavior Change (CMSSBC), which advocates tailoring messages to peoples' stage of change and framing them to focus on self-determined motives and intrinsic goals.
Text messaging data collection for monitoring an infant feeding intervention program in rural China: feasibility study.

Science.gov (United States)

Li, Ye; Wang, Wei; van Velthoven, Michelle Helena; Chen, Li; Car, Josip; Rudan, Igor; Zhang, Yanfeng; Wu, Qiong; Du, Xiaozhen; Scherpbier, Robert W

2013-12-04

An effective data collection method is crucial for high quality monitoring of health interventions. The traditional face-to-face data collection method is labor intensive, expensive, and time consuming. With the rapid increase of mobile phone subscribers, text messaging has the potential to be used for evaluation of population health interventions in rural China. The objective of this study was to explore the feasibility of using text messaging as a data collection tool to monitor an infant feeding intervention program. Participants were caregivers of children aged 0 to 23 months in rural China who participated in an infant feeding health education program. We used the test-retest method. First, we collected data with a text messaging survey and then with a face-to-face survey for 2 periods of 3 days. We compared the response rate, data agreement, costs, and participants' acceptability of the two methods. Also, we interviewed participants to explore their reasons for not responding to the text messages and the reasons for disagreement in the two methods. In addition, we evaluated the most appropriate time during the day for sending text messages. We included 258 participants; 99 (38.4%) participated in the text messaging survey and 177 (68.6%) in the face-to-face survey. Compared with the face-to-face survey, the text messaging survey had much lower response rates to at least one question (38.4% vs 68.6%) and to all 7 questions (27.9% vs 67.4%) with moderate data agreement (most kappa values between .5 and .75, the intraclass correlation coefficients between .53 to .72). Participants who took part in both surveys gave the same acceptability rating for both methods (median 4.0 for both on a 5-point scale, 1=disliked very much and 5=liked very much). The costs per questionnaire for the text messaging method were much lower than the costs for the face-to-face method: ¥19.7 (US $3.13) versus ¥33.9 (US $5.39) for all questionnaires, and ¥27.1 (US $4.31) versus ¥34
User's guide of parallel program development environment (PPDE). The 2nd edition

Energy Technology Data Exchange (ETDEWEB)

Ueno, Hirokazu; Takemiya, Hiroshi; Imamura, Toshiyuki; Koide, Hiroshi; Matsuda, Katsuyuki; Higuchi, Kenji; Hirayama, Toshio [Center for Promotion of Computational Science and Engineering, Japan Atomic Energy Research Institute, Tokyo (Japan); Ohta, Hirofumi [Hitachi Ltd., Tokyo (Japan)

2000-03-01

The STA basic system has been enhanced to accelerate support for parallel programming on heterogeneous parallel computers, through a series of R and D on the technology of parallel processing. The enhancement has been made through extending the function of the PPDF, Parallel Program Development Environment in the STA basic system. The extended PPDE has the function to make: 1) the automatic creation of a 'makefile' and a shell script file for its execution, 2) the multi-tools execution which makes the tools on heterogeneous computers to execute with one operation a task on a computer, and 3) the mirror composition to reflect editing results of a file on a computer into all related files on other computers. These additional functions will enhance the work efficiency for program development on some computers. More functions have been added to the PPDE to provide help for parallel program development. New functions were also designed to complement a HPF translator and a paralleilizing support tool when working together so that a sequential program is efficiently converted to a parallel program. This report describes the use of extended PPDE. (author)
'Nuclearelectrica' Company messages for a broadly acceptable nuclear power program

International Nuclear Information System (INIS)

Stiopol, Mihaela; Bilegan, C. Iosif

2001-01-01

Romania started the nuclear power program about 20 years ago, by a high level Government decision. After 1989 the former Ministry of Electrical Power was transformed into a state owned company, RENEL, in which nuclear activities were also included. RENEL was a monopoly system responsible for production, transport and distribution of electricity in Romania. The deregulation process in the power sector was many times asked by the World Bank and International Monetary Fund, to split this monopoly system in separately activities: Production, Transport and Distribution. The first step occurred in July 1998, when the nuclear activities were externalized from RENEL. In nuclear sector two new entities were created: SN 'Nuclearelectrica' SA, a state own company that includes three branches: - Nuclear Power Production - Cernavoda NPP Unit 1; - Nuclear Fuel Plant-Pitesti; - Project Development Branch - Cernavoda Units 2-5. The second entity is the so-called Romanian Authority (Autonomous Reggie) for Nuclear Activities (RAAN), including as branches the heavy water fabrication plant 'ROMAG PROD', the Nuclear Research Institute (ICN) Pitesti and the Nuclear Engineering and Design Institute (CITON) Bucharest. The rest of conventional power sector was renamed, namely, CONEL. The organization process continued and in August 2000, by a Government Ordinance the CONEL was split into the following companies: - one for hydropower production 'HIDROELECTRICA'; - one for thermal power production 'TERMOELECTRICA'; - one for transport 'TRANSELECTRICA'; - one for distribution 'ELECTRICA'. The goal of a third step of restructuring process is the privatization in the power field. The steps of Romanian Power Sector Restructuring are presented. Since 1991 a Public Information program has been established. Depending on the evolution of the construction of the first Romanian nuclear power station, during the years, the messages changed. Everybody working in the nuclear field knows how difficult is
Compiling the parallel programming language NestStep to the CELL processor

OpenAIRE

Holm, Magnus

2010-01-01

The goal of this project is to create a source-to-source compiler which will translate NestStep code to C code. The compiler's job is to replace NestStep constructs with a series of function calls to the NestStep runtime system. NestStep is a parallel programming language extension based on the BSP model. It adds constructs for parallel programming on top of an imperative programming language. For this project, only constructs extending the C language are relevant. The output code will compil...
An Optimized Parallel FDTD Topology for Challenging Electromagnetic Simulations on Supercomputers

Directory of Open Access Journals (Sweden)

Shugang Jiang

2015-01-01

Full Text Available It may not be a challenge to run a Finite-Difference Time-Domain (FDTD code for electromagnetic simulations on a supercomputer with more than 10 thousands of CPU cores; however, to make FDTD code work with the highest efficiency is a challenge. In this paper, the performance of parallel FDTD is optimized through MPI (message passing interface virtual topology, based on which a communication model is established. The general rules of optimal topology are presented according to the model. The performance of the method is tested and analyzed on three high performance computing platforms with different architectures in China. Simulations including an airplane with a 700-wavelength wingspan, and a complex microstrip antenna array with nearly 2000 elements are performed very efficiently using a maximum of 10240 CPU cores.
A language for data-parallel and task parallel programming dedicated to multi-SIMD computers. Contributions to hydrodynamic simulation with lattice gases

International Nuclear Information System (INIS)

Pic, Marc Michel

1995-01-01

Parallel programming covers task-parallelism and data-parallelism. Many problems need both parallelisms. Multi-SIMD computers allow hierarchical approach of these parallelisms. The T++ language, based on C++, is dedicated to exploit Multi-SIMD computers using a programming paradigm which is an extension of array-programming to tasks managing. Our language introduced array of independent tasks to achieve separately (MIMD), on subsets of processors of identical behaviour (SIMD), in order to translate the hierarchical inclusion of data-parallelism in task-parallelism. To manipulate in a symmetrical way tasks and data we propose meta-operations which have the same behaviour on tasks arrays and on data arrays. We explain how to implement this language on our parallel computer SYMPHONIE in order to profit by the locally-shared memory, by the hardware virtualization, and by the multiplicity of communications networks. We analyse simultaneously a typical application of such architecture. Finite elements scheme for Fluid mechanic needs powerful parallel computers and requires large floating points abilities. Lattice gases is an alternative to such simulations. Boolean lattice bases are simple, stable, modular, need to floating point computation, but include numerical noise. Boltzmann lattice gases present large precision of computation, but needs floating points and are only locally stable. We propose a new scheme, called multi-bit, who keeps the advantages of each boolean model to which it is applied, with large numerical precision and reduced noise. Experiments on viscosity, physical behaviour, noise reduction and spurious invariants are shown and implementation techniques for parallel Multi-SIMD computers detailed. (author) [fr
Messages about appearance, food, weight and exercise in "tween" television.

Science.gov (United States)

Simpson, Courtney C; Kwitowski, Melissa; Boutte, Rachel; Gow, Rachel W; Mazzeo, Suzanne E

2016-12-01

Tweens (children ages ~8-14years) are a relatively recently defined age group, increasingly targeted by marketers. Individuals in this age group are particularly vulnerable to opinions and behaviors presented in media messages, given their level of cognitive and social development. However, little research has examined messages about appearance, food, weight, and exercise in television specifically targeting tweens, despite the popularity of this media type among this age group. This study used a content analytic approach to explore these messages in the five most popular television shows for tweens on the Disney Channel (as of 2015). Using a multiple-pass approach, relevant content in episodes from the most recently completed seasons of each show was coded. Appearance related incidents occurred in every episode; these most frequently mentioned attractiveness/beauty. Food related incidents were also present in every episode; typically, these situations were appearance and weight neutral. Exercise related incidents occurred in 53.3% of episodes; the majority expressed resistance to exercise. Weight related incidents occurred in 40.0% of the episodes; the majority praised the muscular ideal. Women were more likely to initiate appearance incidents, and men were more likely to initiate exercise incidents. These results suggest that programs specifically marketed to tweens reinforce appearance ideals, including stereotypes about female attractiveness and male athleticism, two constructs linked to eating pathology and body dissatisfaction. Given the developmental vulnerability of the target group, these findings are concerning, and highlight potential foci for prevention programming, including media literacy, for tweens. Copyright Â© 2016 Elsevier Ltd. All rights reserved.
Contributions to computational stereology and parallel programming

DEFF Research Database (Denmark)

Rasmusson, Allan

rotator, even without the need for isotropic sections. To meet the need for computational power to perform image restoration of virtual tissue sections, parallel programming on GPUs has also been part of the project. This has lead to a significant change in paradigm for a previously developed surgical...
Attributes of Candidates Passing the ABS Certifying Examination on the First Attempt-Program Directors׳ Perspective.

Science.gov (United States)

Sheikh, Mohd Raashid; Hulme, Michael

2016-01-01

The American Board of Surgery Certifying Examination (CE) is a pivotal event in a surgeon's career development, as it is the last challenge before achieving Board certification. First-time pass rate on the CE is one of the key metrics of surgery residency programs. The overall pass rate on the CE has declined significantly in recent years. The goal of this study was the identification of attributes of general surgery residents that are associated with passing the CE at the first attempt. The modified Delphi process was used to survey general surgery program directors. The study was conducted in 2 rounds in the interest of time available for surgical education research fellowship project. All 259 program directors were contacted in each round of surveys. In all, 49 (19%) responded to the first round and 54 (21%) responded to the second round of survey. The characteristics of a successful resident on CE include confidence, self-motivation, sound knowledge base, strong performance on the Board's training examination (American Board of Surgery In-Training Examination), and mock orals, and good communication skills. Postgraduate years 4 and 5 are the most likely resident levels at which failure could be predicted. Copyright © 2015 Association of Program Directors in Surgery. Published by Elsevier Inc. All rights reserved.
Our messages for a broadly acceptable national nuclear program

International Nuclear Information System (INIS)

Stiopol, Mihaela; Bilegan, Iosif Constantin

2001-01-01

Full text: Romania started the nuclear power program some 20 years ago by a high level Government decision. During that time no one asked and nobody explained to the people why a NPP is so much required. Before revolution was forbidden to talk or to write about nuclear matters. After the revolution many changes have occurred even in this field. The former ministry of electrical power was transformed into a state owned company 'RENEL' in which were also included the nuclear activities. RENEL was a monopoly responsible for production, transportation and distribution of electricity in Romania. The restructuring process in the energy field was many times asked by the World Bank and International monetary Fund - to split this monopoly system in separately activities: Production, Transportation and Distribution. The first step happened in July 1998, when the nuclear activities were externalised from RENEL, and two new entities were created: 'Nuclearelectrica' National Company - a state own company which includes three branches: - Nuclear Power Production - Cernavoda NPP - Unit 1; - Nuclear Fuel Plant - Pitesti; - Project Development Branch - Cernavoda - Unit 2-5. The second entity is so-called Autonomous Reggie for Nuclear Activities including the Heavy Water production, Nuclear Research Institute and the nuclear engineering activities - CITON. The restructuring process continued and in August 2000, By a Government Ordinance the rest of RENEL was split in more companies: - one for Hydropower production; - one for thermal power production; - one for transport; - one for distribution. The goal of a third step of restructuring process is the privatisation in the power field. Since 1991 a Public Information program has been established and it followed the usual steps. Depending on the evolution of the construction of the first Romanian nuclear power during the years the messages changed. Everybody working in the nuclear field knows now how difficult is to build the
Heterogeneous Multicore Parallel Programming for Graphics Processing Units

Directory of Open Access Journals (Sweden)

Francois Bodin

2009-01-01

Full Text Available Hybrid parallel multicore architectures based on graphics processing units (GPUs can provide tremendous computing power. Current NVIDIA and AMD Graphics Product Group hardware display a peak performance of hundreds of gigaflops. However, exploiting GPUs from existing applications is a difficult task that requires non-portable rewriting of the code. In this paper, we present HMPP, a Heterogeneous Multicore Parallel Programming workbench with compilers, developed by CAPS entreprise, that allows the integration of heterogeneous hardware accelerators in a unintrusive manner while preserving the legacy code.
Intranode data communications in a parallel computer

Science.gov (United States)

Archer, Charles J; Blocksome, Michael A; Miller, Douglas R; Ratterman, Joseph D; Smith, Brian E

2014-01-07

Intranode data communications in a parallel computer that includes compute nodes configured to execute processes, where the data communications include: allocating, upon initialization of a first process of a computer node, a region of shared memory; establishing, by the first process, a predefined number of message buffers, each message buffer associated with a process to be initialized on the compute node; sending, to a second process on the same compute node, a data communications message without determining whether the second process has been initialized, including storing the data communications message in the message buffer of the second process; and upon initialization of the second process: retrieving, by the second process, a pointer to the second process's message buffer; and retrieving, by the second process from the second process's message buffer in dependence upon the pointer, the data communications message sent by the first process.
Intranode data communications in a parallel computer

Science.gov (United States)

Archer, Charles J; Blocksome, Michael A; Miller, Douglas R; Ratterman, Joseph D; Smith, Brian E

2013-07-23

Intranode data communications in a parallel computer that includes compute nodes configured to execute processes, where the data communications include: allocating, upon initialization of a first process of a compute node, a region of shared memory; establishing, by the first process, a predefined number of message buffers, each message buffer associated with a process to be initialized on the compute node; sending, to a second process on the same compute node, a data communications message without determining whether the second process has been initialized, including storing the data communications message in the message buffer of the second process; and upon initialization of the second process: retrieving, by the second process, a pointer to the second process's message buffer; and retrieving, by the second process from the second process's message buffer in dependence upon the pointer, the data communications message sent by the first process.

Boys to Men: Sports Media. Messages about Masculinity: A National Poll of Children, Focus Groups, and Content Analysis of Sports Programs and Commercials.

Science.gov (United States)

Messner, Mike; Hunt, Darnell; Dunbar, Michele; Chen, Perry; Lapp, Joan; Miller, Patti

Sports programming plays a significant role in the media messages that American boys receive today. To explore the messages that sports programming presents to its audience, this report relates the findings of a study that analyzed a representative selection of sports programs and their accompanying commercials; also presented are findings from a…
Text Messaging Data Collection for Monitoring an Infant Feeding Intervention Program in Rural China: Feasibility Study

Science.gov (United States)

van Velthoven, Michelle Helena; Chen, Li; Car, Josip; Rudan, Igor; Wu, Qiong; Du, Xiaozhen; Scherpbier, Robert W

2013-01-01

Background An effective data collection method is crucial for high quality monitoring of health interventions. The traditional face-to-face data collection method is labor intensive, expensive, and time consuming. With the rapid increase of mobile phone subscribers, text messaging has the potential to be used for evaluation of population health interventions in rural China. Objective The objective of this study was to explore the feasibility of using text messaging as a data collection tool to monitor an infant feeding intervention program. Methods Participants were caregivers of children aged 0 to 23 months in rural China who participated in an infant feeding health education program. We used the test-retest method. First, we collected data with a text messaging survey and then with a face-to-face survey for 2 periods of 3 days. We compared the response rate, data agreement, costs, and participants’ acceptability of the two methods. Also, we interviewed participants to explore their reasons for not responding to the text messages and the reasons for disagreement in the two methods. In addition, we evaluated the most appropriate time during the day for sending text messages. Results We included 258 participants; 99 (38.4%) participated in the text messaging survey and 177 (68.6%) in the face-to-face survey. Compared with the face-to-face survey, the text messaging survey had much lower response rates to at least one question (38.4% vs 68.6%) and to all 7 questions (27.9% vs 67.4%) with moderate data agreement (most kappa values between .5 and .75, the intraclass correlation coefficients between .53 to .72). Participants who took part in both surveys gave the same acceptability rating for both methods (median 4.0 for both on a 5-point scale, 1=disliked very much and 5=liked very much). The costs per questionnaire for the text messaging method were much lower than the costs for the face-to-face method: ¥19.7 (US $3.13) versus ¥33.9 (US $5.39) for all
Parallel computation

International Nuclear Information System (INIS)

Jejcic, A.; Maillard, J.; Maurel, G.; Silva, J.; Wolff-Bacha, F.

1997-01-01

The work in the field of parallel processing has developed as research activities using several numerical Monte Carlo simulations related to basic or applied current problems of nuclear and particle physics. For the applications utilizing the GEANT code development or improvement works were done on parts simulating low energy physical phenomena like radiation, transport and interaction. The problem of actinide burning by means of accelerators was approached using a simulation with the GEANT code. A program of neutron tracking in the range of low energies up to the thermal region has been developed. It is coupled to the GEANT code and permits in a single pass the simulation of a hybrid reactor core receiving a proton burst. Other works in this field refers to simulations for nuclear medicine applications like, for instance, development of biological probes, evaluation and characterization of the gamma cameras (collimators, crystal thickness) as well as the method for dosimetric calculations. Particularly, these calculations are suited for a geometrical parallelization approach especially adapted to parallel machines of the TN310 type. Other works mentioned in the same field refer to simulation of the electron channelling in crystals and simulation of the beam-beam interaction effect in colliders. The GEANT code was also used to simulate the operation of germanium detectors designed for natural and artificial radioactivity monitoring of environment
The Utility of the Memorable Messages Framework as an Intermediary Evaluation Tool for Fruit and Vegetable Consumption in a Nutrition Education Program

Science.gov (United States)

Davis, LaShara A.; Morgan, Susan E.; Mobley, Amy R.

2016-01-01

Additional strategies to evaluate the impact of community nutrition education programs on low-income individuals are needed. The objective of this qualitative study was to examine the use of the Memorable Messages Framework as an intermediary nutrition education program evaluation tool to determine what fruit and vegetable messages were reported…
Development of Parallel Computing Framework to Enhance Radiation Transport Code Capabilities for Rare Isotope Beam Facility Design

Energy Technology Data Exchange (ETDEWEB)

Kostin, Mikhail [Michigan State Univ., East Lansing, MI (United States); Mokhov, Nikolai [Fermi National Accelerator Lab. (FNAL), Batavia, IL (United States); Niita, Koji [Research Organization for Information Science and Technology, Ibaraki-ken (Japan)

2013-09-25

A parallel computing framework has been developed to use with general-purpose radiation transport codes. The framework was implemented as a C++ module that uses MPI for message passing. It is intended to be used with older radiation transport codes implemented in Fortran77, Fortran 90 or C. The module is significantly independent of radiation transport codes it can be used with, and is connected to the codes by means of a number of interface functions. The framework was developed and tested in conjunction with the MARS15 code. It is possible to use it with other codes such as PHITS, FLUKA and MCNP after certain adjustments. Besides the parallel computing functionality, the framework offers a checkpoint facility that allows restarting calculations with a saved checkpoint file. The checkpoint facility can be used in single process calculations as well as in the parallel regime. The framework corrects some of the known problems with the scheduling and load balancing found in the original implementations of the parallel computing functionality in MARS15 and PHITS. The framework can be used efficiently on homogeneous systems and networks of workstations, where the interference from the other users is possible.
Parallel programming of saccades during natural scene viewing: evidence from eye movement positions.

Science.gov (United States)

Wu, Esther X W; Gilani, Syed Omer; van Boxtel, Jeroen J A; Amihai, Ido; Chua, Fook Kee; Yen, Shih-Cheng

2013-10-24

Previous studies have shown that saccade plans during natural scene viewing can be programmed in parallel. This evidence comes mainly from temporal indicators, i.e., fixation durations and latencies. In the current study, we asked whether eye movement positions recorded during scene viewing also reflect parallel programming of saccades. As participants viewed scenes in preparation for a memory task, their inspection of the scene was suddenly disrupted by a transition to another scene. We examined whether saccades after the transition were invariably directed immediately toward the center or were contingent on saccade onset times relative to the transition. The results, which showed a dissociation in eye movement behavior between two groups of saccades after the scene transition, supported the parallel programming account. Saccades with relatively long onset times (>100 ms) after the transition were directed immediately toward the center of the scene, probably to restart scene exploration. Saccades with short onset times (programming of saccades during scene viewing. Additionally, results from the analyses of intersaccadic intervals were also consistent with the parallel programming hypothesis.
Basic design of parallel computational program for probabilistic structural analysis

International Nuclear Information System (INIS)

Kaji, Yoshiyuki; Arai, Taketoshi; Gu, Wenwei; Nakamura, Hitoshi

1999-06-01

In our laboratory, for 'development of damage evaluation method of structural brittle materials by microscopic fracture mechanics and probabilistic theory' (nuclear computational science cross-over research) we examine computational method related to super parallel computation system which is coupled with material strength theory based on microscopic fracture mechanics for latent cracks and continuum structural model to develop new structural reliability evaluation methods for ceramic structures. This technical report is the review results regarding probabilistic structural mechanics theory, basic terms of formula and program methods of parallel computation which are related to principal terms in basic design of computational mechanics program. (author)
Basic design of parallel computational program for probabilistic structural analysis

Energy Technology Data Exchange (ETDEWEB)

Kaji, Yoshiyuki; Arai, Taketoshi [Japan Atomic Energy Research Inst., Tokai, Ibaraki (Japan). Tokai Research Establishment; Gu, Wenwei; Nakamura, Hitoshi

1999-06-01

In our laboratory, for `development of damage evaluation method of structural brittle materials by microscopic fracture mechanics and probabilistic theory` (nuclear computational science cross-over research) we examine computational method related to super parallel computation system which is coupled with material strength theory based on microscopic fracture mechanics for latent cracks and continuum structural model to develop new structural reliability evaluation methods for ceramic structures. This technical report is the review results regarding probabilistic structural mechanics theory, basic terms of formula and program methods of parallel computation which are related to principal terms in basic design of computational mechanics program. (author)
Baseline Motivation Type as a Predictor of Dropout in a Healthy Eating Text Messaging Program.

Science.gov (United States)

Coa, Kisha; Patrick, Heather

2016-09-29

Growing evidence suggests that text messaging programs are effective in facilitating health behavior change. However, high dropout rates limit the potential effectiveness of these programs. This paper describes patterns of early dropout in the HealthyYou text (HYTxt) program, with a focus on the impact of baseline motivation quality on dropout, as characterized by Self-Determination Theory (SDT). This analysis included 193 users of HYTxt, a diet and physical activity text messaging intervention developed by the US National Cancer Institute. Descriptive statistics were computed, and logistic regression models were run to examine the association between baseline motivation type and early program dropout. Overall, 43.0% (83/193) of users dropped out of the program; of these, 65.1% (54/83; 28.0% of all users) did so within the first 2 weeks. Users with higher autonomous motivation had significantly lower odds of dropping out within the first 2 weeks. A one unit increase in autonomous motivation was associated with lower odds (odds ratio 0.44, 95% CI 0.24-0.81) of early dropout, which persisted after adjusting for level of controlled motivation. Applying SDT-based strategies to enhance autonomous motivation might reduce early dropout rates, which can improve program exposure and effectiveness.
Parallel Fortran-MPI software for numerical inversion of the Laplace transform and its application to oscillatory water levels in groundwater environments

Science.gov (United States)

Zhan, X.

2005-01-01

A parallel Fortran-MPI (Message Passing Interface) software for numerical inversion of the Laplace transform based on a Fourier series method is developed to meet the need of solving intensive computational problems involving oscillatory water level's response to hydraulic tests in a groundwater environment. The software is a parallel version of ACM (The Association for Computing Machinery) Transactions on Mathematical Software (TOMS) Algorithm 796. Running 38 test examples indicated that implementation of MPI techniques with distributed memory architecture speedups the processing and improves the efficiency. Applications to oscillatory water levels in a well during aquifer tests are presented to illustrate how this package can be applied to solve complicated environmental problems involved in differential and integral equations. The package is free and is easy to use for people with little or no previous experience in using MPI but who wish to get off to a quick start in parallel computing. ?? 2004 Elsevier Ltd. All rights reserved.
Parallel SOR methods with a parabolic-diffusion acceleration technique for solving an unstructured-grid Poisson equation on 3D arbitrary geometries

Science.gov (United States)

Zapata, M. A. Uh; Van Bang, D. Pham; Nguyen, K. D.

2016-05-01

This paper presents a parallel algorithm for the finite-volume discretisation of the Poisson equation on three-dimensional arbitrary geometries. The proposed method is formulated by using a 2D horizontal block domain decomposition and interprocessor data communication techniques with message passing interface. The horizontal unstructured-grid cells are reordered according to the neighbouring relations and decomposed into blocks using a load-balanced distribution to give all processors an equal amount of elements. In this algorithm, two parallel successive over-relaxation methods are presented: a multi-colour ordering technique for unstructured grids based on distributed memory and a block method using reordering index following similar ideas of the partitioning for structured grids. In all cases, the parallel algorithms are implemented with a combination of an acceleration iterative solver. This solver is based on a parabolic-diffusion equation introduced to obtain faster solutions of the linear systems arising from the discretisation. Numerical results are given to evaluate the performances of the methods showing speedups better than linear.
Domain decomposition parallel computing for transient two-phase flow of nuclear reactors

Energy Technology Data Exchange (ETDEWEB)

Lee, Jae Ryong; Yoon, Han Young [KAERI, Daejeon (Korea, Republic of); Choi, Hyoung Gwon [Seoul National University, Seoul (Korea, Republic of)

2016-05-15

KAERI (Korea Atomic Energy Research Institute) has been developing a multi-dimensional two-phase flow code named CUPID for multi-physics and multi-scale thermal hydraulics analysis of Light water reactors (LWRs). The CUPID code has been validated against a set of conceptual problems and experimental data. In this work, the CUPID code has been parallelized based on the domain decomposition method with Message passing interface (MPI) library. For domain decomposition, the CUPID code provides both manual and automatic methods with METIS library. For the effective memory management, the Compressed sparse row (CSR) format is adopted, which is one of the methods to represent the sparse asymmetric matrix. CSR format saves only non-zero value and its position (row and column). By performing the verification for the fundamental problem set, the parallelization of the CUPID has been successfully confirmed. Since the scalability of a parallel simulation is generally known to be better for fine mesh system, three different scales of mesh system are considered: 40000 meshes for coarse mesh system, 320000 meshes for mid-size mesh system, and 2560000 meshes for fine mesh system. In the given geometry, both single- and two-phase calculations were conducted. In addition, two types of preconditioners for a matrix solver were compared: Diagonal and incomplete LU preconditioner. In terms of enhancement of the parallel performance, the OpenMP and MPI hybrid parallel computing for a pressure solver was examined. It is revealed that the scalability of hybrid calculation was enhanced for the multi-core parallel computation.
Reconfigurable multi-DSP parallel computing architecture based on DSM%基于DSM的可重构多DSP并行处理架构

Institute of Scientific and Technical Information of China (English)

程鑫; 吴华春

2012-01-01

提出一种基于DSM的可在线重构多DSP并行处理架构,采用基于自定义内部总线的信息传递服务,在分布式物理内存上实现了统一编址的共享内存模型,减小了DSP之间的数据传递开销；设计基于VME总线的在线重构来实现针对消息传递服务的重定义,增强了并行计算架构的通用性.实验表明,采用此DSM能减小了并行DSP对共享数据同步访问开销,满足多轴精密同步运动控制系统需求.%A design of reconfigurable multi-digital signal processor (DSP) parallel computing architecture based on distributed shared memory (DSM) was proposed. A message-passing communication based on the user-defined internal bus (IB) was designed to implement a shared memory model on physically distributed memory, which decreased the data transmission overhead. Online reconfiguration mechanism was designed to implement message-passing communication reconfiguration, which in-creasd the universality of parallel architecture. The experiment shows that adopting the DSM introduced can reduce simultaneous access overhead to shared data, which satisfies the requirements of ultra-precise multi-axis motion control system.
Modifiable variables in physical therapy education programs associated with first-time and three-year National Physical Therapy Examination pass rates in the United States

Directory of Open Access Journals (Sweden)

Chad Cook

2015-09-01

Full Text Available Purpose: This study aimed to examine the modifiable programmatic characteristics reflected in the Commission on Accreditation in Physical Therapy Education (CAPTE Annual Accreditation Report for all accredited programs that reported pass rates on the National Physical Therapist Examination, and to build a predictive model for first-time and three-year ultimate pass rates. Methods: This observational study analyzed programmatic information from the 185 CAPTE-accredited physical therapy programs in the United States and Puerto Rico out of a total of 193 programs that provided the first-time and three-year ultimate pass rates in 2011. Fourteen predictive variables representing student selection and composition, clinical education length and design, and general program length and design were analyzed against first-time pass rates and ultimate pass rates on the NPTE. Univariate and multivariate multinomial regression analysis for first-time pass rates and logistic regression analysis for three-year ultimate pass rates were performed. Results: The variables associated with the first-time pass rate in the multivariate analysis were the mean undergraduate grade point average (GPA and the average age of the cohort. Multivariate analysis showed that mean undergraduate GPA was associated with the three-year ultimate pass rate. Conclusions: Mean undergraduate GPA was found to be the only modifiable predictor for both first-time and three-year pass rates among CAPTE-accredited physical therapy programs.
On the Performance of the Python Programming Language for Serial and Parallel Scientific Computations

Directory of Open Access Journals (Sweden)

Xing Cai

2005-01-01

Full Text Available This article addresses the performance of scientific applications that use the Python programming language. First, we investigate several techniques for improving the computational efficiency of serial Python codes. Then, we discuss the basic programming techniques in Python for parallelizing serial scientific applications. It is shown that an efficient implementation of the array-related operations is essential for achieving good parallel performance, as for the serial case. Once the array-related operations are efficiently implemented, probably using a mixed-language implementation, good serial and parallel performance become achievable. This is confirmed by a set of numerical experiments. Python is also shown to be well suited for writing high-level parallel programs.
MulticoreBSP for C : A high-performance library for shared-memory parallel programming

NARCIS (Netherlands)

Yzelman, A. N.; Bisseling, R. H.; Roose, D.; Meerbergen, K.

2014-01-01

The bulk synchronous parallel (BSP) model, as well as parallel programming interfaces based on BSP, classically target distributed-memory parallel architectures. In earlier work, Yzelman and Bisseling designed a MulticoreBSP for Java library specifically for shared-memory architectures. In the
A multithreaded parallel implementation of a dynamic programming algorithm for sequence comparison.

Science.gov (United States)

Martins, W S; Del Cuvillo, J B; Useche, F J; Theobald, K B; Gao, G R

2001-01-01

This paper discusses the issues involved in implementing a dynamic programming algorithm for biological sequence comparison on a general-purpose parallel computing platform based on a fine-grain event-driven multithreaded program execution model. Fine-grain multithreading permits efficient parallelism exploitation in this application both by taking advantage of asynchronous point-to-point synchronizations and communication with low overheads and by effectively tolerating latency through the overlapping of computation and communication. We have implemented our scheme on EARTH, a fine-grain event-driven multithreaded execution and architecture model which has been ported to a number of parallel machines with off-the-shelf processors. Our experimental results show that the dynamic programming algorithm can be efficiently implemented on EARTH systems with high performance (e.g., speedup of 90 on 120 nodes), good programmability and reasonable cost.
High performance parallelism pearls 2 multicore and many-core programming approaches

CERN Document Server

Jeffers, Jim

2015-01-01

High Performance Parallelism Pearls Volume 2 offers another set of examples that demonstrate how to leverage parallelism. Similar to Volume 1, the techniques included here explain how to use processors and coprocessors with the same programming - illustrating the most effective ways to combine Xeon Phi coprocessors with Xeon and other multicore processors. The book includes examples of successful programming efforts, drawn from across industries and domains such as biomed, genetics, finance, manufacturing, imaging, and more. Each chapter in this edited work includes detailed explanations of t
Towards Interactive Visual Exploration of Parallel Programs using a Domain-Specific Language

KAUST Repository

Klein, Tobias

2016-04-19

The use of GPUs and the massively parallel computing paradigm have become wide-spread. We describe a framework for the interactive visualization and visual analysis of the run-time behavior of massively parallel programs, especially OpenCL kernels. This facilitates understanding a program\\'s function and structure, finding the causes of possible slowdowns, locating program bugs, and interactively exploring and visually comparing different code variants in order to improve performance and correctness. Our approach enables very specific, user-centered analysis, both in terms of the recording of the run-time behavior and the visualization itself. Instead of having to manually write instrumented code to record data, simple code annotations tell the source-to-source compiler which code instrumentation to generate automatically. The visualization part of our framework then enables the interactive analysis of kernel run-time behavior in a way that can be very specific to a particular problem or optimization goal, such as analyzing the causes of memory bank conflicts or understanding an entire parallel algorithm.
Wideband and flat-gain amplifier based on high concentration erbium-doped fibres in parallel double-pass configuration

International Nuclear Information System (INIS)

Hamida, B A; Cheng, X S; Harun, S W; Naji, A W; Arof, H; Al-Khateeb, W; Khan, S; Ahmad, H

2012-01-01

A wideband and flat gain erbium-doped fibre amplifier (EDFA) is demonstrated using a hybrid gain medium of a zirconiabased erbium-doped fibre (Zr-EDF) and a high concentration erbium-doped fibre (EDF). The amplifier has two stages comprising a 2-m-long ZEDF and 9-m-long EDF optimised for C- and L-band operations, respectively, in a double-pass parallel configuration. A chirp fibre Bragg grating (CFBG) is used in both stages to ensure double propagation of the signal and thus to increase the attainable gain in both C- and L-band regions. At an input signal power of 0 dBm, a flat gain of 15 dB is achieved with a gain variation of less than 0.5 dB within a wide wavelength range from 1530 to 1605 nm. The corresponding noise figure varies from 6.2 to 10.8 dB within this wavelength region.

What motivates consumers to re-tweet brand content? The impact of information, emotion, and traceability on pass-along behavior

NARCIS (Netherlands)

Araujo, T.; Neijens, P.; Vliegenthart, R.

2015-01-01

How do certain cues influence pass-along behavior (re-Tweeting) of brand messages on Twitter? Analyzing 19,343 global brand messages over a three-year period, the authors of this article found that informational cues were predictors of higher levels of re-Tweeting, particularly product details and
Run-Time and Compiler Support for Programming in Adaptive Parallel Environments

Directory of Open Access Journals (Sweden)

Guy Edjlali

1997-01-01

Full Text Available For better utilization of computing resources, it is important to consider parallel programming environments in which the number of available processors varies at run-time. In this article, we discuss run-time support for data-parallel programming in such an adaptive environment. Executing programs in an adaptive environment requires redistributing data when the number of processors changes, and also requires determining new loop bounds and communication patterns for the new set of processors. We have developed a run-time library to provide this support. We discuss how the run-time library can be used by compilers of high-performance Fortran (HPF-like languages to generate code for an adaptive environment. We present performance results for a Navier-Stokes solver and a multigrid template run on a network of workstations and an IBM SP-2. Our experiments show that if the number of processors is not varied frequently, the cost of data redistribution is not significant compared to the time required for the actual computation. Overall, our work establishes the feasibility of compiling HPF for a network of nondedicated workstations, which are likely to be an important resource for parallel programming in the future.
Three-Dimensional Induced Polarization Parallel Inversion Using Nonlinear Conjugate Gradients Method

Directory of Open Access Journals (Sweden)

Huan Ma

2015-01-01

Full Text Available Four kinds of array of induced polarization (IP methods (surface, borehole-surface, surface-borehole, and borehole-borehole are widely used in resource exploration. However, due to the presence of large amounts of the sources, it will take much time to complete the inversion. In the paper, a new parallel algorithm is described which uses message passing interface (MPI and graphics processing unit (GPU to accelerate 3D inversion of these four methods. The forward finite differential equation is solved by ILU0 preconditioner and the conjugate gradient (CG solver. The inverse problem is solved by nonlinear conjugate gradients (NLCG iteration which is used to calculate one forward and two “pseudo-forward” modelings and update the direction, space, and model in turn. Because each source is independent in forward and “pseudo-forward” modelings, multiprocess modes are opened by calling MPI library. The iterative matrix solver within CULA is called in each process. Some tables and synthetic data examples illustrate that this parallel inversion algorithm is effective. Furthermore, we demonstrate that the joint inversion of surface and borehole data produces resistivity and chargeability results are superior to those obtained from inversions of individual surface data.
Engineered cell-cell communication via DNA messaging

Directory of Open Access Journals (Sweden)

Ortiz Monica E

2012-09-01

Full Text Available Abstract Background Evolution has selected for organisms that benefit from genetically encoded cell-cell communication. Engineers have begun to repurpose elements of natural communication systems to realize programmed pattern formation and coordinate other population-level behaviors. However, existing engineered systems rely on system-specific small molecules to send molecular messages among cells. Thus, the information transmission capacity of current engineered biological communication systems is physically limited by specific biomolecules that are capable of sending only a single message, typically “regulate transcription.” Results We have engineered a cell-cell communication platform using bacteriophage M13 gene products to autonomously package and deliver heterologous DNA messages of varying lengths and encoded functions. We demonstrate the decoupling of messages from a common communication channel via the autonomous transmission of various arbitrary genetic messages. Further, we increase the range of engineered DNA messaging across semisolid media by linking message transmission or receipt to active cellular chemotaxis. Conclusions We demonstrate decoupling of a communication channel from message transmission within engineered biological systems via the autonomous targeted transduction of user-specified heterologous DNA messages. We also demonstrate that bacteriophage M13 particle production and message transduction occurs among chemotactic bacteria. We use chemotaxis to improve the range of DNA messaging, increasing both transmission distance and communication bit rates relative to existing small molecule-based communication systems. We postulate that integration of different engineered cell-cell communication platforms will allow for more complex spatial programming of dynamic cellular consortia.
Xyce parallel electronic simulator : users' guide. Version 5.1.

Energy Technology Data Exchange (ETDEWEB)

Mei, Ting; Rankin, Eric Lamont; Thornquist, Heidi K.; Santarelli, Keith R.; Fixel, Deborah A.; Coffey, Todd Stirling; Russo, Thomas V.; Schiek, Richard Louis; Keiter, Eric Richard; Pawlowski, Roger Patrick

2009-11-01

This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: (1) Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). Note that this includes support for most popular parallel and serial computers. (2) Improved performance for all numerical kernels (e.g., time integrator, nonlinear and linear solvers) through state-of-the-art algorithms and novel techniques. (3) Device models which are specifically tailored to meet Sandia's needs, including some radiation-aware devices (for Sandia users only). (4) Object-oriented code design and implementation using modern coding practices that ensure that the Xyce Parallel Electronic Simulator will be maintainable and extensible far into the future. Xyce is a parallel code in the most general sense of the phrase - a message passing parallel implementation - which allows it to run efficiently on the widest possible number of computing platforms. These include serial, shared-memory and distributed-memory parallel as well as heterogeneous platforms. Careful attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows. The development of Xyce provides a platform for computational research and development aimed specifically at the needs of the Laboratory. With Xyce, Sandia has an 'in-house' capability with which both new electrical (e.g., device model development) and algorithmic (e.g., faster time-integration methods, parallel solver algorithms) research and development can be performed. As a result, Xyce is a
Xyce Parallel Electronic Simulator : users' guide, version 4.1.

Energy Technology Data Exchange (ETDEWEB)

Mei, Ting; Rankin, Eric Lamont; Thornquist, Heidi K.; Santarelli, Keith R.; Fixel, Deborah A.; Coffey, Todd Stirling; Russo, Thomas V.; Schiek, Richard Louis; Keiter, Eric Richard; Pawlowski, Roger Patrick

2009-02-01

This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: (1) Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). Note that this includes support for most popular parallel and serial computers. (2) Improved performance for all numerical kernels (e.g., time integrator, nonlinear and linear solvers) through state-of-the-art algorithms and novel techniques. (3) Device models which are specifically tailored to meet Sandia's needs, including some radiation-aware devices (for Sandia users only). (4) Object-oriented code design and implementation using modern coding practices that ensure that the Xyce Parallel Electronic Simulator will be maintainable and extensible far into the future. Xyce is a parallel code in the most general sense of the phrase - a message passing parallel implementation - which allows it to run efficiently on the widest possible number of computing platforms. These include serial, shared-memory and distributed-memory parallel as well as heterogeneous platforms. Careful attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows. The development of Xyce provides a platform for computational research and development aimed specifically at the needs of the Laboratory. With Xyce, Sandia has an 'in-house' capability with which both new electrical (e.g., device model development) and algorithmic (e.g., faster time-integration methods, parallel solver algorithms) research and development can be performed. As a result, Xyce is a
Analysis of parallel computing performance of the code MCNP

International Nuclear Information System (INIS)

Wang Lei; Wang Kan; Yu Ganglin

2006-01-01

Parallel computing can reduce the running time of the code MCNP effectively. With the MPI message transmitting software, MCNP5 can achieve its parallel computing on PC cluster with Windows operating system. Parallel computing performance of MCNP is influenced by factors such as the type, the complexity level and the parameter configuration of the computing problem. This paper analyzes the parallel computing performance of MCNP regarding with these factors and gives measures to improve the MCNP parallel computing performance. (authors)
The simplified spherical harmonics (SP{sub L}) methodology with space and moment decomposition in parallel environments

Energy Technology Data Exchange (ETDEWEB)

Gianluca, Longoni; Alireza, Haghighat [Florida University, Nuclear and Radiological Engineering Department, Gainesville, FL (United States)

2003-07-01

In recent years, the SP{sub L} (simplified spherical harmonics) equations have received renewed interest for the simulation of nuclear systems. We have derived the SP{sub L} equations starting from the even-parity form of the S{sub N} equations. The SP{sub L} equations form a system of (L+1)/2 second order partial differential equations that can be solved with standard iterative techniques such as the Conjugate Gradient (CG). We discretized the SP{sub L} equations with the finite-volume approach in a 3-D Cartesian space. We developed a new 3-D general code, Pensp{sub L} (Parallel Environment Neutral-particle SP{sub L}). Pensp{sub L} solves both fixed source and criticality eigenvalue problems. In order to optimize the memory management, we implemented a Compressed Diagonal Storage (CDS) to store the SP{sub L} matrices. Pensp{sub L} includes parallel algorithms for space and moment domain decomposition. The computational load is distributed on different processors, using a mapping function, which maps the 3-D Cartesian space and moments onto processors. The code is written in Fortran 90 using the Message Passing Interface (MPI) libraries for the parallel implementation of the algorithm. The code has been tested on the Pcpen cluster and the parallel performance has been assessed in terms of speed-up and parallel efficiency. (author)
Lessons learned: program messaging in gender-transformative work with men and boys in South Africa

Science.gov (United States)

Viitanen, Amanda P.; Colvin, Christopher J.

2015-01-01

Background Adherence to traditional notions of masculinity has been identified as an important driver in the perpetuation of numerous health and social problems, including gender-based violence and HIV. With the largest generalized HIV epidemic in the world and high rates of violence against women, the need for gender-transformative work in South Africa is broadly accepted in activist circles and at the national and community level. Because of the integral role men play in both of these epidemics, initiatives and strategies that engage men in promoting gender equality have emerged over the last decade and the evidence base supporting the effectiveness of masculinities-based interventions is growing. However, little research exists on men's receptivity to the messages delivered in these programs. Objective This article examines the current practices among a set of gender-transformation initiatives in South Africa to see what lessons can be derived from them. We look at how South African men participating in these programs responded to three thematic messages frequently found in gender-transformative work: 1) the ‘costs of masculinity’ men pay for adherence to harmful gender constructs; 2) multiple forms of masculinity; and 3) the human rights framework and contested rights. Design This article synthesizes qualitative findings from in-depth interviews, focus group discussions, and ethnographic research with men participating in several gender- and health-intervention programs in South Africa. The data were collected between 2007 and 2011 and synthesized using some of the basic principles of meta-ethnography. Results and conclusions Overall, men were receptive to the three thematic messages reviewed; they were able to see them in the context of their own lives and the messages facilitated rich dialog among participants. However, some men were more ambivalent toward shifting gender notions and some even adamantly resisted engaging in discussions over gender
Lessons learned: program messaging in gender-transformative work with men and boys in South Africa.

Science.gov (United States)

Viitanen, Amanda P; Colvin, Christopher J

2015-01-01

Adherence to traditional notions of masculinity has been identified as an important driver in the perpetuation of numerous health and social problems, including gender-based violence and HIV. With the largest generalized HIV epidemic in the world and high rates of violence against women, the need for gender-transformative work in South Africa is broadly accepted in activist circles and at the national and community level. Because of the integral role men play in both of these epidemics, initiatives and strategies that engage men in promoting gender equality have emerged over the last decade and the evidence base supporting the effectiveness of masculinities-based interventions is growing. However, little research exists on men's receptivity to the messages delivered in these programs. This article examines the current practices among a set of gender-transformation initiatives in South Africa to see what lessons can be derived from them. We look at how South African men participating in these programs responded to three thematic messages frequently found in gender-transformative work: 1) the 'costs of masculinity' men pay for adherence to harmful gender constructs; 2) multiple forms of masculinity; and 3) the human rights framework and contested rights. This article synthesizes qualitative findings from in-depth interviews, focus group discussions, and ethnographic research with men participating in several gender- and health-intervention programs in South Africa. The data were collected between 2007 and 2011 and synthesized using some of the basic principles of meta-ethnography. Overall, men were receptive to the three thematic messages reviewed; they were able to see them in the context of their own lives and the messages facilitated rich dialog among participants. However, some men were more ambivalent toward shifting gender notions and some even adamantly resisted engaging in discussions over gender equality. More research is needed to gauge the long-term impact
Scientific programming on massively parallel processor CP-PACS

International Nuclear Information System (INIS)

Boku, Taisuke

1998-01-01

The massively parallel processor CP-PACS takes various problems of calculation physics as the object, and it has been designed so that its architecture has been devised to do various numerical processings. In this report, the outline of the CP-PACS and the example of programming in the Kernel CG benchmark in NAS Parallel Benchmarks, version 1, are shown, and the pseudo vector processing mechanism and the parallel processing tuning of scientific and technical computation utilizing the three-dimensional hyper crossbar net, which are two great features of the architecture of the CP-PACS are described. As for the CP-PACS, the PUs based on RISC processor and added with pseudo vector processor are used. Pseudo vector processing is realized as the loop processing by scalar command. The features of the connection net of PUs are explained. The algorithm of the NPB version 1 Kernel CG is shown. The part that takes the time for processing most in the main loop is the product of matrix and vector (matvec), and the parallel processing of the matvec is explained. The time for the computation by the CPU is determined. As the evaluation of the performance, the evaluation of the time for execution, the short vector processing of pseudo vector processor based on slide window, and the comparison with other parallel computers are reported. (K.I.)
F-Nets and Software Cabling: Deriving a Formal Model and Language for Portable Parallel Programming

Science.gov (United States)

DiNucci, David C.; Saini, Subhash (Technical Monitor)

1998-01-01

Parallel programming is still being based upon antiquated sequence-based definitions of the terms "algorithm" and "computation", resulting in programs which are architecture dependent and difficult to design and analyze. By focusing on obstacles inherent in existing practice, a more portable model is derived here, which is then formalized into a model called Soviets which utilizes a combination of imperative and functional styles. This formalization suggests more general notions of algorithm and computation, as well as insights into the meaning of structured programming in a parallel setting. To illustrate how these principles can be applied, a very-high-level graphical architecture-independent parallel language, called Software Cabling, is described, with many of the features normally expected from today's computer languages (e.g. data abstraction, data parallelism, and object-based programming constructs).
A Parallel Supercomputer Implementation of a Biological Inspired Neural Network and its use for Pattern Recognition

International Nuclear Information System (INIS)

De Ladurantaye, Vincent; Lavoie, Jean; Bergeron, Jocelyn; Parenteau, Maxime; Lu Huizhong; Pichevar, Ramin; Rouat, Jean

2012-01-01

A parallel implementation of a large spiking neural network is proposed and evaluated. The neural network implements the binding by synchrony process using the Oscillatory Dynamic Link Matcher (ODLM). Scalability, speed and performance are compared for 2 implementations: Message Passing Interface (MPI) and Compute Unified Device Architecture (CUDA) running on clusters of multicore supercomputers and NVIDIA graphical processing units respectively. A global spiking list that represents at each instant the state of the neural network is described. This list indexes each neuron that fires during the current simulation time so that the influence of their spikes are simultaneously processed on all computing units. Our implementation shows a good scalability for very large networks. A complex and large spiking neural network has been implemented in parallel with success, thus paving the road towards real-life applications based on networks of spiking neurons. MPI offers a better scalability than CUDA, while the CUDA implementation on a GeForce GTX 285 gives the best cost to performance ratio. When running the neural network on the GTX 285, the processing speed is comparable to the MPI implementation on RQCHP's Mammouth parallel with 64 notes (128 cores).
Evolution of a minimal parallel programming model

International Nuclear Information System (INIS)

Lusk, Ewing; Butler, Ralph; Pieper, Steven C.

2017-01-01

Here, we take a historical approach to our presentation of self-scheduled task parallelism, a programming model with its origins in early irregular and nondeterministic computations encountered in automated theorem proving and logic programming. We show how an extremely simple task model has evolved into a system, asynchronous dynamic load balancing (ADLB), and a scalable implementation capable of supporting sophisticated applications on today’s (and tomorrow’s) largest supercomputers; and we illustrate the use of ADLB with a Green’s function Monte Carlo application, a modern, mature nuclear physics code in production use. Our lesson is that by surrendering a certain amount of generality and thus applicability, a minimal programming model (in terms of its basic concepts and the size of its application programmer interface) can achieve extreme scalability without introducing complexity.
An Approach to Ad-hoc Messaging Networks Using Time Shifted Propagation

Directory of Open Access Journals (Sweden)

Christoph Fuchß

2007-10-01

Full Text Available Many communication devices, like mobile phones and PDAs, are enabled for near field communication by using Bluetooth. Many approaches dealt so far with the attempt to transfer mobile ad-hoc networks (MANET to the mechanism of the “fixed internet” to mobile networks. In order to achieve liability and robustness of common TCP connections routing algorithm in near field communication based networks become more sophisticated and complex. These mechanisms often do not reflect on the application’s particularities.Our approach of an ad-hoc messaging network (AMNET uses simple store-and-forward message passing to spread data asynchronously. We do not aim at the reliability of common internet networks but focus on application specific needs that can be covered by simple message passing mechanism. In this paper we will portray a powerful network by using simple devices and communication protocols on the basis of AMNETs. Simulation results of our AMNET approach provide insights towards speeding up the network setup process and to enable the use of AMNETs even with few participants by introducing a hybrid structure of infrastructure and mobile nodes.
Distributed parallel cooperative coevolutionary multi-objective large-scale immune algorithm for deployment of wireless sensor networks

DEFF Research Database (Denmark)

Cao, Bin; Zhao, Jianwei; Yang, Po

2018-01-01

-objective evolutionary algorithms the Cooperative Coevolutionary Generalized Differential Evolution 3, the Cooperative Multi-objective Differential Evolution and the Nondominated Sorting Genetic Algorithm III, the proposed algorithm addresses the deployment optimization problem efficiently and effectively.......Using immune algorithms is generally a time-intensive process especially for problems with a large number of variables. In this paper, we propose a distributed parallel cooperative coevolutionary multi-objective large-scale immune algorithm that is implemented using the message passing interface...... (MPI). The proposed algorithm is composed of three layers: objective, group and individual layers. First, for each objective in the multi-objective problem to be addressed, a subpopulation is used for optimization, and an archive population is used to optimize all the objectives. Second, the large...
Parallelization characteristics of the DeCART code

International Nuclear Information System (INIS)

Cho, J. Y.; Joo, H. G.; Kim, H. Y.; Lee, C. C.; Chang, M. H.; Zee, S. Q.

2003-12-01

This report is to describe the parallelization characteristics of the DeCART code and also examine its parallel performance. Parallel computing algorithms are implemented to DeCART to reduce the tremendous computational burden and memory requirement involved in the three-dimensional whole core transport calculation. In the parallelization of the DeCART code, the axial domain decomposition is first realized by using MPI (Message Passing Interface), and then the azimuthal angle domain decomposition by using either MPI or OpenMP. When using the MPI for both the axial and the angle domain decomposition, the concept of MPI grouping is employed for convenient communication in each communication world. For the parallel computation, most of all the computing modules except for the thermal hydraulic module are parallelized. These parallelized computing modules include the MOC ray tracing, CMFD, NEM, region-wise cross section preparation and cell homogenization modules. For the distributed allocation, most of all the MOC and CMFD/NEM variables are allocated only for the assigned planes, which reduces the required memory by a ratio of the number of the assigned planes to the number of all planes. The parallel performance of the DeCART code is evaluated by solving two problems, a rodded variation of the C5G7 MOX three-dimensional benchmark problem and a simplified three-dimensional SMART PWR core problem. In the aspect of parallel performance, the DeCART code shows a good speedup of about 40.1 and 22.4 in the ray tracing module and about 37.3 and 20.2 in the total computing time when using 48 CPUs on the IBM Regatta and 24 CPUs on the LINUX cluster, respectively. In the comparison between the MPI and OpenMP, OpenMP shows a somewhat better performance than MPI. Therefore, it is concluded that the first priority in the parallel computation of the DeCART code is in the axial domain decomposition by using MPI, and then in the angular domain using OpenMP, and finally the angular
Ordering schemes for sparse matrices using modern programming paradigms

International Nuclear Information System (INIS)

Oliker, Leonid; Li, Xiaoye; Husbands, Parry; Biswas, Rupak

2000-01-01

The Conjugate Gradient (CG) algorithm is perhaps the best-known iterative technique to solve sparse linear systems that are symmetric and positive definite. In previous work, we investigated the effects of various ordering and partitioning strategies on the performance of CG using different programming paradigms and architectures. This paper makes several extensions to our prior research. First, we present a hybrid(MPI+OpenMP) implementation of the CG algorithm on the IBM SP and show that the hybrid paradigm increases programming complexity with little performance gains compared to a pure MPI implementation. For ill-conditioned linear systems, it is often necessary to use a preconditioning technique. We present MPI results for ILU(0) preconditioned CG (PCG) using the BlockSolve95 library, and show that the initial ordering of the input matrix dramatically affect PCG's performance. Finally, a multithreaded version of the PCG is developed on the Cray (Tera) MTA. Unlike the message-passing version, this implementation did not require the complexities of special orderings or graph dependency analysis. However, only limited scalability was achieved due to the lack of available thread level parallelism
Increasing available FIFO space to prevent messaging queue deadlocks in a DMA environment

Science.gov (United States)

Blocksome, Michael A [Rochester, MN; Chen, Dong [Croton On Hudson, NY; Gooding, Thomas [Rochester, MN; Heidelberger, Philip [Cortlandt Manor, NY; Parker, Jeff [Rochester, MN

2012-02-07

Embodiments of the invention may be used to manage message queues in a parallel computing environment to prevent message queue deadlock. A direct memory access controller of a compute node may determine when a messaging queue is full. In response, the DMA may generate an interrupt. An interrupt handler may stop the DMA and swap all descriptors from the full messaging queue into a larger queue (or enlarge the original queue). The interrupt handler then restarts the DMA. Alternatively, the interrupt handler stops the DMA, allocates a memory block to hold queue data, and then moves descriptors from the full messaging queue into the allocated memory block. The interrupt handler then restarts the DMA. During a normal messaging advance cycle, a messaging manager attempts to inject the descriptors in the memory block into other messaging queues until the descriptors have all been processed.
Iterated Process Analysis over Lattice-Valued Regular Expressions

DEFF Research Database (Denmark)

Midtgaard, Jan; Nielson, Flemming; Nielson, Hanne Riis

2016-01-01

We present an iterated approach to statically analyze programs of two processes communicating by message passing. Our analysis operates over a domain of lattice-valued regular expressions, and computes increasingly better approximations of each process's communication behavior. Overall the work e...... extends traditional semantics-based program analysis techniques to automatically reason about message passing in a manner that can simultaneously analyze both values of variables as well as message order, message content, and their interdependencies.......We present an iterated approach to statically analyze programs of two processes communicating by message passing. Our analysis operates over a domain of lattice-valued regular expressions, and computes increasingly better approximations of each process's communication behavior. Overall the work...

Computationally efficient implementation of combustion chemistry in parallel PDF calculations

International Nuclear Information System (INIS)

Lu Liuyan; Lantz, Steven R.; Ren Zhuyin; Pope, Stephen B.

2009-01-01

In parallel calculations of combustion processes with realistic chemistry, the serial in situ adaptive tabulation (ISAT) algorithm [S.B. Pope, Computationally efficient implementation of combustion chemistry using in situ adaptive tabulation, Combustion Theory and Modelling, 1 (1997) 41-63; L. Lu, S.B. Pope, An improved algorithm for in situ adaptive tabulation, Journal of Computational Physics 228 (2009) 361-386] substantially speeds up the chemistry calculations on each processor. To improve the parallel efficiency of large ensembles of such calculations in parallel computations, in this work, the ISAT algorithm is extended to the multi-processor environment, with the aim of minimizing the wall clock time required for the whole ensemble. Parallel ISAT strategies are developed by combining the existing serial ISAT algorithm with different distribution strategies, namely purely local processing (PLP), uniformly random distribution (URAN), and preferential distribution (PREF). The distribution strategies enable the queued load redistribution of chemistry calculations among processors using message passing. They are implemented in the software x2f m pi, which is a Fortran 95 library for facilitating many parallel evaluations of a general vector function. The relative performance of the parallel ISAT strategies is investigated in different computational regimes via the PDF calculations of multiple partially stirred reactors burning methane/air mixtures. The results show that the performance of ISAT with a fixed distribution strategy strongly depends on certain computational regimes, based on how much memory is available and how much overlap exists between tabulated information on different processors. No one fixed strategy consistently achieves good performance in all the regimes. Therefore, an adaptive distribution strategy, which blends PLP, URAN and PREF, is devised and implemented. It yields consistently good performance in all regimes. In the adaptive parallel
Effectiveness of mobile phone messaging in prevention of type 2 diabetes by lifestyle modification in men in India: a prospective, parallel-group, randomised controlled trial.

Science.gov (United States)

Ramachandran, Ambady; Snehalatha, Chamukuttan; Ram, Jagannathan; Selvam, Sundaram; Simon, Mary; Nanditha, Arun; Shetty, Ananth Samith; Godsland, Ian F; Chaturvedi, Nish; Majeed, Azeem; Oliver, Nick; Toumazou, Christofer; Alberti, K George; Johnston, Desmond G

2013-11-01

Type 2 diabetes can often be prevented by lifestyle modification; however, successful lifestyle intervention programmes are labour intensive. Mobile phone messaging is an inexpensive alternative way to deliver educational and motivational advice about lifestyle modification. We aimed to assess whether mobile phone messaging that encouraged lifestyle change could reduce incident type 2 diabetes in Indian Asian men with impaired glucose tolerance. We did a prospective, parallel-group, randomised controlled trial between Aug 10, 2009, and Nov 30, 2012, at ten sites in southeast India. Working Indian men (aged 35-55 years) with impaired glucose tolerance were randomly assigned (1:1) with a computer-generated randomisation sequence to a mobile phone messaging intervention or standard care (control group). Participants in the intervention group received frequent mobile phone messages compared with controls who received standard lifestyle modification advice at baseline only. Field staff and participants were, by necessity, not masked to study group assignment, but allocation was concealed from laboratory personnel as well as principal and co-investigators. The primary outcome was incidence of type 2 diabetes, analysed by intention to treat. This trial is registered with ClinicalTrials.gov, number NCT00819455. We assessed 8741 participants for eligibility. 537 patients were randomly assigned to either the mobile phone messaging intervention (n=271) or standard care (n=266). The cumulative incidence of type 2 diabetes was lower in those who received mobile phone messages than in controls: 50 (18%) participants in the intervention group developed type 2 diabetes compared with 73 (27%) in the control group (hazard ratio 0·64, 95% CI 0·45-0·92; p=0·015). The number needed to treat to prevent one case of type 2 diabetes was 11 (95% CI 6-55). One patient in the control group died suddenly at the end of the first year. We recorded no other serious adverse events. Mobile
Towards Interactive Visual Exploration of Parallel Programs using a Domain-Specific Language

KAUST Repository

Klein, Tobias; Bruckner, Stefan; Grö ller, M. Eduard; Hadwiger, Markus; Rautek, Peter

2016-01-01

The use of GPUs and the massively parallel computing paradigm have become wide-spread. We describe a framework for the interactive visualization and visual analysis of the run-time behavior of massively parallel programs, especially OpenCL kernels. This facilitates understanding a program's function and structure, finding the causes of possible slowdowns, locating program bugs, and interactively exploring and visually comparing different code variants in order to improve performance and correctness. Our approach enables very specific, user-centered analysis, both in terms of the recording of the run-time behavior and the visualization itself. Instead of having to manually write instrumented code to record data, simple code annotations tell the source-to-source compiler which code instrumentation to generate automatically. The visualization part of our framework then enables the interactive analysis of kernel run-time behavior in a way that can be very specific to a particular problem or optimization goal, such as analyzing the causes of memory bank conflicts or understanding an entire parallel algorithm.
A backtracking algorithm for the stream AND-parallel execution of logic programs

Energy Technology Data Exchange (ETDEWEB)

Somogyi, Z.; Ramamohanarao, K.; Vaghani, J. (Univ. of Melbourne, Parkville (Australia))

1988-06-01

The authors present the first backtracking algorithm for stream AND-parallel logic programs. It relies on compile-time knowledge of the data flow graph of each clause to let it figure out efficiently which goals to kill or restart when a goal fails. This crucial information, which they derive from mode declarations, was not available at compile-time in any previous stream AND-parallel system. They show that modes can increase the precision of the backtracking algorithm, though their algorithm allows this precision to be traded off against overhead on a procedure-by-procedure and call-by-call basis. The modes also allow their algorithm to handle efficiently programs that manipulate partially instantiated data structures and an important class of programs with circular dependency graphs. On code that does not need backtracking, the efficiency of their algorithm approaches that of the committed-choice languages; on code that does need backtracking its overhead is comparable to that of the independent AND-parallel backtracking algorithms.
MINUIT package parallelization and applications using the RooFit package

International Nuclear Information System (INIS)

Lazzaro, Alfio; Moneta, Lorenzo

2010-01-01

The fitting procedures are based on numerical minimization of functions. The MINUIT package is the most common package used for such procedures in High Energy Physics community. The main algorithm in this package, MIGRAD, searches the minimum of a function using the gradient information. For each minimization iteration, MIGRAD requires the calculation of the derivative for each free parameter of the function to be minimized. Minimization is required for data analysis problems based on the maximum likelihood technique. The calculation of complex likelihood functions, with several free parameters, many independent variables and large data samples, can be very CPU-time consuming. Then, the minimization process requires the calculation of the likelihood functions several times for each minimization iteration. In this paper we will show how MIGRAD algorithm and the likelihood function calculation can be easily parallelized using Message Passing Interface techniques. We will present the speed-up improvements obtained in typical physics applications such as complex maximum likelihood fits using the RooFit package.
The Automatic Parallelisation of Scientific Application Codes Using a Computer Aided Parallelisation Toolkit

Science.gov (United States)

Ierotheou, C.; Johnson, S.; Leggett, P.; Cross, M.; Evans, E.; Jin, Hao-Qiang; Frumkin, M.; Yan, J.; Biegel, Bryan (Technical Monitor)

2001-01-01

The shared-memory programming model is a very effective way to achieve parallelism on shared memory parallel computers. Historically, the lack of a programming standard for using directives and the rather limited performance due to scalability have affected the take-up of this programming model approach. Significant progress has been made in hardware and software technologies, as a result the performance of parallel programs with compiler directives has also made improvements. The introduction of an industrial standard for shared-memory programming with directives, OpenMP, has also addressed the issue of portability. In this study, we have extended the computer aided parallelization toolkit (developed at the University of Greenwich), to automatically generate OpenMP based parallel programs with nominal user assistance. We outline the way in which loop types are categorized and how efficient OpenMP directives can be defined and placed using the in-depth interprocedural analysis that is carried out by the toolkit. We also discuss the application of the toolkit on the NAS Parallel Benchmarks and a number of real-world application codes. This work not only demonstrates the great potential of using the toolkit to quickly parallelize serial programs but also the good performance achievable on up to 300 processors for hybrid message passing and directive-based parallelizations.
Lemon : An MPI parallel I/O library for data encapsulation using LIME

NARCIS (Netherlands)

Deuzeman, Albert; Reker, Siebren; Urbach, Carsten

We introduce Lemon, an MPI parallel I/O library that provides efficient parallel I/O of both binary and metadata on massively parallel architectures. Motivated by the demands of the lattice Quantum Chromodynamics community, the data is stored in the SciDAC Lattice QCD Interchange Message
Endpoint-based parallel data processing with non-blocking collective instructions in a parallel active messaging interface of a parallel computer

Science.gov (United States)

Archer, Charles J; Blocksome, Michael A; Cernohous, Bob R; Ratterman, Joseph D; Smith, Brian E

2014-11-11

Endpoint-based parallel data processing with non-blocking collective instructions in a PAMI of a parallel computer is disclosed. The PAMI is composed of data communications endpoints, each including a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task. The compute nodes are coupled for data communications through the PAMI. The parallel application establishes a data communications geometry specifying a set of endpoints that are used in collective operations of the PAMI by associating with the geometry a list of collective algorithms valid for use with the endpoints of the geometry; registering in each endpoint in the geometry a dispatch callback function for a collective operation; and executing without blocking, through a single one of the endpoints in the geometry, an instruction for the collective operation.
Public's Cognitive and Emotional Responses to Nuclear Messages: Implications for Effective Nuclear Communication Programs

International Nuclear Information System (INIS)

Kim, Hyo Jung

2017-01-01

The public debate over the use of nuclear energy is not limited to the area of technology, and has become subject to the public's subjective perceptions and emotions regarding the issue. This study empirically demonstrated the advantage of loss framing in improving public's favorable responses toward nuclear energy messages. Such framing effect was found to be moderated by individuals' daily use of online news. The findings of this study suggest that public's cognitive and emotional responses toward nuclear messages should be carefully considered when planning effective nuclear communication program.
A PARALLEL MONTE CARLO CODE FOR SIMULATING COLLISIONAL N-BODY SYSTEMS

International Nuclear Information System (INIS)

Pattabiraman, Bharath; Umbreit, Stefan; Liao, Wei-keng; Choudhary, Alok; Kalogera, Vassiliki; Memik, Gokhan; Rasio, Frederic A.

2013-01-01

We present a new parallel code for computing the dynamical evolution of collisional N-body systems with up to N ∼ 10 7 particles. Our code is based on the Hénon Monte Carlo method for solving the Fokker-Planck equation, and makes assumptions of spherical symmetry and dynamical equilibrium. The principal algorithmic developments involve optimizing data structures and the introduction of a parallel random number generation scheme as well as a parallel sorting algorithm required to find nearest neighbors for interactions and to compute the gravitational potential. The new algorithms we introduce along with our choice of decomposition scheme minimize communication costs and ensure optimal distribution of data and workload among the processing units. Our implementation uses the Message Passing Interface library for communication, which makes it portable to many different supercomputing architectures. We validate the code by calculating the evolution of clusters with initial Plummer distribution functions up to core collapse with the number of stars, N, spanning three orders of magnitude from 10 5 to 10 7 . We find that our results are in good agreement with self-similar core-collapse solutions, and the core-collapse times generally agree with expectations from the literature. Also, we observe good total energy conservation, within ∼ 5 , 128 for N = 10 6 and 256 for N = 10 7 . The runtime reaches saturation with the addition of processors beyond these limits, which is a characteristic of the parallel sorting algorithm. The resulting maximum speedups we achieve are approximately 60×, 100×, and 220×, respectively.
Feedback Driven Annotation and Refactoring of Parallel Programs

DEFF Research Database (Denmark)

Larsen, Per

and communication in embedded programs. Runtime checks are developed to ensure that annotations correctly describe observable program behavior. The performance impact of runtime checking is evaluated on several benchmark kernels and is negligible in all cases. The second aspect is compilation feedback. Annotations...... are not effective unless programmers are told how and when they are benecial. A prototype compilation feedback system was developed in collaboration with IBM Haifa Research Labs. It reports issues that prevent further analysis to the programmer. Performance evaluation shows that three programs performes signicantly......This thesis combines programmer knowledge and feedback to improve modeling and optimization of software. The research is motivated by two observations. First, there is a great need for automatic analysis of software for embedded systems - to expose and model parallelism inherent in programs. Second...
Comparing tailored and untailored text messages for smoking cessation

DEFF Research Database (Denmark)

Skov-Ettrup, L S; Ringgaard, L W; Dalum, P

2014-01-01

The aim was to compare the effectiveness of untailored text messages for smoking cessation to tailored text messages delivered at a higher frequency. From February 2007 to August 2009, 2030 users of an internet-based smoking cessation program with optional text message support aged 15-25 years were...... of text messages increases quit rates among young smokers....
Chaos-pass filtering in injection-locked semiconductor lasers

International Nuclear Information System (INIS)

Murakami, Atsushi; Shore, K. Alan

2005-01-01

Chaos-pass filtering (CPF) of semiconductor lasers has been studied theoretically. CPF is a phenomenon which occurs in laser chaos synchronization by injection locking and is a fundamental technique for the extraction of messages at the receiver laser in chaotic communications systems. We employ a simple theory based on driven damped oscillators to clarify the physical background of CPF. The receiver laser is optically driven by injection from the transmitter laser. We have numerically investigated the response characteristics of the receiver when it is driven by periodic (message) and chaotic (carrier) signals. It is thereby revealed that the response of the receiver laser in the two cases is quite different. For the periodic drive, the receiver exhibits a response depending on the signal frequency, while the chaotic drive provides a frequency-independent synchronous response to the receiver laser. We verify that the periodic and chaotic drives occur independently in the CPF response, and, consequently, CPF can be clearly understood in the difference of the two drives. Message extraction using CPF is also examined, and the validity of our theoretical explanation for the physical mechanism underlying CPF is thus verified
Increasing the Operational Value of Event Messages

Science.gov (United States)

Li, Zhenping; Savkli, Cetin; Smith, Dan

2003-01-01

Assessing the health of a space mission has traditionally been performed using telemetry analysis tools. Parameter values are compared to known operational limits and are plotted over various time periods. This presentation begins with the notion that there is an incredible amount of untapped information contained within the mission s event message logs. Through creative advancements in message handling tools, the event message logs can be used to better assess spacecraft and ground system status and to highlight and report on conditions not readily apparent when messages are evaluated one-at-a-time during a real-time pass. Work in this area is being funded as part of a larger NASA effort at the Goddard Space Flight Center to create component-based, middleware-based, standards-based general purpose ground system architecture referred to as GMSEC - the GSFC Mission Services Evolution Center. The new capabilities and operational concepts for event display, event data analyses and data mining are being developed by Lockheed Martin and the new subsystem has been named GREAT - the GMSEC Reusable Event Analysis Toolkit. Planned for use on existing and future missions, GREAT has the potential to increase operational efficiency in areas of problem detection and analysis, general status reporting, and real-time situational awareness.
An iterative algorithm for solving the multidimensional neutron diffusion nodal method equations on parallel computers

International Nuclear Information System (INIS)

Kirk, B.L.; Azmy, Y.Y.

1992-01-01

In this paper the one-group, steady-state neutron diffusion equation in two-dimensional Cartesian geometry is solved using the nodal integral method. The discrete variable equations comprise loosely coupled sets of equations representing the nodal balance of neutrons, as well as neutron current continuity along rows or columns of computational cells. An iterative algorithm that is more suitable for solving large problems concurrently is derived based on the decomposition of the spatial domain and is accelerated using successive overrelaxation. This algorithm is very well suited for parallel computers, especially since the spatial domain decomposition occurs naturally, so that the number of iterations required for convergence does not depend on the number of processors participating in the calculation. Implementation of the authors' algorithm on the Intel iPSC/2 hypercube and Sequent Balance 8000 parallel computer is presented, and measured speedup and efficiency for test problems are reported. The results suggest that the efficiency of the hypercube quickly deteriorates when many processors are used, while the Sequent Balance retains very high efficiency for a comparable number of participating processors. This leads to the conjecture that message-passing parallel computers are not as well suited for this algorithm as shared-memory machines
A massively parallel algorithm for the collision probability calculations in the Apollo-II code using the PVM library

International Nuclear Information System (INIS)

Stankovski, Z.

1995-01-01

The collision probability method in neutron transport, as applied to 2D geometries, consume a great amount of computer time, for a typical 2D assembly calculation about 90% of the computing time is consumed in the collision probability evaluations. Consequently RZ or 3D calculations became prohibitive. In this paper the author presents a simple but efficient parallel algorithm based on the message passing host/node programmation model. Parallelization was applied to the energy group treatment. Such approach permits parallelization of the existing code, requiring only limited modifications. Sequential/parallel computer portability is preserved, which is a necessary condition for a industrial code. Sequential performances are also preserved. The algorithm is implemented on a CRAY 90 coupled to a 128 processor T3D computer, a 16 processor IBM SPI and a network of workstations, using the Public Domain PVM library. The tests were executed for a 2D geometry with the standard 99-group library. All results were very satisfactory, the best ones with IBM SPI. Because of heterogeneity of the workstation network, the author did not ask high performances for this architecture. The same source code was used for all computers. A more impressive advantage of this algorithm will appear in the calculations of the SAPHYR project (with the future fine multigroup library of about 8000 groups) with a massively parallel computer, using several hundreds of processors
Process-Oriented Parallel Programming with an Application to Data-Intensive Computing

OpenAIRE

Givelberg, Edward

2014-01-01

We introduce process-oriented programming as a natural extension of object-oriented programming for parallel computing. It is based on the observation that every class of an object-oriented language can be instantiated as a process, accessible via a remote pointer. The introduction of process pointers requires no syntax extension, identifies processes with programming objects, and enables processes to exchange information simply by executing remote methods. Process-oriented programming is a h...
The Relation of Source Credibility and Message Frequency to Program Evaluation and Self-Confidence of Students in a Job Shadowing Program

Science.gov (United States)

Linnehan, Frank

2004-01-01

Using a pre- and post-test design, this study examined the relation of an adult's credibility and message frequency to the beliefs of female high school students participating in a job-shadowing program. Hypotheses were based on the Elaboration Likelihood Model of attitude formation and change. Findings indicate that credibility of the adult…
Efficient parallel implicit methods for rotary-wing aerodynamics calculations

Science.gov (United States)

Wissink, Andrew M.

Euler/Navier-Stokes Computational Fluid Dynamics (CFD) methods are commonly used for prediction of the aerodynamics and aeroacoustics of modern rotary-wing aircraft. However, their widespread application to large complex problems is limited lack of adequate computing power. Parallel processing offers the potential for dramatic increases in computing power, but most conventional implicit solution methods are inefficient in parallel and new techniques must be adopted to realize its potential. This work proposes alternative implicit schemes for Euler/Navier-Stokes rotary-wing calculations which are robust and efficient in parallel. The first part of this work proposes an efficient parallelizable modification of the Lower Upper-Symmetric Gauss Seidel (LU-SGS) implicit operator used in the well-known Transonic Unsteady Rotor Navier Stokes (TURNS) code. The new hybrid LU-SGS scheme couples a point-relaxation approach of the Data Parallel-Lower Upper Relaxation (DP-LUR) algorithm for inter-processor communication with the Symmetric Gauss Seidel algorithm of LU-SGS for on-processor computations. With the modified operator, TURNS is implemented in parallel using Message Passing Interface (MPI) for communication. Numerical performance and parallel efficiency are evaluated on the IBM SP2 and Thinking Machines CM-5 multi-processors for a variety of steady-state and unsteady test cases. The hybrid LU-SGS scheme maintains the numerical performance of the original LU-SGS algorithm in all cases and shows a good degree of parallel efficiency. It experiences a higher degree of robustness than DP-LUR for third-order upwind solutions. The second part of this work examines use of Krylov subspace iterative solvers for the nonlinear CFD solutions. The hybrid LU-SGS scheme is used as a parallelizable preconditioner. Two iterative methods are tested, Generalized Minimum Residual (GMRES) and Orthogonal s-Step Generalized Conjugate Residual (OSGCR). The Newton method demonstrates good
Study of MPI based on parallel MOM on PC clusters for EM-beam scattering by 2-D PEC rough surfaces

International Nuclear Information System (INIS)

Jun, Ma; Li-Xin, Guo; An-Qi, Wang

2009-01-01

This paper firstly applies the finite impulse response filter (FIR) theory combined with the fast Fourier transform (FFT) method to generate two-dimensional Gaussian rough surface. Using the electric field integral equation (EFIE), it introduces the method of moment (MOM) with RWG vector basis function and Galerkin's method to investigate the electromagnetic beam scattering by a two-dimensional PEC Gaussian rough surface on personal computer (PC) clusters. The details of the parallel conjugate gradient method (CGM) for solving the matrix equation are also presented and the numerical simulations are obtained through the message passing interface (MPI) platform on the PC clusters. It finds significantly that the parallel MOM supplies a novel technique for solving a two-dimensional rough surface electromagnetic-scattering problem. The influences of the root-mean-square height, the correlation length and the polarization on the beam scattering characteristics by two-dimensional PEC Gaussian rough surfaces are finally discussed. (classical areas of phenomenology)

Effect of Message Format and Content on Attitude Accessibility Regarding Sexually Transmitted Infections.

Science.gov (United States)

Jain, Parul; Hoffman, Eric; Beam, Michael; Xu, Shan Susan

2017-11-01

Sexually transmitted infections (STIs) are widespread in the United States among people ages 15-24 years and cost almost $16 billion yearly. It is therefore important to understand message design strategies that could help reduce these numbers. Guided by exemplification theory and the extended parallel process model (EPPM), this study examines the influence of message format and the presence versus absence of a graphic image on recipients' accessibility of STI attitudes regarding safe sex. Results of the experiment indicate a significant effect from testimonial messages on increased attitude accessibility regarding STIs compared to statistical messages. Results also indicate a conditional indirect effect of testimonial messages on STI attitude accessibility, though threat is greater when a graphic image is included. Implications and directions for future research are discussed.
Stepwise Development of a Text Messaging-Based Bullying Prevention Program for Middle School Students (BullyDown)

OpenAIRE

Ybarra, Michele L; Prescott, Tonya L; Espelage, Dorothy L

2016-01-01

Background Bullying is a significant public health issue among middle school-aged youth. Current prevention programs have only a moderate impact. Cell phone text messaging technology (mHealth) can potentially overcome existing challenges, particularly those that are structural (e.g., limited time that teachers can devote to non-educational topics). To date, the description of the development of empirically-based mHealth-delivered bullying prevention programs are lacking in the literature. Obj...
Academic training: From Evolution Theory to Parallel and Distributed Genetic Programming

CERN Multimedia

2007-01-01

2006-2007 ACADEMIC TRAINING PROGRAMME LECTURE SERIES 15, 16 March From 11:00 to 12:00 - Main Auditorium, bldg. 500 From Evolution Theory to Parallel and Distributed Genetic Programming F. FERNANDEZ DE VEGA / Univ. of Extremadura, SP Lecture No. 1: From Evolution Theory to Evolutionary Computation Evolutionary computation is a subfield of artificial intelligence (more particularly computational intelligence) involving combinatorial optimization problems, which are based to some degree on the evolution of biological life in the natural world. In this tutorial we will review the source of inspiration for this metaheuristic and its capability for solving problems. We will show the main flavours within the field, and different problems that have been successfully solved employing this kind of techniques. Lecture No. 2: Parallel and Distributed Genetic Programming The successful application of Genetic Programming (GP, one of the available Evolutionary Algorithms) to optimization problems has encouraged an ...
Many-core technologies: The move to energy-efficient, high-throughput x86 computing (TFLOPS on a chip)

CERN Multimedia

CERN. Geneva

2012-01-01

With Moore's Law alive and well, more and more parallelism is introduced into all computing platforms at all levels of integration and programming to achieve higher performance and energy efficiency. Especially in the area of High-Performance Computing (HPC) users can entertain a combination of different hardware and software parallel architectures and programming environments. Those technologies range from vectorization and SIMD computation over shared memory multi-threading (e.g. OpenMP) to distributed memory message passing (e.g. MPI) on cluster systems. We will discuss HPC industry trends and Intel's approach to it from processor/system architectures and research activities to hardware and software tools technologies. This includes the recently announced new Intel(r) Many Integrated Core (MIC) architecture for highly-parallel workloads and general purpose, energy efficient TFLOPS performance, some of its architectural features and its programming environment. At the end we will have a br...
OpenSWPC: an open-source integrated parallel simulation code for modeling seismic wave propagation in 3D heterogeneous viscoelastic media

Science.gov (United States)

Maeda, Takuto; Takemura, Shunsuke; Furumura, Takashi

2017-07-01

We have developed an open-source software package, Open-source Seismic Wave Propagation Code (OpenSWPC), for parallel numerical simulations of seismic wave propagation in 3D and 2D (P-SV and SH) viscoelastic media based on the finite difference method in local-to-regional scales. This code is equipped with a frequency-independent attenuation model based on the generalized Zener body and an efficient perfectly matched layer for absorbing boundary condition. A hybrid-style programming using OpenMP and the Message Passing Interface (MPI) is adopted for efficient parallel computation. OpenSWPC has wide applicability for seismological studies and great portability to allowing excellent performance from PC clusters to supercomputers. Without modifying the code, users can conduct seismic wave propagation simulations using their own velocity structure models and the necessary source representations by specifying them in an input parameter file. The code has various modes for different types of velocity structure model input and different source representations such as single force, moment tensor and plane-wave incidence, which can easily be selected via the input parameters. Widely used binary data formats, the Network Common Data Form (NetCDF) and the Seismic Analysis Code (SAC) are adopted for the input of the heterogeneous structure model and the outputs of the simulation results, so users can easily handle the input/output datasets. All codes are written in Fortran 2003 and are available with detailed documents in a public repository.[Figure not available: see fulltext.
Low-sensitivity active filter realization using a complex all-pass filter

Science.gov (United States)

Regalia, Phillip A.; Mitra, Sanjit K.

1987-04-01

A wide class of continuous-time transfer functions may be implemented as the parallel combination of two all-pass filters, including Butterworth, Chebyshev, and elliptic low-pass approximations of odd order. Here, the realization of even-order low-pass classical approximations is considered, and it is shown that they may be decomposed in terms of complex all-pass functions. A systematic realization approach, based on scattering domain simulation (i.e., wave active filters), allows for a low-sensitivity active filter implementation. Further insight into the low-sensitivity property is gained by connecting the insertion loss of doubly terminated antimetric networks with the imaginary return loss of complex lossless networks.
Notified Access: Extending Remote Memory Access Programming Models for Producer-Consumer Synchronization

KAUST Repository

Belli, Roberto; Hoefler, Torsten

2015-01-01

Remote Memory Access (RMA) programming enables direct access to low-level hardware features to achieve high performance for distributed-memory programs. However, the design of RMA programming schemes focuses on the memory access and less on the synchronization. For example, in contemporary RMA programming systems, the widely used producer-consumer pattern can only be implemented inefficiently, incurring in an overhead of an additional round-trip message. We propose Notified Access, a scheme where the target process of an access can receive a completion notification. This scheme enables direct and efficient synchronization with a minimum number of messages. We implement our scheme in an open source MPI-3 RMA library and demonstrate lower overheads (two cache misses) than other point-to-point synchronization mechanisms for each notification. We also evaluate our implementation on three real-world benchmarks, a stencil computation, a tree computation, and a Colicky factorization implemented with tasks. Our scheme always performs better than traditional message passing and other existing RMA synchronization schemes, providing up to 50% speedup on small messages. Our analysis shows that Notified Access is a valuable primitive for any RMA system. Furthermore, we provide guidance for the design of low-level network interfaces to support Notified Access efficiently.
Notified Access: Extending Remote Memory Access Programming Models for Producer-Consumer Synchronization

KAUST Repository

Belli, Roberto

2015-05-01

Remote Memory Access (RMA) programming enables direct access to low-level hardware features to achieve high performance for distributed-memory programs. However, the design of RMA programming schemes focuses on the memory access and less on the synchronization. For example, in contemporary RMA programming systems, the widely used producer-consumer pattern can only be implemented inefficiently, incurring in an overhead of an additional round-trip message. We propose Notified Access, a scheme where the target process of an access can receive a completion notification. This scheme enables direct and efficient synchronization with a minimum number of messages. We implement our scheme in an open source MPI-3 RMA library and demonstrate lower overheads (two cache misses) than other point-to-point synchronization mechanisms for each notification. We also evaluate our implementation on three real-world benchmarks, a stencil computation, a tree computation, and a Colicky factorization implemented with tasks. Our scheme always performs better than traditional message passing and other existing RMA synchronization schemes, providing up to 50% speedup on small messages. Our analysis shows that Notified Access is a valuable primitive for any RMA system. Furthermore, we provide guidance for the design of low-level network interfaces to support Notified Access efficiently.
Roofline model toolkit: A practical tool for architectural and program analysis

Energy Technology Data Exchange (ETDEWEB)

Lo, Yu Jung [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Williams, Samuel [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Van Straalen, Brian [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Ligocki, Terry J. [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Cordery, Matthew J. [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Wright, Nicholas J. [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Hall, Mary W. [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Oliker, Leonid [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)

2015-04-18

We present preliminary results of the Roofline Toolkit for multicore, many core, and accelerated architectures. This paper focuses on the processor architecture characterization engine, a collection of portable instrumented micro benchmarks implemented with Message Passing Interface (MPI), and OpenMP used to express thread-level parallelism. These benchmarks are specialized to quantify the behavior of different architectural features. Compared to previous work on performance characterization, these microbenchmarks focus on capturing the performance of each level of the memory hierarchy, along with thread-level parallelism, instruction-level parallelism and explicit SIMD parallelism, measured in the context of the compilers and run-time environments. We also measure sustained PCIe throughput with four GPU memory managed mechanisms. By combining results from the architecture characterization with the Roofline model based solely on architectural specifications, this work offers insights for performance prediction of current and future architectures and their software systems. To that end, we instrument three applications and plot their resultant performance on the corresponding Roofline model when run on a Blue Gene/Q architecture.
Final Report: Center for Programming Models for Scalable Parallel Computing

Energy Technology Data Exchange (ETDEWEB)

Mellor-Crummey, John [William Marsh Rice University

2011-09-13

As part of the Center for Programming Models for Scalable Parallel Computing, Rice University collaborated with project partners in the design, development and deployment of language, compiler, and runtime support for parallel programming models to support application development for the “leadership-class” computer systems at DOE national laboratories. Work over the course of this project has focused on the design, implementation, and evaluation of a second-generation version of Coarray Fortran. Research and development efforts of the project have focused on the CAF 2.0 language, compiler, runtime system, and supporting infrastructure. This has involved working with the teams that provide infrastructure for CAF that we rely on, implementing new language and runtime features, producing an open source compiler that enabled us to evaluate our ideas, and evaluating our design and implementation through the use of benchmarks. The report details the research, development, findings, and conclusions from this work.
Scalable High Performance Message Passing over InfiniBand for Open MPI

Energy Technology Data Exchange (ETDEWEB)

Friedley, A; Hoefler, T; Leininger, M L; Lumsdaine, A

2007-10-24

InfiniBand (IB) is a popular network technology for modern high-performance computing systems. MPI implementations traditionally support IB using a reliable, connection-oriented (RC) transport. However, per-process resource usage that grows linearly with the number of processes, makes this approach prohibitive for large-scale systems. IB provides an alternative in the form of a connectionless unreliable datagram transport (UD), which allows for near-constant resource usage and initialization overhead as the process count increases. This paper describes a UD-based implementation for IB in Open MPI as a scalable alternative to existing RC-based schemes. We use the software reliability capabilities of Open MPI to provide the guaranteed delivery semantics required by MPI. Results show that UD not only requires fewer resources at scale, but also allows for shorter MPI startup times. A connectionless model also improves performance for applications that tend to send small messages to many different processes.
De Novo Ultrascale Atomistic Simulations On High-End Parallel Supercomputers

Energy Technology Data Exchange (ETDEWEB)

Nakano, A; Kalia, R K; Nomura, K; Sharma, A; Vashishta, P; Shimojo, F; van Duin, A; Goddard, III, W A; Biswas, R; Srivastava, D; Yang, L H

2006-09-04

We present a de novo hierarchical simulation framework for first-principles based predictive simulations of materials and their validation on high-end parallel supercomputers and geographically distributed clusters. In this framework, high-end chemically reactive and non-reactive molecular dynamics (MD) simulations explore a wide solution space to discover microscopic mechanisms that govern macroscopic material properties, into which highly accurate quantum mechanical (QM) simulations are embedded to validate the discovered mechanisms and quantify the uncertainty of the solution. The framework includes an embedded divide-and-conquer (EDC) algorithmic framework for the design of linear-scaling simulation algorithms with minimal bandwidth complexity and tight error control. The EDC framework also enables adaptive hierarchical simulation with automated model transitioning assisted by graph-based event tracking. A tunable hierarchical cellular decomposition parallelization framework then maps the O(N) EDC algorithms onto Petaflops computers, while achieving performance tunability through a hierarchy of parameterized cell data/computation structures, as well as its implementation using hybrid Grid remote procedure call + message passing + threads programming. High-end computing platforms such as IBM BlueGene/L, SGI Altix 3000 and the NSF TeraGrid provide an excellent test grounds for the framework. On these platforms, we have achieved unprecedented scales of quantum-mechanically accurate and well validated, chemically reactive atomistic simulations--1.06 billion-atom fast reactive force-field MD and 11.8 million-atom (1.04 trillion grid points) quantum-mechanical MD in the framework of the EDC density functional theory on adaptive multigrids--in addition to 134 billion-atom non-reactive space-time multiresolution MD, with the parallel efficiency as high as 0.998 on 65,536 dual-processor BlueGene/L nodes. We have also achieved an automated execution of hierarchical QM
76 FR 62808 - Pilot Program for Parallel Review of Medical Products

Science.gov (United States)

2011-10-11

... voluntary participation in the pilot program, as well as the guiding principles the Agencies intend to... 57045), parallel review is intended to reduce the time between FDA marketing approval and CMS national...
Performance standards for pass-fail determinations in the national air traffic flight service station training program.

Science.gov (United States)

1979-07-01

This report describes and documents Pass-Fail procedures for the new FSS Training Program. New types of measures and sets of norms are used to create standards of performance that students must meet to become eligible for acceptance into the operatio...
Empirical valence bond models for reactive potential energy surfaces: a parallel multilevel genetic program approach.

Science.gov (United States)

Bellucci, Michael A; Coker, David F

2011-07-28

We describe a new method for constructing empirical valence bond potential energy surfaces using a parallel multilevel genetic program (PMLGP). Genetic programs can be used to perform an efficient search through function space and parameter space to find the best functions and sets of parameters that fit energies obtained by ab initio electronic structure calculations. Building on the traditional genetic program approach, the PMLGP utilizes a hierarchy of genetic programming on two different levels. The lower level genetic programs are used to optimize coevolving populations in parallel while the higher level genetic program (HLGP) is used to optimize the genetic operator probabilities of the lower level genetic programs. The HLGP allows the algorithm to dynamically learn the mutation or combination of mutations that most effectively increase the fitness of the populations, causing a significant increase in the algorithm's accuracy and efficiency. The algorithm's accuracy and efficiency is tested against a standard parallel genetic program with a variety of one-dimensional test cases. Subsequently, the PMLGP is utilized to obtain an accurate empirical valence bond model for proton transfer in 3-hydroxy-gamma-pyrone in gas phase and protic solvent. © 2011 American Institute of Physics
Multicore Challenges and Benefits for High Performance Scientific Computing

Directory of Open Access Journals (Sweden)

Ida M.B. Nielsen

2008-01-01

Full Text Available Until recently, performance gains in processors were achieved largely by improvements in clock speeds and instruction level parallelism. Thus, applications could obtain performance increases with relatively minor changes by upgrading to the latest generation of computing hardware. Currently, however, processor performance improvements are realized by using multicore technology and hardware support for multiple threads within each core, and taking full advantage of this technology to improve the performance of applications requires exposure of extreme levels of software parallelism. We will here discuss the architecture of parallel computers constructed from many multicore chips as well as techniques for managing the complexity of programming such computers, including the hybrid message-passing/multi-threading programming model. We will illustrate these ideas with a hybrid distributed memory matrix multiply and a quantum chemistry algorithm for energy computation using Møller–Plesset perturbation theory.
Practical Formal Verification of MPI and Thread Programs

Science.gov (United States)

Gopalakrishnan, Ganesh; Kirby, Robert M.

Large-scale simulation codes in science and engineering are written using the Message Passing Interface (MPI). Shared memory threads are widely used directly, or to implement higher level programming abstractions. Traditional debugging methods for MPI or thread programs are incapable of providing useful formal guarantees about coverage. They get bogged down in the sheer number of interleavings (schedules), often missing shallow bugs. In this tutorial we will introduce two practical formal verification tools: ISP (for MPI C programs) and Inspect (for Pthread C programs). Unlike other formal verification tools, ISP and Inspect run directly on user source codes (much like a debugger). They pursue only the relevant set of process interleavings, using our own customized Dynamic Partial Order Reduction algorithms. For a given test harness, DPOR allows these tools to guarantee the absence of deadlocks, instrumented MPI object leaks and communication races (using ISP), and shared memory races (using Inspect). ISP and Inspect have been used to verify large pieces of code: in excess of 10,000 lines of MPI/C for ISP in under 5 seconds, and about 5,000 lines of Pthread/C code in a few hours (and much faster with the use of a cluster or by exploiting special cases such as symmetry) for Inspect. We will also demonstrate the Microsoft Visual Studio and Eclipse Parallel Tools Platform integrations of ISP (these will be available on the LiveCD).
Energy and exergy analysis in double-pass solar air heater

Indian Academy of Sciences (India)

P VELMURUGAN

mesh) in the second pass, and also by mounting longitudinal fins in the back side of the absorber plate ( ... energy sources. ... indoor solar simulator test facility photographically shown ..... El-khawajah et al [19] who employed multiple parallel.
Comparison of 250 MHz R10K Origin 2000 and 400 MHz Origin 2000 Using NAS Parallel Benchmarks

Science.gov (United States)

Turney, Raymond D.; Thigpen, William W. (Technical Monitor)

2001-01-01

This report describes results of benchmark tests on Steger, a 250 MHz Origin 2000 system with R10K processors, currently installed at the NASA Ames National Advanced Supercomputing (NAS) facility. For comparison purposes, the tests were also run on Lomax, a 400 MHz Origin 2000 with R12K processors. The BT, LU, and SP application benchmarks in the NAS Parallel Benchmark Suite and the kernel benchmark FT were chosen to measure system performance. Having been written to measure performance on Computational Fluid Dynamics applications, these benchmarks are assumed appropriate to represent the NAS workload. Since the NAS runs both message passing (MPI) and shared-memory, compiler directive type codes, both MPI and OpenMP versions of the benchmarks were used. The MPI versions used were the latest official release of the NAS Parallel Benchmarks, version 2.3. The OpenMP versions used were PBN3b2, a beta version that is in the process of being released. NPB 2.3 and PBN3b2 are technically different benchmarks, and NPB results are not directly comparable to PBN results.
Synchronous Parallel System for Emulation and Discrete Event Simulation

Science.gov (United States)

Steinman, Jeffrey S. (Inventor)

2001-01-01

A synchronous parallel system for emulation and discrete event simulation having parallel nodes responds to received messages at each node by generating event objects having individual time stamps, stores only the changes to the state variables of the simulation object attributable to the event object and produces corresponding messages. The system refrains from transmitting the messages and changing the state variables while it determines whether the changes are superseded, and then stores the unchanged state variables in the event object for later restoral to the simulation object if called for. This determination preferably includes sensing the time stamp of each new event object and determining which new event object has the earliest time stamp as the local event horizon, determining the earliest local event horizon of the nodes as the global event horizon, and ignoring events whose time stamps are less than the global event horizon. Host processing between the system and external terminals enables such a terminal to query, monitor, command or participate with a simulation object during the simulation process.

76 FR 66309 - Pilot Program for Parallel Review of Medical Products; Correction

Science.gov (United States)

2011-10-26

... DEPARTMENT OF HEALTH AND HUMAN SERVICES Centers for Medicare and Medicaid Services [CMS-3180-N2] Food and Drug Administration [Docket No. FDA-2010-N-0308] Pilot Program for Parallel Review of Medical... 11, 2011 (76 FR 62808). The document announced a pilot program for sponsors of innovative device...
DMA engine for repeating communication patterns

Science.gov (United States)

Chen, Dong; Gara, Alan G.; Giampapa, Mark E.; Heidelberger, Philip; Steinmacher-Burow, Burkhard; Vranas, Pavlos

2010-09-21

A parallel computer system is constructed as a network of interconnected compute nodes to operate a global message-passing application for performing communications across the network. Each of the compute nodes includes one or more individual processors with memories which run local instances of the global message-passing application operating at each compute node to carry out local processing operations independent of processing operations carried out at other compute nodes. Each compute node also includes a DMA engine constructed to interact with the application via Injection FIFO Metadata describing multiple Injection FIFOs where each Injection FIFO may containing an arbitrary number of message descriptors in order to process messages with a fixed processing overhead irrespective of the number of message descriptors included in the Injection FIFO.
Identifying effective components of alcohol abuse prevention programs: effects of fear appeals, message style, and source expertise.

Science.gov (United States)

Stainback, R D; Rogers, R W

1983-04-01

Despite the importance of alcohol abuse prevention programs, the effectiveness of many components of these programs has not been demonstrated empirically. An experiment tested the efficacy of three components of many prevention programs: fear appeals, one- versus two-sided message style, and the expertise of the source. The persuasive impact of this information was examined on 113 ninth-grade students' intentions to abstain from drinking alcohol while they are teenagers. The results reveal that fear appeals are successful in strengthening students' intentions to refrain from drinking. Implications are discussed for implementing these principles and for designing future investigations of alcohol abuse prevention programs.
An Examination of Adolescent Recall of Anti-Smoking Messages: Attitudes, Message Type, and Message Perceptions.

Science.gov (United States)

Bigsby, Elisabeth; Monahan, Jennifer L; Ewoldsen, David R

2017-04-01

Delayed message recall may be influenced by currently held accessible attitudes, the nature of the message, and message perceptions (perception of bias and message elaboration). This study examined the potential of message perceptions to mediate the influence of valenced attitude accessibility and message type on unaided recall of anti-smoking Public Service Announcements (PSAs). In a field experiment, ninth grade students (N = 244) watched three PSAs and responded to items on laptop computers. Twelve weeks later, follow-up telephone surveys were conducted to assess unaided recall. Both valenced attitude accessibility and message type were associated with message perceptions. However, only perception of message bias partially mediated the relationship between message type and unaided recall.
Massively Parallel Finite Element Programming

KAUST Repository

Heister, Timo; Kronbichler, Martin; Bangerth, Wolfgang

2010-01-01

Today's large finite element simulations require parallel algorithms to scale on clusters with thousands or tens of thousands of processor cores. We present data structures and algorithms to take advantage of the power of high performance computers in generic finite element codes. Existing generic finite element libraries often restrict the parallelization to parallel linear algebra routines. This is a limiting factor when solving on more than a few hundreds of cores. We describe routines for distributed storage of all major components coupled with efficient, scalable algorithms. We give an overview of our effort to enable the modern and generic finite element library deal.II to take advantage of the power of large clusters. In particular, we describe the construction of a distributed mesh and develop algorithms to fully parallelize the finite element calculation. Numerical results demonstrate good scalability. © 2010 Springer-Verlag.
Massively Parallel Finite Element Programming

KAUST Repository

Heister, Timo

2010-01-01

Today\\'s large finite element simulations require parallel algorithms to scale on clusters with thousands or tens of thousands of processor cores. We present data structures and algorithms to take advantage of the power of high performance computers in generic finite element codes. Existing generic finite element libraries often restrict the parallelization to parallel linear algebra routines. This is a limiting factor when solving on more than a few hundreds of cores. We describe routines for distributed storage of all major components coupled with efficient, scalable algorithms. We give an overview of our effort to enable the modern and generic finite element library deal.II to take advantage of the power of large clusters. In particular, we describe the construction of a distributed mesh and develop algorithms to fully parallelize the finite element calculation. Numerical results demonstrate good scalability. © 2010 Springer-Verlag.
Integrating marker passing and problem solving a spreading activation approach to improved choice in planning

CERN Document Server

Hendler, James A

2014-01-01

A recent area of interest in the Artificial Intelligence community has been the application of massively parallel algorithms to enhance the choice mechanism in traditional AI problems. This volume provides a detailed description of how marker-passing -- a parallel, non-deductive, spreading activation algorithm -- is a powerful approach to refining the choice mechanisms in an AI problem-solving system. The author scrutinizes the design of both the algorithm and the system, and then reviews the current literature and research in planning and marker passing. Also included: a comparison of this
ZeroMQ messaging for many applications

CERN Document Server

Hintjens, Pieter

2013-01-01

Dive into ØMQ (aka ZeroMQ), the smart socket library that gives you fast, easy, message-based concurrency for your applications. With this quick-paced guide, you’ll learn hands-on how to use this scalable, lightweight, and highly flexible networking tool for exchanging messages among clusters, the cloud, and other multi-system environments. ØMQ maintainer Pieter Hintjens takes you on a tour of real-world applications, using extended examples in C to help you work with ØMQ’s API, sockets, and patterns. Learn how to use specific ØMQ programming techniques, build multithreaded applications, and create your own messaging architectures. You’ll discover how ØMQ works with several programming languages and most operating systems—with little or no cost. Learn ØMQ’s main patterns: request-reply, publish-subscribe, and pipeline Work with ØMQ sockets and patterns by building several small applications Explore advanced uses of ØMQ’s request-reply pattern through working examples Build reliable request...
An Efficient Parallel Multi-Scale Segmentation Method for Remote Sensing Imagery

Directory of Open Access Journals (Sweden)

Haiyan Gu

2018-04-01

Full Text Available Remote sensing (RS image segmentation is an essential step in geographic object-based image analysis (GEOBIA to ultimately derive “meaningful objects”. While many segmentation methods exist, most of them are not efficient for large data sets. Thus, the goal of this research is to develop an efficient parallel multi-scale segmentation method for RS imagery by combining graph theory and the fractal net evolution approach (FNEA. Specifically, a minimum spanning tree (MST algorithm in graph theory is proposed to be combined with a minimum heterogeneity rule (MHR algorithm that is used in FNEA. The MST algorithm is used for the initial segmentation while the MHR algorithm is used for object merging. An efficient implementation of the segmentation strategy is presented using data partition and the “reverse searching-forward processing” chain based on message passing interface (MPI parallel technology. Segmentation results of the proposed method using images from multiple sensors (airborne, SPECIM AISA EAGLE II, WorldView-2, RADARSAT-2 and different selected landscapes (residential/industrial, residential/agriculture covering four test sites indicated its efficiency in accuracy and speed. We conclude that the proposed method is applicable and efficient for the segmentation of a variety of RS imagery (airborne optical, satellite optical, SAR, high-spectral, while the accuracy is comparable with that of the FNEA method.
Modeling radiative transport in ICF plasmas on an IBM SP2 supercomputer

International Nuclear Information System (INIS)

Johansen, J.A.; MacFarlane, J.J.; Moses, G.A.

1995-01-01

At the University of Wisconsin-Madison the authors have integrated a collisional-radiative-equilibrium model into their CONRAD radiation-hydrodynamics code. This integrated package allows them to accurately simulate the transport processes involved in ICF plasmas; including the important effects of self-absorption of line-radiation. However, as they increase the amount of atomic structure utilized in their transport models, the computational demands increase nonlinearly. In an attempt to meet this increased computational demand, they have recently embarked on a mission to parallelize the CONRAD program. The parallel CONRAD development is being performed on an IBM SP2 supercomputer. The parallelism is based on a message passing paradigm, and is being implemented using PVM. At the present time they have determined that approximately 70% of the sequential program can be executed in parallel. Accordingly, they expect that the parallel version will yield a speedup on the order of three times that of the sequential version. This translates into only 10 hours of execution time for the parallel version, whereas the sequential version required 30 hours
Xyce Parallel Electronic Simulator Users' Guide Version 6.7.

Energy Technology Data Exchange (ETDEWEB)

Keiter, Eric R. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Aadithya, Karthik Venkatraman [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Mei, Ting [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Russo, Thomas V. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Schiek, Richard [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Sholander, Peter E. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Thornquist, Heidi K. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Verley, Jason [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)

2017-05-01

This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: Capability to solve extremely large circuit problems by supporting large-scale parallel com- puting platforms (up to thousands of processors). This includes support for most popular parallel and serial computers. A differential-algebraic-equation (DAE) formulation, which better isolates the device model package from solver algorithms. This allows one to develop new types of analysis without requiring the implementation of analysis-specific device models. Device models that are specifically tailored to meet Sandia's needs, including some radiation- aware devices (for Sandia users only). Object-oriented code design and implementation using modern coding practices. Xyce is a parallel code in the most general sense of the phrase -- a message passing parallel implementation -- which allows it to run efficiently a wide range of computing platforms. These include serial, shared-memory and distributed-memory parallel platforms. Attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows. The information herein is subject to change without notice. Copyright c 2002-2017 Sandia Corporation. All rights reserved. Trademarks Xyce TM Electronic Simulator and Xyce TM are trademarks of Sandia Corporation. Orcad, Orcad Capture, PSpice and Probe are registered trademarks of Cadence Design Systems, Inc. Microsoft, Windows and Windows 7 are registered trademarks of Microsoft Corporation. Medici, DaVinci and Taurus are registered trademarks of Synopsys Corporation. Amtec and TecPlot are trademarks of
Programming in Fortran M

Energy Technology Data Exchange (ETDEWEB)

Foster, I.; Olson, R.; Tuecke, S.

1993-08-01

Fortran M is a small set of extensions to Fortran that supports a modular approach to the construction of sequential and parallel programs. Fortran M programs use channels to plug together processes which may be written in Fortran M or Fortran 77. Processes communicate by sending and receiving messages on channels. Channels and processes can be created dynamically, but programs remain deterministic unless specialized nondeterministic constructs are used. Fortran M programs can execute on a range of sequential, parallel, and networked computers. This report incorporates both a tutorial introduction to Fortran M and a users guide for the Fortran M compiler developed at Argonne National Laboratory. The Fortran M compiler, supporting software, and documentation are made available free of charge by Argonne National Laboratory, but are protected by a copyright which places certain restrictions on how they may be redistributed. See the software for details. The latest version of both the compiler and this manual can be obtained by anonymous ftp from Argonne National Laboratory in the directory pub/fortran-m at info.mcs.anl.gov.
Feasibility studies for a high energy physics MC program on massive parallel platforms

International Nuclear Information System (INIS)

Bertolotto, L.M.; Peach, K.J.; Apostolakis, J.; Bruschini, C.E.; Calafiura, P.; Gagliardi, F.; Metcalf, M.; Norton, A.; Panzer-Steindel, B.

1994-01-01

The parallelization of a Monte Carlo program for the NA48 experiment is presented. As a first step, a task farming structure was realized. Based on this, a further step, making use of a distributed database for showers in the electro-magnetic calorimeter, was implemented. Further possibilities for using parallel processing for a quasi-real time calibration of the calorimeter are described
Parallel programming with Python

CERN Document Server

Palach, Jan

2014-01-01

A fast, easy-to-follow and clear tutorial to help you develop Parallel computing systems using Python. Along with explaining the fundamentals, the book will also introduce you to slightly advanced concepts and will help you in implementing these techniques in the real world. If you are an experienced Python programmer and are willing to utilize the available computing resources by parallelizing applications in a simple way, then this book is for you. You are required to have a basic knowledge of Python development to get the most of this book.
Line-plane broadcasting in a data communications network of a parallel computer

Science.gov (United States)

Archer, Charles J.; Berg, Jeremy E.; Blocksome, Michael A.; Smith, Brian E.

2010-06-08

Methods, apparatus, and products are disclosed for line-plane broadcasting in a data communications network of a parallel computer, the parallel computer comprising a plurality of compute nodes connected together through the network, the network optimized for point to point data communications and characterized by at least a first dimension, a second dimension, and a third dimension, that include: initiating, by a broadcasting compute node, a broadcast operation, including sending a message to all of the compute nodes along an axis of the first dimension for the network; sending, by each compute node along the axis of the first dimension, the message to all of the compute nodes along an axis of the second dimension for the network; and sending, by each compute node along the axis of the second dimension, the message to all of the compute nodes along an axis of the third dimension for the network.
A hybrid parallel architecture for electrostatic interactions in the simulation of dissipative particle dynamics

Science.gov (United States)

Yang, Sheng-Chun; Lu, Zhong-Yuan; Qian, Hu-Jun; Wang, Yong-Lei; Han, Jie-Ping

2017-11-01

, which approximately take up most of the total simulation time. Although the parallel method CU-ENUF (Yang et al., 2016) based on GPU has achieved a qualitative leap compared with previous methods in electrostatic interactions computation, the computation capability is limited to the throughput capacity of a single GPU for super-scale simulation system. Therefore, we should look for an effective method to handle the calculation of electrostatic interactions efficiently for a simulation system with super-scale size. Solution method: We constructed a hybrid parallel architecture, in which CPU and GPU are combined to accelerate the electrostatic computation effectively. Firstly, the simulation system is divided into many subtasks via domain-decomposition method. Then MPI (Message Passing Interface) is used to implement the CPU-parallel computation with each computer node corresponding to a particular subtask, and furthermore each subtask in one computer node will be executed in GPU in parallel efficiently. In this hybrid parallel method, the most critical technical problem is how to parallelize a CUNFFT (nonequispaced fast Fourier transform based on CUDA) in the parallel strategy, which is conquered effectively by deep-seated research of basic principles and some algorithm skills. Restrictions: The HP-ENUF is mainly oriented to super-scale system simulations, in which the performance superiority is shown adequately. However, for a small simulation system containing less than 106 particles, the mode of multiple computer nodes has no apparent efficiency advantage or even lower efficiency due to the serious network delay among computer nodes, than the mode of single computer node. References: (1) S.-C. Yang, H.-J. Qian, Z.-Y. Lu, Appl. Comput. Harmon. Anal. 2016, http://dx.doi.org/10.1016/j.acha.2016.04.009. (2) S.-C. Yang, Y.-L. Wang, G.-S. Jiao, H.-J. Qian, Z.-Y. Lu, J. Comput. Chem. 37 (2016) 378. (3) S.-C. Yang, Y.-L. Zhu, H.-J. Qian, Z.-Y. Lu, Appl. Chem. Res. Chin. Univ
Enabling Requirements-Based Programming for Highly-Dependable Complex Parallel and Distributed Systems

Science.gov (United States)

Hinchey, Michael G.; Rash, James L.; Rouff, Christopher A.

2005-01-01

The manual application of formal methods in system specification has produced successes, but in the end, despite any claims and assertions by practitioners, there is no provable relationship between a manually derived system specification or formal model and the customer's original requirements. Complex parallel and distributed system present the worst case implications for today s dearth of viable approaches for achieving system dependability. No avenue other than formal methods constitutes a serious contender for resolving the problem, and so recognition of requirements-based programming has come at a critical juncture. We describe a new, NASA-developed automated requirement-based programming method that can be applied to certain classes of systems, including complex parallel and distributed systems, to achieve a high degree of dependability.
A PARALLEL MONTE CARLO CODE FOR SIMULATING COLLISIONAL N-BODY SYSTEMS

Energy Technology Data Exchange (ETDEWEB)

Pattabiraman, Bharath; Umbreit, Stefan; Liao, Wei-keng; Choudhary, Alok; Kalogera, Vassiliki; Memik, Gokhan; Rasio, Frederic A., E-mail: bharath@u.northwestern.edu [Center for Interdisciplinary Exploration and Research in Astrophysics, Northwestern University, Evanston, IL (United States)

2013-02-15

We present a new parallel code for computing the dynamical evolution of collisional N-body systems with up to N {approx} 10{sup 7} particles. Our code is based on the Henon Monte Carlo method for solving the Fokker-Planck equation, and makes assumptions of spherical symmetry and dynamical equilibrium. The principal algorithmic developments involve optimizing data structures and the introduction of a parallel random number generation scheme as well as a parallel sorting algorithm required to find nearest neighbors for interactions and to compute the gravitational potential. The new algorithms we introduce along with our choice of decomposition scheme minimize communication costs and ensure optimal distribution of data and workload among the processing units. Our implementation uses the Message Passing Interface library for communication, which makes it portable to many different supercomputing architectures. We validate the code by calculating the evolution of clusters with initial Plummer distribution functions up to core collapse with the number of stars, N, spanning three orders of magnitude from 10{sup 5} to 10{sup 7}. We find that our results are in good agreement with self-similar core-collapse solutions, and the core-collapse times generally agree with expectations from the literature. Also, we observe good total energy conservation, within {approx}< 0.04% throughout all simulations. We analyze the performance of the code, and demonstrate near-linear scaling of the runtime with the number of processors up to 64 processors for N = 10{sup 5}, 128 for N = 10{sup 6} and 256 for N = 10{sup 7}. The runtime reaches saturation with the addition of processors beyond these limits, which is a characteristic of the parallel sorting algorithm. The resulting maximum speedups we achieve are approximately 60 Multiplication-Sign , 100 Multiplication-Sign , and 220 Multiplication-Sign , respectively.
Double-pass quantum volume hologram

International Nuclear Information System (INIS)

Vasilyev, Denis V.; Sokolov, Ivan V.

2011-01-01

We propose a scheme for parallel, spatially multimode quantum memory for light. The scheme is based on the propagation in different directions of a quantum signal wave and strong classical reference wave, like in a classical volume hologram and the previously proposed quantum volume hologram [D. V. Vasilyev et al., Phys. Rev. A 81, 020302(R) (2010)]. The medium for the hologram consists of a spatially extended ensemble of cold spin-polarized atoms. In the absence of the collective spin rotation during the interaction, two passes of light for both storage and retrieval are required, and therefore the present scheme can be called a double-pass quantum volume hologram. The scheme is less sensitive to diffraction and therefore is capable of achieving a higher density of storage of spatial modes as compared to the previously proposed thin quantum hologram [D. V. Vasilyev et al., Phys. Rev. A 77, 020302(R) (2008)], which also requires two passes of light for both storage and retrieval. However, the present scheme allows one to achieve a good memory performance with a lower optical depth of the atomic sample as compared to the quantum volume hologram. A quantum hologram capable of storing entangled images can become an important ingredient in quantum information processing and quantum imaging.
Compiler Technology for Parallel Scientific Computation

Directory of Open Access Journals (Sweden)

Can Özturan

1994-01-01

Full Text Available There is a need for compiler technology that, given the source program, will generate efficient parallel codes for different architectures with minimal user involvement. Parallel computation is becoming indispensable in solving large-scale problems in science and engineering. Yet, the use of parallel computation is limited by the high costs of developing the needed software. To overcome this difficulty we advocate a comprehensive approach to the development of scalable architecture-independent software for scientific computation based on our experience with equational programming language (EPL. Our approach is based on a program decomposition, parallel code synthesis, and run-time support for parallel scientific computation. The program decomposition is guided by the source program annotations provided by the user. The synthesis of parallel code is based on configurations that describe the overall computation as a set of interacting components. Run-time support is provided by the compiler-generated code that redistributes computation and data during object program execution. The generated parallel code is optimized using techniques of data alignment, operator placement, wavefront determination, and memory optimization. In this article we discuss annotations, configurations, parallel code generation, and run-time support suitable for parallel programs written in the functional parallel programming language EPL and in Fortran.

Comparative analysis of the serial/parallel numerical calculation of boiling channels thermohydraulics; Analisis comparativo del calculo numerico serie/paralelo de la termohidraulica de canales con ebullicion

Energy Technology Data Exchange (ETDEWEB)

Cecenas F, M., E-mail: mcf@iie.org.mx [Instituto Nacional de Electricidad y Energias Limpias, Reforma 113, Col. Palmira, 62490 Cuernavaca, Morelos (Mexico)

2017-09-15

A parallel channel model with boiling and punctual neutron kinetics is used to compare the implementation of its programming in C language through a conventional scheme and through a parallel programming scheme. In both cases the subroutines written in C are practically the same, but they vary in the way of controlling the execution of the tasks that calculate the different channels. Parallel Virtual Machine is used for the parallel solution, which allows the passage of messages between tasks to control convergence and transfer the variables of interest between the tasks that run simultaneously on a platform equipped with a multi-core microprocessor. For some problems defined as a study case, such as the one presented in this paper, a computer with two cores can reduce the computation time to 54-56% of the time required by the same program in its conventional sequential version. Similarly, a processor with four cores can reduce the time to 22-33% of execution time of the conventional serial version. These results of substantially reducing the computation time are very motivating of all those applications that can be prepared to be parallelized and whose execution time is an important factor. (Author)
I/O Parallelization for the Goddard Earth Observing System Data Assimilation System (GEOS DAS)

Science.gov (United States)

Lucchesi, Rob; Sawyer, W.; Takacs, L. L.; Lyster, P.; Zero, J.

1998-01-01

The National Aeronautics and Space Administration (NASA) Data Assimilation Office (DAO) at the Goddard Space Flight Center (GSFC) has developed the GEOS DAS, a data assimilation system that provides production support for NASA missions and will support NASA's Earth Observing System (EOS) in the coming years. The GEOS DAS will be used to provide background fields of meteorological quantities to EOS satellite instrument teams for use in their data algorithms as well as providing assimilated data sets for climate studies on decadal time scales. The DAO has been involved in prototyping parallel implementations of the GEOS DAS for a number of years and is now embarking on an effort to convert the production version from shared-memory parallelism to distributed-memory parallelism using the portable Message-Passing Interface (MPI). The GEOS DAS consists of two main components, an atmospheric General Circulation Model (GCM) and a Physical-space Statistical Analysis System (PSAS). The GCM operates on data that are stored on a regular grid while PSAS works with observational data that are scattered irregularly throughout the atmosphere. As a result, the two components have different data decompositions. The GCM is decomposed horizontally as a checkerboard with all vertical levels of each box existing on the same processing element(PE). The dynamical core of the GCM can also operate on a rotated grid, which requires communication-intensive grid transformations during GCM integration. PSAS groups observations on PEs in a more irregular and dynamic fashion.
Testing the effect of text messaging cues to promote physical activity habits: a worksite-based exploratory intervention.

Science.gov (United States)

Fournier, M; d'Arripe-Longueville, F; Radel, R

2017-10-01

This study aims to test the efficacy of text messaging cues (SMS) to promote physical activity (PA) habit formation in the workplace. Employees (N = 49; 28 females and 21 males, Mage = 47.5 ± 8.29 years) were randomized into two parallel groups: a PA group enrolled in a 28-week supervised PA program and a PA+SMS group enrolled in the same PA program with text messaging cues received before their PA sessions. The exercise habit was assessed every week from self-reports on an online application. PA maintenance and several physical fitness measures were also assessed prior to and after the intervention to evaluate its general impact. Mixed model analysis of the 603 observations indicated a small but significant effect of the SMS cues on the speed at which participants engaged in PA behaviors, as the significant interaction effect revealed that the slope of the exercise habit over time was slightly steeper in the PA+SMS group (B = 0.0462, P = 0.0001) than in the PA group (B = 0.0216, P = 0.01). SMS delivery had a marginal effect on the maintenance of PA behaviors 1 year after the intervention. The results suggest that text messaging can help to form PA habits at the workplace and might facilitate long-term maintenance of PA behaviors. © 2016 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.
Practical parallel programming

CERN Document Server

Bauer, Barr E

2014-01-01

This is the book that will teach programmers to write faster, more efficient code for parallel processors. The reader is introduced to a vast array of procedures and paradigms on which actual coding may be based. Examples and real-life simulations using these devices are presented in C and FORTRAN.
The FORCE: A portable parallel programming language supporting computational structural mechanics

Science.gov (United States)

Jordan, Harry F.; Benten, Muhammad S.; Brehm, Juergen; Ramanan, Aruna

1989-01-01

This project supports the conversion of codes in Computational Structural Mechanics (CSM) to a parallel form which will efficiently exploit the computational power available from multiprocessors. The work is a part of a comprehensive, FORTRAN-based system to form a basis for a parallel version of the NICE/SPAR combination which will form the CSM Testbed. The software is macro-based and rests on the force methodology developed by the principal investigator in connection with an early scientific multiprocessor. Machine independence is an important characteristic of the system so that retargeting it to the Flex/32, or any other multiprocessor on which NICE/SPAR might be imnplemented, is well supported. The principal investigator has experience in producing parallel software for both full and sparse systems of linear equations using the force macros. Other researchers have used the Force in finite element programs. It has been possible to rapidly develop software which performs at maximum efficiency on a multiprocessor. The inherent machine independence of the system also means that the parallelization will not be limited to a specific multiprocessor.
Xyce Parallel Electronic Simulator - User's Guide, Version 1.0

Energy Technology Data Exchange (ETDEWEB)

HUTCHINSON, SCOTT A; KEITER, ERIC R.; HOEKSTRA, ROBERT J.; WATERS, LON J.; RUSSO, THOMAS V.; RANKIN, ERIC LAMONT; WIX, STEVEN D.

2002-11-01

This manual describes the use of the Xyce Parallel Electronic Simulator code for simulating electrical circuits at a variety of abstraction levels. The Xyce Parallel Electronic Simulator has been written to support,in a rigorous manner, the simulation needs of the Sandia National Laboratories electrical designers. As such, the development has focused on improving the capability over the current state-of-the-art in the following areas: (1) Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). Note that this includes support for most popular parallel and serial computers. (2) Improved performance for all numerical kernels (e.g., time integrator, nonlinear and linear solvers) through state-of-the-art algorithms and novel techniques. (3) A client-server or multi-tiered operating model wherein the numerical kernel can operate independently of the graphical user interface (GUI). (4) Object-oriented code design and implementation using modern coding-practices that ensure that the Xyce Parallel Electronic Simulator will be maintainable and extensible far into the future. The code is a parallel code in the most general sense of the phrase--a message passing parallel implementation--which allows it to run efficiently on the widest possible number of computing platforms. These include serial, shared-memory and distributed-memory parallel as well as heterogeneous platforms. Furthermore, careful attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved even as the number of processors grows. Another feature required by designers is the ability to add device models, many specific to the needs of Sandia, to the code. To this end, the device package in the Xyce Parallel Electronic Simulator is designed to support a variety of device model inputs. These input formats include standard analytical models, behavioral models
Managing Gradual Typing with Message-Safety in Dart

DEFF Research Database (Denmark)

Ernst, Erik; Møller, Anders; Schwarz, Mathias Romme

. This supports evolution from a dynamically typed programto a strictly statically typed form. We present a formal model of Dart that elucidates how a core of the language and its standard type system works. This allows us to characterize message-safe programs and present a theorem stating that such programs......This paper establishes a notion of message-safe programs as a natural intermediate point between dynamically typed and statically typed Dart programs. Unlike traditional static type checking, the type system in the Dart programming language is unsound by design. The rationale has been...... that this allows compile-time detection of likely errors and enables code completion in integrated development environments, without being restrictive on programmers. We show that, despite unsoundness, judicious use of type annotations can ensure useful properties of the runtime behavior of Dart programs...
Message from the Program Chairs

DEFF Research Database (Denmark)

Sheng, Quan Z.; Wang, Guoren; Jensen, Christian S.

2012-01-01

. The papers cover contemporary topics in the fields of Web management and World Wide Web related research and applications, such as advanced application of databases, cloud computing, content management, data mining and knowledge discovery, distributed and parallel processing, grid computing, internet...
Parallelization and checkpointing of GPU applications through program transformation

Energy Technology Data Exchange (ETDEWEB)

Solano-Quinde, Lizandro Damian [Iowa State Univ., Ames, IA (United States)

2012-01-01

GPUs have emerged as a powerful tool for accelerating general-purpose applications. The availability of programming languages that makes writing general-purpose applications for running on GPUs tractable have consolidated GPUs as an alternative for accelerating general purpose applications. Among the areas that have benefited from GPU acceleration are: signal and image processing, computational fluid dynamics, quantum chemistry, and, in general, the High Performance Computing (HPC) Industry. In order to continue to exploit higher levels of parallelism with GPUs, multi-GPU systems are gaining popularity. In this context, single-GPU applications are parallelized for running in multi-GPU systems. Furthermore, multi-GPU systems help to solve the GPU memory limitation for applications with large application memory footprint. Parallelizing single-GPU applications has been approached by libraries that distribute the workload at runtime, however, they impose execution overhead and are not portable. On the other hand, on traditional CPU systems, parallelization has been approached through application transformation at pre-compile time, which enhances the application to distribute the workload at application level and does not have the issues of library-based approaches. Hence, a parallelization scheme for GPU systems based on application transformation is needed. Like any computing engine of today, reliability is also a concern in GPUs. GPUs are vulnerable to transient and permanent failures. Current checkpoint/restart techniques are not suitable for systems with GPUs. Checkpointing for GPU systems present new and interesting challenges, primarily due to the natural differences imposed by the hardware design, the memory subsystem architecture, the massive number of threads, and the limited amount of synchronization among threads. Therefore, a checkpoint/restart technique suitable for GPU systems is needed. The goal of this work is to exploit higher levels of parallelism and
Modeling drivers' passing duration and distance in a virtual environment

Directory of Open Access Journals (Sweden)

Haneen Farah

2013-07-01

The main contribution of this paper is in the empirical models developed for passing duration and distance which highlights the factors that affect drivers' passing behavior and can be used to enhance the passing models in simulation programs.
Distributed Memory Parallel Computing with SEAWAT

Science.gov (United States)

Verkaik, J.; Huizer, S.; van Engelen, J.; Oude Essink, G.; Ram, R.; Vuik, K.

2017-12-01

Fresh groundwater reserves in coastal aquifers are threatened by sea-level rise, extreme weather conditions, increasing urbanization and associated groundwater extraction rates. To counteract these threats, accurate high-resolution numerical models are required to optimize the management of these precious reserves. The major model drawbacks are long run times and large memory requirements, limiting the predictive power of these models. Distributed memory parallel computing is an efficient technique for reducing run times and memory requirements, where the problem is divided over multiple processor cores. A new Parallel Krylov Solver (PKS) for SEAWAT is presented. PKS has recently been applied to MODFLOW and includes Conjugate Gradient (CG) and Biconjugate Gradient Stabilized (BiCGSTAB) linear accelerators. Both accelerators are preconditioned by an overlapping additive Schwarz preconditioner in a way that: a) subdomains are partitioned using Recursive Coordinate Bisection (RCB) load balancing, b) each subdomain uses local memory only and communicates with other subdomains by Message Passing Interface (MPI) within the linear accelerator, c) it is fully integrated in SEAWAT. Within SEAWAT, the PKS-CG solver replaces the Preconditioned Conjugate Gradient (PCG) solver for solving the variable-density groundwater flow equation and the PKS-BiCGSTAB solver replaces the Generalized Conjugate Gradient (GCG) solver for solving the advection-diffusion equation. PKS supports the third-order Total Variation Diminishing (TVD) scheme for computing advection. Benchmarks were performed on the Dutch national supercomputer (https://userinfo.surfsara.nl/systems/cartesius) using up to 128 cores, for a synthetic 3D Henry model (100 million cells) and the real-life Sand Engine model ( 10 million cells). The Sand Engine model was used to investigate the potential effect of the long-term morphological evolution of a large sand replenishment and climate change on fresh groundwater resources
Enhancing a Teen Pregnancy Prevention Program with text messaging: engaging minority youth to develop TOP ® Plus Text.

Science.gov (United States)

Devine, Sharon; Bull, Sheana; Dreisbach, Susan; Shlay, Judith

2014-03-01

To develop and pilot a theory-based, mobile phone texting component attractive to minority youth as a supplement to the Teen Outreach Program(®), a youth development program for reducing teen pregnancy and school dropout. We conducted iterative formative research with minority youth in multiple focus groups to explore interest in texting and reaction to text messages. We piloted a month-long version of TOP(®) Plus Text with 96 teens at four sites and conducted a computer-based survey immediately after enrollment and at the end of the pilot that collected information about teens' values, social support, self-efficacy, and behaviors relating to school performance, trouble with the law, and sexual activity. After each of the first three weekly sessions we collected satisfaction measures. Upon completion of the pilot we conducted exit interviews with twelve purposively selected pilot participants. We successfully recruited and enrolled minority youth into the pilot. Teens were enthusiastic about text messages complementing TOP(®). Results also revealed barriers: access to text-capable mobile phones, retention as measured by completion of the post-pilot survey, and a need to be attentive to teen literacy. Piloting helped identify improvements for implementation including offering text messages through multiple platforms so youth without access to a mobile phone could receive messages; rewording texts to allow youth to express opinions without feeling judged; and collecting multiple types of contact information to improve follow-up. Thoughtful attention to social and behavioral theory and investment in iterative formative research with extensive consultation with teens can lead to an engaging texting curriculum that enhances and complements TOP(®). Copyright © 2014 Society for Adolescent Health and Medicine. Published by Elsevier Inc. All rights reserved.
ng: What next-generation languages can teach us about HENP frameworks in the manycore era

International Nuclear Information System (INIS)

Binet, Sébastien

2011-01-01

Current High Energy and Nuclear Physics (HENP) frameworks were written before multicore systems became widely deployed. A 'single-thread' execution model naturally emerged from that environment, however, this no longer fits into the processing model on the dawn of the manycore era. Although previous work focused on minimizing the changes to be applied to the LHC frameworks (because of the data taking phase) while still trying to reap the benefits of the parallel-enhanced CPU architectures, this paper explores what new languages could bring to the design of the next-generation frameworks. Parallel programming is still in an intensive phase of R and D and no silver bullet exists despite the 30+ years of literature on the subject. Yet, several parallel programming styles have emerged: actors, message passing, communicating sequential processes, task-based programming, data flow programming, ... to name a few. We present the work of the prototyping of a next-generation framework in new and expressive languages (python and Go) to investigate how code clarity and robustness are affected and what are the downsides of using languages younger than FORTRAN/C/C++.
Dynamic programming in parallel boundary detection with application to ultrasound intima-media segmentation.

Science.gov (United States)

Zhou, Yuan; Cheng, Xinyao; Xu, Xiangyang; Song, Enmin

2013-12-01

Segmentation of carotid artery intima-media in longitudinal ultrasound images for measuring its thickness to predict cardiovascular diseases can be simplified as detecting two nearly parallel boundaries within a certain distance range, when plaque with irregular shapes is not considered. In this paper, we improve the implementation of two dynamic programming (DP) based approaches to parallel boundary detection, dual dynamic programming (DDP) and piecewise linear dual dynamic programming (PL-DDP). Then, a novel DP based approach, dual line detection (DLD), which translates the original 2-D curve position to a 4-D parameter space representing two line segments in a local image segment, is proposed to solve the problem while maintaining efficiency and rotation invariance. To apply the DLD to ultrasound intima-media segmentation, it is imbedded in a framework that employs an edge map obtained from multiplication of the responses of two edge detectors with different scales and a coupled snake model that simultaneously deforms the two contours for maintaining parallelism. The experimental results on synthetic images and carotid arteries of clinical ultrasound images indicate improved performance of the proposed DLD compared to DDP and PL-DDP, with respect to accuracy and efficiency. Copyright © 2013 Elsevier B.V. All rights reserved.
Implementation of the DPM Monte Carlo code on a parallel architecture for treatment planning applications.

Science.gov (United States)

Tyagi, Neelam; Bose, Abhijit; Chetty, Indrin J

2004-09-01

We have parallelized the Dose Planning Method (DPM), a Monte Carlo code optimized for radiotherapy class problems, on distributed-memory processor architectures using the Message Passing Interface (MPI). Parallelization has been investigated on a variety of parallel computing architectures at the University of Michigan-Center for Advanced Computing, with respect to efficiency and speedup as a function of the number of processors. We have integrated the parallel pseudo random number generator from the Scalable Parallel Pseudo-Random Number Generator (SPRNG) library to run with the parallel DPM. The Intel cluster consisting of 800 MHz Intel Pentium III processor shows an almost linear speedup up to 32 processors for simulating 1 x 10(8) or more particles. The speedup results are nearly linear on an Athlon cluster (up to 24 processors based on availability) which consists of 1.8 GHz+ Advanced Micro Devices (AMD) Athlon processors on increasing the problem size up to 8 x 10(8) histories. For a smaller number of histories (1 x 10(8)) the reduction of efficiency with the Athlon cluster (down to 83.9% with 24 processors) occurs because the processing time required to simulate 1 x 10(8) histories is less than the time associated with interprocessor communication. A similar trend was seen with the Opteron Cluster (consisting of 1400 MHz, 64-bit AMD Opteron processors) on increasing the problem size. Because of the 64-bit architecture Opteron processors are capable of storing and processing instructions at a faster rate and hence are faster as compared to the 32-bit Athlon processors. We have validated our implementation with an in-phantom dose calculation study using a parallel pencil monoenergetic electron beam of 20 MeV energy. The phantom consists of layers of water, lung, bone, aluminum, and titanium. The agreement in the central axis depth dose curves and profiles at different depths shows that the serial and parallel codes are equivalent in accuracy.
Implementation of the DPM Monte Carlo code on a parallel architecture for treatment planning applications

International Nuclear Information System (INIS)

Tyagi, Neelam; Bose, Abhijit; Chetty, Indrin J.

2004-01-01

We have parallelized the Dose Planning Method (DPM), a Monte Carlo code optimized for radiotherapy class problems, on distributed-memory processor architectures using the Message Passing Interface (MPI). Parallelization has been investigated on a variety of parallel computing architectures at the University of Michigan-Center for Advanced Computing, with respect to efficiency and speedup as a function of the number of processors. We have integrated the parallel pseudo random number generator from the Scalable Parallel Pseudo-Random Number Generator (SPRNG) library to run with the parallel DPM. The Intel cluster consisting of 800 MHz Intel Pentium III processor shows an almost linear speedup up to 32 processors for simulating 1x10 8 or more particles. The speedup results are nearly linear on an Athlon cluster (up to 24 processors based on availability) which consists of 1.8 GHz+ Advanced Micro Devices (AMD) Athlon processors on increasing the problem size up to 8x10 8 histories. For a smaller number of histories (1x10 8 ) the reduction of efficiency with the Athlon cluster (down to 83.9% with 24 processors) occurs because the processing time required to simulate 1x10 8 histories is less than the time associated with interprocessor communication. A similar trend was seen with the Opteron Cluster (consisting of 1400 MHz, 64-bit AMD Opteron processors) on increasing the problem size. Because of the 64-bit architecture Opteron processors are capable of storing and processing instructions at a faster rate and hence are faster as compared to the 32-bit Athlon processors. We have validated our implementation with an in-phantom dose calculation study using a parallel pencil monoenergetic electron beam of 20 MeV energy. The phantom consists of layers of water, lung, bone, aluminum, and titanium. The agreement in the central axis depth dose curves and profiles at different depths shows that the serial and parallel codes are equivalent in accuracy
A Novel Single Pass Authenticated Encryption Stream Cipher for Software Defined Radios

DEFF Research Database (Denmark)

Khajuria, Samant

2012-01-01

to propose cryptographic services such as confidentiality, integrity and authentication. Therefore, integration of security services into SDR devices is essential. Authenticated Encryption schemes donate the class of cryptographic algorithms that are designed for protecting both message confidentiality....... This makes authenticated encryption very attractive for low-cost low-power hardware implementations, as it allows for the substantial decrease in the circuit area and power consumed compared to the traditional schemes. In this thesis, an authenticated encryption scheme is proposed with the focus of achieving...... high throughput and low overhead for SDRs. The thesis is divided into two research topics. One topic is the design of a 1-pass authenticated encryption scheme that can accomplish both message secrecy and authenticity in a single cryptographic primitive. The other topic is the implementation...
On a model of three-dimensional bursting and its parallel implementation

Science.gov (United States)

Tabik, S.; Romero, L. F.; Garzón, E. M.; Ramos, J. I.

2008-04-01

A mathematical model for the simulation of three-dimensional bursting phenomena and its parallel implementation are presented. The model consists of four nonlinearly coupled partial differential equations that include fast and slow variables, and exhibits bursting in the absence of diffusion. The differential equations have been discretized by means of a second-order accurate in both space and time, linearly-implicit finite difference method in equally-spaced grids. The resulting system of linear algebraic equations at each time level has been solved by means of the Preconditioned Conjugate Gradient (PCG) method. Three different parallel implementations of the proposed mathematical model have been developed; two of these implementations, i.e., the MPI and the PETSc codes, are based on a message passing paradigm, while the third one, i.e., the OpenMP code, is based on a shared space address paradigm. These three implementations are evaluated on two current high performance parallel architectures, i.e., a dual-processor cluster and a Shared Distributed Memory (SDM) system. A novel representation of the results that emphasizes the most relevant factors that affect the performance of the paralled implementations, is proposed. The comparative analysis of the computational results shows that the MPI and the OpenMP implementations are about twice more efficient than the PETSc code on the SDM system. It is also shown that, for the conditions reported here, the nonlinear dynamics of the three-dimensional bursting phenomena exhibits three stages characterized by asynchronous, synchronous and then asynchronous oscillations, before a quiescent state is reached. It is also shown that the fast system reaches steady state in much less time than the slow variables.
Standardized Testing Practices: Effect on Graduation and NCLEX® Pass Rates.

Science.gov (United States)

Randolph, Pamela K

The use standardized testing in pre-licensure nursing programs has been accompanied by conflicting reports of effective practices. The purpose of this project was to describe standardized testing practices in one states' nursing programs and discover if the use of a cut score or oversight of remediation had any effect on (a) first time NCLEX® pass rates, (b) on-time graduation (OTG) or (c) the combination of (a) and (b). Administrators of 38 nursing programs in one Southwest state were sent surveys; surveys were returned by 34 programs (89%). Survey responses were compared to each program's NCLEX pass rate and on-time graduation rate; t-tests were conducted for significant differences associated with a required minimum score (cut score) and oversight of remediation. There were no significant differences in NCLEX pass or on-time graduation rates related to establishment of a cut score. There was a significant difference when the NCLEX pass rate and on-time graduation rate were combined (Outcome Index "OI") with significantly higher program outcomes (P=.02.) for programs without cut-scores. There were no differences associated with faculty oversight of remediation. The results of this study do not support establishment of a cut-score when implementing a standardized testing. Copyright © 2016. Published by Elsevier Inc.
Link failure detection in a parallel computer

Science.gov (United States)

Archer, Charles J.; Blocksome, Michael A.; Megerian, Mark G.; Smith, Brian E.

2010-11-09

Methods, apparatus, and products are disclosed for link failure detection in a parallel computer including compute nodes connected in a rectangular mesh network, each pair of adjacent compute nodes in the rectangular mesh network connected together using a pair of links, that includes: assigning each compute node to either a first group or a second group such that adjacent compute nodes in the rectangular mesh network are assigned to different groups; sending, by each of the compute nodes assigned to the first group, a first test message to each adjacent compute node assigned to the second group; determining, by each of the compute nodes assigned to the second group, whether the first test message was received from each adjacent compute node assigned to the first group; and notifying a user, by each of the compute nodes assigned to the second group, whether the first test message was received.

On Power Allocation for Parallel Gaussian Broadcast Channels with Common Information

Directory of Open Access Journals (Sweden)

Gohary RamyH

2009-01-01

Full Text Available This paper considers a broadcast system in which a single transmitter sends a common message and (independent particular messages to receivers over unmatched parallel scalar Gaussian subchannels. For this system the set of all rate tuples that can be achieved via superposition coding and Gaussian signalling (SPCGS can be parameterized by a set of power loads and partitions, and the boundary of this set can be expressed as the solution of an optimization problem. Although that problem is not convex in the general case, it will be shown that it can be used to obtain tight and efficiently computable inner and outer bounds on the SPCGS rate region. The development of these bounds relies on approximating the original optimization problem by a (convex Geometric Program (GP, and in addition to generating the bounds, the GP also generates the corresponding power loads and partitions. There are special cases of the general problem that can be precisely formulated in a convex form. In this paper, explicit convex formulations are given for three such cases, namely, the case of 2 users, the case in which only particular messages are transmitted (in both of which the SPCGS rate region is the capacity region, and the case in which only the SPCGS sum rate is to be maximized.
Parallelization of the Physical-Space Statistical Analysis System (PSAS)

Science.gov (United States)

Larson, J. W.; Guo, J.; Lyster, P. M.

1999-01-01

Atmospheric data assimilation is a method of combining observations with model forecasts to produce a more accurate description of the atmosphere than the observations or forecast alone can provide. Data assimilation plays an increasingly important role in the study of climate and atmospheric chemistry. The NASA Data Assimilation Office (DAO) has developed the Goddard Earth Observing System Data Assimilation System (GEOS DAS) to create assimilated datasets. The core computational components of the GEOS DAS include the GEOS General Circulation Model (GCM) and the Physical-space Statistical Analysis System (PSAS). The need for timely validation of scientific enhancements to the data assimilation system poses computational demands that are best met by distributed parallel software. PSAS is implemented in Fortran 90 using object-based design principles. The analysis portions of the code solve two equations. The first of these is the "innovation" equation, which is solved on the unstructured observation grid using a preconditioned conjugate gradient (CG) method. The "analysis" equation is a transformation from the observation grid back to a structured grid, and is solved by a direct matrix-vector multiplication. Use of a factored-operator formulation reduces the computational complexity of both the CG solver and the matrix-vector multiplication, rendering the matrix-vector multiplications as a successive product of operators on a vector. Sparsity is introduced to these operators by partitioning the observations using an icosahedral decomposition scheme. PSAS builds a large (approx. 128MB) run-time database of parameters used in the calculation of these operators. Implementing a message passing parallel computing paradigm into an existing yet developing computational system as complex as PSAS is nontrivial. One of the technical challenges is balancing the requirements for computational reproducibility with the need for high performance. The problem of computational
Text Messaging, Teen Outreach Program, and Sexual Health Behavior: A Cluster Randomized Trial.

Science.gov (United States)

Bull, Sheana; Devine, Sharon; Schmiege, Sarah J; Pickard, Leslie; Campbell, Jon; Shlay, Judith C

2016-09-01

To consider whether Youth All Engaged! (a text message intervention) intensified the effects of the adolescent pregnancy prevention Teen Outreach Program (control) for youths. In this trial performed in Denver, Colorado, from 2011 to 2014, we randomized 8 Boys & Girls Clubs each of 4 years into 32 clubs per year combinations to ensure each club would serve as a treatment site for 2 years and a control site for 2 years. Control intervention consisted of the Teen Outreach Program only. We enrolled 852 youths (aged 14-18 years), and 632 were retained at follow-up, with analytic samples ranging from 50 to 624 across outcomes. We examined program costs, and whether the intervention increased condom and contraceptive use, access to care, and pregnancy prevention. Control program costs were $1184 per participant, and intervention costs were an additional $126 per participant (+10.6%). There were no statistically significant differences in primary outcomes for the full sample. Hispanic participants in the intervention condition had fewer pregnancies at follow-up (1.79%) than did those in the control group (6.72%; P = .02). Youth All Engaged is feasible, low cost, and could have potential benefits for Hispanic youths.
GPU-Accelerated Parallel FDTD on Distributed Heterogeneous Platform

Directory of Open Access Journals (Sweden)

Ronglin Jiang

2014-01-01

Full Text Available This paper introduces a (finite difference time domain FDTD code written in Fortran and CUDA for realistic electromagnetic calculations with parallelization methods of Message Passing Interface (MPI and Open Multiprocessing (OpenMP. Since both Central Processing Unit (CPU and Graphics Processing Unit (GPU resources are utilized, a faster execution speed can be reached compared to a traditional pure GPU code. In our experiments, 64 NVIDIA TESLA K20m GPUs and 64 INTEL XEON E5-2670 CPUs are used to carry out the pure CPU, pure GPU, and CPU + GPU tests. Relative to the pure CPU calculations for the same problems, the speedup ratio achieved by CPU + GPU calculations is around 14. Compared to the pure GPU calculations for the same problems, the CPU + GPU calculations have 7.6%–13.2% performance improvement. Because of the small memory size of GPUs, the FDTD problem size is usually very small. However, this code can enlarge the maximum problem size by 25% without reducing the performance of traditional pure GPU code. Finally, using this code, a microstrip antenna array with 16×18 elements is calculated and the radiation patterns are compared with the ones of MoM. Results show that there is a well agreement between them.
Homemade Buckeye-Pi: A Learning Many-Node Platform for High-Performance Parallel Computing

Science.gov (United States)

Amooie, M. A.; Moortgat, J.

2017-12-01

We report on the "Buckeye-Pi" cluster, the supercomputer developed in The Ohio State University School of Earth Sciences from 128 inexpensive Raspberry Pi (RPi) 3 Model B single-board computers. Each RPi is equipped with fast Quad Core 1.2GHz ARMv8 64bit processor, 1GB of RAM, and 32GB microSD card for local storage. Therefore, the cluster has a total RAM of 128GB that is distributed on the individual nodes and a flash capacity of 4TB with 512 processors, while it benefits from low power consumption, easy portability, and low total cost. The cluster uses the Message Passing Interface protocol to manage the communications between each node. These features render our platform the most powerful RPi supercomputer to date and suitable for educational applications in high-performance-computing (HPC) and handling of large datasets. In particular, we use the Buckeye-Pi to implement optimized parallel codes in our in-house simulator for subsurface media flows with the goal of achieving a massively-parallelized scalable code. We present benchmarking results for the computational performance across various number of RPi nodes. We believe our project could inspire scientists and students to consider the proposed unconventional cluster architecture as a mainstream and a feasible learning platform for challenging engineering and scientific problems.
The Physical Therapy and Society Summit (PASS) Meeting: observations and opportunities.

Science.gov (United States)

Kigin, Colleen M; Rodgers, Mary M; Wolf, Steven L

2010-11-01

The construct of delivering high-quality and cost-effective health care is in flux, and the profession must strategically plan how to meet the needs of society. In 2006, the House of Delegates of the American Physical Therapy Association passed a motion to convene a summit on "how physical therapists can meet current, evolving, and future societal health care needs." The Physical Therapy and Society Summit (PASS) meeting on February 27-28, 2009, in Leesburg, Virginia, sent a clear message that for physical therapists to be effective and thrive in the health care environment of the future, a paradigm shift is required. During the PASS meeting, participants reframed our traditional focus on the physical therapist and the patient/client (consumer) to one in which physical therapists are an integral part of a collaborative, multidisciplinary health care team with the health care consumer as its focus. The PASS Steering Committee recognized that some of the opportunities that surfaced during the PASS meeting may be disruptive or may not be within the profession's present strategic or tactical plans. Thus, adopting a framework that helps to establish the need for change that is provocative and potentially disruptive to our present care delivery, yet prioritizes opportunities, is a critical and essential step. Each of us in the physical therapy profession must take on post-PASS roles and responsibilities to accomplish the systemic change that is so intimately intertwined with our destiny. This article offers a perspective of the dynamic dialogue and suggestions that emerged from the PASS event, providing further opportunities for discussion and action within our profession.
Parallel algorithms for geometric connected component labeling on a hypercube multiprocessor

Science.gov (United States)

Belkhale, K. P.; Banerjee, P.

1992-01-01

Different algorithms for the geometric connected component labeling (GCCL) problem are defined each of which involves d stages of message passing, for a d-dimensional hypercube. The major idea is that in each stage a hypercube multiprocessor increases its knowledge of domain. The algorithms under consideration include the QUAD algorithm for small number of processors and the Overlap Quad algorithm for large number of processors, subject to the locality of the connected sets. These algorithms differ in their run time, memory requirements, and message complexity. They were implemented on an Intel iPSC2/D4/MX hypercube.
Immediate increase in food intake following exercise messages.

Science.gov (United States)

Albarracin, Dolores; Wang, Wei; Leeper, Joshua

2009-07-01

Communications to stimulate weight loss include exercise-promotion messages that often produce unsatisfactory results due to compensatory behavioral and metabolic mechanisms triggered by physical activity. This research investigated potential automatic facilitation of eating immediately after exercise messages in the absence of actual exercise. Two controlled experiments demonstrated greater than control food intake following exposure to print messages typical of exercise campaigns as well as subliminal presentation of action words associated with exercise (e.g., "active"). These inadvertent effects may explain the limited efficacy of exercise-promotion programs for weight loss, particularly when systematic dietary guidelines are absent.
Practical parallel computing

CERN Document Server

Morse, H Stephen

1994-01-01

Practical Parallel Computing provides information pertinent to the fundamental aspects of high-performance parallel processing. This book discusses the development of parallel applications on a variety of equipment.Organized into three parts encompassing 12 chapters, this book begins with an overview of the technology trends that converge to favor massively parallel hardware over traditional mainframes and vector machines. This text then gives a tutorial introduction to parallel hardware architectures. Other chapters provide worked-out examples of programs using several parallel languages. Thi
Asynchronous Message Service Reference Implementation

Science.gov (United States)

Burleigh, Scott C.

2011-01-01

This software provides a library of middleware functions with a simple application programming interface, enabling implementation of distributed applications in conformance with the CCSDS AMS (Consultative Committee for Space Data Systems Asynchronous Message Service) specification. The AMS service, and its protocols, implement an architectural concept under which the modules of mission systems may be designed as if they were to operate in isolation, each one producing and consuming mission information without explicit awareness of which other modules are currently operating. Communication relationships among such modules are self-configuring; this tends to minimize complexity in the development and operations of modular data systems. A system built on this model is a society of generally autonomous, inter-operating modules that may fluctuate freely over time in response to changing mission objectives, modules functional upgrades, and recovery from individual module failure. The purpose of AMS, then, is to reduce mission cost and risk by providing standard, reusable infrastructure for the exchange of information among data system modules in a manner that is simple to use, highly automated, flexible, robust, scalable, and efficient. The implementation is designed to spawn multiple threads of AMS functionality under the control of an AMS application program. These threads enable all members of an AMS-based, distributed application to discover one another in real time, subscribe to messages on specific topics, and to publish messages on specific topics. The query/reply (client/server) communication model is also supported. Message exchange is optionally subject to encryption (to support confidentiality) and authorization. Fault tolerance measures in the discovery protocol minimize the likelihood of overall application failure due to any single operational error anywhere in the system. The multi-threaded design simplifies processing while enabling application nodes to
Overview of the Force Scientific Parallel Language

Directory of Open Access Journals (Sweden)

Gita Alaghband

1994-01-01

Full Text Available The Force parallel programming language designed for large-scale shared-memory multiprocessors is presented. The language provides a number of parallel constructs as extensions to the ordinary Fortran language and is implemented as a two-level macro preprocessor to support portability across shared memory multiprocessors. The global parallelism model on which the Force is based provides a powerful parallel language. The parallel constructs, generic synchronization, and freedom from process management supported by the Force has resulted in structured parallel programs that are ported to the many multiprocessors on which the Force is implemented. Two new parallel constructs for looping and functional decomposition are discussed. Several programming examples to illustrate some parallel programming approaches using the Force are also presented.
Xyce Parallel Electronic Simulator : users' guide, version 2.0.

Energy Technology Data Exchange (ETDEWEB)

Hoekstra, Robert John; Waters, Lon J.; Rankin, Eric Lamont; Fixel, Deborah A.; Russo, Thomas V.; Keiter, Eric Richard; Hutchinson, Scott Alan; Pawlowski, Roger Patrick; Wix, Steven D.

2004-06-01

This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator capable of simulating electrical circuits at a variety of abstraction levels. Primarily, Xyce has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability the current state-of-the-art in the following areas: {sm_bullet} Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). Note that this includes support for most popular parallel and serial computers. {sm_bullet} Improved performance for all numerical kernels (e.g., time integrator, nonlinear and linear solvers) through state-of-the-art algorithms and novel techniques. {sm_bullet} Device models which are specifically tailored to meet Sandia's needs, including many radiation-aware devices. {sm_bullet} A client-server or multi-tiered operating model wherein the numerical kernel can operate independently of the graphical user interface (GUI). {sm_bullet} Object-oriented code design and implementation using modern coding practices that ensure that the Xyce Parallel Electronic Simulator will be maintainable and extensible far into the future. Xyce is a parallel code in the most general sense of the phrase - a message passing of computing platforms. These include serial, shared-memory and distributed-memory parallel implementation - which allows it to run efficiently on the widest possible number parallel as well as heterogeneous platforms. Careful attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows. One feature required by designers is the ability to add device models, many specific to the needs of Sandia, to the code. To this end, the device package in the Xyce
Using dual-phase message signs to display airline information : technical summary.

Science.gov (United States)

2010-01-01

Changeable Message Signs are electronic traffic signs that can be programmed to display : important messages that vary according to current roadway conditions, including the : presence of traffic congestion and accidents. CMSs can also be used to dis...
Parallel multiscale simulations of a brain aneurysm

Energy Technology Data Exchange (ETDEWEB)

Grinberg, Leopold [Division of Applied Mathematics, Brown University, Providence, RI 02912 (United States); Fedosov, Dmitry A. [Institute of Complex Systems and Institute for Advanced Simulation, Forschungszentrum Jülich, Jülich 52425 (Germany); Karniadakis, George Em, E-mail: george_karniadakis@brown.edu [Division of Applied Mathematics, Brown University, Providence, RI 02912 (United States)

2013-07-01

Cardiovascular pathologies, such as a brain aneurysm, are affected by the global blood circulation as well as by the local microrheology. Hence, developing computational models for such cases requires the coupling of disparate spatial and temporal scales often governed by diverse mathematical descriptions, e.g., by partial differential equations (continuum) and ordinary differential equations for discrete particles (atomistic). However, interfacing atomistic-based with continuum-based domain discretizations is a challenging problem that requires both mathematical and computational advances. We present here a hybrid methodology that enabled us to perform the first multiscale simulations of platelet depositions on the wall of a brain aneurysm. The large scale flow features in the intracranial network are accurately resolved by using the high-order spectral element Navier–Stokes solver NεκTαr. The blood rheology inside the aneurysm is modeled using a coarse-grained stochastic molecular dynamics approach (the dissipative particle dynamics method) implemented in the parallel code LAMMPS. The continuum and atomistic domains overlap with interface conditions provided by effective forces computed adaptively to ensure continuity of states across the interface boundary. A two-way interaction is allowed with the time-evolving boundary of the (deposited) platelet clusters tracked by an immersed boundary method. The corresponding heterogeneous solvers (NεκTαr and LAMMPS) are linked together by a computational multilevel message passing interface that facilitates modularity and high parallel efficiency. Results of multiscale simulations of clot formation inside the aneurysm in a patient-specific arterial tree are presented. We also discuss the computational challenges involved and present scalability results of our coupled solver on up to 300 K computer processors. Validation of such coupled atomistic-continuum models is a main open issue that has to be addressed in
Parallel multiscale simulations of a brain aneurysm

International Nuclear Information System (INIS)

Grinberg, Leopold; Fedosov, Dmitry A.; Karniadakis, George Em

2013-01-01

Cardiovascular pathologies, such as a brain aneurysm, are affected by the global blood circulation as well as by the local microrheology. Hence, developing computational models for such cases requires the coupling of disparate spatial and temporal scales often governed by diverse mathematical descriptions, e.g., by partial differential equations (continuum) and ordinary differential equations for discrete particles (atomistic). However, interfacing atomistic-based with continuum-based domain discretizations is a challenging problem that requires both mathematical and computational advances. We present here a hybrid methodology that enabled us to perform the first multiscale simulations of platelet depositions on the wall of a brain aneurysm. The large scale flow features in the intracranial network are accurately resolved by using the high-order spectral element Navier–Stokes solver NεκTαr. The blood rheology inside the aneurysm is modeled using a coarse-grained stochastic molecular dynamics approach (the dissipative particle dynamics method) implemented in the parallel code LAMMPS. The continuum and atomistic domains overlap with interface conditions provided by effective forces computed adaptively to ensure continuity of states across the interface boundary. A two-way interaction is allowed with the time-evolving boundary of the (deposited) platelet clusters tracked by an immersed boundary method. The corresponding heterogeneous solvers (NεκTαr and LAMMPS) are linked together by a computational multilevel message passing interface that facilitates modularity and high parallel efficiency. Results of multiscale simulations of clot formation inside the aneurysm in a patient-specific arterial tree are presented. We also discuss the computational challenges involved and present scalability results of our coupled solver on up to 300 K computer processors. Validation of such coupled atomistic-continuum models is a main open issue that has to be addressed in
SELF-CONSISTENT LANGEVIN SIMULATION OF COULOMB COLLISIONS IN CHARGED-PARTICLE BEAMS

International Nuclear Information System (INIS)

QIANG, J.; RYNE, R.; HABIB, S.

2000-01-01

In many plasma physics and charged-particle beam dynamics problems, Coulomb collisions are modeled by a Fokker-Planck equation. In order to incorporate these collisions, we present a three-dimensional parallel Langevin simulation method using a Particle-In-Cell (PIC) approach implemented on high-performance parallel computers. We perform, for the first time, a fully self-consistent simulation, in which the FR-iction and diffusion coefficients are computed FR-om first principles. We employ a two-dimensional domain decomposition approach within a message passing programming paradigm along with dynamic load balancing. Object oriented programming is used to encapsulate details of the communication syntax as well as to enhance reusability and extensibility. Performance tests on the SGI Origin 2000 and the Cray T3E-900 have demonstrated good scalability. Work is in progress to apply our technique to intrabeam scattering in accelerators
Technical Evaluation Report 6: Chat and Instant Messaging Systems

Directory of Open Access Journals (Sweden)

Jennifer Stein

2002-04-01

Full Text Available Text-based conferencing can be both asynchronous (i.e., participants log into the conference at separate times, and synchronous (i.e., interaction takes place in real time. It is thus subject to the same wide variation as the online audio- and video-conferencing methods (see the earlier Reports in this series. Synchronous text-based approaches (e.g., online chat groups and instant messaging systems are highly popular among online users generally owing to their ability to bring together special-interest groups from around the world without cost. In distance education (DE, however, synchronous chat methods are less widely used, owing in part to the problems of arranging for working adults in different time zones to join a discussion group simultaneously. Instant text messaging is more popular among DE users in view of the choice it provides between responding to a message immediately (synchronous communication or after a delay (asynchronous. The different synchronous and asynchronous approaches are likely to become more widely used in parallel with one another, as they are integrated in individual product packages.
Nature and Impact of Alcohol Messages in a Youth-Oriented Television Series.

Science.gov (United States)

Russell, Cristel Antonia; Russell, Dale W; Grube, Joel W

2009-01-01

This research contributes to the extant literature on television influence by pairing a stimulus-side approach documenting how information is presented within a TV series with a response-side assessment of whether connectedness and exposure to a series influence the processing of that information differently depending on its format. The inquiry focuses on the nature and impact of messages about alcohol contained within a youth oriented TV program. The findings indicate that the recall and perception of the more overt negative messages increase with exposure and that receptiveness to the subtle and less remembered positive messages increases with levels of program connectedness. Highly connected viewers are both more receptive to and in greater agreement with the underlying positive alcohol message communicated in the series.
A text message intervention for alcohol risk reduction among community college students: TMAP.

Science.gov (United States)

Bock, Beth C; Barnett, Nancy P; Thind, Herpreet; Rosen, Rochelle; Walaska, Kristen; Traficante, Regina; Foster, Robert; Deutsch, Chris; Fava, Joseph L; Scott-Sheldon, Lori A J

2016-12-01

Students at community colleges comprise nearly half of all U.S. college students and show higher risk of heavy drinking and related consequences compared to students at 4-year colleges, but no alcohol safety programs currently target this population. To examine the feasibility, acceptability, and preliminary efficacy of an alcohol risk-reduction program delivered through text messaging designed for community college (CC) students. Heavy drinking adult CC students (N=60) were enrolled and randomly assigned to the six-week active intervention (Text Message Alcohol Program: TMAP) or a control condition of general motivational (not alcohol related) text messages. TMAP text messages consisted of alcohol facts, strategies to limit alcohol use and related risks, and motivational messages. Assessments were conducted at baseline, week 6 (end of treatment) and week 12 (follow up). Most participants (87%) completed all follow up assessments. Intervention messages received an average rating of 6.8 (SD=1.5) on a 10-point scale. At week six, TMAP participants were less likely than controls to report heavy drinking and negative alcohol consequences. The TMAP group also showed significant increases in self-efficacy to resist drinking in high risk situations between baseline and week six, with no such increase among controls. Results were maintained through the week 12 follow up. The TMAP alcohol risk reduction program was feasible and highly acceptable indicated by high retention rates through the final follow up assessment and good ratings for the text message content. Reductions in multiple outcomes provide positive indications of intervention efficacy. Copyright © 2016. Published by Elsevier Ltd.
PASS Student Leader and Mentor Roles: A Tertiary Leadership Pathway

Science.gov (United States)

Skalicky, Jane; Caney, Annaliese

2010-01-01

In relation to developing leadership skills during tertiary studies, this paper considers the leadership pathway afforded by a Peer Assisted Study Sessions (PASS) program which includes the traditional PASS Leader role and a more senior PASS Mentor role. Data was collected using a structured survey with open-ended questions designed to capture the…

Some links on this page may take you to non-federal websites. Their policies may differ from this site.