WorldWideScience

Sample records for message passing parallel

  1. Message passing with parallel queue traversal

    Underwood, Keith D [Albuquerque, NM; Brightwell, Ronald B [Albuquerque, NM; Hemmert, K Scott [Albuquerque, NM

    2012-05-01

    In message passing implementations, associative matching structures are used to permit list entries to be searched in parallel fashion, thereby avoiding the delay of linear list traversal. List management capabilities are provided to support list entry turnover semantics and priority ordering semantics.

  2. Parallelization of a hydrological model using the message passing interface

    Wu, Yiping; Li, Tiejian; Sun, Liqun; Chen, Ji

    2013-01-01

    With the increasing knowledge about the natural processes, hydrological models such as the Soil and Water Assessment Tool (SWAT) are becoming larger and more complex with increasing computation time. Additionally, other procedures such as model calibration, which may require thousands of model iterations, can increase running time and thus further reduce rapid modeling and analysis. Using the widely-applied SWAT as an example, this study demonstrates how to parallelize a serial hydrological model in a Windows® environment using a parallel programing technology—Message Passing Interface (MPI). With a case study, we derived the optimal values for the two parameters (the number of processes and the corresponding percentage of work to be distributed to the master process) of the parallel SWAT (P-SWAT) on an ordinary personal computer and a work station. Our study indicates that model execution time can be reduced by 42%–70% (or a speedup of 1.74–3.36) using multiple processes (two to five) with a proper task-distribution scheme (between the master and slave processes). Although the computation time cost becomes lower with an increasing number of processes (from two to five), this enhancement becomes less due to the accompanied increase in demand for message passing procedures between the master and all slave processes. Our case study demonstrates that the P-SWAT with a five-process run may reach the maximum speedup, and the performance can be quite stable (fairly independent of a project size). Overall, the P-SWAT can help reduce the computation time substantially for an individual model run, manual and automatic calibration procedures, and optimization of best management practices. In particular, the parallelization method we used and the scheme for deriving the optimal parameters in this study can be valuable and easily applied to other hydrological or environmental models.

  3. Protocol-Based Verification of Message-Passing Parallel Programs

    López-Acosta, Hugo-Andrés; Eduardo R. B. Marques, Eduardo R. B.; Martins, Francisco

    2015-01-01

    We present ParTypes, a type-based methodology for the verification of Message Passing Interface (MPI) programs written in the C programming language. The aim is to statically verify programs against protocol specifications, enforcing properties such as fidelity and absence of deadlocks. We develo...

  4. Algorithms for parallel flow solvers on message passing architectures

    Vanderwijngaart, Rob F.

    1995-01-01

    The purpose of this project has been to identify and test suitable technologies for implementation of fluid flow solvers -- possibly coupled with structures and heat equation solvers -- on MIMD parallel computers. In the course of this investigation much attention has been paid to efficient domain decomposition strategies for ADI-type algorithms. Multi-partitioning derives its efficiency from the assignment of several blocks of grid points to each processor in the parallel computer. A coarse-grain parallelism is obtained, and a near-perfect load balance results. In uni-partitioning every processor receives responsibility for exactly one block of grid points instead of several. This necessitates fine-grain pipelined program execution in order to obtain a reasonable load balance. Although fine-grain parallelism is less desirable on many systems, especially high-latency networks of workstations, uni-partition methods are still in wide use in production codes for flow problems. Consequently, it remains important to achieve good efficiency with this technique that has essentially been superseded by multi-partitioning for parallel ADI-type algorithms. Another reason for the concentration on improving the performance of pipeline methods is their applicability in other types of flow solver kernels with stronger implied data dependence. Analytical expressions can be derived for the size of the dynamic load imbalance incurred in traditional pipelines. From these it can be determined what is the optimal first-processor retardation that leads to the shortest total completion time for the pipeline process. Theoretical predictions of pipeline performance with and without optimization match experimental observations on the iPSC/860 very well. Analysis of pipeline performance also highlights the effect of uncareful grid partitioning in flow solvers that employ pipeline algorithms. If grid blocks at boundaries are not at least as large in the wall-normal direction as those

  5. Stampi: a message passing library for distributed parallel computing. User's guide

    Imamura, Toshiyuki; Koide, Hiroshi; Takemiya, Hiroshi

    1998-11-01

    A new message passing library, Stampi, has been developed to realize a computation with different kind of parallel computers arbitrarily and making MPI (Message Passing Interface) as an unique interface for communication. Stampi is based on MPI2 specification. It realizes dynamic process creation to different machines and communication between spawned one within the scope of MPI semantics. Vender implemented MPI as a closed system in one parallel machine and did not support both functions; process creation and communication to external machines. Stampi supports both functions and enables us distributed parallel computing. Currently Stampi has been implemented on COMPACS (COMplex PArallel Computer System) introduced in CCSE, five parallel computers and one graphic workstation, and any communication on them can be processed on. (author)

  6. Parallelization of MCNP Monte Carlo neutron and photon transport code in parallel virtual machine and message passing interface

    Deng Li; Xie Zhongsheng

    1999-01-01

    The coupled neutron and photon transport Monte Carlo code MCNP (version 3B) has been parallelized in parallel virtual machine (PVM) and message passing interface (MPI) by modifying a previous serial code. The new code has been verified by solving sample problems. The speedup increases linearly with the number of processors and the average efficiency is up to 99% for 12-processor. (author)

  7. Stampi: a message passing library for distributed parallel computing. User's guide, second edition

    Imamura, Toshiyuki; Koide, Hiroshi; Takemiya, Hiroshi

    2000-02-01

    A new message passing library, Stampi, has been developed to realize a computation with different kind of parallel computers arbitrarily and making MPI (Message Passing Interface) as an unique interface for communication. Stampi is based on the MPI2 specification, and it realizes dynamic process creation to different machines and communication between spawned one within the scope of MPI semantics. Main features of Stampi are summarized as follows: (i) an automatic switch function between external- and internal communications, (ii) a message routing/relaying with a routing module, (iii) a dynamic process creation, (iv) a support of two types of connection, Master/Slave and Client/Server, (v) a support of a communication with Java applets. Indeed vendors implemented MPI libraries as a closed system in one parallel machine or their systems, and did not support both functions; process creation and communication to external machines. Stampi supports both functions and enables us distributed parallel computing. Currently Stampi has been implemented on COMPACS (COMplex PArallel Computer System) introduced in CCSE, five parallel computers and one graphic workstation, moreover on eight kinds of parallel machines, totally fourteen systems. Stampi provides us MPI communication functionality on them. This report describes mainly the usage of Stampi. (author)

  8. The specification of Stampi, a message passing library for distributed parallel computing

    Imamura, Toshiyuki; Takemiya, Hiroshi; Koide, Hiroshi

    2000-03-01

    At CCSE, Center for Promotion of Computational Science and Engineering, a new message passing library for heterogeneous and distributed parallel computing has been developed, and it is called as Stampi. Stampi enables us to communicate between any combination of parallel computers as well as workstations. Currently, a Stampi system is constructed from Stampi library and Stampi/Java. It provides functions to connect a Stampi application with not only those on COMPACS, COMplex Parallel Computer System, but also applets which work on WWW browsers. This report summarizes the specifications of Stampi and details the development of its system. (author)

  9. The design of multi-core DSP parallel model based on message passing and multi-level pipeline

    Niu, Jingyu; Hu, Jian; He, Wenjing; Meng, Fanrong; Li, Chuanrong

    2017-10-01

    Currently, the design of embedded signal processing system is often based on a specific application, but this idea is not conducive to the rapid development of signal processing technology. In this paper, a parallel processing model architecture based on multi-core DSP platform is designed, and it is mainly suitable for the complex algorithms which are composed of different modules. This model combines the ideas of multi-level pipeline parallelism and message passing, and summarizes the advantages of the mainstream model of multi-core DSP (the Master-Slave model and the Data Flow model), so that it has better performance. This paper uses three-dimensional image generation algorithm to validate the efficiency of the proposed model by comparing with the effectiveness of the Master-Slave and the Data Flow model.

  10. On the adequacy of message-passing parallel supercomputers for solving neutron transport problems

    Azmy, Y.Y.

    1990-01-01

    A coarse-grained, static-scheduling parallelization of the standard iterative scheme used for solving the discrete-ordinates approximation of the neutron transport equation is described. The parallel algorithm is based on a decomposition of the angular domain along the discrete ordinates, thus naturally producing a set of completely uncoupled systems of equations in each iteration. Implementation of the parallel code on Intcl's iPSC/2 hypercube, and solutions to test problems are presented as evidence of the high speedup and efficiency of the parallel code. The performance of the parallel code on the iPSC/2 is analyzed, and a model for the CPU time as a function of the problem size (order of angular quadrature) and the number of participating processors is developed and validated against measured CPU times. The performance model is used to speculate on the potential of massively parallel computers for significantly speeding up real-life transport calculations at acceptable efficiencies. We conclude that parallel computers with a few hundred processors are capable of producing large speedups at very high efficiencies in very large three-dimensional problems. 10 refs., 8 figs

  11. Parallel multigrid methods: implementation on message-passing computers and applications to fluid dynamics. A draft

    Solchenbach, K.; Thole, C.A.; Trottenberg, U.

    1987-01-01

    For a wide class of problems in scientific computing, in particular for partial differential equations, the multigrid principle has proved to yield highly efficient numerical methods. However, the principle has to be applied carefully: if the multigrid components are not chosen adequately with respect to the given problem, the efficiency may be much smaller than possible. This has been demonstrated for many practical problems. Unfortunately, the general theories on multigrid convergence do not give much help in constructing really efficient multigrid algorithms. Although some progress has been made in bridging the gap between theory and practice during the last few years, there are still several theoretical approaches which are misleading rather than helpful with respect to the objective of real efficiency. The research in finding highly efficient algorithms for non-model applications therefore is still a sophisticated mixture of theoretical considerations, a transfer of experiences from model to real life problems and systematical experimental work. The emphasis of the practical research activity today lies - among others - in the following fields: - finding efficient multigrid components for really complex problems, - combining the multigrid approach with advanced discretizative techniques: - constructing highly parallel multigrid algorithms. In this paper, we want to deal mainly with the last topic

  12. Message Passing Framework for Globally Interconnected Clusters

    Hafeez, M; Riaz, N; Asghar, S; Malik, U A; Rehman, A

    2011-01-01

    In prevailing technology trends it is apparent that the network requirements and technologies will advance in future. Therefore the need of High Performance Computing (HPC) based implementation for interconnecting clusters is comprehensible for scalability of clusters. Grid computing provides global infrastructure of interconnecting clusters consisting of dispersed computing resources over Internet. On the other hand the leading model for HPC programming is Message Passing Interface (MPI). As compared to Grid computing, MPI is better suited for solving most of the complex computational problems. MPI itself is restricted to a single cluster. It does not support message passing over the internet to use the computing resources of different clusters in an optimal way. We propose a model that provides message passing capabilities between parallel applications over the internet. The proposed model is based on Architecture for Java Universal Message Passing (A-JUMP) framework and Enterprise Service Bus (ESB) named as High Performance Computing Bus. The HPC Bus is built using ActiveMQ. HPC Bus is responsible for communication and message passing in an asynchronous manner. Asynchronous mode of communication offers an assurance for message delivery as well as a fault tolerance mechanism for message passing. The idea presented in this paper effectively utilizes wide-area intercluster networks. It also provides scheduling, dynamic resource discovery and allocation, and sub-clustering of resources for different jobs. Performance analysis and comparison study of the proposed framework with P2P-MPI are also presented in this paper.

  13. Message-passing-interface-based parallel FDTD investigation on the EM scattering from a 1-D rough sea surface using uniaxial perfectly matched layer absorbing boundary.

    Li, J; Guo, L-X; Zeng, H; Han, X-B

    2009-06-01

    A message-passing-interface (MPI)-based parallel finite-difference time-domain (FDTD) algorithm for the electromagnetic scattering from a 1-D randomly rough sea surface is presented. The uniaxial perfectly matched layer (UPML) medium is adopted for truncation of FDTD lattices, in which the finite-difference equations can be used for the total computation domain by properly choosing the uniaxial parameters. This makes the parallel FDTD algorithm easier to implement. The parallel performance with different processors is illustrated for one sea surface realization, and the computation time of the parallel FDTD algorithm is dramatically reduced compared to a single-process implementation. Finally, some numerical results are shown, including the backscattering characteristics of sea surface for different polarization and the bistatic scattering from a sea surface with large incident angle and large wind speed.

  14. Message passing for quantified Boolean formulas

    Zhang, Pan; Ramezanpour, Abolfazl; Zecchina, Riccardo; Zdeborová, Lenka

    2012-01-01

    We introduce two types of message passing algorithms for quantified Boolean formulas (QBF). The first type is a message passing based heuristics that can prove unsatisfiability of the QBF by assigning the universal variables in such a way that the remaining formula is unsatisfiable. In the second type, we use message passing to guide branching heuristics of a Davis–Putnam–Logemann–Loveland (DPLL) complete solver. Numerical experiments show that on random QBFs our branching heuristics give robust exponential efficiency gain with respect to state-of-the-art solvers. We also manage to solve some previously unsolved benchmarks from the QBFLIB library. Apart from this, our study sheds light on using message passing in small systems and as subroutines in complete solvers

  15. Blind sensor calibration using approximate message passing

    Schülke, Christophe; Caltagirone, Francesco; Zdeborová, Lenka

    2015-01-01

    The ubiquity of approximately sparse data has led a variety of communities to take great interest in compressed sensing algorithms. Although these are very successful and well understood for linear measurements with additive noise, applying them to real data can be problematic if imperfect sensing devices introduce deviations from this ideal signal acquisition process, caused by sensor decalibration or failure. We propose a message passing algorithm called calibration approximate message passing (Cal-AMP) that can treat a variety of such sensor-induced imperfections. In addition to deriving the general form of the algorithm, we numerically investigate two particular settings. In the first, a fraction of the sensors is faulty, giving readings unrelated to the signal. In the second, sensors are decalibrated and each one introduces a different multiplicative gain to the measurements. Cal-AMP shares the scalability of approximate message passing, allowing us to treat large sized instances of these problems, and experimentally exhibits a phase transition between domains of success and failure. (paper)

  16. Distributed parallel messaging for multiprocessor systems

    Chen, Dong; Heidelberger, Philip; Salapura, Valentina; Senger, Robert M; Steinmacher-Burrow, Burhard; Sugawara, Yutaka

    2013-06-04

    A method and apparatus for distributed parallel messaging in a parallel computing system. The apparatus includes, at each node of a multiprocessor network, multiple injection messaging engine units and reception messaging engine units, each implementing a DMA engine and each supporting both multiple packet injection into and multiple reception from a network, in parallel. The reception side of the messaging unit (MU) includes a switch interface enabling writing of data of a packet received from the network to the memory system. The transmission side of the messaging unit, includes switch interface for reading from the memory system when injecting packets into the network.

  17. Broadcasting a message in a parallel computer

    Berg, Jeremy E [Rochester, MN; Faraj, Ahmad A [Rochester, MN

    2011-08-02

    Methods, systems, and products are disclosed for broadcasting a message in a parallel computer. The parallel computer includes a plurality of compute nodes connected together using a data communications network. The data communications network optimized for point to point data communications and is characterized by at least two dimensions. The compute nodes are organized into at least one operational group of compute nodes for collective parallel operations of the parallel computer. One compute node of the operational group assigned to be a logical root. Broadcasting a message in a parallel computer includes: establishing a Hamiltonian path along all of the compute nodes in at least one plane of the data communications network and in the operational group; and broadcasting, by the logical root to the remaining compute nodes, the logical root's message along the established Hamiltonian path.

  18. The serial message-passing schedule for LDPC decoding algorithms

    Liu, Mingshan; Liu, Shanshan; Zhou, Yuan; Jiang, Xue

    2015-12-01

    The conventional message-passing schedule for LDPC decoding algorithms is the so-called flooding schedule. It has the disadvantage that the updated messages cannot be used until next iteration, thus reducing the convergence speed . In this case, the Layered Decoding algorithm (LBP) based on serial message-passing schedule is proposed. In this paper the decoding principle of LBP algorithm is briefly introduced, and then proposed its two improved algorithms, the grouped serial decoding algorithm (Grouped LBP) and the semi-serial decoding algorithm .They can improve LBP algorithm's decoding speed while maintaining a good decoding performance.

  19. Future-based Static Analysis of Message Passing Programs

    Wytse Oortwijn

    2016-06-01

    Full Text Available Message passing is widely used in industry to develop programs consisting of several distributed communicating components. Developing functionally correct message passing software is very challenging due to the concurrent nature of message exchanges. Nonetheless, many safety-critical applications rely on the message passing paradigm, including air traffic control systems and emergency services, which makes proving their correctness crucial. We focus on the modular verification of MPI programs by statically verifying concrete Java code. We use separation logic to reason about local correctness and define abstractions of the communication protocol in the process algebra used by mCRL2. We call these abstractions futures as they predict how components will interact during program execution. We establish a provable link between futures and program code and analyse the abstract futures via model checking to prove global correctness. Finally, we verify a leader election protocol to demonstrate our approach.

  20. A real-time MPEG software decoder using a portable message-passing library

    Kwong, Man Kam; Tang, P.T. Peter; Lin, Biquan

    1995-12-31

    We present a real-time MPEG software decoder that uses message-passing libraries such as MPL, p4 and MPI. The parallel MPEG decoder currently runs on the IBM SP system but can be easil ported to other parallel machines. This paper discusses our parallel MPEG decoding algorithm as well as the parallel programming environment under which it uses. Several technical issues are discussed, including balancing of decoding speed, memory limitation, 1/0 capacities, and optimization of MPEG decoding components. This project shows that a real-time portable software MPEG decoder is feasible in a general-purpose parallel machine.

  1. EXIT Chart Analysis of Binary Message-Passing Decoders

    Lechner, Gottfried; Pedersen, Troels; Kramer, Gerhard

    2007-01-01

    Binary message-passing decoders for LDPC codes are analyzed using EXIT charts. For the analysis, the variable node decoder performs all computations in the L-value domain. For the special case of a hard decision channel, this leads to the well know Gallager B algorithm, while the analysis can...... be extended to channels with larger output alphabets. By increasing the output alphabet from hard decisions to four symbols, a gain of more than 1.0 dB is achieved using optimized codes. For this code optimization, the mixing property of EXIT functions has to be modified to the case of binary message......-passing decoders....

  2. Message-Passing Receivers for Single Carrier Systems with Frequency-Domain Equalization

    Zhang, Chuanzong; Manchón, Carles Navarro; Wang, Zhongyong

    2015-01-01

    In this letter, we design iterative receiver algorithms for joint frequency-domain equalization and decoding in a single carrier system assuming perfect channel state information. Based on an approximate inference framework that combines belief propagation (BP) and the mean field (MF) approximation......, we propose two receiver algorithms with, respectively, parallel and sequential message-passing schedules in the MF part. A recently proposed receiver based on generalized approximate message passing (GAMP) is used as a benchmarking reference. The simulation results show that the BP-MF receiver...

  3. Track-stitching using graphical models and message passing

    Van der Merwe, LJ

    2013-07-01

    Full Text Available In order to stitch tracks together, two tasks are required, namely tracking and track stitching. In this study track stitching is performed using a graphical model and message passing (belief propagation) approach. Tracks are modelled as nodes in a...

  4. A message passing algorithm for the evaluation of social influence

    Vassio, Luca; Fagnani, Fabio; Frasca, Paolo; Ozdaglar, Asuman

    2014-01-01

    In this paper, we define a new measure of node centrality in social networks, the Harmonic Influence Centrality, which emerges naturally in the study of social influence over networks. Next, we introduce a distributed message passing algorithm to compute the Harmonic Influence Centrality of each

  5. Statistics of Epidemics in Networks by Passing Messages

    Shrestha, Munik Kumar

    Epidemic processes are common out-of-equilibrium phenomena of broad interdisciplinary interest. In this thesis, we show how message-passing approach can be a helpful tool for simulating epidemic models in disordered medium like networks, and in particular for estimating the probability that a given node will become infectious at a particular time. The sort of dynamics we consider are stochastic, where randomness can arise from the stochastic events or from the randomness of network structures. As in belief propagation, variables or messages in message-passing approach are defined on the directed edges of a network. However, unlike belief propagation, where the posterior distributions are updated according to Bayes' rule, in message-passing approach we write differential equations for the messages over time. It takes correlations between neighboring nodes into account while preventing causal signals from backtracking to their immediate source, and thus avoids "echo chamber effects" where a pair of adjacent nodes each amplify the probability that the other is infectious. In our first results, we develop a message-passing approach to threshold models of behavior popular in sociology. These are models, first proposed by Granovetter, where individuals have to hear about a trend or behavior from some number of neighbors before adopting it themselves. In thermodynamic limit of large random networks, we provide an exact analytic scheme while calculating the time dependence of the probabilities and thus learning about the whole dynamics of bootstrap percolation, which is a simple model known in statistical physics for exhibiting discontinuous phase transition. As an application, we apply a similar model to financial networks, studying when bankruptcies spread due to the sudden devaluation of shared assets in overlapping portfolios. We predict that although diversification may be good for individual institutions, it can create dangerous systemic effects, and as a result

  6. Belief propagation decoding of quantum channels by passing quantum messages

    Renes, Joseph M

    2017-01-01

    The belief propagation (BP) algorithm is a powerful tool in a wide range of disciplines from statistical physics to machine learning to computational biology, and is ubiquitous in decoding classical error-correcting codes. The algorithm works by passing messages between nodes of the factor graph associated with the code and enables efficient decoding of the channel, in some cases even up to the Shannon capacity. Here we construct the first BP algorithm which passes quantum messages on the factor graph and is capable of decoding the classical–quantum channel with pure state outputs. This gives explicit decoding circuits whose number of gates is quadratic in the code length. We also show that this decoder can be modified to work with polar codes for the pure state channel and as part of a decoder for transmitting quantum information over the amplitude damping channel. These represent the first explicit capacity-achieving decoders for non-Pauli channels. (fast track communication)

  7. Belief propagation decoding of quantum channels by passing quantum messages

    Renes, Joseph M.

    2017-07-01

    The belief propagation (BP) algorithm is a powerful tool in a wide range of disciplines from statistical physics to machine learning to computational biology, and is ubiquitous in decoding classical error-correcting codes. The algorithm works by passing messages between nodes of the factor graph associated with the code and enables efficient decoding of the channel, in some cases even up to the Shannon capacity. Here we construct the first BP algorithm which passes quantum messages on the factor graph and is capable of decoding the classical-quantum channel with pure state outputs. This gives explicit decoding circuits whose number of gates is quadratic in the code length. We also show that this decoder can be modified to work with polar codes for the pure state channel and as part of a decoder for transmitting quantum information over the amplitude damping channel. These represent the first explicit capacity-achieving decoders for non-Pauli channels.

  8. Analysis and Design of Binary Message-Passing Decoders

    Lechner, Gottfried; Pedersen, Troels; Kramer, Gerhard

    2012-01-01

    Binary message-passing decoders for low-density parity-check (LDPC) codes are studied by using extrinsic information transfer (EXIT) charts. The channel delivers hard or soft decisions and the variable node decoder performs all computations in the L-value domain. A hard decision channel results...... message-passing decoders. Finally, it is shown that errors on cycles consisting only of degree two and three variable nodes cannot be corrected and a necessary and sufficient condition for the existence of a cycle-free subgraph is derived....... in the well-know Gallager B algorithm, and increasing the output alphabet from hard decisions to two bits yields a gain of more than 1.0 dB in the required signal to noise ratio when using optimized codes. The code optimization requires adapting the mixing property of EXIT functions to the case of binary...

  9. S-AMP: Approximate Message Passing for General Matrix Ensembles

    Cakmak, Burak; Winther, Ole; Fleury, Bernard H.

    2014-01-01

    the approximate message-passing (AMP) algorithm to general matrix ensembles with a well-defined large system size limit. The generalization is based on the S-transform (in free probability) of the spectrum of the measurement matrix. Furthermore, we show that the optimality of S-AMP follows directly from its......We propose a novel iterative estimation algorithm for linear observation models called S-AMP. The fixed points of S-AMP are the stationary points of the exact Gibbs free energy under a set of (first- and second-) moment consistency constraints in the large system limit. S-AMP extends...

  10. Fault-tolerant Agreement in Synchronous Message-passing Systems

    Raynal, Michel

    2010-01-01

    The present book focuses on the way to cope with the uncertainty created by process failures (crash, omission failures and Byzantine behavior) in synchronous message-passing systems (i.e., systems whose progress is governed by the passage of time). To that end, the book considers fundamental problems that distributed synchronous processes have to solve. These fundamental problems concern agreement among processes (if processes are unable to agree in one way or another in presence of failures, no non-trivial problem can be solved). They are consensus, interactive consistency, k-set agreement an

  11. Weighted community detection and data clustering using message passing

    Shi, Cheng; Liu, Yanchen; Zhang, Pan

    2018-03-01

    Grouping objects into clusters based on the similarities or weights between them is one of the most important problems in science and engineering. In this work, by extending message-passing algorithms and spectral algorithms proposed for an unweighted community detection problem, we develop a non-parametric method based on statistical physics, by mapping the problem to the Potts model at the critical temperature of spin-glass transition and applying belief propagation to solve the marginals corresponding to the Boltzmann distribution. Our algorithm is robust to over-fitting and gives a principled way to determine whether there are significant clusters in the data and how many clusters there are. We apply our method to different clustering tasks. In the community detection problem in weighted and directed networks, we show that our algorithm significantly outperforms existing algorithms. In the clustering problem, where the data were generated by mixture models in the sparse regime, we show that our method works all the way down to the theoretical limit of detectability and gives accuracy very close to that of the optimal Bayesian inference. In the semi-supervised clustering problem, our method only needs several labels to work perfectly in classic datasets. Finally, we further develop Thouless-Anderson-Palmer equations which heavily reduce the computation complexity in dense networks but give almost the same performance as belief propagation.

  12. Parallel imaging for first-pass myocardial perfusion

    Irwan, Roy; Lubbers, Daniel D.; van der Vleuten, Pieter A.; Kappert, Peter; Gotte, Marco J. W.; Sijens, Paul E.

    Two parallel imaging methods used for first-pass myocardial perfusion imaging were compared in terms of signal-to-noise ratio (SNR), contrast-to-noise ratio (CNR) and image artifacts. One used adaptive Time-adaptive SENSitivity Encoding (TSENSE) and the other used GeneRalized Autocalibrating

  13. Communication strategies for angular domain decomposition of transport calculations on message passing multiprocessors

    Azmy, Y.Y.

    1997-01-01

    The effect of three communication schemes for solving Arbitrarily High Order Transport (AHOT) methods of the Nodal type on parallel performance is examined via direct measurements and performance models. The target architecture in this study is Oak Ridge National Laboratory's 128 node Paragon XP/S 5 computer and the parallelization is based on the Parallel Virtual Machine (PVM) library. However, the conclusions reached can be easily generalized to a large class of message passing platforms and communication software. The three schemes considered here are: (1) PVM's global operations (broadcast and reduce) which utilizes the Paragon's native corresponding operations based on a spanning tree routing; (2) the Bucket algorithm wherein the angular domain decomposition of the mesh sweep is complemented with a spatial domain decomposition of the accumulation process of the scalar flux from the angular flux and the convergence test; (3) a distributed memory version of the Bucket algorithm that pushes the spatial domain decomposition one step farther by actually distributing the fixed source and flux iterates over the memories of the participating processes. Their conclusion is that the Bucket algorithm is the most efficient of the three if all participating processes have sufficient memories to hold the entire problem arrays. Otherwise, the third scheme becomes necessary at an additional cost to speedup and parallel efficiency that is quantifiable via the parallel performance model

  14. Data communications in a parallel active messaging interface of a parallel computer

    Davis, Kristan D; Faraj, Daniel A

    2013-07-09

    Algorithm selection for data communications in a parallel active messaging interface (`PAMI`) of a parallel computer, the PAMI composed of data communications endpoints, each endpoint including specifications of a client, a context, and a task, endpoints coupled for data communications through the PAMI, including associating in the PAMI data communications algorithms and ranges of message sizes so that each algorithm is associated with a separate range of message sizes; receiving in an origin endpoint of the PAMI a data communications instruction, the instruction specifying transmission of a data communications message from the origin endpoint to a target endpoint, the data communications message characterized by a message size; selecting, from among the associated algorithms and ranges, a data communications algorithm in dependence upon the message size; and transmitting, according to the selected data communications algorithm from the origin endpoint to the target endpoint, the data communications message.

  15. Endpoint-based parallel data processing in a parallel active messaging interface of a parallel computer

    Archer, Charles J.; Blocksome, Michael A.; Ratterman, Joseph D.; Smith, Brian E.

    2014-08-12

    Endpoint-based parallel data processing in a parallel active messaging interface (`PAMI`) of a parallel computer, the PAMI composed of data communications endpoints, each endpoint including a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task, the compute nodes coupled for data communications through the PAMI, including establishing a data communications geometry, the geometry specifying, for tasks representing processes of execution of the parallel application, a set of endpoints that are used in collective operations of the PAMI including a plurality of endpoints for one of the tasks; receiving in endpoints of the geometry an instruction for a collective operation; and executing the instruction for a collective operation through the endpoints in dependence upon the geometry, including dividing data communications operations among the plurality of endpoints for one of the tasks.

  16. Why pass on viral messages? Because they connect emotionally

    Dobele, A.; Lindgreen, A.; Beverland, M.; Vanhamme, J.; Wijk, van R.

    2007-01-01

    In this article, we identify that successful viral marketing campaigns trigger an emotional response in recipients. Working under this premise, we examine the effects of viral messages containing the six primary emotions (surprise, joy, sadness, anger, fear, and disgust) on recipients' emotional

  17. High performance message passing for the ATLAS DAQ/EF-1 project

    Mornacchi, Giuseppe

    1999-01-01

    Summary form only. A message passing library has been developed in the context of the ATLAS DAQ/EF-1 project. It is used for time critical applications within the front-end part of the DAQ system, mainly to exchange data control messages between I/O processors. Key objectives of the design were low message overheads, efficient use of the data transfer buses, provision of broadcast functionality and a hardware and operating system independent implementation of the application interface. The design and implementation of the message passing library are presented. As required by the project, the implementation is based on commercial components, namely VMEbus, PCI, the Lynx-OS real-time operating system and an additional inter- processor link, PVIC. The latter offers broadcast functionality identified as being important to the overall performance of the message passing. In addition, performance benchmarks for all implementing buses are presented for both simple test programs and the full DAQ applications. (0 refs)...

  18. Theoretic derivation of directed acyclic subgraph algorithm and comparisons with message passing algorithm

    Ha, Jeongmok; Jeong, Hong

    2016-07-01

    This study investigates the directed acyclic subgraph (DAS) algorithm, which is used to solve discrete labeling problems much more rapidly than other Markov-random-field-based inference methods but at a competitive accuracy. However, the mechanism by which the DAS algorithm simultaneously achieves competitive accuracy and fast execution speed, has not been elucidated by a theoretical derivation. We analyze the DAS algorithm by comparing it with a message passing algorithm. Graphical models, inference methods, and energy-minimization frameworks are compared between DAS and message passing algorithms. Moreover, the performances of DAS and other message passing methods [sum-product belief propagation (BP), max-product BP, and tree-reweighted message passing] are experimentally compared.

  19. Message passing vs. shared address space on a cluster of SMPs

    Shan, Hongzhang; Singh, Jaswinder Pal; Oliker, Leonid; Biswas, Rupak

    2001-01-01

    The emergence of scalable computer architectures using clusters of PCs or PC-SMPs with commodity networking has made them attractive platforms for high-end scientific computing. Currently, message passing (MP) and shared address space (SAS) are the two leading programming paradigms for these systems. MP has been standardized with MPI, and is the most common and mature parallel programming approach. However, MP code development can be extremely difficult, especially for irregularly structured computations. SAS offers substantial ease of programming, but may suffer from performance limitations due to poor spatial locality and high protocol overhead. In this paper, they compare the performance of and programming effort required for six applications under both programming models on a 32-CPU PC-SMP cluster. Our application suite consists of codes that typically do not exhibit scalable performance under shared-memory programming due to their high communication-to-computation ratios and complex communication patterns. Results indicate that SAS can achieve about half the parallel efficiency of MPI for most of the applications; however, on certain classes of problems, SAS performance is competitive with MPI

  20. An efficient communication scheme for solving Sn equations on message-passing multiprocessors

    Azmy, Y.Y.

    1993-01-01

    Early models of Intel's hypercube multiprocessors, e.g., the iPSC/1 and iPSC/2, were characterized by the high latency of message passing. This relatively weak dependence of the communication penalty on the size of messages, in contrast to its strong dependence on the number of messages, justified using the Fan-in Fan-out algorithm (which implements a minimum spanning tree path) to perform global operations, such as global sums, etc. Recent models of message-passing computers, such as the iPSC/860 and the Paragon, have been found to possess much smaller latency, thus forcing a reexamination of the issue of performance optimization with respect to communication schemes. Essentially, the Fan-in Fan-out scheme minimizes the number of nonsimultaneous messages sent but not the volume of data traffic across the network. Furthermore, if a global operation is performed in conjunction with the message passing, a large fraction of the attached nodes remains idle as the number of utilized processors is halved in each step of the process. On the other hand, the Recursive Halving scheme offers the smallest communication cost for global operations but has some drawbacks

  1. McMPI – a managed-code message passing interface library for high performance communication in C#

    Holmes, Daniel John

    2012-01-01

    This work endeavours to achieve technology transfer between established best-practice in academic high-performance computing and current techniques in commercial high-productivity computing. It shows that a credible high-performance message-passing communication library, with semantics and syntax following the Message-Passing Interface (MPI) Standard, can be built in pure C# (one of the .Net suite of computer languages). Message-passing has been the dominant paradigm in high-pe...

  2. Neighbourhood-consensus message passing and its potentials in image processing applications

    Ružic, Tijana; Pižurica, Aleksandra; Philips, Wilfried

    2011-03-01

    In this paper, a novel algorithm for inference in Markov Random Fields (MRFs) is presented. Its goal is to find approximate maximum a posteriori estimates in a simple manner by combining neighbourhood influence of iterated conditional modes (ICM) and message passing of loopy belief propagation (LBP). We call the proposed method neighbourhood-consensus message passing because a single joint message is sent from the specified neighbourhood to the central node. The message, as a function of beliefs, represents the agreement of all nodes within the neighbourhood regarding the labels of the central node. This way we are able to overcome the disadvantages of reference algorithms, ICM and LBP. On one hand, more information is propagated in comparison with ICM, while on the other hand, the huge amount of pairwise interactions is avoided in comparison with LBP by working with neighbourhoods. The idea is related to the previously developed iterated conditional expectations algorithm. Here we revisit it and redefine it in a message passing framework in a more general form. The results on three different benchmarks demonstrate that the proposed technique can perform well both for binary and multi-label MRFs without any limitations on the model definition. Furthermore, it manifests improved performance over related techniques either in terms of quality and/or speed.

  3. Data communications for a collective operation in a parallel active messaging interface of a parallel computer

    Faraj, Daniel A

    2013-07-16

    Algorithm selection for data communications in a parallel active messaging interface (`PAMI`) of a parallel computer, the PAMI composed of data communications endpoints, each endpoint including specifications of a client, a context, and a task, endpoints coupled for data communications through the PAMI, including associating in the PAMI data communications algorithms and bit masks; receiving in an origin endpoint of the PAMI a collective instruction, the instruction specifying transmission of a data communications message from the origin endpoint to a target endpoint; constructing a bit mask for the received collective instruction; selecting, from among the associated algorithms and bit masks, a data communications algorithm in dependence upon the constructed bit mask; and executing the collective instruction, transmitting, according to the selected data communications algorithm from the origin endpoint to the target endpoint, the data communications message.

  4. Administering truncated receive functions in a parallel messaging interface

    Archer, Charles J; Blocksome, Michael A; Ratterman, Joseph D; Smith, Brian E

    2014-12-09

    Administering truncated receive functions in a parallel messaging interface (`PMI`) of a parallel computer comprising a plurality of compute nodes coupled for data communications through the PMI and through a data communications network, including: sending, through the PMI on a source compute node, a quantity of data from the source compute node to a destination compute node; specifying, by an application on the destination compute node, a portion of the quantity of data to be received by the application on the destination compute node and a portion of the quantity of data to be discarded; receiving, by the PMI on the destination compute node, all of the quantity of data; providing, by the PMI on the destination compute node to the application on the destination compute node, only the portion of the quantity of data to be received by the application; and discarding, by the PMI on the destination compute node, the portion of the quantity of data to be discarded.

  5. Data communications in a parallel active messaging interface of a parallel computer

    Archer, Charles J; Blocksome, Michael A; Ratterman, Joseph D; Smith, Brian E

    2013-11-12

    Data communications in a parallel active messaging interface (`PAMI`) of a parallel computer composed of compute nodes that execute a parallel application, each compute node including application processors that execute the parallel application and at least one management processor dedicated to gathering information regarding data communications. The PAMI is composed of data communications endpoints, each endpoint composed of a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task, the compute nodes and the endpoints coupled for data communications through the PAMI and through data communications resources. Embodiments function by gathering call site statistics describing data communications resulting from execution of data communications instructions and identifying in dependence upon the call cite statistics a data communications algorithm for use in executing a data communications instruction at a call site in the parallel application.

  6. Data communications in a parallel active messaging interface of a parallel computer

    Archer, Charles J; Blocksome, Michael A; Ratterman, Joseph D; Smith, Brian E

    2013-10-29

    Data communications in a parallel active messaging interface (`PAMI`) of a parallel computer, the parallel computer including a plurality of compute nodes that execute a parallel application, the PAMI composed of data communications endpoints, each endpoint including a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task, the compute nodes and the endpoints coupled for data communications through the PAMI and through data communications resources, including receiving in an origin endpoint of the PAMI a data communications instruction, the instruction characterized by an instruction type, the instruction specifying a transmission of transfer data from the origin endpoint to a target endpoint and transmitting, in accordance with the instruction type, the transfer data from the origin endpoint to the target endpoint.

  7. Space Reclamation for Uncoordinated Checkpointing in Message-Passing Systems. Ph.D. Thesis

    Wang, Yi-Min

    1993-01-01

    Checkpointing and rollback recovery are techniques that can provide efficient recovery from transient process failures. In a message-passing system, the rollback of a message sender may cause the rollback of the corresponding receiver, and the system needs to roll back to a consistent set of checkpoints called recovery line. If the processes are allowed to take uncoordinated checkpoints, the above rollback propagation may result in the domino effect which prevents recovery line progression. Traditionally, only obsolete checkpoints before the global recovery line can be discarded, and the necessary and sufficient condition for identifying all garbage checkpoints has remained an open problem. A necessary and sufficient condition for achieving optimal garbage collection is derived and it is proved that the number of useful checkpoints is bounded by N(N+1)/2, where N is the number of processes. The approach is based on the maximum-sized antichain model of consistent global checkpoints and the technique of recovery line transformation and decomposition. It is also shown that, for systems requiring message logging to record in-transit messages, the same approach can be used to achieve optimal message log reclamation. As a final topic, a unifying framework is described by considering checkpoint coordination and exploiting piecewise determinism as mechanisms for bounding rollback propagation, and the applicability of the optimal garbage collection algorithm to domino-free recovery protocols is demonstrated.

  8. Processing data communications events by awakening threads in parallel active messaging interface of a parallel computer

    Archer, Charles J.; Blocksome, Michael A.; Ratterman, Joseph D.; Smith, Brian E.

    2016-03-15

    Processing data communications events in a parallel active messaging interface (`PAMI`) of a parallel computer that includes compute nodes that execute a parallel application, with the PAMI including data communications endpoints, and the endpoints are coupled for data communications through the PAMI and through other data communications resources, including determining by an advance function that there are no actionable data communications events pending for its context, placing by the advance function its thread of execution into a wait state, waiting for a subsequent data communications event for the context; responsive to occurrence of a subsequent data communications event for the context, awakening by the thread from the wait state; and processing by the advance function the subsequent data communications event now pending for the context.

  9. A Message-Passing Hardware/Software Cosimulation Environment for Reconfigurable Computing Systems

    Manuel Saldaña

    2009-01-01

    Full Text Available High-performance reconfigurable computers (HPRCs provide a mix of standard processors and FPGAs to collectively accelerate applications. This introduces new design challenges, such as the need for portable programming models across HPRCs and system-level verification tools. To address the need for cosimulating a complete heterogeneous application using both software and hardware in an HPRC, we have created a tool called the Message-passing Simulation Framework (MSF. We have used it to simulate and develop an interface enabling an MPI-based approach to exchange data between X86 processors and hardware engines inside FPGAs. The MSF can also be used as an application development tool that enables multiple FPGAs in simulation to exchange messages amongst themselves and with X86 processors. As an example, we simulate a LINPACK benchmark hardware core using an Intel-FSB-Xilinx-FPGA platform to quickly prototype the hardware, to test the communications. and to verify the benchmark results.

  10. Fencing data transfers in a parallel active messaging interface of a parallel computer

    Blocksome, Michael A.; Mamidala, Amith R.

    2015-06-02

    Fencing data transfers in a parallel active messaging interface (`PAMI`) of a parallel computer, the PAMI including data communications endpoints, each endpoint including a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task; the compute nodes coupled for data communications through the PAMI and through data communications resources including at least one segment of shared random access memory; including initiating execution through the PAMI of an ordered sequence of active SEND instructions for SEND data transfers between two endpoints, effecting deterministic SEND data transfers through a segment of shared memory; and executing through the PAMI, with no FENCE accounting for SEND data transfers, an active FENCE instruction, the FENCE instruction completing execution only after completion of all SEND instructions initiated prior to execution of the FENCE instruction for SEND data transfers between the two endpoints.

  11. Fencing direct memory access data transfers in a parallel active messaging interface of a parallel computer

    Blocksome, Michael A.; Mamidala, Amith R.

    2013-09-03

    Fencing direct memory access (`DMA`) data transfers in a parallel active messaging interface (`PAMI`) of a parallel computer, the PAMI including data communications endpoints, each endpoint including specifications of a client, a context, and a task, the endpoints coupled for data communications through the PAMI and through DMA controllers operatively coupled to segments of shared random access memory through which the DMA controllers deliver data communications deterministically, including initiating execution through the PAMI of an ordered sequence of active DMA instructions for DMA data transfers between two endpoints, effecting deterministic DMA data transfers through a DMA controller and a segment of shared memory; and executing through the PAMI, with no FENCE accounting for DMA data transfers, an active FENCE instruction, the FENCE instruction completing execution only after completion of all DMA instructions initiated prior to execution of the FENCE instruction for DMA data transfers between the two endpoints.

  12. A message-passing approach to random constraint satisfaction problems with growing domains

    Zhao, Chunyan; Zheng, Zhiming; Zhou, Haijun; Xu, Ke

    2011-01-01

    Message-passing algorithms based on belief propagation (BP) are implemented on a random constraint satisfaction problem (CSP) referred to as model RB, which is a prototype of hard random CSPs with growing domain size. In model RB, the number of candidate discrete values (the domain size) of each variable increases polynomially with the variable number N of the problem formula. Although the satisfiability threshold of model RB is exactly known, finding solutions for a single problem formula is quite challenging and attempts have been limited to cases of N ∼ 10 2 . In this paper, we propose two different kinds of message-passing algorithms guided by BP for this problem. Numerical simulations demonstrate that these algorithms allow us to find a solution for random formulas of model RB with constraint tightness slightly less than p cr , the threshold value for the satisfiability phase transition. To evaluate the performance of these algorithms, we also provide a local search algorithm (random walk) as a comparison. Besides this, the simulated time dependence of the problem size N and the entropy of the variables for growing domain size are discussed

  13. Partition function expansion on region graphs and message-passing equations

    Zhou, Haijun; Wang, Chuang; Xiao, Jing-Qing; Bi, Zedong

    2011-01-01

    Disordered and frustrated graphical systems are ubiquitous in physics, biology, and information science. For models on complete graphs or random graphs, deep understanding has been achieved through the mean-field replica and cavity methods. But finite-dimensional 'real' systems remain very challenging because of the abundance of short loops and strong local correlations. A statistical mechanics theory is constructed in this paper for finite-dimensional models based on the mathematical framework of the partition function expansion and the concept of region graphs. Rigorous expressions for the free energy and grand free energy are derived. Message-passing equations on the region graph, such as belief propagation and survey propagation, are also derived rigorously. (letter)

  14. Distributed primal–dual interior-point methods for solving tree-structured coupled convex problems using message-passing

    Khoshfetrat Pakazad, Sina; Hansson, Anders; Andersen, Martin S.

    2017-01-01

    In this paper, we propose a distributed algorithm for solving coupled problems with chordal sparsity or an inherent tree structure which relies on primal–dual interior-point methods. We achieve this by distributing the computations at each iteration, using message-passing. In comparison to existi...

  15. Design of a Message Passing Model for Use in a Heterogeneous CPU-NFP Framework for Network Analytics

    Pennefather, S

    2017-09-01

    Full Text Available of applications written in the Go programming language to be executed on a Network Flow Processor (NFP) for enhanced performance. This paper explores the need and feasibility of implementing a message passing model for data transmission between the NFP and CPU...

  16. Approximate message passing for nonconvex sparse regularization with stability and asymptotic analysis

    Sakata, Ayaka; Xu, Yingying

    2018-03-01

    We analyse a linear regression problem with nonconvex regularization called smoothly clipped absolute deviation (SCAD) under an overcomplete Gaussian basis for Gaussian random data. We propose an approximate message passing (AMP) algorithm considering nonconvex regularization, namely SCAD-AMP, and analytically show that the stability condition corresponds to the de Almeida-Thouless condition in spin glass literature. Through asymptotic analysis, we show the correspondence between the density evolution of SCAD-AMP and the replica symmetric (RS) solution. Numerical experiments confirm that for a sufficiently large system size, SCAD-AMP achieves the optimal performance predicted by the replica method. Through replica analysis, a phase transition between replica symmetric and replica symmetry breaking (RSB) region is found in the parameter space of SCAD. The appearance of the RS region for a nonconvex penalty is a significant advantage that indicates the region of smooth landscape of the optimization problem. Furthermore, we analytically show that the statistical representation performance of the SCAD penalty is better than that of \

  17. Using Partial Reconfiguration and Message Passing to Enable FPGA-Based Generic Computing Platforms

    Manuel Saldaña

    2012-01-01

    Full Text Available Partial reconfiguration (PR is an FPGA feature that allows the modification of certain parts of an FPGA while the rest of the system continues to operate without disruption. This distinctive characteristic of FPGAs has many potential benefits but also challenges. The lack of good CAD tools and the deep hardware knowledge requirement result in a hard-to-use feature. In this paper, the new partition-based Xilinx PR flow is used to incorporate PR within our MPI-based message-passing framework to allow hardware designers to create template bitstreams, which are predesigned, prerouted, generic bitstreams that can be reused for multiple applications. As an example of the generality of this approach, four different applications that use the same template bitstream are run consecutively, with a PR operation performed at the beginning of each application to instantiate the desired application engine. We demonstrate a simplified, reusable, high-level, and portable PR interface for X86-FPGA hybrid machines. PR issues such as local resets of reconfigurable modules and context saving and restoring are addressed in this paper followed by some examples and preliminary PR overhead measurements.

  18. Detecting and Preventing Sybil Attacks in Wireless Sensor Networks Using Message Authentication and Passing Method.

    Dhamodharan, Udaya Suriya Raj Kumar; Vayanaperumal, Rajamani

    2015-01-01

    Wireless sensor networks are highly indispensable for securing network protection. Highly critical attacks of various kinds have been documented in wireless sensor network till now by many researchers. The Sybil attack is a massive destructive attack against the sensor network where numerous genuine identities with forged identities are used for getting an illegal entry into a network. Discerning the Sybil attack, sinkhole, and wormhole attack while multicasting is a tremendous job in wireless sensor network. Basically a Sybil attack means a node which pretends its identity to other nodes. Communication to an illegal node results in data loss and becomes dangerous in the network. The existing method Random Password Comparison has only a scheme which just verifies the node identities by analyzing the neighbors. A survey was done on a Sybil attack with the objective of resolving this problem. The survey has proposed a combined CAM-PVM (compare and match-position verification method) with MAP (message authentication and passing) for detecting, eliminating, and eventually preventing the entry of Sybil nodes in the network. We propose a scheme of assuring security for wireless sensor network, to deal with attacks of these kinds in unicasting and multicasting.

  19. Detecting and Preventing Sybil Attacks in Wireless Sensor Networks Using Message Authentication and Passing Method

    Udaya Suriya Raj Kumar Dhamodharan

    2015-01-01

    Full Text Available Wireless sensor networks are highly indispensable for securing network protection. Highly critical attacks of various kinds have been documented in wireless sensor network till now by many researchers. The Sybil attack is a massive destructive attack against the sensor network where numerous genuine identities with forged identities are used for getting an illegal entry into a network. Discerning the Sybil attack, sinkhole, and wormhole attack while multicasting is a tremendous job in wireless sensor network. Basically a Sybil attack means a node which pretends its identity to other nodes. Communication to an illegal node results in data loss and becomes dangerous in the network. The existing method Random Password Comparison has only a scheme which just verifies the node identities by analyzing the neighbors. A survey was done on a Sybil attack with the objective of resolving this problem. The survey has proposed a combined CAM-PVM (compare and match-position verification method with MAP (message authentication and passing for detecting, eliminating, and eventually preventing the entry of Sybil nodes in the network. We propose a scheme of assuring security for wireless sensor network, to deal with attacks of these kinds in unicasting and multicasting.

  20. A Two-Pass Exact Algorithm for Selection on Parallel Disk Systems.

    Mi, Tian; Rajasekaran, Sanguthevar

    2013-07-01

    Numerous OLAP queries process selection operations of "top N", median, "top 5%", in data warehousing applications. Selection is a well-studied problem that has numerous applications in the management of data and databases since, typically, any complex data query can be reduced to a series of basic operations such as sorting and selection. The parallel selection has also become an important fundamental operation, especially after parallel databases were introduced. In this paper, we present a deterministic algorithm Recursive Sampling Selection (RSS) to solve the exact out-of-core selection problem, which we show needs no more than (2 + ε ) passes ( ε being a very small fraction). We have compared our RSS algorithm with two other algorithms in the literature, namely, the Deterministic Sampling Selection and QuickSelect on the Parallel Disks Systems. Our analysis shows that DSS is a (2 + ε )-pass algorithm when the total number of input elements N is a polynomial in the memory size M (i.e., N = M c for some constant c ). While, our proposed algorithm RSS runs in (2 + ε ) passes without any assumptions. Experimental results indicate that both RSS and DSS outperform QuickSelect on the Parallel Disks Systems. Especially, the proposed algorithm RSS is more scalable and robust to handle big data when the input size is far greater than the core memory size, including the case of N ≫ M c .

  1. Coding for Parallel Links to Maximize the Expected Value of Decodable Messages

    Klimesh, Matthew A.; Chang, Christopher S.

    2011-01-01

    When multiple parallel communication links are available, it is useful to consider link-utilization strategies that provide tradeoffs between reliability and throughput. Interesting cases arise when there are three or more available links. Under the model considered, the links have known probabilities of being in working order, and each link has a known capacity. The sender has a number of messages to send to the receiver. Each message has a size and a value (i.e., a worth or priority). Messages may be divided into pieces arbitrarily, and the value of each piece is proportional to its size. The goal is to choose combinations of messages to send on the links so that the expected value of the messages decodable by the receiver is maximized. There are three parts to the innovation: (1) Applying coding to parallel links under the model; (2) Linear programming formulation for finding the optimal combinations of messages to send on the links; and (3) Algorithms for assisting in finding feasible combinations of messages, as support for the linear programming formulation. There are similarities between this innovation and methods developed in the field of network coding. However, network coding has generally been concerned with either maximizing throughput in a fixed network, or robust communication of a fixed volume of data. In contrast, under this model, the throughput is expected to vary depending on the state of the network. Examples of error-correcting codes that are useful under this model but which are not needed under previous models have been found. This model can represent either a one-shot communication attempt, or a stream of communications. Under the one-shot model, message sizes and link capacities are quantities of information (e.g., measured in bits), while under the communications stream model, message sizes and link capacities are information rates (e.g., measured in bits/second). This work has the potential to increase the value of data returned from

  2. What it Takes to Get Passed On: Message Content, Style, and Structure as Predictors of Retransmission in the Boston Marathon Bombing Response.

    Jeannette Sutton

    Full Text Available Message retransmission is a central aspect of information diffusion. In a disaster context, the passing on of official warning messages by members of the public also serves as a behavioral indicator of message salience, suggesting that particular messages are (or are not perceived by the public to be both noteworthy and valuable enough to share with others. This study provides the first examination of terse message retransmission of official warning messages in response to a domestic terrorist attack, the Boston Marathon Bombing in 2013. Using messages posted from public officials' Twitter accounts that were active during the period of the Boston Marathon bombing and manhunt, we examine the features of messages that are associated with their retransmission. We focus on message content, style, and structure, as well as the networked relationships of message senders to answer the question: what characteristics of a terse message sent under conditions of imminent threat predict its retransmission among members of the public? We employ a negative binomial model to examine how message characteristics affect message retransmission. We find that, rather than any single effect dominating the process, retransmission of official Tweets during the Boston bombing response was jointly influenced by various message content, style, and sender characteristics. These findings suggest the need for more work that investigates impact of multiple factors on the allocation of attention and on message retransmission during hazard events.

  3. Passing in Command Line Arguments and Parallel Cluster/Multicore Batching in R with batch.

    Hoffmann, Thomas J

    2011-03-01

    It is often useful to rerun a command line R script with some slight change in the parameters used to run it - a new set of parameters for a simulation, a different dataset to process, etc. The R package batch provides a means to pass in multiple command line options, including vectors of values in the usual R format, easily into R. The same script can be setup to run things in parallel via different command line arguments. The R package batch also provides a means to simplify this parallel batching by allowing one to use R and an R-like syntax for arguments to spread a script across a cluster or local multicore/multiprocessor computer, with automated syntax for several popular cluster types. Finally it provides a means to aggregate the results together of multiple processes run on a cluster.

  4. Microstructure and mechanical properties of AZ91 tubes fabricated by Multi-pass Parallel Tubular Channel Angular Pressing

    Hooman Abdolvand; Ghader Faraji; Javad Shahbazi Karami

    2017-01-01

    Parallel Tubular Channel Angular Pressing (PTCAP) process is a novel recently developed severe plastic deformation (SPD) method for producing ultrafine grained (UFG) and nanograined (NG) tubular specimens with excellent mechanical and physical properties. This process has several advantageous compared to its TCAP counterparts. In this paper, a fine grained AZ91 tube was fabricated via multi pass parallel tubular channel angular pressing (PTCAP) process. Tubes were processed up to three passes...

  5. Fencing network direct memory access data transfers in a parallel active messaging interface of a parallel computer

    Blocksome, Michael A.; Mamidala, Amith R.

    2015-07-07

    Fencing direct memory access (`DMA`) data transfers in a parallel active messaging interface (`PAMI`) of a parallel computer, the PAMI including data communications endpoints, each endpoint including specifications of a client, a context, and a task, the endpoints coupled for data communications through the PAMI and through DMA controllers operatively coupled to a deterministic data communications network through which the DMA controllers deliver data communications deterministically, including initiating execution through the PAMI of an ordered sequence of active DMA instructions for DMA data transfers between two endpoints, effecting deterministic DMA data transfers through a DMA controller and the deterministic data communications network; and executing through the PAMI, with no FENCE accounting for DMA data transfers, an active FENCE instruction, the FENCE instruction completing execution only after completion of all DMA instructions initiated prior to execution of the FENCE instruction for DMA data transfers between the two endpoints.

  6. Processing communications events in parallel active messaging interface by awakening thread from wait state

    Archer, Charles J; Blocksome, Michael A; Ratterman, Joseph D; Smith, Brian E

    2013-10-22

    Processing data communications events in a parallel active messaging interface (`PAMI`) of a parallel computer that includes compute nodes that execute a parallel application, with the PAMI including data communications endpoints, and the endpoints are coupled for data communications through the PAMI and through other data communications resources, including determining by an advance function that there are no actionable data communications events pending for its context, placing by the advance function its thread of execution into a wait state, waiting for a subsequent data communications event for the context; responsive to occurrence of a subsequent data communications event for the context, awakening by the thread from the wait state; and processing by the advance function the subsequent data communications event now pending for the context.

  7. Scalable High Performance Message Passing over InfiniBand for Open MPI

    Friedley, A; Hoefler, T; Leininger, M L; Lumsdaine, A

    2007-10-24

    InfiniBand (IB) is a popular network technology for modern high-performance computing systems. MPI implementations traditionally support IB using a reliable, connection-oriented (RC) transport. However, per-process resource usage that grows linearly with the number of processes, makes this approach prohibitive for large-scale systems. IB provides an alternative in the form of a connectionless unreliable datagram transport (UD), which allows for near-constant resource usage and initialization overhead as the process count increases. This paper describes a UD-based implementation for IB in Open MPI as a scalable alternative to existing RC-based schemes. We use the software reliability capabilities of Open MPI to provide the guaranteed delivery semantics required by MPI. Results show that UD not only requires fewer resources at scale, but also allows for shorter MPI startup times. A connectionless model also improves performance for applications that tend to send small messages to many different processes.

  8. A new parallel algorithm and its simulation on hypercube simulator for low pass digital image filtering using systolic array

    Al-Hallaq, A.; Amin, S.

    1998-01-01

    This paper introduces a new parallel algorithm and its simulation on a hypercube simulator for the low pass digital image filtering using a systolic array. This new algorithm is faster than the old one (Amin, 1988). This is due to the the fact that the old algorithm carries out the addition operations in a sequential mode. But in our new design these addition operations are divided into tow groups, which can be performed in parallel. One group will be performed on one half of the systolic array and the other on the second half, that is, by folding. This parallelism reduces the time required for the whole process by almost quarter the time of the old algorithm.(authors). 18 refs., 3 figs

  9. Statistical Physics, Optimization, Inference, and Message-Passing Algorithms : Lecture Notes of the Les Houches School of Physics : Special Issue, October 2013

    Ricci-Tersenghi, Federico; Zdeborova, Lenka; Zecchina, Riccardo; Tramel, Eric W; Cugliandolo, Leticia F

    2015-01-01

    This book contains a collection of the presentations that were given in October 2013 at the Les Houches Autumn School on statistical physics, optimization, inference, and message-passing algorithms. In the last decade, there has been increasing convergence of interest and methods between theoretical physics and fields as diverse as probability, machine learning, optimization, and inference problems. In particular, much theoretical and applied work in statistical physics and computer science has relied on the use of message-passing algorithms and their connection to the statistical physics of glasses and spin glasses. For example, both the replica and cavity methods have led to recent advances in compressed sensing, sparse estimation, and random constraint satisfaction, to name a few. This book’s detailed pedagogical lectures on statistical inference, computational complexity, the replica and cavity methods, and belief propagation are aimed particularly at PhD students, post-docs, and young researchers desir...

  10. Can rare SAT formulae be easily recognized? On the efficiency of message-passing algorithms for K-SAT at large clause-to-variable ratios

    Altarelli, Fabrizio; Monasson, Remi; Zamponi, Francesco

    2007-01-01

    For large clause-to-variable ratios, typical K-SAT instances drawn from the uniform distribution have no solution. We argue, based on statistical mechanics calculations using the replica and cavity methods, that rare satisfiable instances from the uniform distribution are very similar to typical instances drawn from the so-called planted distribution, where instances are chosen uniformly between the ones that admit a given solution. It then follows, from a recent article by Feige, Mossel and Vilenchik (2006 Complete convergence of message passing algorithms for some satisfiability problems Proc. Random 2006 pp 339-50), that these rare instances can be easily recognized (in O(log N) time and with probability close to 1) by a simple message-passing algorithm

  11. Portable and Transparent Message Compression in MPI Libraries to Improve the Performance and Scalability of Parallel Applications

    Albonesi, David; Burtscher, Martin

    2009-04-17

    The goal of this project has been to develop a lossless compression algorithm for message-passing libraries that can accelerate HPC systems by reducing the communication time. Because both compression and decompression have to be performed in software in real time, the algorithm has to be extremely fast while still delivering a good compression ratio. During the first half of this project, they designed a new compression algorithm called FPC for scientific double-precision data, made the source code available on the web, and published two papers describing its operation, the first in the proceedings of the Data Compression Conference and the second in the IEEE Transactions on Computers. At comparable average compression ratios, this algorithm compresses and decompresses 10 to 100 times faster than BZIP2, DFCM, FSD, GZIP, and PLMI on the three architectures tested. With prediction tables that fit into the CPU's L1 data acache, FPC delivers a guaranteed throughput of six gigabits per second on a 1.6 GHz Itanium 2 system. The C source code and documentation of FPC are posted on-line and have already been downloaded hundreds of times. To evaluate FPC, they gathered 13 real-world scientific datasets from around the globe, including satellite data, crash-simulation data, and messages from HPC systems. Based on the large number of requests they received, they also made these datasets available to the community (with permission of the original sources). While FPC represents a great step forward, it soon became clear that its throughput was too slow for the emerging 10 gigabits per second networks. Hence, no speedup can be gained by including this algorithm in an MPI library. They therefore changed the aim of the second half of the project. Instead of implementing FPC in an MPI library, they refocused their efforts to develop a parallel compression algorithm to further boost the throughput. After all, all modern high-end microprocessors contain multiple CPUs on a

  12. The effects of fear appeal message repetition on perceived threat, perceived efficacy, and behavioral intention in the extended parallel process model.

    Shi, Jingyuan Jolie; Smith, Sandi W

    2016-01-01

    This study examined the effect of moderately repeated exposure (three times) to a fear appeal message on the Extended Parallel Processing Model (EPPM) variables of threat, efficacy, and behavioral intentions for the recommended behaviors in the message, as well as the proportions of systematic and message-related thoughts generated after each message exposure. The results showed that after repeated exposure to a fear appeal message about preventing melanoma, perceived threat in terms of susceptibility and perceived efficacy in terms of response efficacy significantly increased. The behavioral intentions of all recommended behaviors did not change after repeated exposure to the message. However, after the second exposure the proportions of both systematic and all message-related thoughts (relative to total thoughts) significantly decreased while the proportion of heuristic thoughts significantly increased, and this pattern held after the third exposure. The findings demonstrated that the predictions in the EPPM are likely to be operative after three exposures to a persuasive message.

  13. Parallel Monte Carlo simulation of aerosol dynamics

    Zhou, K.; He, Z.; Xiao, M.; Zhang, Z.

    2014-01-01

    is simulated with a stochastic method (Marcus-Lushnikov stochastic process). Operator splitting techniques are used to synthesize the deterministic and stochastic parts in the algorithm. The algorithm is parallelized using the Message Passing Interface (MPI

  14. Wideband and flat-gain amplifier based on high concentration erbium-doped fibres in parallel double-pass configuration

    Hamida, B A; Cheng, X S; Harun, S W; Naji, A W; Arof, H; Al-Khateeb, W; Khan, S; Ahmad, H

    2012-01-01

    A wideband and flat gain erbium-doped fibre amplifier (EDFA) is demonstrated using a hybrid gain medium of a zirconiabased erbium-doped fibre (Zr-EDF) and a high concentration erbium-doped fibre (EDF). The amplifier has two stages comprising a 2-m-long ZEDF and 9-m-long EDF optimised for C- and L-band operations, respectively, in a double-pass parallel configuration. A chirp fibre Bragg grating (CFBG) is used in both stages to ensure double propagation of the signal and thus to increase the attainable gain in both C- and L-band regions. At an input signal power of 0 dBm, a flat gain of 15 dB is achieved with a gain variation of less than 0.5 dB within a wide wavelength range from 1530 to 1605 nm. The corresponding noise figure varies from 6.2 to 10.8 dB within this wavelength region.

  15. Getting the Message? Native Reactive Electrophiles Pass Two Out of Three Thresholds to be Bona Fide Signaling Mediators.

    Poganik, Jesse R; Long, Marcus J C; Aye, Yimon

    2018-05-01

    Precision cell signaling activities of reactive electrophilic species (RES) are arguably among the most poorly-understood means to transmit biological messages. Latest research implicates native RES to be a chemically-distinct subset of endogenous redox signals that influence cell decision making through non-enzyme-assisted modifications of specific proteins. Yet, fundamental questions remain regarding the role of RES as bona fide second messengers. Here, we lay out three sets of criteria we feel need to be met for RES to be considered as true cellular signals that directly mediate information transfer by modifying "first-responding" sensor proteins. We critically assess the available evidence and define the extent to which each criterion has been fulfilled. Finally, we offer some ideas on the future trajectories of the electrophile signaling field taking inspiration from work that has been done to understand canonical signaling mediators. Also see the video abstract here: https://youtu.be/rG7o0clVP0c. © 2018 WILEY Periodicals, Inc.

  16. Endpoint-based parallel data processing with non-blocking collective instructions in a parallel active messaging interface of a parallel computer

    Archer, Charles J; Blocksome, Michael A; Cernohous, Bob R; Ratterman, Joseph D; Smith, Brian E

    2014-11-11

    Endpoint-based parallel data processing with non-blocking collective instructions in a PAMI of a parallel computer is disclosed. The PAMI is composed of data communications endpoints, each including a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task. The compute nodes are coupled for data communications through the PAMI. The parallel application establishes a data communications geometry specifying a set of endpoints that are used in collective operations of the PAMI by associating with the geometry a list of collective algorithms valid for use with the endpoints of the geometry; registering in each endpoint in the geometry a dispatch callback function for a collective operation; and executing without blocking, through a single one of the endpoints in the geometry, an instruction for the collective operation.

  17. Application Portable Parallel Library

    Cole, Gary L.; Blech, Richard A.; Quealy, Angela; Townsend, Scott

    1995-01-01

    Application Portable Parallel Library (APPL) computer program is subroutine-based message-passing software library intended to provide consistent interface to variety of multiprocessor computers on market today. Minimizes effort needed to move application program from one computer to another. User develops application program once and then easily moves application program from parallel computer on which created to another parallel computer. ("Parallel computer" also include heterogeneous collection of networked computers). Written in C language with one FORTRAN 77 subroutine for UNIX-based computers and callable from application programs written in C language or FORTRAN 77.

  18. Behaviour of Parallel Coupled Microstrip Band Pass Filter and Simple Microstripline due to Thin-Film Al2O3 Overlay

    S. B. Rane

    1996-01-01

    Full Text Available The X-band behaviour of a seven-section parallel-coupled microstrip band pass filter and microstripline due to thin-film Al2O3 overlay of different thickness is reported in this paper. This Al2O3 film can give a homogeneous overlay structure. There is a substantial increase in the bandwidth due to the overlay, the pass band extending towards higher frequency side. In most of the cases, an increase in the pass band transmittance of a microstripline also increases due to a thin-film Al2O3 overlay, especially for frequencies less than 9.0 GHz. At higher frequencies, random variations are observed. It is felt that thin-film overlays can be used to modify the microstripline circuit properties, thereby avoiding costly and time consuming elaborate design procedures.

  19. Eighth SIAM conference on parallel processing for scientific computing: Final program and abstracts

    NONE

    1997-12-31

    This SIAM conference is the premier forum for developments in parallel numerical algorithms, a field that has seen very lively and fruitful developments over the past decade, and whose health is still robust. Themes for this conference were: combinatorial optimization; data-parallel languages; large-scale parallel applications; message-passing; molecular modeling; parallel I/O; parallel libraries; parallel software tools; parallel compilers; particle simulations; problem-solving environments; and sparse matrix computations.

  20. Ultrascalable petaflop parallel supercomputer

    Blumrich, Matthias A [Ridgefield, CT; Chen, Dong [Croton On Hudson, NY; Chiu, George [Cross River, NY; Cipolla, Thomas M [Katonah, NY; Coteus, Paul W [Yorktown Heights, NY; Gara, Alan G [Mount Kisco, NY; Giampapa, Mark E [Irvington, NY; Hall, Shawn [Pleasantville, NY; Haring, Rudolf A [Cortlandt Manor, NY; Heidelberger, Philip [Cortlandt Manor, NY; Kopcsay, Gerard V [Yorktown Heights, NY; Ohmacht, Martin [Yorktown Heights, NY; Salapura, Valentina [Chappaqua, NY; Sugavanam, Krishnan [Mahopac, NY; Takken, Todd [Brewster, NY

    2010-07-20

    A massively parallel supercomputer of petaOPS-scale includes node architectures based upon System-On-a-Chip technology, where each processing node comprises a single Application Specific Integrated Circuit (ASIC) having up to four processing elements. The ASIC nodes are interconnected by multiple independent networks that optimally maximize the throughput of packet communications between nodes with minimal latency. The multiple networks may include three high-speed networks for parallel algorithm message passing including a Torus, collective network, and a Global Asynchronous network that provides global barrier and notification functions. These multiple independent networks may be collaboratively or independently utilized according to the needs or phases of an algorithm for optimizing algorithm processing performance. The use of a DMA engine is provided to facilitate message passing among the nodes without the expenditure of processing resources at the node.

  1. Performance of a Sequential and Parallel Computational Fluid Dynamic (CFD) Solver on a Missile Body Configuration

    Hisley, Dixie

    1999-01-01

    .... The goals of this report are: (1) to investigate the performance of message passing and loop level parallelization techniques, as they were implemented in the computational fluid dynamics (CFD...

  2. Building Blocks for the Rapid Development of Parallel Simulations, Phase I

    National Aeronautics and Space Administration — Scientists need to be able to quickly develop and run parallel simulations without paying the high price of writing low-level message passing codes using compiled...

  3. A portable implementation of ARPACK for distributed memory parallel architectures

    Maschhoff, K.J.; Sorensen, D.C.

    1996-12-31

    ARPACK is a package of Fortran 77 subroutines which implement the Implicitly Restarted Arnoldi Method used for solving large sparse eigenvalue problems. A parallel implementation of ARPACK is presented which is portable across a wide range of distributed memory platforms and requires minimal changes to the serial code. The communication layers used for message passing are the Basic Linear Algebra Communication Subprograms (BLACS) developed for the ScaLAPACK project and Message Passing Interface(MPI).

  4. Myocardial first pass perfusion imaging with gadobutrol: impact of parallel imaging algorithms on image quality and signal behavior.

    Theisen, Daniel; Wintersperger, Bernd J; Huber, Armin; Dietrich, Olaf; Reiser, Maximilian F; Schönberg, Stefan O

    2007-07-01

    To implement parallel imaging algorithms in fast gradient recalled echo sequences for myocardial perfusion imaging and evaluate image quality, signal-to-noise ratio (SNR), contrast-enhancement ratio (CER), and semiquantitative perfusion parameters. In 20 volunteers, myocardial perfusion imaging with gadobutrol was performed at rest using an accelerated TurboFLASH sequence (TR 2.3 milliseconds, TE 0.93 milliseconds, flip angle [FA] 15 degrees) with GRAPPA, R=2. A nonaccelerated TurboFLASH sequence with similar scan parameters served as standard of reference. Artifacts were assessed qualitatively. SNR, CER, and CNR were calculated and semiquantitative perfusion parameters were determined from fitted SI-time curves. Phantom measurements yielded significant higher SNR for nonaccelerated images (Pimages (Pimages for artifacts by 2 board-certified radiologists yielded a significant reduction in dark rim artifacts with GRAPPA, R=2 (P<0.001). The application of GRAPPA with an acceleration factor of R=2 leads to a significant reduction of dark rim artifacts in fast gradient recalled echo sequences.

  5. Optimisation of a parallel ocean general circulation model

    M. I. Beare; D. P. Stevens

    1997-01-01

    International audience; This paper presents the development of a general-purpose parallel ocean circulation model, for use on a wide range of computer platforms, from traditional scalar machines to workstation clusters and massively parallel processors. Parallelism is provided, as a modular option, via high-level message-passing routines, thus hiding the technical intricacies from the user. An initial implementation highlights that the parallel efficiency of the model is adversely affected by...

  6. An extension of the extended parallel process model (EPPM) in television health news: the influence of health consciousness on individual message processing and acceptance.

    Hong, Hyehyun

    2011-06-01

    The purpose of this study is to examine the role of health consciousness in processing TV news that contains potential health threats and preventive recommendations. Based on the extended parallel process model (Witte, 1992), relationships among health consciousness, perceived severity, perceived susceptibility, perceived response efficacy, perceived self-efficacy, and message acceptance/rejection were hypothesized. Responses collected from 175 participants after viewing four TV health news stories were analyzed using the bootstrapping analysis (Preacher & Hayes, 2008). Results confirmed three mediators (i.e., perceived severity, response efficacy, self-efficacy) in the influence of health consciousness on message acceptance. A negative association found between health consciousness and perceived susceptibility is discussed in relation to characteristics of health conscious individuals and optimistic bias of health risks.

  7. Effectiveness of mobile phone messaging in prevention of type 2 diabetes by lifestyle modification in men in India: a prospective, parallel-group, randomised controlled trial.

    Ramachandran, Ambady; Snehalatha, Chamukuttan; Ram, Jagannathan; Selvam, Sundaram; Simon, Mary; Nanditha, Arun; Shetty, Ananth Samith; Godsland, Ian F; Chaturvedi, Nish; Majeed, Azeem; Oliver, Nick; Toumazou, Christofer; Alberti, K George; Johnston, Desmond G

    2013-11-01

    Type 2 diabetes can often be prevented by lifestyle modification; however, successful lifestyle intervention programmes are labour intensive. Mobile phone messaging is an inexpensive alternative way to deliver educational and motivational advice about lifestyle modification. We aimed to assess whether mobile phone messaging that encouraged lifestyle change could reduce incident type 2 diabetes in Indian Asian men with impaired glucose tolerance. We did a prospective, parallel-group, randomised controlled trial between Aug 10, 2009, and Nov 30, 2012, at ten sites in southeast India. Working Indian men (aged 35-55 years) with impaired glucose tolerance were randomly assigned (1:1) with a computer-generated randomisation sequence to a mobile phone messaging intervention or standard care (control group). Participants in the intervention group received frequent mobile phone messages compared with controls who received standard lifestyle modification advice at baseline only. Field staff and participants were, by necessity, not masked to study group assignment, but allocation was concealed from laboratory personnel as well as principal and co-investigators. The primary outcome was incidence of type 2 diabetes, analysed by intention to treat. This trial is registered with ClinicalTrials.gov, number NCT00819455. We assessed 8741 participants for eligibility. 537 patients were randomly assigned to either the mobile phone messaging intervention (n=271) or standard care (n=266). The cumulative incidence of type 2 diabetes was lower in those who received mobile phone messages than in controls: 50 (18%) participants in the intervention group developed type 2 diabetes compared with 73 (27%) in the control group (hazard ratio 0·64, 95% CI 0·45-0·92; p=0·015). The number needed to treat to prevent one case of type 2 diabetes was 11 (95% CI 6-55). One patient in the control group died suddenly at the end of the first year. We recorded no other serious adverse events. Mobile

  8. Parallel object-oriented specification language

    Florescu, O.; Voeten, J.P.M.; Theelen, B.D.; Geilen, M.C.W.; Corporaal, H.; Burns, Alan

    2008-01-01

    The Parallel Object-Oriented Specification Language (POOSL) is an expressive modelling language for hardware/software systems [10]. It was originally defined in [7] as an object-oriented extension of process algebra CCS [6], supporting (conditional) synchronous message passing between

  9. A Model for Speedup of Parallel Programs

    1997-01-01

    Sanjeev. K Setia . The interaction between mem- ory allocation and adaptive partitioning in message- passing multicomputers. In IPPS 󈨣 Workshop on Job...Scheduling Strategies for Parallel Processing, pages 89{99, 1995. [15] Sanjeev K. Setia and Satish K. Tripathi. A compar- ative analysis of static

  10. Parallel integer sorting with medium and fine-scale parallelism

    Dagum, Leonardo

    1993-01-01

    Two new parallel integer sorting algorithms, queue-sort and barrel-sort, are presented and analyzed in detail. These algorithms do not have optimal parallel complexity, yet they show very good performance in practice. Queue-sort designed for fine-scale parallel architectures which allow the queueing of multiple messages to the same destination. Barrel-sort is designed for medium-scale parallel architectures with a high message passing overhead. The performance results from the implementation of queue-sort on a Connection Machine CM-2 and barrel-sort on a 128 processor iPSC/860 are given. The two implementations are found to be comparable in performance but not as good as a fully vectorized bucket sort on the Cray YMP.

  11. Parallel Framework for Cooperative Processes

    Mitică Craus

    2005-01-01

    Full Text Available This paper describes the work of an object oriented framework designed to be used in the parallelization of a set of related algorithms. The idea behind the system we are describing is to have a re-usable framework for running several sequential algorithms in a parallel environment. The algorithms that the framework can be used with have several things in common: they have to run in cycles and the work should be possible to be split between several "processing units". The parallel framework uses the message-passing communication paradigm and is organized as a master-slave system. Two applications are presented: an Ant Colony Optimization (ACO parallel algorithm for the Travelling Salesman Problem (TSP and an Image Processing (IP parallel algorithm for the Symmetrical Neighborhood Filter (SNF. The implementations of these applications by means of the parallel framework prove to have good performances: approximatively linear speedup and low communication cost.

  12. Parallel-Processing Test Bed For Simulation Software

    Blech, Richard; Cole, Gary; Townsend, Scott

    1996-01-01

    Second-generation Hypercluster computing system is multiprocessor test bed for research on parallel algorithms for simulation in fluid dynamics, electromagnetics, chemistry, and other fields with large computational requirements but relatively low input/output requirements. Built from standard, off-shelf hardware readily upgraded as improved technology becomes available. System used for experiments with such parallel-processing concepts as message-passing algorithms, debugging software tools, and computational steering. First-generation Hypercluster system described in "Hypercluster Parallel Processor" (LEW-15283).

  13. Abstract Level Parallelization of Finite Difference Methods

    Edwin Vollebregt

    1997-01-01

    Full Text Available A formalism is proposed for describing finite difference calculations in an abstract way. The formalism consists of index sets and stencils, for characterizing the structure of sets of data items and interactions between data items (“neighbouring relations”. The formalism provides a means for lifting programming to a more abstract level. This simplifies the tasks of performance analysis and verification of correctness, and opens the way for automaticcode generation. The notation is particularly useful in parallelization, for the systematic construction of parallel programs in a process/channel programming paradigm (e.g., message passing. This is important because message passing, unfortunately, still is the only approach that leads to acceptable performance for many more unstructured or irregular problems on parallel computers that have non-uniform memory access times. It will be shown that the use of index sets and stencils greatly simplifies the determination of which data must be exchanged between different computing processes.

  14. Hierarchical approach to optimization of parallel matrix multiplication on large-scale platforms

    Hasanov, Khalid; Quintin, Jean-Noë l; Lastovetsky, Alexey

    2014-01-01

    -scale parallelism in mind. Indeed, while in 1990s a system with few hundred cores was considered a powerful supercomputer, modern top supercomputers have millions of cores. In this paper, we present a hierarchical approach to optimization of message-passing parallel

  15. PUMA: An Operating System for Massively Parallel Systems

    Stephen R. Wheat

    1994-01-01

    Full Text Available This article presents an overview of PUMA (Performance-oriented, User-managed Messaging Architecture, a message-passing kernel for massively parallel systems. Message passing in PUMA is based on portals – an opening in the address space of an application process. Once an application process has established a portal, other processes can write values into the portal using a simple send operation. Because messages are written directly into the address space of the receiving process, there is no need to buffer messages in the PUMA kernel and later copy them into the applications address space. PUMA consists of two components: the quintessential kernel (Q-Kernel and the process control thread (PCT. Although the PCT provides management decisions, the Q-Kernel controls access and implements the policies specified by the PCT.

  16. Parallel sparse direct solvers for Poisson's equation in streamer discharges

    M. Nool (Margreet); M. Genseberger (Menno); U. M. Ebert (Ute)

    2017-01-01

    textabstractThe aim of this paper is to examine whether a hybrid approach of parallel computing, a combination of the message passing model (MPI) with the threads model (OpenMP) can deliver good performance in streamer discharge simulations. Since one of the bottlenecks of almost all streamer

  17. Performance modeling of parallel algorithms for solving neutron diffusion problems

    Azmy, Y.Y.; Kirk, B.L.

    1995-01-01

    Neutron diffusion calculations are the most common computational methods used in the design, analysis, and operation of nuclear reactors and related activities. Here, mathematical performance models are developed for the parallel algorithm used to solve the neutron diffusion equation on message passing and shared memory multiprocessors represented by the Intel iPSC/860 and the Sequent Balance 8000, respectively. The performance models are validated through several test problems, and these models are used to estimate the performance of each of the two considered architectures in situations typical of practical applications, such as fine meshes and a large number of participating processors. While message passing computers are capable of producing speedup, the parallel efficiency deteriorates rapidly as the number of processors increases. Furthermore, the speedup fails to improve appreciably for massively parallel computers so that only small- to medium-sized message passing multiprocessors offer a reasonable platform for this algorithm. In contrast, the performance model for the shared memory architecture predicts very high efficiency over a wide range of number of processors reasonable for this architecture. Furthermore, the model efficiency of the Sequent remains superior to that of the hypercube if its model parameters are adjusted to make its processors as fast as those of the iPSC/860. It is concluded that shared memory computers are better suited for this parallel algorithm than message passing computers

  18. Auctioning Bulk Mobile Messages

    S. Meij (Simon); L-F. Pau (Louis-François); H.W.G.M. van Heck (Eric)

    2003-01-01

    textabstractThe search for enablers of continued growth of SMS traffic, as well as the take-off of the more diversified MMS message contents, open up for enterprises the potential of bulk use of mobile messaging , instead of essentially one-by-one use. In parallel, such enterprises or value added

  19. Design considerations for parallel graphics libraries

    Crockett, Thomas W.

    1994-01-01

    Applications which run on parallel supercomputers are often characterized by massive datasets. Converting these vast collections of numbers to visual form has proven to be a powerful aid to comprehension. For a variety of reasons, it may be desirable to provide this visual feedback at runtime. One way to accomplish this is to exploit the available parallelism to perform graphics operations in place. In order to do this, we need appropriate parallel rendering algorithms and library interfaces. This paper provides a tutorial introduction to some of the issues which arise in designing parallel graphics libraries and their underlying rendering algorithms. The focus is on polygon rendering for distributed memory message-passing systems. We illustrate our discussion with examples from PGL, a parallel graphics library which has been developed on the Intel family of parallel systems.

  20. Parallelization of simulation code for liquid-gas model of lattice-gas fluid

    Kawai, Wataru; Ebihara, Kenichi; Kume, Etsuo; Watanabe, Tadashi

    2000-03-01

    A simulation code for hydrodynamical phenomena which is based on the liquid-gas model of lattice-gas fluid is parallelized by using MPI (Message Passing Interface) library. The parallelized code can be applied to the larger size of the simulations than the non-parallelized code. The calculation times of the parallelized code on VPP500 (Vector-Parallel super computer with dispersed memory units), AP3000 (Scalar-parallel server with dispersed memory units), and a workstation cluster decreased in inverse proportion to the number of processors. (author)

  1. Optimisation of a parallel ocean general circulation model

    Beare, M. I.; Stevens, D. P.

    1997-10-01

    This paper presents the development of a general-purpose parallel ocean circulation model, for use on a wide range of computer platforms, from traditional scalar machines to workstation clusters and massively parallel processors. Parallelism is provided, as a modular option, via high-level message-passing routines, thus hiding the technical intricacies from the user. An initial implementation highlights that the parallel efficiency of the model is adversely affected by a number of factors, for which optimisations are discussed and implemented. The resulting ocean code is portable and, in particular, allows science to be achieved on local workstations that could otherwise only be undertaken on state-of-the-art supercomputers.

  2. Parallel community climate model: Description and user`s guide

    Drake, J.B.; Flanery, R.E.; Semeraro, B.D.; Worley, P.H. [and others

    1996-07-15

    This report gives an overview of a parallel version of the NCAR Community Climate Model, CCM2, implemented for MIMD massively parallel computers using a message-passing programming paradigm. The parallel implementation was developed on an Intel iPSC/860 with 128 processors and on the Intel Delta with 512 processors, and the initial target platform for the production version of the code is the Intel Paragon with 2048 processors. Because the implementation uses a standard, portable message-passing libraries, the code has been easily ported to other multiprocessors supporting a message-passing programming paradigm. The parallelization strategy used is to decompose the problem domain into geographical patches and assign each processor the computation associated with a distinct subset of the patches. With this decomposition, the physics calculations involve only grid points and data local to a processor and are performed in parallel. Using parallel algorithms developed for the semi-Lagrangian transport, the fast Fourier transform and the Legendre transform, both physics and dynamics are computed in parallel with minimal data movement and modest change to the original CCM2 source code. Sequential or parallel history tapes are written and input files (in history tape format) are read sequentially by the parallel code to promote compatibility with production use of the model on other computer systems. A validation exercise has been performed with the parallel code and is detailed along with some performance numbers on the Intel Paragon and the IBM SP2. A discussion of reproducibility of results is included. A user`s guide for the PCCM2 version 2.1 on the various parallel machines completes the report. Procedures for compilation, setup and execution are given. A discussion of code internals is included for those who may wish to modify and use the program in their own research.

  3. Differences Between Distributed and Parallel Systems

    Brightwell, R.; Maccabe, A.B.; Rissen, R.

    1998-10-01

    Distributed systems have been studied for twenty years and are now coming into wider use as fast networks and powerful workstations become more readily available. In many respects a massively parallel computer resembles a network of workstations and it is tempting to port a distributed operating system to such a machine. However, there are significant differences between these two environments and a parallel operating system is needed to get the best performance out of a massively parallel system. This report characterizes the differences between distributed systems, networks of workstations, and massively parallel systems and analyzes the impact of these differences on operating system design. In the second part of the report, we introduce Puma, an operating system specifically developed for massively parallel systems. We describe Puma portals, the basic building blocks for message passing paradigms implemented on top of Puma, and show how the differences observed in the first part of the report have influenced the design and implementation of Puma.

  4. Parallel algorithms for numerical linear algebra

    van der Vorst, H

    1990-01-01

    This is the first in a new series of books presenting research results and developments concerning the theory and applications of parallel computers, including vector, pipeline, array, fifth/future generation computers, and neural computers.All aspects of high-speed computing fall within the scope of the series, e.g. algorithm design, applications, software engineering, networking, taxonomy, models and architectural trends, performance, peripheral devices.Papers in Volume One cover the main streams of parallel linear algebra: systolic array algorithms, message-passing systems, algorithms for p

  5. A PARALLEL EXTENSION OF THE UAL ENVIRONMENT

    MALITSKY, N.; SHISHLO, A.

    2001-01-01

    The deployment of the Unified Accelerator Library (UAL) environment on the parallel cluster is presented. The approach is based on the Message-Passing Interface (MPI) library and the Perl adapter that allows one to control and mix together the existing conventional UAL components with the new MPI-based parallel extensions. In the paper, we provide timing results and describe the application of the new environment to the SNS Ring complex beam dynamics studies, particularly, simulations of several physical effects, such as space charge, field errors, fringe fields, and others

  6. A parallel implementation of 3-d CT image reconstruction on a hypercube multiprocessor

    Chen, C.M.; Lee, S.Y.; Cho, Z.H.

    1990-01-01

    In this paper, the authors describe how image reconstruction in computerized tomography (CT) can be parallelized on a message-passing multiprocessor. In particular, the results obtained from parallel implementation of 3-D CT image reconstruction for parallel beam geometries on the Intel hypercube, iPSC/2, are presented. A two stage pipelining approach is employed for filtering (convolution) and backprojection. The conventional sequential convolution algorithm is modified such that the symmetry of the filter kernel is fully utilized for parallelization. In the backprojection stage, the 3-D incremental algorithm, the authors' recently developed backprojection scheme which is shown to be faster than conventional algorithm, is parallelized

  7. Parallel simulated annealing algorithms for cell placement on hypercube multiprocessors

    Banerjee, Prithviraj; Jones, Mark Howard; Sargent, Jeff S.

    1990-01-01

    Two parallel algorithms for standard cell placement using simulated annealing are developed to run on distributed-memory message-passing hypercube multiprocessors. The cells can be mapped in a two-dimensional area of a chip onto processors in an n-dimensional hypercube in two ways, such that both small and large cell exchange and displacement moves can be applied. The computation of the cost function in parallel among all the processors in the hypercube is described, along with a distributed data structure that needs to be stored in the hypercube to support the parallel cost evaluation. A novel tree broadcasting strategy is used extensively for updating cell locations in the parallel environment. A dynamic parallel annealing schedule estimates the errors due to interacting parallel moves and adapts the rate of synchronization automatically. Two novel approaches in controlling error in parallel algorithms are described: heuristic cell coloring and adaptive sequence control.

  8. A Parallel Encryption Algorithm Based on Piecewise Linear Chaotic Map

    Xizhong Wang

    2013-01-01

    Full Text Available We introduce a parallel chaos-based encryption algorithm for taking advantage of multicore processors. The chaotic cryptosystem is generated by the piecewise linear chaotic map (PWLCM. The parallel algorithm is designed with a master/slave communication model with the Message Passing Interface (MPI. The algorithm is suitable not only for multicore processors but also for the single-processor architecture. The experimental results show that the chaos-based cryptosystem possesses good statistical properties. The parallel algorithm provides much better performance than the serial ones and would be useful to apply in encryption/decryption file with large size or multimedia.

  9. CALTRANS: A parallel, deterministic, 3D neutronics code

    Carson, L.; Ferguson, J.; Rogers, J.

    1994-04-01

    Our efforts to parallelize the deterministic solution of the neutron transport equation has culminated in a new neutronics code CALTRANS, which has full 3D capability. In this article, we describe the layout and algorithms of CALTRANS and present performance measurements of the code on a variety of platforms. Explicit implementation of the parallel algorithms of CALTRANS using both the function calls of the Parallel Virtual Machine software package (PVM 3.2) and the Meiko CS-2 tagged message passing library (based on the Intel NX/2 interface) are provided in appendices.

  10. High-energy physics software parallelization using database techniques

    Argante, E.; Van der Stok, P.D.V.; Willers, I.

    1997-01-01

    A programming model for software parallelization, called CoCa, is introduced that copes with problems caused by typical features of high-energy physics software. By basing CoCa on the database transaction paradigm, the complexity induced by the parallelization is for a large part transparent to the programmer, resulting in a higher level of abstraction than the native message passing software. CoCa is implemented on a Meiko CS-2 and on a SUN SPARCcenter 2000 parallel computer. On the CS-2, the performance is comparable with the performance of native PVM and MPI. (orig.)

  11. High-speed parallel solution of the neutron diffusion equation with the hierarchical domain decomposition boundary element method incorporating parallel communications

    Tsuji, Masashi; Chiba, Gou

    2000-01-01

    A hierarchical domain decomposition boundary element method (HDD-BEM) for solving the multiregion neutron diffusion equation (NDE) has been fully parallelized, both for numerical computations and for data communications, to accomplish a high parallel efficiency on distributed memory message passing parallel computers. Data exchanges between node processors that are repeated during iteration processes of HDD-BEM are implemented, without any intervention of the host processor that was used to supervise parallel processing in the conventional parallelized HDD-BEM (P-HDD-BEM). Thus, the parallel processing can be executed with only cooperative operations of node processors. The communication overhead was even the dominant time consuming part in the conventional P-HDD-BEM, and the parallelization efficiency decreased steeply with the increase of the number of processors. With the parallel data communication, the efficiency is affected only by the number of boundary elements assigned to decomposed subregions, and the communication overhead can be drastically reduced. This feature can be particularly advantageous in the analysis of three-dimensional problems where a large number of processors are required. The proposed P-HDD-BEM offers a promising solution to the deterioration problem of parallel efficiency and opens a new path to parallel computations of NDEs on distributed memory message passing parallel computers. (author)

  12. Design strategies for irregularly adapting parallel applications

    Oliker, Leonid; Biswas, Rupak; Shan, Hongzhang; Sing, Jaswinder Pal

    2000-01-01

    Achieving scalable performance for dynamic irregular applications is eminently challenging. Traditional message-passing approaches have been making steady progress towards this goal; however, they suffer from complex implementation requirements. The use of a global address space greatly simplifies the programming task, but can degrade the performance of dynamically adapting computations. In this work, we examine two major classes of adaptive applications, under five competing programming methodologies and four leading parallel architectures. Results indicate that it is possible to achieve message-passing performance using shared-memory programming techniques by carefully following the same high level strategies. Adaptive applications have computational work loads and communication patterns which change unpredictably at runtime, requiring dynamic load balancing to achieve scalable performance on parallel machines. Efficient parallel implementations of such adaptive applications are therefore a challenging task. This work examines the implementation of two typical adaptive applications, Dynamic Remeshing and N-Body, across various programming paradigms and architectural platforms. We compare several critical factors of the parallel code development, including performance, programmability, scalability, algorithmic development, and portability

  13. A Parallel Workload Model and its Implications for Processor Allocation

    1996-11-01

    with SEV or AVG, both of which can tolerate c = 0.4 { 0.6 before their performance deteriorates signi cantly. On the other hand, Setia [10] has...Sanjeev. K Setia . The interaction between memory allocation and adaptive partitioning in message-passing multicomputers. In IPPS 󈨣 Workshop on Job...Scheduling Strategies for Parallel Processing, pages 89{99, 1995. [11] Sanjeev K. Setia and Satish K. Tripathi. An analysis of several processor

  14. Parallel computing and networking; Heiretsu keisanki to network

    Asakawa, E; Tsuru, T [Japan National Oil Corp., Tokyo (Japan); Matsuoka, T [Japan Petroleum Exploration Co. Ltd., Tokyo (Japan)

    1996-05-01

    This paper describes the trend of parallel computers used in geophysical exploration. Around 1993 was the early days when the parallel computers began to be used for geophysical exploration. Classification of these computers those days was mainly MIMD (multiple instruction stream, multiple data stream), SIMD (single instruction stream, multiple data stream) and the like. Parallel computers were publicized in the 1994 meeting of the Geophysical Exploration Society as a `high precision imaging technology`. Concerning the library of parallel computers, there was a shift to PVM (parallel virtual machine) in 1993 and to MPI (message passing interface) in 1995. In addition, the compiler of FORTRAN90 was released with support implemented for data parallel and vector computers. In 1993, networks used were Ethernet, FDDI, CDDI and HIPPI. In 1995, the OC-3 products under ATM began to propagate. However, ATM remains to be an interoffice high speed network because the ATM service has not spread yet for the public network. 1 ref.

  15. Parallel MCNP Monte Carlo transport calculations with MPI

    Wagner, J.C.; Haghighat, A.

    1996-01-01

    The steady increase in computational performance has made Monte Carlo calculations for large/complex systems possible. However, in order to make these calculations practical, order of magnitude increases in performance are necessary. The Monte Carlo method is inherently parallel (particles are simulated independently) and thus has the potential for near-linear speedup with respect to the number of processors. Further, the ever-increasing accessibility of parallel computers, such as workstation clusters, facilitates the practical use of parallel Monte Carlo. Recognizing the nature of the Monte Carlo method and the trends in available computing, the code developers at Los Alamos National Laboratory implemented the message-passing general-purpose Monte Carlo radiation transport code MCNP (version 4A). The PVM package was chosen by the MCNP code developers because it supports a variety of communication networks, several UNIX platforms, and heterogeneous computer systems. This PVM version of MCNP has been shown to produce speedups that approach the number of processors and thus, is a very useful tool for transport analysis. Due to software incompatibilities on the local IBM SP2, PVM has not been available, and thus it is not possible to take advantage of this useful tool. Hence, it became necessary to implement an alternative message-passing library package into MCNP. Because the message-passing interface (MPI) is supported on the local system, takes advantage of the high-speed communication switches in the SP2, and is considered to be the emerging standard, it was selected

  16. How to Build an AppleSeed: A Parallel Macintosh Cluster for Numerically Intensive Computing

    Decyk, V. K.; Dauger, D. E.

    We have constructed a parallel cluster consisting of a mixture of Apple Macintosh G3 and G4 computers running the Mac OS, and have achieved very good performance on numerically intensive, parallel plasma particle-incell simulations. A subset of the MPI message-passing library was implemented in Fortran77 and C. This library enabled us to port code, without modification, from other parallel processors to the Macintosh cluster. Unlike Unix-based clusters, no special expertise in operating systems is required to build and run the cluster. This enables us to move parallel computing from the realm of experts to the main stream of computing.

  17. Passing excellence

    Tsoupikova, Daria

    2007-02-01

    This paper describes the research and development of a virtual reality visualization project "Passing excellence" about the world famous architectural ensemble "Kizhi". The Kizhi Pogost is located on an island in Lake Onega in northern Karelia in Russia. It is an authentic museum of an ancient wood building tradition which presents a unique artistic achievement. This ensemble preserves a concentration of masterpieces of the Russian heritage and is included in the List of Most Endangered Sites of the World Monuments Watch protected by World Heritage List of UNESCO. The project strives to create a unique virtual observation of the dynamics of the architectural changes of the museum area beginning from the 15th Century up to the 21st Century. The visualization is being created to restore the original architecture of Kizhi island based on the detailed photographs, architectural and geometric measurements, textural data, video surveys and resources from the Kizhi State Open-Air Museum archives. The project is being developed using Electro, an application development environment for the tiled display high-resolution graphics visualization system and can be shown on the virtual reality systems such as the GeoWall TM and the C-Wall.

  18. Plasma Physics Calculations on a Parallel Macintosh Cluster

    Decyk, Viktor; Dauger, Dean; Kokelaar, Pieter

    2000-03-01

    We have constructed a parallel cluster consisting of 16 Apple Macintosh G3 computers running the MacOS, and achieved very good performance on numerically intensive, parallel plasma particle-in-cell simulations. A subset of the MPI message-passing library was implemented in Fortran77 and C. This library enabled us to port code, without modification, from other parallel processors to the Macintosh cluster. For large problems where message packets are large and relatively few in number, performance of 50-150 MFlops/node is possible, depending on the problem. This is fast enough that 3D calculations can be routinely done. Unlike Unix-based clusters, no special expertise in operating systems is required to build and run the cluster. Full details are available on our web site: http://exodus.physics.ucla.edu/appleseed/.

  19. Pattern-Driven Automatic Parallelization

    Christoph W. Kessler

    1996-01-01

    Full Text Available This article describes a knowledge-based system for automatic parallelization of a wide class of sequential numerical codes operating on vectors and dense matrices, and for execution on distributed memory message-passing multiprocessors. Its main feature is a fast and powerful pattern recognition tool that locally identifies frequently occurring computations and programming concepts in the source code. This tool also works for dusty deck codes that have been "encrypted" by former machine-specific code transformations. Successful pattern recognition guides sophisticated code transformations including local algorithm replacement such that the parallelized code need not emerge from the sequential program structure by just parallelizing the loops. It allows access to an expert's knowledge on useful parallel algorithms, available machine-specific library routines, and powerful program transformations. The partially restored program semantics also supports local array alignment, distribution, and redistribution, and allows for faster and more exact prediction of the performance of the parallelized target code than is usually possible.

  20. Study on MPI/OpenMP hybrid parallelism for Monte Carlo neutron transport code

    Liang Jingang; Xu Qi; Wang Kan; Liu Shiwen

    2013-01-01

    Parallel programming with mixed mode of messages-passing and shared-memory has several advantages when used in Monte Carlo neutron transport code, such as fitting hardware of distributed-shared clusters, economizing memory demand of Monte Carlo transport, improving parallel performance, and so on. MPI/OpenMP hybrid parallelism was implemented based on a one dimension Monte Carlo neutron transport code. Some critical factors affecting the parallel performance were analyzed and solutions were proposed for several problems such as contention access, lock contention and false sharing. After optimization the code was tested finally. It is shown that the hybrid parallel code can reach good performance just as pure MPI parallel program, while it saves a lot of memory usage at the same time. Therefore hybrid parallel is efficient for achieving large-scale parallel of Monte Carlo neutron transport. (authors)

  1. Parallelization of ITOUGH2 using PVM

    Finsterle, Stefan

    1998-01-01

    ITOUGH2 inversions are computationally intensive because the forward problem must be solved many times to evaluate the objective function for different parameter combinations or to numerically calculate sensitivity coefficients. Most of these forward runs are independent from each other and can therefore be performed in parallel. Message passing based on the Parallel Virtual Machine (PVM) system has been implemented into ITOUGH2 to enable parallel processing of ITOUGH2 jobs on a heterogeneous network of Unix workstations. This report describes the PVM system and its implementation into ITOUGH2. Instructions are given for installing PVM, compiling ITOUGH2-PVM for use on a workstation cluster, the preparation of an 1.TOUGH2 input file under PVM, and the execution of an ITOUGH2-PVM application. Examples are discussed, demonstrating the use of ITOUGH2-PVM

  2. Pthreads vs MPI Parallel Performance of Angular-Domain Decomposed S

    Azmy, Y.Y.; Barnett, D.A.

    2000-01-01

    Two programming models for parallelizing the Angular Domain Decomposition (ADD) of the discrete ordinates (S n ) approximation of the neutron transport equation are examined. These are the shared memory model based on the POSIX threads (Pthreads) standard, and the message passing model based on the Message Passing Interface (MPI) standard. These standard libraries are available on most multiprocessor platforms thus making the resulting parallel codes widely portable. The question is: on a fixed platform, and for a particular code solving a given test problem, which of the two programming models delivers better parallel performance? Such comparison is possible on Symmetric Multi-Processors (SMP) architectures in which several CPUs physically share a common memory, and in addition are capable of emulating message passing functionality. Implementation of the two-dimensional,(S n ), Arbitrarily High Order Transport (AHOT) code for solving neutron transport problems using these two parallelization models is described. Measured parallel performance of each model on the COMPAQ AlphaServer 8400 and the SGI Origin 2000 platforms is described, and comparison of the observed speedup for the two programming models is reported. For the case presented in this paper it appears that the MPI implementation scales better than the Pthreads implementation on both platforms

  3. Peformance Tuning and Evaluation of a Parallel Community Climate Model

    Drake, J.B.; Worley, P.H.; Hammond, S.

    1999-11-13

    The Parallel Community Climate Model (PCCM) is a message-passing parallelization of version 2.1 of the Community Climate Model (CCM) developed by researchers at Argonne and Oak Ridge National Laboratories and at the National Center for Atmospheric Research in the early to mid 1990s. In preparation for use in the Department of Energy's Parallel Climate Model (PCM), PCCM has recently been updated with new physics routines from version 3.2 of the CCM, improvements to the parallel implementation, and ports to the SGIKray Research T3E and Origin 2000. We describe our experience in porting and tuning PCCM on these new platforms, evaluating the performance of different parallel algorithm options and comparing performance between the T3E and Origin 2000.

  4. Parallel ray tracing for one-dimensional discrete ordinate computations

    Jarvis, R.D.; Nelson, P.

    1996-01-01

    The ray-tracing sweep in discrete-ordinates, spatially discrete numerical approximation methods applied to the linear, steady-state, plane-parallel, mono-energetic, azimuthally symmetric, neutral-particle transport equation can be reduced to a parallel prefix computation. In so doing, the often severe penalty in convergence rate of the source iteration, suffered by most current parallel algorithms using spatial domain decomposition, can be avoided while attaining parallelism in the spatial domain to whatever extent desired. In addition, the reduction implies parallel algorithm complexity limits for the ray-tracing sweep. The reduction applies to all closed, linear, one-cell functional (CLOF) spatial approximation methods, which encompasses most in current popular use. Scalability test results of an implementation of the algorithm on a 64-node nCube-2S hypercube-connected, message-passing, multi-computer are described. (author)

  5. Implementation of a parallel version of a regional climate model

    Gerstengarbe, F.W. [ed.; Kuecken, M. [Potsdam-Institut fuer Klimafolgenforschung (PIK), Potsdam (Germany); Schaettler, U. [Deutscher Wetterdienst, Offenbach am Main (Germany). Geschaeftsbereich Forschung und Entwicklung

    1997-10-01

    A regional climate model developed by the Max Planck Institute for Meterology and the German Climate Computing Centre in Hamburg based on the `Europa` and `Deutschland` models of the German Weather Service has been parallelized and implemented on the IBM RS/6000 SP computer system of the Potsdam Institute for Climate Impact Research including parallel input/output processing, the explicit Eulerian time-step, the semi-implicit corrections, the normal-mode initialization and the physical parameterizations of the German Weather Service. The implementation utilizes Fortran 90 and the Message Passing Interface. The parallelization strategy used is a 2D domain decomposition. This report describes the parallelization strategy, the parallel I/O organization, the influence of different domain decomposition approaches for static and dynamic load imbalances and first numerical results. (orig.)

  6. Numerical discrepancy between serial and MPI parallel computations

    Sang Bong Lee

    2016-09-01

    Full Text Available Numerical simulations of 1D Burgers equation and 2D sloshing problem were carried out to study numerical discrepancy between serial and parallel computations. The numerical domain was decomposed into 2 and 4 subdomains for parallel computations with message passing interface. The numerical solution of Burgers equation disclosed that fully explicit boundary conditions used on subdomains of parallel computation was responsible for the numerical discrepancy of transient solution between serial and parallel computations. Two dimensional sloshing problems in a rectangular domain were solved using OpenFOAM. After a lapse of initial transient time sloshing patterns of water were significantly different in serial and parallel computations although the same numerical conditions were given. Based on the histograms of pressure measured at two points near the wall the statistical characteristics of numerical solution was not affected by the number of subdomains as much as the transient solution was dependent on the number of subdomains.

  7. Effect of an interactive text-messaging service on patient retention during the first year of HIV care in Kenya (WelTel Retain): an open-label, randomised parallel-group study.

    van der Kop, Mia Liisa; Muhula, Samuel; Nagide, Patrick I; Thabane, Lehana; Gelmon, Lawrence; Awiti, Patricia Opondo; Abunah, Bonface; Kyomuhangi, Lennie Bazira; Budd, Matthew A; Marra, Carlo; Patel, Anik; Karanja, Sarah; Ojakaa, David I; Mills, Edward J; Ekström, Anna Mia; Lester, Richard Todd

    2018-03-01

    Retention of patients in HIV care is crucial to ensure timely treatment initiation, viral suppression, and to avert AIDS-related deaths. We did a randomised trial to determine whether a text-messaging intervention improved retention during the first year of HIV care. This unmasked, randomised parallel-group study was done at two clinics in informal settlements in Nairobi, Kenya. Eligible participants were aged 18 years or older, HIV-positive, had their own mobile phone or access to one, and were able to use simple text messaging (or have somebody who could text message on their behalf). Participants were randomly assigned (1:1), with random block sizes of 2, 4, and 6, to the intervention or control group. Participants in the intervention group received a weekly text message from the automated WelTel service for 1 year and were asked to respond within 48 h. Participants in the control group did not receive text messages. Participants in both groups received usual care, which comprised psychosocial support and counselling; patient education; CD4 cell count; treatment; screening for tuberculosis, opportunistic infections, and sexually transmitted infections; prevention of mother-to-child transmission and family planning services; and up to two telephone calls for missed appointments. The primary outcome was retention in care at 12 months (ie, clinic attendance 10-14 months after the first visit). Participants who did not attend this 12-month appointment were traced, and we considered as retained those who were confirmed to be active in care elsewhere. The data analyst and clinic staff were masked to the group assignment, whereas participants and research nurses were not. We analysed the intention-to-treat population. This trial is registered with ClinicalTrials.gov, number NCT01630304. Between April 4, 2013, and June 4, 2015, we screened 1068 individuals, of whom 700 were recruited. 349 people were allocated to the intervention group and 351 to the control group

  8. Internode data communications in a parallel computer

    Archer, Charles J.; Blocksome, Michael A.; Miller, Douglas R.; Parker, Jeffrey J.; Ratterman, Joseph D.; Smith, Brian E.

    2013-09-03

    Internode data communications in a parallel computer that includes compute nodes that each include main memory and a messaging unit, the messaging unit including computer memory and coupling compute nodes for data communications, in which, for each compute node at compute node boot time: a messaging unit allocates, in the messaging unit's computer memory, a predefined number of message buffers, each message buffer associated with a process to be initialized on the compute node; receives, prior to initialization of a particular process on the compute node, a data communications message intended for the particular process; and stores the data communications message in the message buffer associated with the particular process. Upon initialization of the particular process, the process establishes a messaging buffer in main memory of the compute node and copies the data communications message from the message buffer of the messaging unit into the message buffer of main memory.

  9. Portable parallel programming in a Fortran environment

    May, E.N.

    1989-01-01

    Experience using the Argonne-developed PARMACs macro package to implement a portable parallel programming environment is described. Fortran programs with intrinsic parallelism of coarse and medium granularity are easily converted to parallel programs which are portable among a number of commercially available parallel processors in the class of shared-memory bus-based and local-memory network based MIMD processors. The parallelism is implemented using standard UNIX (tm) tools and a small number of easily understood synchronization concepts (monitors and message-passing techniques) to construct and coordinate multiple cooperating processes on one or many processors. Benchmark results are presented for parallel computers such as the Alliant FX/8, the Encore MultiMax, the Sequent Balance, the Intel iPSC/2 Hypercube and a network of Sun 3 workstations. These parallel machines are typical MIMD types with from 8 to 30 processors, each rated at from 1 to 10 MIPS processing power. The demonstration code used for this work is a Monte Carlo simulation of the response to photons of a ''nearly realistic'' lead, iron and plastic electromagnetic and hadronic calorimeter, using the EGS4 code system. 6 refs., 2 figs., 2 tabs

  10. CUBESIM, Hypercube and Denelcor Hep Parallel Computer Simulation

    Dunigan, T.H.

    1988-01-01

    1 - Description of program or function: CUBESIM is a set of subroutine libraries and programs for the simulation of message-passing parallel computers and shared-memory parallel computers. Subroutines are supplied to simulate the Intel hypercube and the Denelcor HEP parallel computers. The system permits a user to develop and test parallel programs written in C or FORTRAN on a single processor. The user may alter such hypercube parameters as message startup times, packet size, and the computation-to-communication ratio. The simulation generates a trace file that can be used for debugging, performance analysis, or graphical display. 2 - Method of solution: The CUBESIM simulator is linked with the user's parallel application routines to run as a single UNIX process. The simulator library provides a small operating system to perform process and message management. 3 - Restrictions on the complexity of the problem: Up to 128 processors can be simulated with a virtual memory limit of 6 million bytes. Up to 1000 processes can be simulated

  11. Analysis of multigrid methods on massively parallel computers: Architectural implications

    Matheson, Lesley R.; Tarjan, Robert E.

    1993-01-01

    We study the potential performance of multigrid algorithms running on massively parallel computers with the intent of discovering whether presently envisioned machines will provide an efficient platform for such algorithms. We consider the domain parallel version of the standard V cycle algorithm on model problems, discretized using finite difference techniques in two and three dimensions on block structured grids of size 10(exp 6) and 10(exp 9), respectively. Our models of parallel computation were developed to reflect the computing characteristics of the current generation of massively parallel multicomputers. These models are based on an interconnection network of 256 to 16,384 message passing, 'workstation size' processors executing in an SPMD mode. The first model accomplishes interprocessor communications through a multistage permutation network. The communication cost is a logarithmic function which is similar to the costs in a variety of different topologies. The second model allows single stage communication costs only. Both models were designed with information provided by machine developers and utilize implementation derived parameters. With the medium grain parallelism of the current generation and the high fixed cost of an interprocessor communication, our analysis suggests an efficient implementation requires the machine to support the efficient transmission of long messages, (up to 1000 words) or the high initiation cost of a communication must be significantly reduced through an alternative optimization technique. Furthermore, with variable length message capability, our analysis suggests the low diameter multistage networks provide little or no advantage over a simple single stage communications network.

  12. MPI_XSTAR: MPI-based Parallelization of the XSTAR Photoionization Program

    Danehkar, Ashkbiz; Nowak, Michael A.; Lee, Julia C.; Smith, Randall K.

    2018-02-01

    We describe a program for the parallel implementation of multiple runs of XSTAR, a photoionization code that is used to predict the physical properties of an ionized gas from its emission and/or absorption lines. The parallelization program, called MPI_XSTAR, has been developed and implemented in the C++ language by using the Message Passing Interface (MPI) protocol, a conventional standard of parallel computing. We have benchmarked parallel multiprocessing executions of XSTAR, using MPI_XSTAR, against a serial execution of XSTAR, in terms of the parallelization speedup and the computing resource efficiency. Our experience indicates that the parallel execution runs significantly faster than the serial execution, however, the efficiency in terms of the computing resource usage decreases with increasing the number of processors used in the parallel computing.

  13. Extensions to the Parallel Real-Time Artificial Intelligence System (PRAIS) for fault-tolerant heterogeneous cycle-stealing reasoning

    Goldstein, David

    1991-01-01

    Extensions to an architecture for real-time, distributed (parallel) knowledge-based systems called the Parallel Real-time Artificial Intelligence System (PRAIS) are discussed. PRAIS strives for transparently parallelizing production (rule-based) systems, even under real-time constraints. PRAIS accomplished these goals (presented at the first annual C Language Integrated Production System (CLIPS) conference) by incorporating a dynamic task scheduler, operating system extensions for fact handling, and message-passing among multiple copies of CLIPS executing on a virtual blackboard. This distributed knowledge-based system tool uses the portability of CLIPS and common message-passing protocols to operate over a heterogeneous network of processors. Results using the original PRAIS architecture over a network of Sun 3's, Sun 4's and VAX's are presented. Mechanisms using the producer-consumer model to extend the architecture for fault-tolerance and distributed truth maintenance initiation are also discussed.

  14. Research in Parallel Algorithms and Software for Computational Aerosciences

    Domel, Neal D.

    1996-01-01

    Phase 1 is complete for the development of a computational fluid dynamics CFD) parallel code with automatic grid generation and adaptation for the Euler analysis of flow over complex geometries. SPLITFLOW, an unstructured Cartesian grid code developed at Lockheed Martin Tactical Aircraft Systems, has been modified for a distributed memory/massively parallel computing environment. The parallel code is operational on an SGI network, Cray J90 and C90 vector machines, SGI Power Challenge, and Cray T3D and IBM SP2 massively parallel machines. Parallel Virtual Machine (PVM) is the message passing protocol for portability to various architectures. A domain decomposition technique was developed which enforces dynamic load balancing to improve solution speed and memory requirements. A host/node algorithm distributes the tasks. The solver parallelizes very well, and scales with the number of processors. Partially parallelized and non-parallelized tasks consume most of the wall clock time in a very fine grain environment. Timing comparisons on a Cray C90 demonstrate that Parallel SPLITFLOW runs 2.4 times faster on 8 processors than its non-parallel counterpart autotasked over 8 processors.

  15. Optimisation of a parallel ocean general circulation model

    M. I. Beare

    1997-10-01

    Full Text Available This paper presents the development of a general-purpose parallel ocean circulation model, for use on a wide range of computer platforms, from traditional scalar machines to workstation clusters and massively parallel processors. Parallelism is provided, as a modular option, via high-level message-passing routines, thus hiding the technical intricacies from the user. An initial implementation highlights that the parallel efficiency of the model is adversely affected by a number of factors, for which optimisations are discussed and implemented. The resulting ocean code is portable and, in particular, allows science to be achieved on local workstations that could otherwise only be undertaken on state-of-the-art supercomputers.

  16. Optimisation of a parallel ocean general circulation model

    M. I. Beare

    Full Text Available This paper presents the development of a general-purpose parallel ocean circulation model, for use on a wide range of computer platforms, from traditional scalar machines to workstation clusters and massively parallel processors. Parallelism is provided, as a modular option, via high-level message-passing routines, thus hiding the technical intricacies from the user. An initial implementation highlights that the parallel efficiency of the model is adversely affected by a number of factors, for which optimisations are discussed and implemented. The resulting ocean code is portable and, in particular, allows science to be achieved on local workstations that could otherwise only be undertaken on state-of-the-art supercomputers.

  17. Multi-petascale highly efficient parallel supercomputer

    Asaad, Sameh; Bellofatto, Ralph E.; Blocksome, Michael A.; Blumrich, Matthias A.; Boyle, Peter; Brunheroto, Jose R.; Chen, Dong; Cher, Chen-Yong; Chiu, George L.; Christ, Norman; Coteus, Paul W.; Davis, Kristan D.; Dozsa, Gabor J.; Eichenberger, Alexandre E.; Eisley, Noel A.; Ellavsky, Matthew R.; Evans, Kahn C.; Fleischer, Bruce M.; Fox, Thomas W.; Gara, Alan; Giampapa, Mark E.; Gooding, Thomas M.; Gschwind, Michael K.; Gunnels, John A.; Hall, Shawn A.; Haring, Rudolf A.; Heidelberger, Philip; Inglett, Todd A.; Knudson, Brant L.; Kopcsay, Gerard V.; Kumar, Sameer; Mamidala, Amith R.; Marcella, James A.; Megerian, Mark G.; Miller, Douglas R.; Miller, Samuel J.; Muff, Adam J.; Mundy, Michael B.; O'Brien, John K.; O'Brien, Kathryn M.; Ohmacht, Martin; Parker, Jeffrey J.; Poole, Ruth J.; Ratterman, Joseph D.; Salapura, Valentina; Satterfield, David L.; Senger, Robert M.; Steinmacher-Burow, Burkhard; Stockdell, William M.; Stunkel, Craig B.; Sugavanam, Krishnan; Sugawara, Yutaka; Takken, Todd E.; Trager, Barry M.; Van Oosten, James L.; Wait, Charles D.; Walkup, Robert E.; Watson, Alfred T.; Wisniewski, Robert W.; Wu, Peng

    2018-05-15

    A Multi-Petascale Highly Efficient Parallel Supercomputer of 100 petaflop-scale includes node architectures based upon System-On-a-Chip technology, where each processing node comprises a single Application Specific Integrated Circuit (ASIC). The ASIC nodes are interconnected by a five dimensional torus network that optimally maximize the throughput of packet communications between nodes and minimize latency. The network implements collective network and a global asynchronous network that provides global barrier and notification functions. Integrated in the node design include a list-based prefetcher. The memory system implements transaction memory, thread level speculation, and multiversioning cache that improves soft error rate at the same time and supports DMA functionality allowing for parallel processing message-passing.

  18. Introduction to parallel programming

    Brawer, Steven

    1989-01-01

    Introduction to Parallel Programming focuses on the techniques, processes, methodologies, and approaches involved in parallel programming. The book first offers information on Fortran, hardware and operating system models, and processes, shared memory, and simple parallel programs. Discussions focus on processes and processors, joining processes, shared memory, time-sharing with multiple processors, hardware, loops, passing arguments in function/subroutine calls, program structure, and arithmetic expressions. The text then elaborates on basic parallel programming techniques, barriers and race

  19. Subtle Messages.

    Tamplin de Poinsot, Nan

    1999-01-01

    Describes a self-portrait assignment inspired by the work of Frida Kahlo. Discusses Frida Kahlo's artwork and use of surrealist and symbolist views. States that each student had to incorporate personal symbolism in the portrait to convey a message about him or herself in a subtle manner. (CMK)

  20. Advanced Messaging Concept Development Basic Safety Message

    Department of Transportation — Contains all Basic Safety Messages (BSMs) collected during the Advanced Messaging Concept Development (AMCD) field testing program. For this project, all of the Part...

  1. DMS message design workshops.

    2009-03-01

    This report summarizes the training conducted statewide regarding the design and display of messages on : dynamic message signs. The training is based on the Dynamic Message Sign Message Design and Display : Manual (0-4023-P3). Researchers developed ...

  2. Portable programming on parallel/networked computers using the Application Portable Parallel Library (APPL)

    Quealy, Angela; Cole, Gary L.; Blech, Richard A.

    1993-01-01

    The Application Portable Parallel Library (APPL) is a subroutine-based library of communication primitives that is callable from applications written in FORTRAN or C. APPL provides a consistent programmer interface to a variety of distributed and shared-memory multiprocessor MIMD machines. The objective of APPL is to minimize the effort required to move parallel applications from one machine to another, or to a network of homogeneous machines. APPL encompasses many of the message-passing primitives that are currently available on commercial multiprocessor systems. This paper describes APPL (version 2.3.1) and its usage, reports the status of the APPL project, and indicates possible directions for the future. Several applications using APPL are discussed, as well as performance and overhead results.

  3. Parallelization of applications for networks with homogeneous and heterogeneous processors

    Colombet, L.

    1994-01-01

    The aim of this thesis is to study and develop efficient methods for parallelization of scientific applications on parallel computers with distributed memory. The first part presents two libraries of PVM (Parallel Virtual Machine) and MPI (Message Passing Interface) communication tools. They allow implementation of programs on most parallel machines, but also on heterogeneous computer networks. This chapter illustrates the problems faced when trying to evaluate performances of networks with heterogeneous processors. To evaluate such performances, the concepts of speed-up and efficiency have been modified and adapted to account for heterogeneity. The second part deals with a study of parallel application libraries such as ScaLAPACK and with the development of communication masking techniques. The general concept is based on communication anticipation, in particular by pipelining message sending operations. Experimental results on Cray T3D and IBM SP1 machines validates the theoretical studies performed on basic algorithms of the libraries discussed above. Two examples of scientific applications are given: the first is a model of young stars for astrophysics and the other is a model of photon trajectories in the Compton effect. (J.S.). 83 refs., 65 figs., 24 tabs

  4. MPI_XSTAR: MPI-based parallelization of XSTAR program

    Danehkar, A.

    2017-12-01

    MPI_XSTAR parallelizes execution of multiple XSTAR runs using Message Passing Interface (MPI). XSTAR (ascl:9910.008), part of the HEASARC's HEAsoft (ascl:1408.004) package, calculates the physical conditions and emission spectra of ionized gases. MPI_XSTAR invokes XSTINITABLE from HEASoft to generate a job list of XSTAR commands for given physical parameters. The job list is used to make directories in ascending order, where each individual XSTAR is spawned on each processor and outputs are saved. HEASoft's XSTAR2TABLE program is invoked upon the contents of each directory in order to produce table model FITS files for spectroscopy analysis tools.

  5. Hierarchical approach to optimization of parallel matrix multiplication on large-scale platforms

    Hasanov, Khalid

    2014-03-04

    © 2014, Springer Science+Business Media New York. Many state-of-the-art parallel algorithms, which are widely used in scientific applications executed on high-end computing systems, were designed in the twentieth century with relatively small-scale parallelism in mind. Indeed, while in 1990s a system with few hundred cores was considered a powerful supercomputer, modern top supercomputers have millions of cores. In this paper, we present a hierarchical approach to optimization of message-passing parallel algorithms for execution on large-scale distributed-memory systems. The idea is to reduce the communication cost by introducing hierarchy and hence more parallelism in the communication scheme. We apply this approach to SUMMA, the state-of-the-art parallel algorithm for matrix–matrix multiplication, and demonstrate both theoretically and experimentally that the modified Hierarchical SUMMA significantly improves the communication cost and the overall performance on large-scale platforms.

  6. Static and dynamic load-balancing strategies for parallel reservoir simulation

    Anguille, L.; Killough, J.E.; Li, T.M.C.; Toepfer, J.L.

    1995-01-01

    Accurate simulation of the complex phenomena that occur in flow in porous media can tax even the most powerful serial computers. Emergence of new parallel computer architectures as a future efficient tool in reservoir simulation may overcome this difficulty. Unfortunately, major problems remain to be solved before using parallel computers commercially: production serial programs must be rewritten to be efficient in parallel environments and load balancing methods must be explored to evenly distribute the workload on each processor during the simulation. This study implements both a static load-balancing algorithm and a receiver-initiated dynamic load-sharing algorithm to achieve high parallel efficiencies on both the IBM SP2 and Intel IPSC/860 parallel computers. Significant speedup improvement was recorded for both methods. Further optimization of these algorithms yielded a technique with efficiencies as high as 90% and 70% on 8 and 32 nodes, respectively. The increased performance was the result of the minimization of message-passing overhead

  7. Numerical studies on the interaction between atmosphere and ocean using different kinds of parallel computers

    Lee, Soon-Hwan; Chino, Masamichi

    2000-01-01

    The coupling between atmosphere and ocean model has physical and computational difficulties for short-term forecasting of weather and ocean current. In this research, a combination system between high-resolution meso-scale atmospheric model and ocean model has been constructed using a new message-passing library, called Stampi (Seamless Thinking Aid Message Passing Interface), for prediction of particle dispersion at emergency nuclear accident. Stampi, which is based on the MPI (Message Passing Interface) 2 specification, makes us carry out parallel calculations of combination system without parallelization skill to model code. And it realizes dynamic process creation on different machines and communication between spawned one within the scope of MPI semantics. The models included in this combination system are PHYSIC as an atmosphere model, and POM (Princeton Ocean Model) as an ocean model. We applied this combination system to predict sea surface current at Sea of Japan in winter season. Simulation results indicate that the wind stress near the sea surface tends to be a predominant factor to determine surface ocean currents and dispersion of radioactive contamination in the ocean. The surface ocean current is well correspondent with wind direction, induced by high mountains at North Korea. The satellite data of NSCAT (NASA-SCATterometer), which is an image of sea surface current, also agrees well with the results of this system. (author)

  8. Application of a Tsunami Warning Message Metric to refine NOAA NWS Tsunami Warning Messages

    Gregg, C. E.; Johnston, D.; Sorensen, J.; Whitmore, P.

    2013-12-01

    In 2010, the U.S. National Weather Service (NWS) funded a three year project to integrate social science into their Tsunami Program. One of three primary requirements of the grant was to make improvements to tsunami warning messages of the NWS' two Tsunami Warning Centers- the West Coast/Alaska Tsunami Warning Center (WCATWC) in Palmer, Alaska and the Pacific Tsunami Warning Center (PTWC) in Ewa Beach, Hawaii. We conducted focus group meetings with a purposive sample of local, state and Federal stakeholders and emergency managers in six states (AK, WA, OR, CA, HI and NC) and two US Territories (US Virgin Islands and American Samoa) to qualitatively asses information needs in tsunami warning messages using WCATWC tsunami messages for the March 2011 Tohoku earthquake and tsunami event. We also reviewed research literature on behavioral response to warnings to develop a tsunami warning message metric that could be used to guide revisions to tsunami warning messages of both warning centers. The message metric is divided into categories of Message Content, Style, Order and Formatting and Receiver Characteristics. A message is evaluated by cross-referencing the message with the operational definitions of metric factors. Findings are then used to guide revisions of the message until the characteristics of each factor are met. Using findings from this project and findings from a parallel NWS Warning Tiger Team study led by T. Nicolini, the WCATWC implemented the first of two phases of revisions to their warning messages in November 2012. A second phase of additional changes, which will fully implement the redesign of messages based on the metric, is in progress. The resulting messages will reflect current state-of-the-art knowledge on warning message effectiveness. Here we present the message metric; evidence-based rational for message factors; and examples of previous, existing and proposed messages.

  9. Parallel Monte Carlo simulation of aerosol dynamics

    Zhou, K.

    2014-01-01

    A highly efficient Monte Carlo (MC) algorithm is developed for the numerical simulation of aerosol dynamics, that is, nucleation, surface growth, and coagulation. Nucleation and surface growth are handled with deterministic means, while coagulation is simulated with a stochastic method (Marcus-Lushnikov stochastic process). Operator splitting techniques are used to synthesize the deterministic and stochastic parts in the algorithm. The algorithm is parallelized using the Message Passing Interface (MPI). The parallel computing efficiency is investigated through numerical examples. Near 60% parallel efficiency is achieved for the maximum testing case with 3.7 million MC particles running on 93 parallel computing nodes. The algorithm is verified through simulating various testing cases and comparing the simulation results with available analytical and/or other numerical solutions. Generally, it is found that only small number (hundreds or thousands) of MC particles is necessary to accurately predict the aerosol particle number density, volume fraction, and so forth, that is, low order moments of the Particle Size Distribution (PSD) function. Accurately predicting the high order moments of the PSD needs to dramatically increase the number of MC particles. 2014 Kun Zhou et al.

  10. Performance studies of the parallel VIM code

    Shi, B.; Blomquist, R.N.

    1996-01-01

    In this paper, the authors evaluate the performance of the parallel version of the VIM Monte Carlo code on the IBM SPx at the High Performance Computing Research Facility at ANL. Three test problems with contrasting computational characteristics were used to assess effects in performance. A statistical method for estimating the inefficiencies due to load imbalance and communication is also introduced. VIM is a large scale continuous energy Monte Carlo radiation transport program and was parallelized using history partitioning, the master/worker approach, and p4 message passing library. Dynamic load balancing is accomplished when the master processor assigns chunks of histories to workers that have completed a previously assigned task, accommodating variations in the lengths of histories, processor speeds, and worker loads. At the end of each batch (generation), the fission sites and tallies are sent from each worker to the master process, contributing to the parallel inefficiency. All communications are between master and workers, and are serial. The SPx is a scalable 128-node parallel supercomputer with high-performance Omega switches of 63 microsec latency and 35 MBytes/sec bandwidth. For uniform and reproducible performance, they used only the 120 identical regular processors (IBM RS/6000) and excluded the remaining eight planet nodes, which may be loaded by other's jobs

  11. Optimizing spread dynamics on graphs by message passing

    Altarelli, F; Braunstein, A; Dall’Asta, L; Zecchina, R

    2013-01-01

    Cascade processes are responsible for many important phenomena in natural and social sciences. Simple models of irreversible dynamics on graphs, in which nodes activate depending on the state of their neighbors, have been successfully applied to describe cascades in a large variety of contexts. Over the past decades, much effort has been devoted to understanding the typical behavior of the cascades arising from initial conditions extracted at random from some given ensemble. However, the problem of optimizing the trajectory of the system, i.e. of identifying appropriate initial conditions to maximize (or minimize) the final number of active nodes, is still considered to be practically intractable, with the only exception being models that satisfy a sort of diminishing returns property called submodularity. Submodular models can be approximately solved by means of greedy strategies, but by definition they lack cooperative characteristics which are fundamental in many real systems. Here we introduce an efficient algorithm based on statistical physics for the optimization of trajectories in cascade processes on graphs. We show that for a wide class of irreversible dynamics, even in the absence of submodularity, the spread optimization problem can be solved efficiently on large networks. Analytic and algorithmic results on random graphs are complemented by the solution of the spread maximization problem on a real-world network (the Epinions consumer reviews network). (paper)

  12. Discrete geometric analysis of message passing algorithm on graphs

    Watanabe, Yusuke

    2010-04-01

    We often encounter probability distributions given as unnormalized products of non-negative functions. The factorization structures are represented by hypergraphs called factor graphs. Such distributions appear in various fields, including statistics, artificial intelligence, statistical physics, error correcting codes, etc. Given such a distribution, computations of marginal distributions and the normalization constant are often required. However, they are computationally intractable because of their computational costs. One successful approximation method is Loopy Belief Propagation (LBP) algorithm. The focus of this thesis is an analysis of the LBP algorithm. If the factor graph is a tree, i.e. having no cycle, the algorithm gives the exact quantities. If the factor graph has cycles, however, the LBP algorithm does not give exact results and possibly exhibits oscillatory and non-convergent behaviors. The thematic question of this thesis is "How the behaviors of the LBP algorithm are affected by the discrete geometry of the factor graph?" The primary contribution of this thesis is the discovery of a formula that establishes the relation between the LBP, the Bethe free energy and the graph zeta function. This formula provides new techniques for analysis of the LBP algorithm, connecting properties of the graph and of the LBP and the Bethe free energy. We demonstrate applications of the techniques to several problems including (non) convexity of the Bethe free energy, the uniqueness and stability of the LBP fixed point. We also discuss the loop series initiated by Chertkov and Chernyak. The loop series is a subgraph expansion of the normalization constant, or partition function, and reflects the graph geometry. We investigate theoretical natures of the series. Moreover, we show a partial connection between the loop series and the graph zeta function.

  13. Optimizing spread dynamics on graphs by message passing

    Altarelli, F.; Braunstein, A.; Dall'Asta, L.; Zecchina, R.

    2013-09-01

    Cascade processes are responsible for many important phenomena in natural and social sciences. Simple models of irreversible dynamics on graphs, in which nodes activate depending on the state of their neighbors, have been successfully applied to describe cascades in a large variety of contexts. Over the past decades, much effort has been devoted to understanding the typical behavior of the cascades arising from initial conditions extracted at random from some given ensemble. However, the problem of optimizing the trajectory of the system, i.e. of identifying appropriate initial conditions to maximize (or minimize) the final number of active nodes, is still considered to be practically intractable, with the only exception being models that satisfy a sort of diminishing returns property called submodularity. Submodular models can be approximately solved by means of greedy strategies, but by definition they lack cooperative characteristics which are fundamental in many real systems. Here we introduce an efficient algorithm based on statistical physics for the optimization of trajectories in cascade processes on graphs. We show that for a wide class of irreversible dynamics, even in the absence of submodularity, the spread optimization problem can be solved efficiently on large networks. Analytic and algorithmic results on random graphs are complemented by the solution of the spread maximization problem on a real-world network (the Epinions consumer reviews network).

  14. Parallel computation

    Jejcic, A.; Maillard, J.; Maurel, G.; Silva, J.; Wolff-Bacha, F.

    1997-01-01

    The work in the field of parallel processing has developed as research activities using several numerical Monte Carlo simulations related to basic or applied current problems of nuclear and particle physics. For the applications utilizing the GEANT code development or improvement works were done on parts simulating low energy physical phenomena like radiation, transport and interaction. The problem of actinide burning by means of accelerators was approached using a simulation with the GEANT code. A program of neutron tracking in the range of low energies up to the thermal region has been developed. It is coupled to the GEANT code and permits in a single pass the simulation of a hybrid reactor core receiving a proton burst. Other works in this field refers to simulations for nuclear medicine applications like, for instance, development of biological probes, evaluation and characterization of the gamma cameras (collimators, crystal thickness) as well as the method for dosimetric calculations. Particularly, these calculations are suited for a geometrical parallelization approach especially adapted to parallel machines of the TN310 type. Other works mentioned in the same field refer to simulation of the electron channelling in crystals and simulation of the beam-beam interaction effect in colliders. The GEANT code was also used to simulate the operation of germanium detectors designed for natural and artificial radioactivity monitoring of environment

  15. Parallel processing of two-dimensional Sn transport calculations

    Uematsu, M.

    1997-01-01

    A parallel processing method for the two-dimensional S n transport code DOT3.5 has been developed to achieve a drastic reduction in computation time. In the proposed method, parallelization is achieved with angular domain decomposition and/or space domain decomposition. The calculational speed of parallel processing by angular domain decomposition is largely influenced by frequent communications between processing elements. To assess parallelization efficiency, sample problems with up to 32 x 32 spatial meshes were solved with a Sun workstation using the PVM message-passing library. As a result, parallel calculation using 16 processing elements, for example, was found to be nine times as fast as that with one processing element. As for parallel processing by geometry segmentation, the influence of processing element communications on computation time is small; however, discontinuity at the segment boundary degrades convergence speed. To accelerate the convergence, an alternate sweep of angular flux in conjunction with space domain decomposition and a two-step rescaling method consisting of segmentwise rescaling and ordinary pointwise rescaling have been developed. By applying the developed method, the number of iterations needed to obtain a converged flux solution was reduced by a factor of 2. As a result, parallel calculation using 16 processing elements was found to be 5.98 times as fast as the original DOT3.5 calculation

  16. Mixed messages

    Chen, Christopher B.; Hall, Kevin; Tsuyuki, Ross T.

    2014-01-01

    Background: More than 5 years ago, the Blueprint for Pharmacy developed a plan for transitioning pharmacy practice toward more patient-centred care. Much of the strategy for change involves communicating the new vision. Objective: To evaluate the communication of the Vision for Pharmacy by the organizations and corporations that signed the Blueprint for Pharmacy’s Commitment to Act. Methods: The list of 88 signatories of the Commitment to Act was obtained from the Blueprint for Pharmacy document. The website of each of these signatories was searched for all references to the Blueprint for Pharmacy or Vision for Pharmacy. Each of the identified references was then analyzed using summative content analysis. Results: A total of 934 references were identified from the webpages of the 88 signatories. Of these references, 549 were merely links to the Blueprint for Pharmacy’s website, 350 of the references provided some detailed information about the Blueprint for Pharmacy and only 35 references provided any specific plans to transition pharmacy practice. Conclusion: Widespread proliferation of the Vision for Pharmacy has not been achieved. One possible explanation for this is that communication of the vision by the signatories has been incomplete. To ensure the success of future communications, change leaders must develop strategies that consider how individual pharmacists and pharmacies understand the message. PMID:24660012

  17. TPG bus passes

    Staff Association

    2013-01-01

    The CERN Staff Association will stop selling TPG bus passes. All active and retired members of the CERN personnel will be able to purchase Unireso bus passes from the CERN Hostel - Building 39 (Meyrin site) from 1st February 2013. For more information: https://cds.cern.ch/journal/CERNBulletin/2013/04/Announcements/1505279?ln=en

  18. First massively parallel algorithm to be implemented in Apollo-II code

    Stankovski, Z.

    1994-01-01

    The collision probability (CP) method in neutron transport, as applied to arbitrary 2D XY geometries, like the TDT module in APOLLO-II, is very time consuming. Consequently RZ or 3D extensions became prohibitive. Fortunately, this method is very suitable for parallelization. Massively parallel computer architectures, especially MIMD machines, bring a new breath to this method. In this paper we present a CM5 implementation of the CP method. Parallelization is applied to the energy groups, using the CMMD message passing library. In our case we use 32 processors for the standard 99-group APOLLIB-II library. The real advantage of this algorithm will appear in the calculation of the future fine multigroup library (about 8000 groups) of the SAPHYR project with a massively parallel computer (to the order of hundreds of processors). (author). 3 tabs., 4 figs., 4 refs

  19. Same-source parallel implementation of the PSU/NCAR MM5

    Michalakes, J.

    1997-12-31

    The Pennsylvania State/National Center for Atmospheric Research Mesoscale Model is a limited-area model of atmospheric systems, now in its fifth generation, MM5. Designed and maintained for vector and shared-memory parallel architectures, the official version of MM5 does not run on message-passing distributed memory (DM) parallel computers. The authors describe a same-source parallel implementation of the PSU/NCAR MM5 using FLIC, the Fortran Loop and Index Converter. The resulting source is nearly line-for-line identical with the original source code. The result is an efficient distributed memory parallel option to MM5 that can be seamlessly integrated into the official version.

  20. Implementation of Parallel Dynamic Simulation on Shared-Memory vs. Distributed-Memory Environments

    Jin, Shuangshuang; Chen, Yousu; Wu, Di; Diao, Ruisheng; Huang, Zhenyu

    2015-12-09

    Power system dynamic simulation computes the system response to a sequence of large disturbance, such as sudden changes in generation or load, or a network short circuit followed by protective branch switching operation. It consists of a large set of differential and algebraic equations, which is computational intensive and challenging to solve using single-processor based dynamic simulation solution. High-performance computing (HPC) based parallel computing is a very promising technology to speed up the computation and facilitate the simulation process. This paper presents two different parallel implementations of power grid dynamic simulation using Open Multi-processing (OpenMP) on shared-memory platform, and Message Passing Interface (MPI) on distributed-memory clusters, respectively. The difference of the parallel simulation algorithms and architectures of the two HPC technologies are illustrated, and their performances for running parallel dynamic simulation are compared and demonstrated.

  1. Parallelization of a Monte Carlo particle transport simulation code

    Hadjidoukas, P.; Bousis, C.; Emfietzoglou, D.

    2010-05-01

    We have developed a high performance version of the Monte Carlo particle transport simulation code MC4. The original application code, developed in Visual Basic for Applications (VBA) for Microsoft Excel, was first rewritten in the C programming language for improving code portability. Several pseudo-random number generators have been also integrated and studied. The new MC4 version was then parallelized for shared and distributed-memory multiprocessor systems using the Message Passing Interface. Two parallel pseudo-random number generator libraries (SPRNG and DCMT) have been seamlessly integrated. The performance speedup of parallel MC4 has been studied on a variety of parallel computing architectures including an Intel Xeon server with 4 dual-core processors, a Sun cluster consisting of 16 nodes of 2 dual-core AMD Opteron processors and a 200 dual-processor HP cluster. For large problem size, which is limited only by the physical memory of the multiprocessor server, the speedup results are almost linear on all systems. We have validated the parallel implementation against the serial VBA and C implementations using the same random number generator. Our experimental results on the transport and energy loss of electrons in a water medium show that the serial and parallel codes are equivalent in accuracy. The present improvements allow for studying of higher particle energies with the use of more accurate physical models, and improve statistics as more particles tracks can be simulated in low response time.

  2. A parallel solution for high resolution histological image analysis.

    Bueno, G; González, R; Déniz, O; García-Rojo, M; González-García, J; Fernández-Carrobles, M M; Vállez, N; Salido, J

    2012-10-01

    This paper describes a general methodology for developing parallel image processing algorithms based on message passing for high resolution images (on the order of several Gigabytes). These algorithms have been applied to histological images and must be executed on massively parallel processing architectures. Advances in new technologies for complete slide digitalization in pathology have been combined with developments in biomedical informatics. However, the efficient use of these digital slide systems is still a challenge. The image processing that these slides are subject to is still limited both in terms of data processed and processing methods. The work presented here focuses on the need to design and develop parallel image processing tools capable of obtaining and analyzing the entire gamut of information included in digital slides. Tools have been developed to assist pathologists in image analysis and diagnosis, and they cover low and high-level image processing methods applied to histological images. Code portability, reusability and scalability have been tested by using the following parallel computing architectures: distributed memory with massive parallel processors and two networks, INFINIBAND and Myrinet, composed of 17 and 1024 nodes respectively. The parallel framework proposed is flexible, high performance solution and it shows that the efficient processing of digital microscopic images is possible and may offer important benefits to pathology laboratories. Copyright © 2012 Elsevier Ireland Ltd. All rights reserved.

  3. Development of a parallel genetic algorithm using MPI and its application in a nuclear reactor core. Design optimization

    Waintraub, Marcel; Pereira, Claudio M.N.A.; Baptista, Rafael P.

    2005-01-01

    This work presents the development of a distributed parallel genetic algorithm applied to a nuclear reactor core design optimization. In the implementation of the parallelism, a 'Message Passing Interface' (MPI) library, standard for parallel computation in distributed memory platforms, has been used. Another important characteristic of MPI is its portability for various architectures. The main objectives of this paper are: validation of the results obtained by the application of this algorithm in a nuclear reactor core optimization problem, through comparisons with previous results presented by Pereira et al.; and performance test of the Brazilian Nuclear Engineering Institute (IEN) cluster in reactors physics optimization problems. The experiments demonstrated that the developed parallel genetic algorithm using the MPI library presented significant gains in the obtained results and an accentuated reduction of the processing time. Such results ratify the use of the parallel genetic algorithms for the solution of nuclear reactor core optimization problems. (author)

  4. Algorithm for solving the linear Cauchy problem for large systems of ordinary differential equations with the use of parallel computations

    Moryakov, A. V., E-mail: sailor@orc.ru [National Research Centre Kurchatov Institute (Russian Federation)

    2016-12-15

    An algorithm for solving the linear Cauchy problem for large systems of ordinary differential equations is presented. The algorithm for systems of first-order differential equations is implemented in the EDELWEISS code with the possibility of parallel computations on supercomputers employing the MPI (Message Passing Interface) standard for the data exchange between parallel processes. The solution is represented by a series of orthogonal polynomials on the interval [0, 1]. The algorithm is characterized by simplicity and the possibility to solve nonlinear problems with a correction of the operator in accordance with the solution obtained in the previous iterative process.

  5. Parallel implementation of the PHOENIX generalized stellar atmosphere program. II. Wavelength parallelization

    Baron, E.; Hauschildt, Peter H.

    1998-01-01

    We describe an important addition to the parallel implementation of our generalized nonlocal thermodynamic equilibrium (NLTE) stellar atmosphere and radiative transfer computer program PHOENIX. In a previous paper in this series we described data and task parallel algorithms we have developed for radiative transfer, spectral line opacity, and NLTE opacity and rate calculations. These algorithms divided the work spatially or by spectral lines, that is, distributing the radial zones, individual spectral lines, or characteristic rays among different processors and employ, in addition, task parallelism for logically independent functions (such as atomic and molecular line opacities). For finite, monotonic velocity fields, the radiative transfer equation is an initial value problem in wavelength, and hence each wavelength point depends upon the previous one. However, for sophisticated NLTE models of both static and moving atmospheres needed to accurately describe, e.g., novae and supernovae, the number of wavelength points is very large (200,000 - 300,000) and hence parallelization over wavelength can lead both to considerable speedup in calculation time and the ability to make use of the aggregate memory available on massively parallel supercomputers. Here, we describe an implementation of a pipelined design for the wavelength parallelization of PHOENIX, where the necessary data from the processor working on a previous wavelength point is sent to the processor working on the succeeding wavelength point as soon as it is known. Our implementation uses a MIMD design based on a relatively small number of standard message passing interface (MPI) library calls and is fully portable between serial and parallel computers. copyright 1998 The American Astronomical Society

  6. A multitransputer parallel processing system (MTPPS)

    Jethra, A.K.; Pande, S.S.; Borkar, S.P.; Khare, A.N.; Ghodgaonkar, M.D.; Bairi, B.R.

    1993-01-01

    This report describes the design and implementation of a 16 node Multi Transputer Parallel Processing System(MTPPS) which is a platform for parallel program development. It is a MIMD machine based on message passing paradigm. The basic compute engine is an Inmos Transputer Ims T800-20. Transputer with local memory constitutes the processing element (NODE) of this MIMD architecture. Multiple NODES can be connected to each other in an identifiable network topology through the high speed serial links of the transputer. A Network Configuration Unit (NCU) incorporates the necessary hardware to provide software controlled network configuration. System is modularly expandable and more NODES can be added to the system to achieve the required processing power. The system is backend to the IBM-PC which has been integrated into the system to provide user I/O interface. PC resources are available to the programmer. Interface hardware between the PC and the network of transputers is INMOS compatible. Therefore, all the commercially available development software compatible to INMOS products can run on this system. While giving the details of design and implementation, this report briefly summarises MIMD Architectures, Transputer Architecture and Parallel Processing Software Development issues. LINPACK performance evaluation of the system and solutions of neutron physics and plasma physics problem have been discussed along with results. (author). 12 refs., 22 figs., 3 tabs., 3 appendixes

  7. Passing and Catching in Rugby.

    Namudu, Mike M.

    This booklet contains the fundamentals for rugby at the primary school level. It deals primarily with passing and catching the ball. It contains instructions on (1) holding the ball for passing, (2) passing the ball to the left--standing, (3) passing the ball to the left--running, (4) making a switch pass, (5) the scrum half's normal pass, (6) the…

  8. Preventing messaging queue deadlocks in a DMA environment

    Blocksome, Michael A; Chen, Dong; Gooding, Thomas; Heidelberger, Philip; Parker, Jeff

    2014-01-14

    Embodiments of the invention may be used to manage message queues in a parallel computing environment to prevent message queue deadlock. A direct memory access controller of a compute node may determine when a messaging queue is full. In response, the DMA may generate and interrupt. An interrupt handler may stop the DMA and swap all descriptors from the full messaging queue into a larger queue (or enlarge the original queue). The interrupt handler then restarts the DMA. Alternatively, the interrupt handler stops the DMA, allocates a memory block to hold queue data, and then moves descriptors from the full messaging queue into the allocated memory block. The interrupt handler then restarts the DMA. During a normal messaging advance cycle, a messaging manager attempts to inject the descriptors in the memory block into other messaging queues until the descriptors have all been processed.

  9. Intranode data communications in a parallel computer

    Archer, Charles J; Blocksome, Michael A; Miller, Douglas R; Ratterman, Joseph D; Smith, Brian E

    2014-01-07

    Intranode data communications in a parallel computer that includes compute nodes configured to execute processes, where the data communications include: allocating, upon initialization of a first process of a computer node, a region of shared memory; establishing, by the first process, a predefined number of message buffers, each message buffer associated with a process to be initialized on the compute node; sending, to a second process on the same compute node, a data communications message without determining whether the second process has been initialized, including storing the data communications message in the message buffer of the second process; and upon initialization of the second process: retrieving, by the second process, a pointer to the second process's message buffer; and retrieving, by the second process from the second process's message buffer in dependence upon the pointer, the data communications message sent by the first process.

  10. Intranode data communications in a parallel computer

    Archer, Charles J; Blocksome, Michael A; Miller, Douglas R; Ratterman, Joseph D; Smith, Brian E

    2013-07-23

    Intranode data communications in a parallel computer that includes compute nodes configured to execute processes, where the data communications include: allocating, upon initialization of a first process of a compute node, a region of shared memory; establishing, by the first process, a predefined number of message buffers, each message buffer associated with a process to be initialized on the compute node; sending, to a second process on the same compute node, a data communications message without determining whether the second process has been initialized, including storing the data communications message in the message buffer of the second process; and upon initialization of the second process: retrieving, by the second process, a pointer to the second process's message buffer; and retrieving, by the second process from the second process's message buffer in dependence upon the pointer, the data communications message sent by the first process.

  11. Xyce parallel electronic simulator : users' guide.

    Mei, Ting; Rankin, Eric Lamont; Thornquist, Heidi K.; Santarelli, Keith R.; Fixel, Deborah A.; Coffey, Todd Stirling; Russo, Thomas V.; Schiek, Richard Louis; Warrender, Christina E.; Keiter, Eric Richard; Pawlowski, Roger Patrick

    2011-05-01

    This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: (1) Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). Note that this includes support for most popular parallel and serial computers; (2) Improved performance for all numerical kernels (e.g., time integrator, nonlinear and linear solvers) through state-of-the-art algorithms and novel techniques. (3) Device models which are specifically tailored to meet Sandia's needs, including some radiation-aware devices (for Sandia users only); and (4) Object-oriented code design and implementation using modern coding practices that ensure that the Xyce Parallel Electronic Simulator will be maintainable and extensible far into the future. Xyce is a parallel code in the most general sense of the phrase - a message passing parallel implementation - which allows it to run efficiently on the widest possible number of computing platforms. These include serial, shared-memory and distributed-memory parallel as well as heterogeneous platforms. Careful attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows. The development of Xyce provides a platform for computational research and development aimed specifically at the needs of the Laboratory. With Xyce, Sandia has an 'in-house' capability with which both new electrical (e.g., device model development) and algorithmic (e.g., faster time-integration methods, parallel solver algorithms) research and development can be performed. As a result, Xyce is

  12. A hybrid parallel framework for the cellular Potts model simulations

    Jiang, Yi [Los Alamos National Laboratory; He, Kejing [SOUTH CHINA UNIV; Dong, Shoubin [SOUTH CHINA UNIV

    2009-01-01

    The Cellular Potts Model (CPM) has been widely used for biological simulations. However, most current implementations are either sequential or approximated, which can't be used for large scale complex 3D simulation. In this paper we present a hybrid parallel framework for CPM simulations. The time-consuming POE solving, cell division, and cell reaction operation are distributed to clusters using the Message Passing Interface (MPI). The Monte Carlo lattice update is parallelized on shared-memory SMP system using OpenMP. Because the Monte Carlo lattice update is much faster than the POE solving and SMP systems are more and more common, this hybrid approach achieves good performance and high accuracy at the same time. Based on the parallel Cellular Potts Model, we studied the avascular tumor growth using a multiscale model. The application and performance analysis show that the hybrid parallel framework is quite efficient. The hybrid parallel CPM can be used for the large scale simulation ({approx}10{sup 8} sites) of complex collective behavior of numerous cells ({approx}10{sup 6}).

  13. Development of parallel Fokker-Planck code ALLAp

    Batishcheva, A.A.; Sigmar, D.J.; Koniges, A.E.

    1996-01-01

    We report on our ongoing development of the 3D Fokker-Planck code ALLA for a highly collisional scrape-off-layer (SOL) plasma. A SOL with strong gradients of density and temperature in the spatial dimension is modeled. Our method is based on a 3-D adaptive grid (in space, magnitude of the velocity, and cosine of the pitch angle) and a second order conservative scheme. Note that the grid size is typically 100 x 257 x 65 nodes. It was shown in our previous work that only these capabilities make it possible to benchmark a 3D code against a spatially-dependent self-similar solution of a kinetic equation with the Landau collision term. In the present work we show results of a more precise benchmarking against the exact solutions of the kinetic equation using a new parallel code ALLAp with an improved method of parallelization and a modified boundary condition at the plasma edge. We also report first results from the code parallelization using Message Passing Interface for a Massively Parallel CRI T3D platform. We evaluate the ALLAp code performance versus the number of T3D processors used and compare its efficiency against a Work/Data Sharing parallelization scheme and a workstation version

  14. EMI Messaging Guidelines

    Cons, L.

    2011-01-01

    Guidelines for potential users of messaging within EMI. The goal is to provide enough practical information so that EMI product teams can start investigating whether using messaging in their products can be beneficial or not.

  15. Mouse myocardial first-pass perfusion MR imaging

    Coolen, Bram F.; Moonen, Rik P. M.; Paulis, Leonie E. M.; Geelen, Tessa; Nicolay, Klaas; Strijkers, Gustav J.

    2010-01-01

    A first-pass myocardial perfusion sequence for mouse cardiac MRI is presented. A segmented ECG-triggered acquisition combined with parallel imaging acceleration was used to capture the first pass of a Gd-DTPA bolus through the mouse heart with a temporal resolution of 300-400 msec. The method was

  16. Mouse myocardial first-pass perfusion MR imaging

    Coolen, B.F.; Moonen, R.P.M.; Paulis, L.E.M.; Geelen, T.; Nicolay, K.; Strijkers, G.J.

    2010-01-01

    A first-pass myocardial perfusion sequence for mouse cardiac MRI is presented. A segmented ECG-triggered acquisition combined with parallel imaging acceleration was used to capture the first pass of a Gd-DTPA bolus through the mouse heart with a temporal resolution of 300–400 msec. The method was

  17. An efficient implementation of parallel molecular dynamics method on SMP cluster architecture

    Suzuki, Masaaki; Okuda, Hiroshi; Yagawa, Genki

    2003-01-01

    The authors have applied MPI/OpenMP hybrid parallel programming model to parallelize a molecular dynamics (MD) method on a symmetric multiprocessor (SMP) cluster architecture. In that architecture, it can be expected that the hybrid parallel programming model, which uses the message passing library such as MPI for inter-SMP node communication and the loop directive such as OpenMP for intra-SNP node parallelization, is the most effective one. In this study, the parallel performance of the hybrid style has been compared with that of conventional flat parallel programming style, which uses only MPI, both in cases the fast multipole method (FMM) is employed for computing long-distance interactions and that is not employed. The computer environments used here are Hitachi SR8000/MPP placed at the University of Tokyo. The results of calculation are as follows. Without FMM, the parallel efficiency using 16 SMP nodes (128 PEs) is: 90% with the hybrid style, 75% with the flat-MPI style for MD simulation with 33,402 atoms. With FMM, the parallel efficiency using 16 SMP nodes (128 PEs) is: 60% with the hybrid style, 48% with the flat-MPI style for MD simulation with 117,649 atoms. (author)

  18. Parallel Sn Sweeps on Unstructured Grids: Algorithms for Prioritization, Grid Partitioning, and Cycle Detection

    Plimpton, Steven J.; Hendrickson, Bruce; Burns, Shawn P.; McLendon, William III; Rauchwerger, Lawrence

    2005-01-01

    The method of discrete ordinates is commonly used to solve the Boltzmann transport equation. The solution in each ordinate direction is most efficiently computed by sweeping the radiation flux across the computational grid. For unstructured grids this poses many challenges, particularly when implemented on distributed-memory parallel machines where the grid geometry is spread across processors. We present several algorithms relevant to this approach: (a) an asynchronous message-passing algorithm that performs sweeps simultaneously in multiple ordinate directions, (b) a simple geometric heuristic to prioritize the computational tasks that a processor works on, (c) a partitioning algorithm that creates columnar-style decompositions for unstructured grids, and (d) an algorithm for detecting and eliminating cycles that sometimes exist in unstructured grids and can prevent sweeps from successfully completing. Algorithms (a) and (d) are fully parallel; algorithms (b) and (c) can be used in conjunction with (a) to achieve higher parallel efficiencies. We describe our message-passing implementations of these algorithms within a radiation transport package. Performance and scalability results are given for unstructured grids with up to 3 million elements (500 million unknowns) running on thousands of processors of Sandia National Laboratories' Intel Tflops machine and DEC-Alpha CPlant cluster

  19. A parallel neural network training algorithm for control of discrete dynamical systems.

    Gordillo, J. L.; Hanebutte, U. R.; Vitela, J. E.

    1998-01-20

    In this work we present a parallel neural network controller training code, that uses MPI, a portable message passing environment. A comprehensive performance analysis is reported which compares results of a performance model with actual measurements. The analysis is made for three different load assignment schemes: block distribution, strip mining and a sliding average bin packing (best-fit) algorithm. Such analysis is crucial since optimal load balance can not be achieved because the work load information is not available a priori. The speedup results obtained with the above schemes are compared with those corresponding to the bin packing load balance scheme with perfect load prediction based on a priori knowledge of the computing effort. Two multiprocessor platforms: a SGI/Cray Origin 2000 and a IBM SP have been utilized for this study. It is shown that for the best load balance scheme a parallel efficiency of over 50% for the entire computation is achieved by 17 processors of either parallel computers.

  20. Porting Gravitational Wave Signal Extraction to Parallel Virtual Machine (PVM)

    Thirumalainambi, Rajkumar; Thompson, David E.; Redmon, Jeffery

    2009-01-01

    Laser Interferometer Space Antenna (LISA) is a planned NASA-ESA mission to be launched around 2012. The Gravitational Wave detection is fundamentally the determination of frequency, source parameters, and waveform amplitude derived in a specific order from the interferometric time-series of the rotating LISA spacecrafts. The LISA Science Team has developed a Mock LISA Data Challenge intended to promote the testing of complicated nested search algorithms to detect the 100-1 millihertz frequency signals at amplitudes of 10E-21. However, it has become clear that, sequential search of the parameters is very time consuming and ultra-sensitive; hence, a new strategy has been developed. Parallelization of existing sequential search algorithms of Gravitational Wave signal identification consists of decomposing sequential search loops, beginning with outermost loops and working inward. In this process, the main challenge is to detect interdependencies among loops and partitioning the loops so as to preserve concurrency. Existing parallel programs are based upon either shared memory or distributed memory paradigms. In PVM, master and node programs are used to execute parallelization and process spawning. The PVM can handle process management and process addressing schemes using a virtual machine configuration. The task scheduling and the messaging and signaling can be implemented efficiently for the LISA Gravitational Wave search process using a master and 6 nodes. This approach is accomplished using a server that is available at NASA Ames Research Center, and has been dedicated to the LISA Data Challenge Competition. Historically, gravitational wave and source identification parameters have taken around 7 days in this dedicated single thread Linux based server. Using PVM approach, the parameter extraction problem can be reduced to within a day. The low frequency computation and a proxy signal-to-noise ratio are calculated in separate nodes that are controlled by the master

  1. Testing New Programming Paradigms with NAS Parallel Benchmarks

    Jin, H.; Frumkin, M.; Schultz, M.; Yan, J.

    2000-01-01

    Over the past decade, high performance computing has evolved rapidly, not only in hardware architectures but also with increasing complexity of real applications. Technologies have been developing to aim at scaling up to thousands of processors on both distributed and shared memory systems. Development of parallel programs on these computers is always a challenging task. Today, writing parallel programs with message passing (e.g. MPI) is the most popular way of achieving scalability and high performance. However, writing message passing programs is difficult and error prone. Recent years new effort has been made in defining new parallel programming paradigms. The best examples are: HPF (based on data parallelism) and OpenMP (based on shared memory parallelism). Both provide simple and clear extensions to sequential programs, thus greatly simplify the tedious tasks encountered in writing message passing programs. HPF is independent of memory hierarchy, however, due to the immaturity of compiler technology its performance is still questionable. Although use of parallel compiler directives is not new, OpenMP offers a portable solution in the shared-memory domain. Another important development involves the tremendous progress in the internet and its associated technology. Although still in its infancy, Java promisses portability in a heterogeneous environment and offers possibility to "compile once and run anywhere." In light of testing these new technologies, we implemented new parallel versions of the NAS Parallel Benchmarks (NPBs) with HPF and OpenMP directives, and extended the work with Java and Java-threads. The purpose of this study is to examine the effectiveness of alternative programming paradigms. NPBs consist of five kernels and three simulated applications that mimic the computation and data movement of large scale computational fluid dynamics (CFD) applications. We started with the serial version included in NPB2.3. Optimization of memory and cache usage

  2. Parallel computing techniques for rotorcraft aerodynamics

    Ekici, Kivanc

    The modification of unsteady three-dimensional Navier-Stokes codes for application on massively parallel and distributed computing environments is investigated. The Euler/Navier-Stokes code TURNS (Transonic Unsteady Rotor Navier-Stokes) was chosen as a test bed because of its wide use by universities and industry. For the efficient implementation of TURNS on parallel computing systems, two algorithmic changes are developed. First, main modifications to the implicit operator, Lower-Upper Symmetric Gauss Seidel (LU-SGS) originally used in TURNS, is performed. Second, application of an inexact Newton method, coupled with a Krylov subspace iterative method (Newton-Krylov method) is carried out. Both techniques have been tried previously for the Euler equations mode of the code. In this work, we have extended the methods to the Navier-Stokes mode. Several new implicit operators were tried because of convergence problems of traditional operators with the high cell aspect ratio (CAR) grids needed for viscous calculations on structured grids. Promising results for both Euler and Navier-Stokes cases are presented for these operators. For the efficient implementation of Newton-Krylov methods to the Navier-Stokes mode of TURNS, efficient preconditioners must be used. The parallel implicit operators used in the previous step are employed as preconditioners and the results are compared. The Message Passing Interface (MPI) protocol has been used because of its portability to various parallel architectures. It should be noted that the proposed methodology is general and can be applied to several other CFD codes (e.g. OVERFLOW).

  3. Parallel Numerical Simulations of Water Reservoirs

    Torres, Pedro; Mangiavacchi, Norberto

    2010-11-01

    The study of the water flow and scalar transport in water reservoirs is important for the determination of the water quality during the initial stages of the reservoir filling and during the life of the reservoir. For this scope, a parallel 2D finite element code for solving the incompressible Navier-Stokes equations coupled with scalar transport was implemented using the message-passing programming model, in order to perform simulations of hidropower water reservoirs in a computer cluster environment. The spatial discretization is based on the MINI element that satisfies the Babuska-Brezzi (BB) condition, which provides sufficient conditions for a stable mixed formulation. All the distributed data structures needed in the different stages of the code, such as preprocessing, solving and post processing, were implemented using the PETSc library. The resulting linear systems for the velocity and the pressure fields were solved using the projection method, implemented by an approximate block LU factorization. In order to increase the parallel performance in the solution of the linear systems, we employ the static condensation method for solving the intermediate velocity at vertex and centroid nodes separately. We compare performance results of the static condensation method with the approach of solving the complete system. In our tests the static condensation method shows better performance for large problems, at the cost of an increased memory usage. Performance results for other intensive parts of the code in a computer cluster are also presented.

  4. New parallel SOR method by domain partitioning

    Xie, Dexuan [Courant Inst. of Mathematical Sciences New York Univ., NY (United States)

    1996-12-31

    In this paper, we propose and analyze a new parallel SOR method, the PSOR method, formulated by using domain partitioning together with an interprocessor data-communication technique. For the 5-point approximation to the Poisson equation on a square, we show that the ordering of the PSOR based on the strip partition leads to a consistently ordered matrix, and hence the PSOR and the SOR using the row-wise ordering have the same convergence rate. However, in general, the ordering used in PSOR may not be {open_quote}consistently ordered{close_quotes}. So, there is a need to analyze the convergence of PSOR directly. In this paper, we present a PSOR theory, and show that the PSOR method can have the same asymptotic rate of convergence as the corresponding sequential SOR method for a wide class of linear systems in which the matrix is {open_quotes}consistently ordered{close_quotes}. Finally, we demonstrate the parallel performance of the PSOR method on four different message passing multiprocessors (a KSR1, the Intel Delta, an Intel Paragon and an IBM SP2), along with a comparison with the point Red-Black and four-color SOR methods.

  5. On the Parallel Elliptic Single/Multigrid Solutions about Aligned and Nonaligned Bodies Using the Virtual Machine for Multiprocessors

    A. Averbuch

    1994-01-01

    Full Text Available Parallel elliptic single/multigrid solutions around an aligned and nonaligned body are presented and implemented on two multi-user and single-user shared memory multiprocessors (Sequent Symmetry and MOS and on a distributed memory multiprocessor (a Transputer network. Our parallel implementation uses the Virtual Machine for Muli-Processors (VMMP, a software package that provides a coherent set of services for explicitly parallel application programs running on diverse multiple instruction multiple data (MIMD multiprocessors, both shared memory and message passing. VMMP is intended to simplify parallel program writing and to promote portable and efficient programming. Furthermore, it ensures high portability of application programs by implementing the same services on all target multiprocessors. The performance of our algorithm is investigated in detail. It is seen to fit well the above architectures when the number of processors is less than the maximal number of grid points along the axes. In general, the efficiency in the nonaligned case is higher than in the aligned case. Alignment overhead is observed to be up to 200% in the shared-memory case and up to 65% in the message-passing case. We have demonstrated that when using VMMP, the portability of the algorithms is straightforward and efficient.

  6. Formative Research regarding Kidney Disease Health Information in a Latino American Sample: Associations among Message Frame, Threat, Efficacy, Message Effectiveness, and Behavioral Intention

    Maguire, Katheryn C.; Gardner, Jay; Sopory, Pradeep; Jian, Guowei; Roach, Marcia; Amschlinger, Joe; Moreno, Marcia; Pettey, Gary; Piccone, Gianfranco

    2010-01-01

    Using prospect theory and the extended parallel process model, this study examined the effect of gain/loss message framing on perceptions of severity, susceptibility, response efficacy, and self efficacy (derived from the extended parallel process model), as well as perception of message effectiveness and behavioral intention in a community based…

  7. Parallelization characteristics of the DeCART code

    Cho, J. Y.; Joo, H. G.; Kim, H. Y.; Lee, C. C.; Chang, M. H.; Zee, S. Q.

    2003-12-01

    This report is to describe the parallelization characteristics of the DeCART code and also examine its parallel performance. Parallel computing algorithms are implemented to DeCART to reduce the tremendous computational burden and memory requirement involved in the three-dimensional whole core transport calculation. In the parallelization of the DeCART code, the axial domain decomposition is first realized by using MPI (Message Passing Interface), and then the azimuthal angle domain decomposition by using either MPI or OpenMP. When using the MPI for both the axial and the angle domain decomposition, the concept of MPI grouping is employed for convenient communication in each communication world. For the parallel computation, most of all the computing modules except for the thermal hydraulic module are parallelized. These parallelized computing modules include the MOC ray tracing, CMFD, NEM, region-wise cross section preparation and cell homogenization modules. For the distributed allocation, most of all the MOC and CMFD/NEM variables are allocated only for the assigned planes, which reduces the required memory by a ratio of the number of the assigned planes to the number of all planes. The parallel performance of the DeCART code is evaluated by solving two problems, a rodded variation of the C5G7 MOX three-dimensional benchmark problem and a simplified three-dimensional SMART PWR core problem. In the aspect of parallel performance, the DeCART code shows a good speedup of about 40.1 and 22.4 in the ray tracing module and about 37.3 and 20.2 in the total computing time when using 48 CPUs on the IBM Regatta and 24 CPUs on the LINUX cluster, respectively. In the comparison between the MPI and OpenMP, OpenMP shows a somewhat better performance than MPI. Therefore, it is concluded that the first priority in the parallel computation of the DeCART code is in the axial domain decomposition by using MPI, and then in the angular domain using OpenMP, and finally the angular

  8. Public-channel cryptography based on mutual chaos pass filters.

    Klein, Einat; Gross, Noam; Kopelowitz, Evi; Rosenbluh, Michael; Khaykovich, Lev; Kinzel, Wolfgang; Kanter, Ido

    2006-10-01

    We study the mutual coupling of chaotic lasers and observe both experimentally and in numeric simulations that there exists a regime of parameters for which two mutually coupled chaotic lasers establish isochronal synchronization, while a third laser coupled unidirectionally to one of the pair does not synchronize. We then propose a cryptographic scheme, based on the advantage of mutual coupling over unidirectional coupling, where all the parameters of the system are public knowledge. We numerically demonstrate that in such a scheme the two communicating lasers can add a message signal (compressed binary message) to the transmitted coupling signal and recover the message in both directions with high fidelity by using a mutual chaos pass filter procedure. An attacker, however, fails to recover an errorless message even if he amplifies the coupling signal.

  9. Control rod drop transient analysis with the coupled parallel code pCTF-PARCSv2.7

    Ramos, Enrique; Roman, Jose E.; Abarca, Agustín; Miró, Rafael; Bermejo, Juan A.

    2016-01-01

    Highlights: • An MPI parallel version of the thermal–hydraulic subchannel code COBRA-TF has been developed. • The parallel code has been coupled to the 3D neutron diffusion code PARCSv2.7. • The new codes are validated with a control rod drop transient. - Abstract: In order to reduce the response time when simulating large reactors in detail, a parallel version of the thermal–hydraulic subchannel code COBRA-TF (CTF) has been developed using the standard Message Passing Interface (MPI). The parallelization is oriented to reactor cells, so it is best suited for models consisting of many cells. The generation of the Jacobian matrix is parallelized, in such a way that each processor is in charge of generating the data associated with a subset of cells. Also, the solution of the linear system of equations is done in parallel, using the PETSc toolkit. With the goal of creating a powerful tool to simulate the reactor core behavior during asymmetrical transients, the 3D neutron diffusion code PARCSv2.7 (PARCS) has been coupled with the parallel version of CTF (pCTF) using the Parallel Virtual Machine (PVM) technology. In order to validate the correctness of the parallel coupled code, a control rod drop transient has been simulated comparing the results with the real experimental measures acquired during an NPP real test.

  10. Introduction: Mirrors of Passing

    Seebach, Sophie Hooge; Willerslev, Rane

    How is death, time, and materiality interconnected? How to approach an understanding of the world of the dead? In this introduction, we seek to understand how the experience of material decay, of the death of those around us, makes us aware of the passing of time. Through the literary lens of Neil...... Gaiman’s The Graveyard Book, we explore how the world of the dead and the world of the living can intersect; how time and materiality shifts and changes depending on who experiences it. These revelations, based on fiction, provide a mirror through which the reader can experience the varied chapters...

  11. Optimization approaches to mpi and area merging-based parallel buffer algorithm

    Junfu Fan

    Full Text Available On buffer zone construction, the rasterization-based dilation method inevitably introduces errors, and the double-sided parallel line method involves a series of complex operations. In this paper, we proposed a parallel buffer algorithm based on area merging and MPI (Message Passing Interface to improve the performances of buffer analyses on processing large datasets. Experimental results reveal that there are three major performance bottlenecks which significantly impact the serial and parallel buffer construction efficiencies, including the area merging strategy, the task load balance method and the MPI inter-process results merging strategy. Corresponding optimization approaches involving tree-like area merging strategy, the vertex number oriented parallel task partition method and the inter-process results merging strategy were suggested to overcome these bottlenecks. Experiments were carried out to examine the performance efficiency of the optimized parallel algorithm. The estimation results suggested that the optimization approaches could provide high performance and processing ability for buffer construction in a cluster parallel environment. Our method could provide insights into the parallelization of spatial analysis algorithm.

  12. Load-balancing techniques for a parallel electromagnetic particle-in-cell code

    PLIMPTON,STEVEN J.; SEIDEL,DAVID B.; PASIK,MICHAEL F.; COATS,REBECCA S.

    2000-01-01

    QUICKSILVER is a 3-d electromagnetic particle-in-cell simulation code developed and used at Sandia to model relativistic charged particle transport. It models the time-response of electromagnetic fields and low-density-plasmas in a self-consistent manner: the fields push the plasma particles and the plasma current modifies the fields. Through an LDRD project a new parallel version of QUICKSILVER was created to enable large-scale plasma simulations to be run on massively-parallel distributed-memory supercomputers with thousands of processors, such as the Intel Tflops and DEC CPlant machines at Sandia. The new parallel code implements nearly all the features of the original serial QUICKSILVER and can be run on any platform which supports the message-passing interface (MPI) standard as well as on single-processor workstations. This report describes basic strategies useful for parallelizing and load-balancing particle-in-cell codes, outlines the parallel algorithms used in this implementation, and provides a summary of the modifications made to QUICKSILVER. It also highlights a series of benchmark simulations which have been run with the new code that illustrate its performance and parallel efficiency. These calculations have up to a billion grid cells and particles and were run on thousands of processors. This report also serves as a user manual for people wishing to run parallel QUICKSILVER.

  13. Xyce parallel electronic simulator users guide, version 6.0.

    Keiter, Eric R; Mei, Ting; Russo, Thomas V.; Schiek, Richard Louis; Thornquist, Heidi K.; Verley, Jason C.; Fixel, Deborah A.; Coffey, Todd S; Pawlowski, Roger P; Warrender, Christina E.; Baur, David Gregory.

    2013-08-01

    This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). This includes support for most popular parallel and serial computers. A differential-algebraic-equation (DAE) formulation, which better isolates the device model package from solver algorithms. This allows one to develop new types of analysis without requiring the implementation of analysis-specific device models. Device models that are specifically tailored to meet Sandias needs, including some radiationaware devices (for Sandia users only). Object-oriented code design and implementation using modern coding practices. Xyce is a parallel code in the most general sense of the phrase a message passing parallel implementation which allows it to run efficiently a wide range of computing platforms. These include serial, shared-memory and distributed-memory parallel platforms. Attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows.

  14. Xyce parallel electronic simulator users' guide, Version 6.0.1.

    Keiter, Eric R; Mei, Ting; Russo, Thomas V.; Schiek, Richard Louis; Thornquist, Heidi K.; Verley, Jason C.; Fixel, Deborah A.; Coffey, Todd S; Pawlowski, Roger P; Warrender, Christina E.; Baur, David Gregory.

    2014-01-01

    This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). This includes support for most popular parallel and serial computers. A differential-algebraic-equation (DAE) formulation, which better isolates the device model package from solver algorithms. This allows one to develop new types of analysis without requiring the implementation of analysis-specific device models. Device models that are specifically tailored to meet Sandias needs, including some radiationaware devices (for Sandia users only). Object-oriented code design and implementation using modern coding practices. Xyce is a parallel code in the most general sense of the phrase a message passing parallel implementation which allows it to run efficiently a wide range of computing platforms. These include serial, shared-memory and distributed-memory parallel platforms. Attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows.

  15. Xyce parallel electronic simulator users guide, version 6.1

    Keiter, Eric R; Mei, Ting; Russo, Thomas V.; Schiek, Richard Louis; Sholander, Peter E.; Thornquist, Heidi K.; Verley, Jason C.; Baur, David Gregory

    2014-03-01

    This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas; Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). This includes support for most popular parallel and serial computers; A differential-algebraic-equation (DAE) formulation, which better isolates the device model package from solver algorithms. This allows one to develop new types of analysis without requiring the implementation of analysis-specific device models; Device models that are specifically tailored to meet Sandia's needs, including some radiationaware devices (for Sandia users only); and Object-oriented code design and implementation using modern coding practices. Xyce is a parallel code in the most general sense of the phrase-a message passing parallel implementation-which allows it to run efficiently a wide range of computing platforms. These include serial, shared-memory and distributed-memory parallel platforms. Attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows.

  16. Xyce™ Parallel Electronic Simulator Users' Guide, Version 6.5.

    Keiter, Eric R. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States). Electrical Models and Simulation; Aadithya, Karthik V. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States). Electrical Models and Simulation; Mei, Ting [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States). Electrical Models and Simulation; Russo, Thomas V. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States). Electrical Models and Simulation; Schiek, Richard L. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States). Electrical Models and Simulation; Sholander, Peter E. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States). Electrical Models and Simulation; Thornquist, Heidi K. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States). Electrical Models and Simulation; Verley, Jason C. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States). Electrical Models and Simulation

    2016-06-01

    This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). This includes support for most popular parallel and serial computers. A differential-algebraic-equation (DAE) formulation, which better isolates the device model package from solver algorithms. This allows one to develop new types of analysis without requiring the implementation of analysis-specific device models. Device models that are specifically tailored to meet Sandia's needs, including some radiation- aware devices (for Sandia users only). Object-oriented code design and implementation using modern coding practices. Xyce is a parallel code in the most general sense of the phrase -- a message passing parallel implementation -- which allows it to run efficiently a wide range of computing platforms. These include serial, shared-memory and distributed-memory parallel platforms. Attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows. The information herein is subject to change without notice. Copyright © 2002-2016 Sandia Corporation. All rights reserved.

  17. Load-balancing techniques for a parallel electromagnetic particle-in-cell code

    Plimpton, Steven J.; Seidel, David B.; Pasik, Michael F.; Coats, Rebecca S.

    2000-01-01

    QUICKSILVER is a 3-d electromagnetic particle-in-cell simulation code developed and used at Sandia to model relativistic charged particle transport. It models the time-response of electromagnetic fields and low-density-plasmas in a self-consistent manner: the fields push the plasma particles and the plasma current modifies the fields. Through an LDRD project a new parallel version of QUICKSILVER was created to enable large-scale plasma simulations to be run on massively-parallel distributed-memory supercomputers with thousands of processors, such as the Intel Tflops and DEC CPlant machines at Sandia. The new parallel code implements nearly all the features of the original serial QUICKSILVER and can be run on any platform which supports the message-passing interface (MPI) standard as well as on single-processor workstations. This report describes basic strategies useful for parallelizing and load-balancing particle-in-cell codes, outlines the parallel algorithms used in this implementation, and provides a summary of the modifications made to QUICKSILVER. It also highlights a series of benchmark simulations which have been run with the new code that illustrate its performance and parallel efficiency. These calculations have up to a billion grid cells and particles and were run on thousands of processors. This report also serves as a user manual for people wishing to run parallel QUICKSILVER

  18. Xyce Parallel Electronic Simulator Users' Guide Version 6.8

    Keiter, Eric R. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Aadithya, Karthik Venkatraman [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Mei, Ting [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Russo, Thomas V. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Schiek, Richard L. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Sholander, Peter E. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Thornquist, Heidi K. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Verley, Jason C. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)

    2017-10-01

    This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been de- signed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: Capability to solve extremely large circuit problems by supporting large-scale parallel com- puting platforms (up to thousands of processors). This includes support for most popular parallel and serial computers. A differential-algebraic-equation (DAE) formulation, which better isolates the device model package from solver algorithms. This allows one to develop new types of analysis without requiring the implementation of analysis-specific device models. Device models that are specifically tailored to meet Sandia's needs, including some radiation- aware devices (for Sandia users only). Object-oriented code design and implementation using modern coding practices. Xyce is a parallel code in the most general sense of the phrase$-$ a message passing parallel implementation $-$ which allows it to run efficiently a wide range of computing platforms. These include serial, shared-memory and distributed-memory parallel platforms. Attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows.

  19. Unified Internet Messaging

    Healy, Paul; Barber, Declan

    2015-01-01

    As telephony services, mobile services and internet services continue to converge, the prospect of providing Unified Messaging and even Unified Communications becomes increasingly achievable. This paper discusses the growing importance of IP-based networks to Unified Messaging developments and examines some of the key services and protocols that are likely to make Unified Messaging more widely available. In this initial paper, we limit ourselves initially to the unification of text-based mess...

  20. Passing the baton

    2011-01-01

    It was not only in South Korea that batons were being passed last week. While the cream of the world’s athletes were competing in the World Athletics Championships, the cream of the world’s accelerator scientists were on their way to San Sebastian in Spain for the International Particle Accelerator Conference.  One of them was carrying a rather special baton for a handover of a different kind.   When Fermilab’s Vladimir Shiltsev handed the high-energy frontier baton to CERN’s Mike Lamont on Tuesday, it marked the end of an era: a time to look back on the phenomenal contribution the Tevatron has made to particle physics over its 25-year operational lifetime, and the great contribution Fermilab has made over that period to global collaboration in particle physics. There’s always a lot of emotion involved in passing the baton. In athletics, it’s the triumph of wining or the heartbreak of losing. But for this special baton, the...

  1. TRT Barrel milestones passed

    Ogren, H

    2004-01-01

    The barrel TRT detector passed three significant milestones this spring. The Barrel Support Structure (BSS) was completed and moved to the SR-1 building on February 24th. On March 12th the first module passed the quality assurance testing in Building 154 and was transported to the assembly site in the SR-1 building for barrel assembly. Then on April 21st the final production module that had been scanned at Hampton University was shipped to CERN. TRT Barrel Module Production The production of the full complement of barrel modules (96 plus 9 total spares) is now complete. This has been a five-year effort by Duke University, Hampton University, and Indiana University. Actual construction of the modules in the United States was completed in the first part of 2004. The production crews at each of the sites in the United States have now completed their missions. They are shown in the following pictures. Duke University: Production crew with the final completed module. Indiana University: Module producti...

  2. Overview of Parallel Platforms for Common High Performance Computing

    T. Fryza

    2012-04-01

    Full Text Available The paper deals with various parallel platforms used for high performance computing in the signal processing domain. More precisely, the methods exploiting the multicores central processing units such as message passing interface and OpenMP are taken into account. The properties of the programming methods are experimentally proved in the application of a fast Fourier transform and a discrete cosine transform and they are compared with the possibilities of MATLAB's built-in functions and Texas Instruments digital signal processors with very long instruction word architectures. New FFT and DCT implementations were proposed and tested. The implementation phase was compared with CPU based computing methods and with possibilities of the Texas Instruments digital signal processing library on C6747 floating-point DSPs. The optimal combination of computing methods in the signal processing domain and new, fast routines' implementation is proposed as well.

  3. Parallel SN algorithms in shared- and distributed-memory environments

    Haghighat, Alireza; Hunter, Melissa A.; Mattis, Ronald E.

    1995-01-01

    Different 2-D spatial domain partitioning Sn transport theory algorithms have been developed on the basis of the Block-Jacobi iterative scheme. These algorithms have been incorporated into TWOTRAN-II, and tested on a shared-memory CRAY Y-MP C90 and a distributed-memory IBM SP1. For a series of fixed source r-z geometry homogeneous problems, parallel efficiencies in a range of 50-90% are achieved on the C90 with 6 processors, and lower values (20-60%) are obtained on the SP1. It is demonstrated that better performance is attainable if one addresses issues such as convergence rate, load-balancing, and granularity for both architectures, as well as message passing (network bandwidth and latency) for SP1. (author). 17 refs, 4 figs

  4. Deuterium pass through target

    Alger, D.L.

    1975-01-01

    A neutron emitting target is described for use in neutron generating apparatus including a deuteron source and an accelerator vacuum chamber. The target consists of a tritium-containing target layer, a deuteron accumulation layer, and a target support containing passages providing communication between the accumulation layer and portions of the surface of the support exposed to the accelerator vacuum chamber. With this arrangement, deuterons passing through the target layer and implanting in and diffusing through the accumulation layer, diffuse into the communicating passages and are returned to the accelerator vacuum chamber. The invention allows the continuous removal of deuterons from the target in conventional water cooled neutron generating apparatus. Preferably, the target is provided with thin barrier layers to prevent undesirable tritium diffusion out of the target layer, as well as deuteron diffusion into the target layer

  5. Solution of finite element problems using hybrid parallelization with MPI and OpenMP Solution of finite element problems using hybrid parallelization with MPI and OpenMP

    José Miguel Vargas-Félix

    2012-11-01

    Full Text Available The Finite Element Method (FEM is used to solve problems like solid deformation and heat diffusion in domains with complex geometries. This kind of geometries requires discretization with millions of elements; this is equivalent to solve systems of equations with sparse matrices and tens or hundreds of millions of variables. The aim is to use computer clusters to solve these systems. The solution method used is Schur substructuration. Using it is possible to divide a large system of equations into many small ones to solve them more efficiently. This method allows parallelization. MPI (Message Passing Interface is used to distribute the systems of equations to solve each one in a computer of a cluster. Each system of equations is solved using a solver implemented to use OpenMP as a local parallelization method.The Finite Element Method (FEM is used to solve problems like solid deformation and heat diffusion in domains with complex geometries. This kind of geometries requires discretization with millions of elements; this is equivalent to solve systems of equations with sparse matrices and tens or hundreds of millions of variables. The aim is to use computer clusters to solve these systems. The solution method used is Schur substructuration. Using it is possible to divide a large system of equations into many small ones to solve them more efficiently. This method allows parallelization. MPI (Message Passing Interface is used to distribute the systems of equations to solve each one in a computer of a cluster. Each system of equations is solved using a solver implemented to use OpenMP as a local parallelization method.

  6. Parallelizing AT with MatlabMPI

    2011-01-01

    The Accelerator Toolbox (AT) is a high-level collection of tools and scripts specifically oriented toward solving problems dealing with computational accelerator physics. It is integrated into the MATLAB environment, which provides an accessible, intuitive interface for accelerator physicists, allowing researchers to focus the majority of their efforts on simulations and calculations, rather than programming and debugging difficulties. Efforts toward parallelization of AT have been put in place to upgrade its performance to modern standards of computing. We utilized the packages MatlabMPI and pMatlab, which were developed by MIT Lincoln Laboratory, to set up a message-passing environment that could be called within MATLAB, which set up the necessary pre-requisites for multithread processing capabilities. On local quad-core CPUs, we were able to demonstrate processor efficiencies of roughly 95% and speed increases of nearly 380%. By exploiting the efficacy of modern-day parallel computing, we were able to demonstrate incredibly efficient speed increments per processor in AT's beam-tracking functions. Extrapolating from prediction, we can expect to reduce week-long computation runtimes to less than 15 minutes. This is a huge performance improvement and has enormous implications for the future computing power of the accelerator physics group at SSRL. However, one of the downfalls of parringpass is its current lack of transparency; the pMatlab and MatlabMPI packages must first be well-understood by the user before the system can be configured to run the scripts. In addition, the instantiation of argument parameters requires internal modification of the source code. Thus, parringpass, cannot be directly run from the MATLAB command line, which detracts from its flexibility and user-friendliness. Future work in AT's parallelization will focus on development of external functions and scripts that can be called from within MATLAB and configured on multiple nodes, while

  7. Parallel decomposition of the tight-binding fictitious Lagrangian algorithm for molecular dynamics simulations of semiconductors

    Yeh, M.; Kim, J.; Khan, F.S.

    1995-01-01

    We present a parallel decomposition of the tight-binding fictitious Lagrangian algorithm for the Intel iPSC/860 and the Intel Paragon parallel computers. We show that it is possible to perform long simulations, of the order of 10 000 time steps, on semiconducting clusters consisting of as many as 512 atoms, on a time scale of the order of 20 h or less. We have made a very careful timing analysis of all parts of our code, and have identified the bottlenecks. We have also derived formulas which can predict the timing of our code, based on the number of processors, message passing bandwidth, floating point performance of each node, and the set up time for message passing, appropriate to the machine being used. The time of the simulation scales as the square of the number of particles, if the number of processors is made to scale linearly with the number of particles. We show that for a system as large as 512 atoms, the main bottleneck of the computation is the orthogonalization of the wave functions, which consumes about 90% of the total time of the simulation

  8. Parallelism at Cern: real-time and off-line applications in the GP-MIMD2 project

    Calafiura, P.

    1997-01-01

    A wide range of general purpose high-energy physics applications, ranging from Monte Carlo simulation to data acquisition, from interactive data analysis to on-line filtering, have been ported, or developed, and run in parallel on IBM SP-2 and Meiko CS-2 CERN large multi-processor machines. The ESPRIT project GP-MIMD2 has been a catalyst for the interest in parallel computing at CERN. The project provided the 128 processor Meiko CS-2 system that is now succesfully integrated in the CERN computing environment. The CERN experiment NA48 was involved in the GP-MIMD2 project since the beginning. NA48 physicists run, as part of their day-to-day work, simulation and analysis programs parallelized using the message passing interface MPI. The CS-2 is also a vital component of the experiment data acquisition system and will be used to calibrate in real-time the 13000 channels liquid krypton calorimeter. (orig.)

  9. Bistatic scattering from a three-dimensional object above a two-dimensional randomly rough surface modeled with the parallel FDTD approach.

    Guo, L-X; Li, J; Zeng, H

    2009-11-01

    We present an investigation of the electromagnetic scattering from a three-dimensional (3-D) object above a two-dimensional (2-D) randomly rough surface. A Message Passing Interface-based parallel finite-difference time-domain (FDTD) approach is used, and the uniaxial perfectly matched layer (UPML) medium is adopted for truncation of the FDTD lattices, in which the finite-difference equations can be used for the total computation domain by properly choosing the uniaxial parameters. This makes the parallel FDTD algorithm easier to implement. The parallel performance with different number of processors is illustrated for one rough surface realization and shows that the computation time of our parallel FDTD algorithm is dramatically reduced relative to a single-processor implementation. Finally, the composite scattering coefficients versus scattered and azimuthal angle are presented and analyzed for different conditions, including the surface roughness, the dielectric constants, the polarization, and the size of the 3-D object.

  10. Parallel Object-Oriented Computation Applied to a Finite Element Problem

    Jon B. Weissman

    1993-01-01

    Full Text Available The conventional wisdom in the scientific computing community is that the best way to solve large-scale numerically intensive scientific problems on today's parallel MIMD computers is to use Fortran or C programmed in a data-parallel style using low-level message-passing primitives. This approach inevitably leads to nonportable codes and extensive development time, and restricts parallel programming to the domain of the expert programmer. We believe that these problems are not inherent to parallel computing but are the result of the programming tools used. We will show that comparable performance can be achieved with little effort if better tools that present higher level abstractions are used. The vehicle for our demonstration is a 2D electromagnetic finite element scattering code we have implemented in Mentat, an object-oriented parallel processing system. We briefly describe the application. Mentat, the implementation, and present performance results for both a Mentat and a hand-coded parallel Fortran version.

  11. Kemari: A Portable High Performance Fortran System for Distributed Memory Parallel Processors

    T. Kamachi

    1997-01-01

    Full Text Available We have developed a compilation system which extends High Performance Fortran (HPF in various aspects. We support the parallelization of well-structured problems with loop distribution and alignment directives similar to HPF's data distribution directives. Such directives give both additional control to the user and simplify the compilation process. For the support of unstructured problems, we provide directives for dynamic data distribution through user-defined mappings. The compiler also allows integration of message-passing interface (MPI primitives. The system is part of a complete programming environment which also comprises a parallel debugger and a performance monitor and analyzer. After an overview of the compiler, we describe the language extensions and related compilation mechanisms in detail. Performance measurements demonstrate the compiler's applicability to a variety of application classes.

  12. Studies of parallel algorithms for the solution of a Fokker-Planck equation

    Deck, D.; Samba, G.

    1995-11-01

    The study of laser-created plasmas often requires the use of a kinetic model rather than a hydrodynamic one. This model change occurs, for example, in the hot spot formation in an ICF experiment or during the relaxation of colliding plasmas. When the gradients scalelengths or the size of a given system are not small compared to the characteristic mean-free-path, we have to deal with non-equilibrium situations, which can be described by the distribution functions of every species in the system. We present here a numerical method in plane or spherical 1-D geometry, for the solution of a Fokker-Planck equation that describes the evolution of stich functions in the phase space. The size and the time scale of kinetic simulations require the use of Massively Parallel Computers (MPP). We have adopted a message-passing strategy using Parallel Virtual Machine (PVM)

  13. SBML-PET-MPI: a parallel parameter estimation tool for Systems Biology Markup Language based models.

    Zi, Zhike

    2011-04-01

    Parameter estimation is crucial for the modeling and dynamic analysis of biological systems. However, implementing parameter estimation is time consuming and computationally demanding. Here, we introduced a parallel parameter estimation tool for Systems Biology Markup Language (SBML)-based models (SBML-PET-MPI). SBML-PET-MPI allows the user to perform parameter estimation and parameter uncertainty analysis by collectively fitting multiple experimental datasets. The tool is developed and parallelized using the message passing interface (MPI) protocol, which provides good scalability with the number of processors. SBML-PET-MPI is freely available for non-commercial use at http://www.bioss.uni-freiburg.de/cms/sbml-pet-mpi.html or http://sites.google.com/site/sbmlpetmpi/.

  14. Object-Oriented Parallel Particle-in-Cell Code for Beam Dynamics Simulation in Linear Accelerators

    Qiang, J.; Ryne, R.D.; Habib, S.; Decky, V.

    1999-01-01

    In this paper, we present an object-oriented three-dimensional parallel particle-in-cell code for beam dynamics simulation in linear accelerators. A two-dimensional parallel domain decomposition approach is employed within a message passing programming paradigm along with a dynamic load balancing. Implementing object-oriented software design provides the code with better maintainability, reusability, and extensibility compared with conventional structure based code. This also helps to encapsulate the details of communications syntax. Performance tests on SGI/Cray T3E-900 and SGI Origin 2000 machines show good scalability of the object-oriented code. Some important features of this code also include employing symplectic integration with linear maps of external focusing elements and using z as the independent variable, typical in accelerators. A successful application was done to simulate beam transport through three superconducting sections in the APT linac design

  15. Parallel file system with metadata distributed across partitioned key-value store c

    Bent, John M.; Faibish, Sorin; Grider, Gary; Torres, Aaron

    2017-09-19

    Improved techniques are provided for storing metadata associated with a plurality of sub-files associated with a single shared file in a parallel file system. The shared file is generated by a plurality of applications executing on a plurality of compute nodes. A compute node implements a Parallel Log Structured File System (PLFS) library to store at least one portion of the shared file generated by an application executing on the compute node and metadata for the at least one portion of the shared file on one or more object storage servers. The compute node is also configured to implement a partitioned data store for storing a partition of the metadata for the shared file, wherein the partitioned data store communicates with partitioned data stores on other compute nodes using a message passing interface. The partitioned data store can be implemented, for example, using Multidimensional Data Hashing Indexing Middleware (MDHIM).

  16. Actors with Multi-Headed Message Receive Patterns

    Sulzmann, Martin; Lam, Edmund Soon Lee; Van Weert, Peter

    2008-01-01

    style actors with receive clauses containing multi-headed message patterns. Patterns may be non-linear and constrained by guards. We provide a number of examples to show the usefulness of the extension. We also explore the design space for multi-headed message matching semantics, for example first-match......The actor model provides high-level concurrency abstractions to coordinate simultaneous computations by message passing. Languages implementing the actor model such as Erlang commonly only support single-headed pattern matching over received messages. We propose and design an extension of Erlang...... and rule priority-match semantics. The various semantics are inspired by the multi-set constraint matching semantics found in Constraint Handling Rules. This provides us with a formal model to study actors with multi-headed message receive patterns. The system can be implemented efficiently and we have...

  17. Parallel rendering

    Crockett, Thomas W.

    1995-01-01

    This article provides a broad introduction to the subject of parallel rendering, encompassing both hardware and software systems. The focus is on the underlying concepts and the issues which arise in the design of parallel rendering algorithms and systems. We examine the different types of parallelism and how they can be applied in rendering applications. Concepts from parallel computing, such as data decomposition, task granularity, scalability, and load balancing, are considered in relation to the rendering problem. We also explore concepts from computer graphics, such as coherence and projection, which have a significant impact on the structure of parallel rendering algorithms. Our survey covers a number of practical considerations as well, including the choice of architectural platform, communication and memory requirements, and the problem of image assembly and display. We illustrate the discussion with numerous examples from the parallel rendering literature, representing most of the principal rendering methods currently used in computer graphics.

  18. Real-time trajectory optimization on parallel processors

    Psiaki, Mark L.

    1993-01-01

    A parallel algorithm has been developed for rapidly solving trajectory optimization problems. The goal of the work has been to develop an algorithm that is suitable to do real-time, on-line optimal guidance through repeated solution of a trajectory optimization problem. The algorithm has been developed on an INTEL iPSC/860 message passing parallel processor. It uses a zero-order-hold discretization of a continuous-time problem and solves the resulting nonlinear programming problem using a custom-designed augmented Lagrangian nonlinear programming algorithm. The algorithm achieves parallelism of function, derivative, and search direction calculations through the principle of domain decomposition applied along the time axis. It has been encoded and tested on 3 example problems, the Goddard problem, the acceleration-limited, planar minimum-time to the origin problem, and a National Aerospace Plane minimum-fuel ascent guidance problem. Execution times as fast as 118 sec of wall clock time have been achieved for a 128-stage Goddard problem solved on 32 processors. A 32-stage minimum-time problem has been solved in 151 sec on 32 processors. A 32-stage National Aerospace Plane problem required 2 hours when solved on 32 processors. A speed-up factor of 7.2 has been achieved by using 32-nodes instead of 1-node to solve a 64-stage Goddard problem.

  19. Parallel file system performances in fusion data storage

    Iannone, F.; Podda, S.; Bracco, G.; Manduchi, G.; Maslennikov, A.; Migliori, S.; Wolkersdorfer, K.

    2012-01-01

    High I/O flow rates, up to 10 GB/s, are required in large fusion Tokamak experiments like ITER where hundreds of nodes store simultaneously large amounts of data acquired during the plasma discharges. Typical network topologies such as linear arrays (systolic), rings, meshes (2-D arrays), tori (3-D arrays), trees, butterfly, hypercube in combination with high speed data transports like Infiniband or 10G-Ethernet, are the main areas in which the effort to overcome the so-called parallel I/O bottlenecks is most focused. The high I/O flow rates were modelled in an emulated testbed based on the parallel file systems such as Lustre and GPFS, commonly used in High Performance Computing. The test runs on High Performance Computing–For Fusion (8640 cores) and ENEA CRESCO (3392 cores) supercomputers. Message Passing Interface based applications were developed to emulate parallel I/O on Lustre and GPFS using data archival and access solutions like MDSPLUS and Universal Access Layer. These methods of data storage organization are widely diffused in nuclear fusion experiments and are being developed within the EFDA Integrated Tokamak Modelling – Task Force; the authors tried to evaluate their behaviour in a realistic emulation setup.

  20. Parallel file system performances in fusion data storage

    Iannone, F., E-mail: francesco.iannone@enea.it [Associazione EURATOM-ENEA sulla Fusione, C.R.ENEA Frascati, via E.Fermi, 45 - 00044 Frascati, Rome (Italy); Podda, S.; Bracco, G. [ENEA Information Communication Tecnologies, Lungotevere Thaon di Revel, 76 - 00196 Rome (Italy); Manduchi, G. [Associazione EURATOM-ENEA sulla Fusione, Consorzio RFX, Corso Stati Uniti, 4 - 35127 Padua (Italy); Maslennikov, A. [CASPUR Inter-University Consortium for the Application of Super-Computing for Research, via dei Tizii, 6b - 00185 Rome (Italy); Migliori, S. [ENEA Information Communication Tecnologies, Lungotevere Thaon di Revel, 76 - 00196 Rome (Italy); Wolkersdorfer, K. [Juelich Supercomputing Centre-FZJ, D-52425 Juelich (Germany)

    2012-12-15

    High I/O flow rates, up to 10 GB/s, are required in large fusion Tokamak experiments like ITER where hundreds of nodes store simultaneously large amounts of data acquired during the plasma discharges. Typical network topologies such as linear arrays (systolic), rings, meshes (2-D arrays), tori (3-D arrays), trees, butterfly, hypercube in combination with high speed data transports like Infiniband or 10G-Ethernet, are the main areas in which the effort to overcome the so-called parallel I/O bottlenecks is most focused. The high I/O flow rates were modelled in an emulated testbed based on the parallel file systems such as Lustre and GPFS, commonly used in High Performance Computing. The test runs on High Performance Computing-For Fusion (8640 cores) and ENEA CRESCO (3392 cores) supercomputers. Message Passing Interface based applications were developed to emulate parallel I/O on Lustre and GPFS using data archival and access solutions like MDSPLUS and Universal Access Layer. These methods of data storage organization are widely diffused in nuclear fusion experiments and are being developed within the EFDA Integrated Tokamak Modelling - Task Force; the authors tried to evaluate their behaviour in a realistic emulation setup.

  1. Parallel computations

    1982-01-01

    Parallel Computations focuses on parallel computation, with emphasis on algorithms used in a variety of numerical and physical applications and for many different types of parallel computers. Topics covered range from vectorization of fast Fourier transforms (FFTs) and of the incomplete Cholesky conjugate gradient (ICCG) algorithm on the Cray-1 to calculation of table lookups and piecewise functions. Single tridiagonal linear systems and vectorized computation of reactive flow are also discussed.Comprised of 13 chapters, this volume begins by classifying parallel computers and describing techn

  2. Managing internode data communications for an uninitialized process in a parallel computer

    Archer, Charles J; Blocksome, Michael A; Miller, Douglas R; Parker, Jeffrey J; Ratterman, Joseph D; Smith, Brian E

    2014-05-20

    A parallel computer includes nodes, each having main memory and a messaging unit (MU). Each MU includes computer memory, which in turn includes, MU message buffers. Each MU message buffer is associated with an uninitialized process on the compute node. In the parallel computer, managing internode data communications for an uninitialized process includes: receiving, by an MU of a compute node, one or more data communications messages in an MU message buffer associated with an uninitialized process on the compute node; determining, by an application agent, that the MU message buffer associated with the uninitialized process is full prior to initialization of the uninitialized process; establishing, by the application agent, a temporary message buffer for the uninitialized process in main computer memory; and moving, by the application agent, data communications messages from the MU message buffer associated with the uninitialized process to the temporary message buffer in main computer memory.

  3. I spy with my little eye: cognitive processing of framed physical activity messages.

    Bassett-Gunter, Rebecca L; Latimer-Cheung, Amy E; Martin Ginis, Kathleen A; Castelhano, Monica

    2014-01-01

    The primary purpose was to examine the relative cognitive processing of gain-framed versus loss-framed physical activity messages following exposure to health risk information. Guided by the Extended Parallel Process Model, the secondary purpose was to examine the relation between dwell time, message recall, and message-relevant thoughts, as well as perceived risk, personal relevance, and fear arousal. Baseline measures of perceived risk for inactivity-related disease and health problems were administered to 77 undergraduate students. Participants read population-specific health risk information while wearing a head-mounted eye tracker, which measured dwell time on message content. Perceived risk was then reassessed. Next, participants read PA messages while the eye tracker measured dwell time on message content. Immediately following message exposure, recall, thought-listing, fear arousal, and personal relevance were measured. Dwell time on gain-framed messages was significantly greater than loss-framed messages. However, message recall and thought-listing did not differ by message frame. Dwell time was not significantly related to recall or thought-listing. Consistent with the Extended Parallel Process Model, fear arousal was significantly related to recall, thought-listing, and personal relevance. In conclusion, gain-framed messages may evoke greater dwell time than loss-famed messages. However, dwell time alone may be insufficient for evoking further cognitive processing.

  4. Instant Messaging by SIP

    Muhi, Daniel; Dulai, Tibor; Jaskó, Szilárd

    2008-11-01

    SIP is a general-purpose application layer protocol which is able to establish sessions between two or more parties. These sessions are mainly telephone calls and multimedia conferences. However it can be used for other purposes like instant messaging and presence service. SIP has a very important role in mobile communication as more and more communicating applications are going mobile. In this paper we would like to show how SIP can be used for instant messaging purposes.

  5. Analysis of parallel computing performance of the code MCNP

    Wang Lei; Wang Kan; Yu Ganglin

    2006-01-01

    Parallel computing can reduce the running time of the code MCNP effectively. With the MPI message transmitting software, MCNP5 can achieve its parallel computing on PC cluster with Windows operating system. Parallel computing performance of MCNP is influenced by factors such as the type, the complexity level and the parameter configuration of the computing problem. This paper analyzes the parallel computing performance of MCNP regarding with these factors and gives measures to improve the MCNP parallel computing performance. (authors)

  6. P3T+: A Performance Estimator for Distributed and Parallel Programs

    T. Fahringer

    2000-01-01

    Full Text Available Developing distributed and parallel programs on today's multiprocessor architectures is still a challenging task. Particular distressing is the lack of effective performance tools that support the programmer in evaluating changes in code, problem and machine sizes, and target architectures. In this paper we introduce P3T+ which is a performance estimator for mostly regular HPF (High Performance Fortran programs but partially covers also message passing programs (MPI. P3T+ is unique by modeling programs, compiler code transformations, and parallel and distributed architectures. It computes at compile-time a variety of performance parameters including work distribution, number of transfers, amount of data transferred, transfer times, computation times, and number of cache misses. Several novel technologies are employed to compute these parameters: loop iteration spaces, array access patterns, and data distributions are modeled by employing highly effective symbolic analysis. Communication is estimated by simulating the behavior of a communication library used by the underlying compiler. Computation times are predicted through pre-measured kernels on every target architecture of interest. We carefully model most critical architecture specific factors such as cache lines sizes, number of cache lines available, startup times, message transfer time per byte, etc. P3T+ has been implemented and is closely integrated with the Vienna High Performance Compiler (VFC to support programmers develop parallel and distributed applications. Experimental results for realistic kernel codes taken from real-world applications are presented to demonstrate both accuracy and usefulness of P3T+.

  7. HPC parallel programming model for gyrokinetic MHD simulation

    Naitou, Hiroshi; Yamada, Yusuke; Tokuda, Shinji; Ishii, Yasutomo; Yagi, Masatoshi

    2011-01-01

    The 3-dimensional gyrokinetic PIC (particle-in-cell) code for MHD simulation, Gpic-MHD, was installed on SR16000 (“Plasma Simulator”), which is a scalar cluster system consisting of 8,192 logical cores. The Gpic-MHD code advances particle and field quantities in time. In order to distribute calculations over large number of logical cores, the total simulation domain in cylindrical geometry was broken up into N DD-r × N DD-z (number of radial decomposition times number of axial decomposition) small domains including approximately the same number of particles. The axial direction was uniformly decomposed, while the radial direction was non-uniformly decomposed. N RP replicas (copies) of each decomposed domain were used (“particle decomposition”). The hybrid parallelization model of multi-threads and multi-processes was employed: threads were parallelized by the auto-parallelization and N DD-r × N DD-z × N RP processes were parallelized by MPI (message-passing interface). The parallelization performance of Gpic-MHD was investigated for the medium size system of N r × N θ × N z = 1025 × 128 × 128 mesh with 4.196 or 8.192 billion particles. The highest speed for the fixed number of logical cores was obtained for two threads, the maximum number of N DD-z , and optimum combination of N DD-r and N RP . The observed optimum speeds demonstrated good scaling up to 8,192 logical cores. (author)

  8. Xyce parallel electronic simulator : users' guide. Version 5.1.

    Mei, Ting; Rankin, Eric Lamont; Thornquist, Heidi K.; Santarelli, Keith R.; Fixel, Deborah A.; Coffey, Todd Stirling; Russo, Thomas V.; Schiek, Richard Louis; Keiter, Eric Richard; Pawlowski, Roger Patrick

    2009-11-01

    This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: (1) Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). Note that this includes support for most popular parallel and serial computers. (2) Improved performance for all numerical kernels (e.g., time integrator, nonlinear and linear solvers) through state-of-the-art algorithms and novel techniques. (3) Device models which are specifically tailored to meet Sandia's needs, including some radiation-aware devices (for Sandia users only). (4) Object-oriented code design and implementation using modern coding practices that ensure that the Xyce Parallel Electronic Simulator will be maintainable and extensible far into the future. Xyce is a parallel code in the most general sense of the phrase - a message passing parallel implementation - which allows it to run efficiently on the widest possible number of computing platforms. These include serial, shared-memory and distributed-memory parallel as well as heterogeneous platforms. Careful attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows. The development of Xyce provides a platform for computational research and development aimed specifically at the needs of the Laboratory. With Xyce, Sandia has an 'in-house' capability with which both new electrical (e.g., device model development) and algorithmic (e.g., faster time-integration methods, parallel solver algorithms) research and development can be performed. As a result, Xyce is a

  9. Xyce Parallel Electronic Simulator : users' guide, version 4.1.

    Mei, Ting; Rankin, Eric Lamont; Thornquist, Heidi K.; Santarelli, Keith R.; Fixel, Deborah A.; Coffey, Todd Stirling; Russo, Thomas V.; Schiek, Richard Louis; Keiter, Eric Richard; Pawlowski, Roger Patrick

    2009-02-01

    This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: (1) Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). Note that this includes support for most popular parallel and serial computers. (2) Improved performance for all numerical kernels (e.g., time integrator, nonlinear and linear solvers) through state-of-the-art algorithms and novel techniques. (3) Device models which are specifically tailored to meet Sandia's needs, including some radiation-aware devices (for Sandia users only). (4) Object-oriented code design and implementation using modern coding practices that ensure that the Xyce Parallel Electronic Simulator will be maintainable and extensible far into the future. Xyce is a parallel code in the most general sense of the phrase - a message passing parallel implementation - which allows it to run efficiently on the widest possible number of computing platforms. These include serial, shared-memory and distributed-memory parallel as well as heterogeneous platforms. Careful attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows. The development of Xyce provides a platform for computational research and development aimed specifically at the needs of the Laboratory. With Xyce, Sandia has an 'in-house' capability with which both new electrical (e.g., device model development) and algorithmic (e.g., faster time-integration methods, parallel solver algorithms) research and development can be performed. As a result, Xyce is a

  10. A System for Exchanging Control and Status Messages in the NOvA Data Acquisition

    Biery, K.A.; Cooper, R.G.; Foulkes, S.C.; Guglielmo, G.M.; Piccoli, L.P.; Votava, M.E.V.; Fermilab

    2007-01-01

    In preparation for NOvA, a future neutrino experiment at Fermilab, we are developing a system for passing control and status messages in the data acquisition system. The DAQ system will consist of applications running on approximately 450 nodes. The message passing system will use a publish-subscribe model and will provide support for sending messages and receiving the associated replies. Additional features of the system include a layered architecture with custom APIs tailored to the needs of a DAQ system, the use of an open source messaging system for handling the reliable delivery of messages, the ability to send broadcasts to groups of applications, and APIs in Java, C++, and Python. Our choice for the open source system to deliver messages is EPICS. We will discuss the architecture of the system, our experience with EPICS, and preliminary test results

  11. A System for Exchanging Control and Status Messages in the NOvA Data Acquisition

    Biery, K.A.; Cooper, R.G.; Foulkes, S.C.; Guglielmo, G.M.; Piccoli, L.P.; Votava, M.E.V.; /Fermilab

    2007-04-01

    In preparation for NOvA, a future neutrino experiment at Fermilab, we are developing a system for passing control and status messages in the data acquisition system. The DAQ system will consist of applications running on approximately 450 nodes. The message passing system will use a publish-subscribe model and will provide support for sending messages and receiving the associated replies. Additional features of the system include a layered architecture with custom APIs tailored to the needs of a DAQ system, the use of an open source messaging system for handling the reliable delivery of messages, the ability to send broadcasts to groups of applications, and APIs in Java, C++, and Python. Our choice for the open source system to deliver messages is EPICS. We will discuss the architecture of the system, our experience with EPICS, and preliminary test results.

  12. Increasing the Operational Value of Event Messages

    Li, Zhenping; Savkli, Cetin; Smith, Dan

    2003-01-01

    Assessing the health of a space mission has traditionally been performed using telemetry analysis tools. Parameter values are compared to known operational limits and are plotted over various time periods. This presentation begins with the notion that there is an incredible amount of untapped information contained within the mission s event message logs. Through creative advancements in message handling tools, the event message logs can be used to better assess spacecraft and ground system status and to highlight and report on conditions not readily apparent when messages are evaluated one-at-a-time during a real-time pass. Work in this area is being funded as part of a larger NASA effort at the Goddard Space Flight Center to create component-based, middleware-based, standards-based general purpose ground system architecture referred to as GMSEC - the GSFC Mission Services Evolution Center. The new capabilities and operational concepts for event display, event data analyses and data mining are being developed by Lockheed Martin and the new subsystem has been named GREAT - the GMSEC Reusable Event Analysis Toolkit. Planned for use on existing and future missions, GREAT has the potential to increase operational efficiency in areas of problem detection and analysis, general status reporting, and real-time situational awareness.

  13. Parallel algorithms

    Casanova, Henri; Robert, Yves

    2008-01-01

    ""…The authors of the present book, who have extensive credentials in both research and instruction in the area of parallelism, present a sound, principled treatment of parallel algorithms. … This book is very well written and extremely well designed from an instructional point of view. … The authors have created an instructive and fascinating text. The book will serve researchers as well as instructors who need a solid, readable text for a course on parallelism in computing. Indeed, for anyone who wants an understandable text from which to acquire a current, rigorous, and broad vi

  14. Parallel multiscale simulations of a brain aneurysm

    Grinberg, Leopold [Division of Applied Mathematics, Brown University, Providence, RI 02912 (United States); Fedosov, Dmitry A. [Institute of Complex Systems and Institute for Advanced Simulation, Forschungszentrum Jülich, Jülich 52425 (Germany); Karniadakis, George Em, E-mail: george_karniadakis@brown.edu [Division of Applied Mathematics, Brown University, Providence, RI 02912 (United States)

    2013-07-01

    Cardiovascular pathologies, such as a brain aneurysm, are affected by the global blood circulation as well as by the local microrheology. Hence, developing computational models for such cases requires the coupling of disparate spatial and temporal scales often governed by diverse mathematical descriptions, e.g., by partial differential equations (continuum) and ordinary differential equations for discrete particles (atomistic). However, interfacing atomistic-based with continuum-based domain discretizations is a challenging problem that requires both mathematical and computational advances. We present here a hybrid methodology that enabled us to perform the first multiscale simulations of platelet depositions on the wall of a brain aneurysm. The large scale flow features in the intracranial network are accurately resolved by using the high-order spectral element Navier–Stokes solver NεκTαr. The blood rheology inside the aneurysm is modeled using a coarse-grained stochastic molecular dynamics approach (the dissipative particle dynamics method) implemented in the parallel code LAMMPS. The continuum and atomistic domains overlap with interface conditions provided by effective forces computed adaptively to ensure continuity of states across the interface boundary. A two-way interaction is allowed with the time-evolving boundary of the (deposited) platelet clusters tracked by an immersed boundary method. The corresponding heterogeneous solvers (NεκTαr and LAMMPS) are linked together by a computational multilevel message passing interface that facilitates modularity and high parallel efficiency. Results of multiscale simulations of clot formation inside the aneurysm in a patient-specific arterial tree are presented. We also discuss the computational challenges involved and present scalability results of our coupled solver on up to 300 K computer processors. Validation of such coupled atomistic-continuum models is a main open issue that has to be addressed in

  15. Parallel multiscale simulations of a brain aneurysm

    Grinberg, Leopold; Fedosov, Dmitry A.; Karniadakis, George Em

    2013-01-01

    Cardiovascular pathologies, such as a brain aneurysm, are affected by the global blood circulation as well as by the local microrheology. Hence, developing computational models for such cases requires the coupling of disparate spatial and temporal scales often governed by diverse mathematical descriptions, e.g., by partial differential equations (continuum) and ordinary differential equations for discrete particles (atomistic). However, interfacing atomistic-based with continuum-based domain discretizations is a challenging problem that requires both mathematical and computational advances. We present here a hybrid methodology that enabled us to perform the first multiscale simulations of platelet depositions on the wall of a brain aneurysm. The large scale flow features in the intracranial network are accurately resolved by using the high-order spectral element Navier–Stokes solver NεκTαr. The blood rheology inside the aneurysm is modeled using a coarse-grained stochastic molecular dynamics approach (the dissipative particle dynamics method) implemented in the parallel code LAMMPS. The continuum and atomistic domains overlap with interface conditions provided by effective forces computed adaptively to ensure continuity of states across the interface boundary. A two-way interaction is allowed with the time-evolving boundary of the (deposited) platelet clusters tracked by an immersed boundary method. The corresponding heterogeneous solvers (NεκTαr and LAMMPS) are linked together by a computational multilevel message passing interface that facilitates modularity and high parallel efficiency. Results of multiscale simulations of clot formation inside the aneurysm in a patient-specific arterial tree are presented. We also discuss the computational challenges involved and present scalability results of our coupled solver on up to 300 K computer processors. Validation of such coupled atomistic-continuum models is a main open issue that has to be addressed in

  16. Distributed Memory Parallel Computing with SEAWAT

    Verkaik, J.; Huizer, S.; van Engelen, J.; Oude Essink, G.; Ram, R.; Vuik, K.

    2017-12-01

    Fresh groundwater reserves in coastal aquifers are threatened by sea-level rise, extreme weather conditions, increasing urbanization and associated groundwater extraction rates. To counteract these threats, accurate high-resolution numerical models are required to optimize the management of these precious reserves. The major model drawbacks are long run times and large memory requirements, limiting the predictive power of these models. Distributed memory parallel computing is an efficient technique for reducing run times and memory requirements, where the problem is divided over multiple processor cores. A new Parallel Krylov Solver (PKS) for SEAWAT is presented. PKS has recently been applied to MODFLOW and includes Conjugate Gradient (CG) and Biconjugate Gradient Stabilized (BiCGSTAB) linear accelerators. Both accelerators are preconditioned by an overlapping additive Schwarz preconditioner in a way that: a) subdomains are partitioned using Recursive Coordinate Bisection (RCB) load balancing, b) each subdomain uses local memory only and communicates with other subdomains by Message Passing Interface (MPI) within the linear accelerator, c) it is fully integrated in SEAWAT. Within SEAWAT, the PKS-CG solver replaces the Preconditioned Conjugate Gradient (PCG) solver for solving the variable-density groundwater flow equation and the PKS-BiCGSTAB solver replaces the Generalized Conjugate Gradient (GCG) solver for solving the advection-diffusion equation. PKS supports the third-order Total Variation Diminishing (TVD) scheme for computing advection. Benchmarks were performed on the Dutch national supercomputer (https://userinfo.surfsara.nl/systems/cartesius) using up to 128 cores, for a synthetic 3D Henry model (100 million cells) and the real-life Sand Engine model ( 10 million cells). The Sand Engine model was used to investigate the potential effect of the long-term morphological evolution of a large sand replenishment and climate change on fresh groundwater resources

  17. Development of whole core thermal-hydraulic analysis program ACT. 4. Simplified fuel assembly model and parallelization by MPI

    Ohshima, Hiroyuki

    2001-10-01

    A whole core thermal-hydraulic analysis program ACT is being developed for the purpose of evaluating detailed in-core thermal hydraulic phenomena of fast reactors including the effect of the flow between wrapper-tube walls (inter-wrapper flow) under various reactor operation conditions. As appropriate boundary conditions in addition to a detailed modeling of the core are essential for accurate simulations of in-core thermal hydraulics, ACT consists of not only fuel assembly and inter-wrapper flow analysis modules but also a heat transport system analysis module that gives response of the plant dynamics to the core model. This report describes incorporation of a simplified model to the fuel assembly analysis module and program parallelization by a message passing method toward large-scale simulations. ACT has a fuel assembly analysis module which can simulate a whole fuel pin bundle in each fuel assembly of the core and, however, it may take much CPU time for a large-scale core simulation. Therefore, a simplified fuel assembly model that is thermal-hydraulically equivalent to the detailed one has been incorporated in order to save the simulation time and resources. This simplified model is applied to several parts of fuel assemblies in a core where the detailed simulation results are not required. With regard to the program parallelization, the calculation load and the data flow of ACT were analyzed and the optimum parallelization has been done including the improvement of the numerical simulation algorithm of ACT. Message Passing Interface (MPI) is applied to data communication between processes and synchronization in parallel calculations. Parallelized ACT was verified through a comparison simulation with the original one. In addition to the above works, input manuals of the core analysis module and the heat transport system analysis module have been prepared. (author)

  18. Double-pass quantum volume hologram

    Vasilyev, Denis V.; Sokolov, Ivan V.

    2011-01-01

    We propose a scheme for parallel, spatially multimode quantum memory for light. The scheme is based on the propagation in different directions of a quantum signal wave and strong classical reference wave, like in a classical volume hologram and the previously proposed quantum volume hologram [D. V. Vasilyev et al., Phys. Rev. A 81, 020302(R) (2010)]. The medium for the hologram consists of a spatially extended ensemble of cold spin-polarized atoms. In the absence of the collective spin rotation during the interaction, two passes of light for both storage and retrieval are required, and therefore the present scheme can be called a double-pass quantum volume hologram. The scheme is less sensitive to diffraction and therefore is capable of achieving a higher density of storage of spatial modes as compared to the previously proposed thin quantum hologram [D. V. Vasilyev et al., Phys. Rev. A 77, 020302(R) (2008)], which also requires two passes of light for both storage and retrieval. However, the present scheme allows one to achieve a good memory performance with a lower optical depth of the atomic sample as compared to the quantum volume hologram. A quantum hologram capable of storing entangled images can become an important ingredient in quantum information processing and quantum imaging.

  19. Parallel computing in cluster of GPU applied to a problem of nuclear engineering

    Moraes, Sergio Ricardo S.; Heimlich, Adino; Resende, Pedro

    2013-01-01

    Cluster computing has been widely used as a low cost alternative for parallel processing in scientific applications. With the use of Message-Passing Interface (MPI) protocol development became even more accessible and widespread in the scientific community. A more recent trend is the use of Graphic Processing Unit (GPU), which is a powerful co-processor able to perform hundreds of instructions in parallel, reaching a capacity of hundreds of times the processing of a CPU. However, a standard PC does not allow, in general, more than two GPUs. Hence, it is proposed in this work development and evaluation of a hybrid low cost parallel approach to the solution to a nuclear engineering typical problem. The idea is to use clusters parallelism technology (MPI) together with GPU programming techniques (CUDA - Compute Unified Device Architecture) to simulate neutron transport through a slab using Monte Carlo method. By using a cluster comprised by four quad-core computers with 2 GPU each, it has been developed programs using MPI and CUDA technologies. Experiments, applying different configurations, from 1 to 8 GPUs has been performed and results were compared with the sequential (non-parallel) version. A speed up of about 2.000 times has been observed when comparing the 8-GPU with the sequential version. Results here presented are discussed and analyzed with the objective of outlining gains and possible limitations of the proposed approach. (author)

  20. Parallelization of the MAAP-A code neutronics/thermal hydraulics coupling

    Froehle, P.H.; Wei, T.Y.C.; Weber, D.P.; Henry, R.E.

    1998-01-01

    A major new feature, one-dimensional space-time kinetics, has been added to a developmental version of the MAAP code through the introduction of the DIF3D-K module. This code is referred to as MAAP-A. To reduce the overall job time required, a capability has been provided to run the MAAP-A code in parallel. The parallel version of MAAP-A utilizes two machines running in parallel, with the DIF3D-K module executing on one machine and the rest of the MAAP-A code executing on the other machine. Timing results obtained during the development of the capability indicate that reductions in time of 30--40% are possible. The parallel version can be run on two SPARC 20 (SUN OS 5.5) workstations connected through the ethernet. MPI (Message Passing Interface standard) needs to be implemented on the machines. If necessary the parallel version can also be run on only one machine. The results obtained running in this one-machine mode identically match the results obtained from the serial version of the code

  1. The simplified spherical harmonics (SPL) methodology with space and moment decomposition in parallel environments

    Gianluca, Longoni; Alireza, Haghighat

    2003-01-01

    In recent years, the SP L (simplified spherical harmonics) equations have received renewed interest for the simulation of nuclear systems. We have derived the SP L equations starting from the even-parity form of the S N equations. The SP L equations form a system of (L+1)/2 second order partial differential equations that can be solved with standard iterative techniques such as the Conjugate Gradient (CG). We discretized the SP L equations with the finite-volume approach in a 3-D Cartesian space. We developed a new 3-D general code, Pensp L (Parallel Environment Neutral-particle SP L ). Pensp L solves both fixed source and criticality eigenvalue problems. In order to optimize the memory management, we implemented a Compressed Diagonal Storage (CDS) to store the SP L matrices. Pensp L includes parallel algorithms for space and moment domain decomposition. The computational load is distributed on different processors, using a mapping function, which maps the 3-D Cartesian space and moments onto processors. The code is written in Fortran 90 using the Message Passing Interface (MPI) libraries for the parallel implementation of the algorithm. The code has been tested on the Pcpen cluster and the parallel performance has been assessed in terms of speed-up and parallel efficiency. (author)

  2. Couriers in the Inca Empire: Getting Your Message Across. [Lesson Plan].

    2002

    This lesson shows how the Inca communicated across the vast stretches of their mountain realm, the largest empire of the pre-industrial world. The lesson explains how couriers carried messages along mountain-ridge roads, up and down stone steps, and over chasm-spanning footbridges. It states that couriers could pass a message from Quito (Ecuador)…

  3. The message is the message-maker.

    Chalkley, A B

    1977-03-01

    For those engaged in family planning or other demographic work of an active kind, serious errors can be made and much money and skill wasted unless there is a clear idea of available means of communication. Literacy and media-diffusion figures offer vague parameters, especially in Asia, and the role of spoken communication -- considered key in "illiterate" societies -- is even more difficult to assess. For mass media, the starting point is "diffusion rates" representing numbers of TV sets owned or newspapers sold per 1000 population and so on -- measures of quantity. This article surveys the population growth rates, urban-rural distribution, educational levels, literacy rates, numbers of newspapers bought, radios and TVs owned (per 1000 population) for 12 Asian countries, and discusses their meaning in terms of media use. Chief among the points made are that print media still have an enormous role to play in the developing countries -- newspaper diffusion rates are quite high, even in countries with low urban population (especially India). The quality of electronic media (too often considered the natural "wave of the future" everywhere) varies but is generally not high. Where they are fully developed their role is vital -- but it might be noted that it is the message makers themselves who are most vital. Choosing the right medium and the proper message for it is essential.

  4. An Examination of Adolescent Recall of Anti-Smoking Messages: Attitudes, Message Type, and Message Perceptions.

    Bigsby, Elisabeth; Monahan, Jennifer L; Ewoldsen, David R

    2017-04-01

    Delayed message recall may be influenced by currently held accessible attitudes, the nature of the message, and message perceptions (perception of bias and message elaboration). This study examined the potential of message perceptions to mediate the influence of valenced attitude accessibility and message type on unaided recall of anti-smoking Public Service Announcements (PSAs). In a field experiment, ninth grade students (N = 244) watched three PSAs and responded to items on laptop computers. Twelve weeks later, follow-up telephone surveys were conducted to assess unaided recall. Both valenced attitude accessibility and message type were associated with message perceptions. However, only perception of message bias partially mediated the relationship between message type and unaided recall.

  5. Popular Mobilization Messaging

    James Garrison

    2017-04-01

    Full Text Available This Research Paper examines the Iraqi Popular Mobilization Unit’s (PMU messaging on the organisation’s website and social media platforms through early January 2017 to develop a more nuanced understanding of the PMU’s outlook, both present and future. After providing an overview of the PMU’s media presence online, the paper discusses how the organisation promotes its core narrative: that it is a cross-confessional and patriotic force for the defence of all Iraqis against a brutal and evil IS. The paper then addresses the PMU’s use of messaging to refute the sectarian portrayal of the organisation in some quarters before turning to the way the PMU approaches regional and international states in its media. Finally, the paper summarises the PMU’s messaging strategy and discusses how this strategy implies a less threatening future for the organisation than is often anticipated.

  6. Increasing available FIFO space to prevent messaging queue deadlocks in a DMA environment

    Blocksome, Michael A [Rochester, MN; Chen, Dong [Croton On Hudson, NY; Gooding, Thomas [Rochester, MN; Heidelberger, Philip [Cortlandt Manor, NY; Parker, Jeff [Rochester, MN

    2012-02-07

    Embodiments of the invention may be used to manage message queues in a parallel computing environment to prevent message queue deadlock. A direct memory access controller of a compute node may determine when a messaging queue is full. In response, the DMA may generate an interrupt. An interrupt handler may stop the DMA and swap all descriptors from the full messaging queue into a larger queue (or enlarge the original queue). The interrupt handler then restarts the DMA. Alternatively, the interrupt handler stops the DMA, allocates a memory block to hold queue data, and then moves descriptors from the full messaging queue into the allocated memory block. The interrupt handler then restarts the DMA. During a normal messaging advance cycle, a messaging manager attempts to inject the descriptors in the memory block into other messaging queues until the descriptors have all been processed.

  7. WebPASS Explorer (HR Personnel Management)

    US Agency for International Development — WebPass Explorer (WebPASS Framework): USAID is partnering with DoS in the implementation of their WebPass Post Personnel (PS) Module. WebPassPS does not replace...

  8. Multi-pass spectroscopic ellipsometry

    Stehle, Jean-Louis; Samartzis, Peter C.; Stamataki, Katerina; Piel, Jean-Philippe; Katsoprinakis, George E.; Papadakis, Vassilis; Schimowski, Xavier; Rakitzis, T. Peter; Loppinet, Benoit

    2014-01-01

    Spectroscopic ellipsometry is an established technique, particularly useful for thickness measurements of thin films. It measures polarization rotation after a single reflection of a beam of light on the measured substrate at a given incidence angle. In this paper, we report the development of multi-pass spectroscopic ellipsometry where the light beam reflects multiple times on the sample. We have investigated both theoretically and experimentally the effect of sample reflectivity, number of reflections (passes), angles of incidence and detector dynamic range on ellipsometric observables tanΨ and cosΔ. The multiple pass approach provides increased sensitivity to small changes in Ψ and Δ, opening the way for single measurement determination of optical thickness T, refractive index n and absorption coefficient k of thin films, a significant improvement over the existing techniques. Based on our results, we discuss the strengths, the weaknesses and possible applications of this technique. - Highlights: • We present multi-pass spectroscopic ellipsometry (MPSE), a multi-pass approach to ellipsometry. • Different detectors, samples, angles of incidence and number of passes were tested. • N passes improve polarization ratio sensitivity to the power of N. • N reflections improve phase shift sensitivity by a factor of N. • MPSE can significantly improve thickness measurements in thin films

  9. SMS Messaging Applications

    Pero, Nicola

    2009-01-01

    Cell phones are the most common communication device on the planet, and Short Message Service (SMS) is the chief channel for companies to offer services, accept requests, report news, and download binary files over cell phones. This guide describes the protocols and best practices (things that ensure you won't get sued or lose your right to offer a service) you need to know to make SMS messaging part of an organizational service. Issues such as character sets, differences among vendors, common practices in Europe and North America, and API choices are covered.

  10. Parallelization of applications for networks with homogeneous and heterogeneous processors; Parallelisation d`applications pour des reseaux de processeurs homogenes ou heterogenes

    Colombet, L

    1994-10-07

    The aim of this thesis is to study and develop efficient methods for parallelization of scientific applications on parallel computers with distributed memory. The first part presents two libraries of PVM (Parallel Virtual Machine) and MPI (Message Passing Interface) communication tools. They allow implementation of programs on most parallel machines, but also on heterogeneous computer networks. This chapter illustrates the problems faced when trying to evaluate performances of networks with heterogeneous processors. To evaluate such performances, the concepts of speed-up and efficiency have been modified and adapted to account for heterogeneity. The second part deals with a study of parallel application libraries such as ScaLAPACK and with the development of communication masking techniques. The general concept is based on communication anticipation, in particular by pipelining message sending operations. Experimental results on Cray T3D and IBM SP1 machines validates the theoretical studies performed on basic algorithms of the libraries discussed above. Two examples of scientific applications are given: the first is a model of young stars for astrophysics and the other is a model of photon trajectories in the Compton effect. (J.S.). 83 refs., 65 figs., 24 tabs.

  11. Parallel Fortran-MPI software for numerical inversion of the Laplace transform and its application to oscillatory water levels in groundwater environments

    Zhan, X.

    2005-01-01

    A parallel Fortran-MPI (Message Passing Interface) software for numerical inversion of the Laplace transform based on a Fourier series method is developed to meet the need of solving intensive computational problems involving oscillatory water level's response to hydraulic tests in a groundwater environment. The software is a parallel version of ACM (The Association for Computing Machinery) Transactions on Mathematical Software (TOMS) Algorithm 796. Running 38 test examples indicated that implementation of MPI techniques with distributed memory architecture speedups the processing and improves the efficiency. Applications to oscillatory water levels in a well during aquifer tests are presented to illustrate how this package can be applied to solve complicated environmental problems involved in differential and integral equations. The package is free and is easy to use for people with little or no previous experience in using MPI but who wish to get off to a quick start in parallel computing. ?? 2004 Elsevier Ltd. All rights reserved.

  12. Implementation of the DPM Monte Carlo code on a parallel architecture for treatment planning applications.

    Tyagi, Neelam; Bose, Abhijit; Chetty, Indrin J

    2004-09-01

    We have parallelized the Dose Planning Method (DPM), a Monte Carlo code optimized for radiotherapy class problems, on distributed-memory processor architectures using the Message Passing Interface (MPI). Parallelization has been investigated on a variety of parallel computing architectures at the University of Michigan-Center for Advanced Computing, with respect to efficiency and speedup as a function of the number of processors. We have integrated the parallel pseudo random number generator from the Scalable Parallel Pseudo-Random Number Generator (SPRNG) library to run with the parallel DPM. The Intel cluster consisting of 800 MHz Intel Pentium III processor shows an almost linear speedup up to 32 processors for simulating 1 x 10(8) or more particles. The speedup results are nearly linear on an Athlon cluster (up to 24 processors based on availability) which consists of 1.8 GHz+ Advanced Micro Devices (AMD) Athlon processors on increasing the problem size up to 8 x 10(8) histories. For a smaller number of histories (1 x 10(8)) the reduction of efficiency with the Athlon cluster (down to 83.9% with 24 processors) occurs because the processing time required to simulate 1 x 10(8) histories is less than the time associated with interprocessor communication. A similar trend was seen with the Opteron Cluster (consisting of 1400 MHz, 64-bit AMD Opteron processors) on increasing the problem size. Because of the 64-bit architecture Opteron processors are capable of storing and processing instructions at a faster rate and hence are faster as compared to the 32-bit Athlon processors. We have validated our implementation with an in-phantom dose calculation study using a parallel pencil monoenergetic electron beam of 20 MeV energy. The phantom consists of layers of water, lung, bone, aluminum, and titanium. The agreement in the central axis depth dose curves and profiles at different depths shows that the serial and parallel codes are equivalent in accuracy.

  13. Implementation of the DPM Monte Carlo code on a parallel architecture for treatment planning applications

    Tyagi, Neelam; Bose, Abhijit; Chetty, Indrin J.

    2004-01-01

    We have parallelized the Dose Planning Method (DPM), a Monte Carlo code optimized for radiotherapy class problems, on distributed-memory processor architectures using the Message Passing Interface (MPI). Parallelization has been investigated on a variety of parallel computing architectures at the University of Michigan-Center for Advanced Computing, with respect to efficiency and speedup as a function of the number of processors. We have integrated the parallel pseudo random number generator from the Scalable Parallel Pseudo-Random Number Generator (SPRNG) library to run with the parallel DPM. The Intel cluster consisting of 800 MHz Intel Pentium III processor shows an almost linear speedup up to 32 processors for simulating 1x10 8 or more particles. The speedup results are nearly linear on an Athlon cluster (up to 24 processors based on availability) which consists of 1.8 GHz+ Advanced Micro Devices (AMD) Athlon processors on increasing the problem size up to 8x10 8 histories. For a smaller number of histories (1x10 8 ) the reduction of efficiency with the Athlon cluster (down to 83.9% with 24 processors) occurs because the processing time required to simulate 1x10 8 histories is less than the time associated with interprocessor communication. A similar trend was seen with the Opteron Cluster (consisting of 1400 MHz, 64-bit AMD Opteron processors) on increasing the problem size. Because of the 64-bit architecture Opteron processors are capable of storing and processing instructions at a faster rate and hence are faster as compared to the 32-bit Athlon processors. We have validated our implementation with an in-phantom dose calculation study using a parallel pencil monoenergetic electron beam of 20 MeV energy. The phantom consists of layers of water, lung, bone, aluminum, and titanium. The agreement in the central axis depth dose curves and profiles at different depths shows that the serial and parallel codes are equivalent in accuracy

  14. Efficient parallel implicit methods for rotary-wing aerodynamics calculations

    Wissink, Andrew M.

    Euler/Navier-Stokes Computational Fluid Dynamics (CFD) methods are commonly used for prediction of the aerodynamics and aeroacoustics of modern rotary-wing aircraft. However, their widespread application to large complex problems is limited lack of adequate computing power. Parallel processing offers the potential for dramatic increases in computing power, but most conventional implicit solution methods are inefficient in parallel and new techniques must be adopted to realize its potential. This work proposes alternative implicit schemes for Euler/Navier-Stokes rotary-wing calculations which are robust and efficient in parallel. The first part of this work proposes an efficient parallelizable modification of the Lower Upper-Symmetric Gauss Seidel (LU-SGS) implicit operator used in the well-known Transonic Unsteady Rotor Navier Stokes (TURNS) code. The new hybrid LU-SGS scheme couples a point-relaxation approach of the Data Parallel-Lower Upper Relaxation (DP-LUR) algorithm for inter-processor communication with the Symmetric Gauss Seidel algorithm of LU-SGS for on-processor computations. With the modified operator, TURNS is implemented in parallel using Message Passing Interface (MPI) for communication. Numerical performance and parallel efficiency are evaluated on the IBM SP2 and Thinking Machines CM-5 multi-processors for a variety of steady-state and unsteady test cases. The hybrid LU-SGS scheme maintains the numerical performance of the original LU-SGS algorithm in all cases and shows a good degree of parallel efficiency. It experiences a higher degree of robustness than DP-LUR for third-order upwind solutions. The second part of this work examines use of Krylov subspace iterative solvers for the nonlinear CFD solutions. The hybrid LU-SGS scheme is used as a parallelizable preconditioner. Two iterative methods are tested, Generalized Minimum Residual (GMRES) and Orthogonal s-Step Generalized Conjugate Residual (OSGCR). The Newton method demonstrates good

  15. Computationally efficient implementation of combustion chemistry in parallel PDF calculations

    Lu Liuyan; Lantz, Steven R.; Ren Zhuyin; Pope, Stephen B.

    2009-01-01

    In parallel calculations of combustion processes with realistic chemistry, the serial in situ adaptive tabulation (ISAT) algorithm [S.B. Pope, Computationally efficient implementation of combustion chemistry using in situ adaptive tabulation, Combustion Theory and Modelling, 1 (1997) 41-63; L. Lu, S.B. Pope, An improved algorithm for in situ adaptive tabulation, Journal of Computational Physics 228 (2009) 361-386] substantially speeds up the chemistry calculations on each processor. To improve the parallel efficiency of large ensembles of such calculations in parallel computations, in this work, the ISAT algorithm is extended to the multi-processor environment, with the aim of minimizing the wall clock time required for the whole ensemble. Parallel ISAT strategies are developed by combining the existing serial ISAT algorithm with different distribution strategies, namely purely local processing (PLP), uniformly random distribution (URAN), and preferential distribution (PREF). The distribution strategies enable the queued load redistribution of chemistry calculations among processors using message passing. They are implemented in the software x2f m pi, which is a Fortran 95 library for facilitating many parallel evaluations of a general vector function. The relative performance of the parallel ISAT strategies is investigated in different computational regimes via the PDF calculations of multiple partially stirred reactors burning methane/air mixtures. The results show that the performance of ISAT with a fixed distribution strategy strongly depends on certain computational regimes, based on how much memory is available and how much overlap exists between tabulated information on different processors. No one fixed strategy consistently achieves good performance in all the regimes. Therefore, an adaptive distribution strategy, which blends PLP, URAN and PREF, is devised and implemented. It yields consistently good performance in all regimes. In the adaptive parallel

  16. Are Instant Messages Speech?

    Baron, Naomi S.

    Instant messaging (IM) is commonly viewed as a “spoken” medium, in light of its reputation for informality, non-standard spelling and punctuation, and use of lexical shortenings and emoticons. However, the actual nature of IM is an empirical issue that bears linguistic analysis.

  17. Microprocessorized message multiplexer

    Ejzman, S.; Guglielmi, L.; Jaeger, J.J.

    1980-07-01

    The 'Microprocessorized Message Multiplexer' is an elementary development tool used to create and debug the software of a target microprocessor (User Module: UM). It connects together four devices: a terminal, a cassette recorder, the target microprocessor and a host computer where macro and editor for the M 6800 microprocessor are resident [fr

  18. Grounding in Instant Messaging

    Fox Tree, Jean E.; Mayer, Sarah A.; Betts, Teresa E.

    2011-01-01

    In two experiments, we investigated predictions of the "collaborative theory of language use" (Clark, 1996) as applied to instant messaging (IM). This theory describes how the presence and absence of different grounding constraints causes people to interact differently across different communicative media (Clark & Brennan, 1991). In Study 1, we…

  19. The Prodiguer Messaging Platform

    Denvil, S.; Greenslade, M. A.; Carenton, N.; Levavasseur, G.; Raciazek, J.

    2015-12-01

    CONVERGENCE is a French multi-partner national project designed to gather HPC and informatics expertise to innovate in the context of running French global climate models with differing grids and at differing resolutions. Efficient and reliable execution of these models and the management and dissemination of model output are some of the complexities that CONVERGENCE aims to resolve.At any one moment in time, researchers affiliated with the Institut Pierre Simon Laplace (IPSL) climate modeling group, are running hundreds of global climate simulations. These simulations execute upon a heterogeneous set of French High Performance Computing (HPC) environments. The IPSL's simulation execution runtime libIGCM (library for IPSL Global Climate Modeling group) has recently been enhanced so as to support hitherto impossible realtime use cases such as simulation monitoring, data publication, metrics collection, simulation control, visualizations … etc. At the core of this enhancement is Prodiguer: an AMQP (Advanced Message Queue Protocol) based event driven asynchronous distributed messaging platform. libIGCM now dispatches copious amounts of information, in the form of messages, to the platform for remote processing by Prodiguer software agents at IPSL servers in Paris. Such processing takes several forms: Persisting message content to database(s); Launching rollback jobs upon simulation failure; Notifying downstream applications; Automation of visualization pipelines; We will describe and/or demonstrate the platform's: Technical implementation; Inherent ease of scalability; Inherent adaptiveness in respect to supervising simulations; Web portal receiving simulation notifications in realtime.

  20. 3. Secure Messaging

    Home; Journals; Resonance – Journal of Science Education; Volume 6; Issue 1. Electronic Commerce - Secure Messaging. V Rajaraman. Series Article Volume 6 Issue 1 January 2001 pp 8-17. Fulltext. Click here to view fulltext PDF. Permanent link: https://www.ias.ac.in/article/fulltext/reso/006/01/0008-0017 ...

  1. Passing crisis and emergency risk communications: the effects of communication channel, information type, and repetition.

    Edworthy, Judy; Hellier, Elizabeth; Newbold, Lex; Titchener, Kirsteen

    2015-05-01

    Three experiments explore several factors which influence information transmission when warning messages are passed from person to person. In Experiment 1, messages were passed down chains of participants using five different modes of communication. Written communication channels resulted in more accurate message transmission than verbal. In addition, some elements of the message endured further down the chain than others. Experiment 2 largely replicated these effects and also demonstrated that simple repetition of a message eliminated differences between written and spoken communication. In a final field experiment, chains of participants passed information however they wanted to, with the proviso that half of the chains could not use telephones. Here, the lack of ability to use a telephone did not affect accuracy, but did slow down the speed of transmission from the recipient of the message to the last person in the chain. Implications of the findings for crisis and emergency risk communication are discussed. Copyright © 2015 Elsevier Ltd and The Ergonomics Society. All rights reserved.

  2. A parallel Monte Carlo code for planar and SPECT imaging: implementation, verification and applications in (131)I SPECT.

    Dewaraja, Yuni K; Ljungberg, Michael; Majumdar, Amitava; Bose, Abhijit; Koral, Kenneth F

    2002-02-01

    This paper reports the implementation of the SIMIND Monte Carlo code on an IBM SP2 distributed memory parallel computer. Basic aspects of running Monte Carlo particle transport calculations on parallel architectures are described. Our parallelization is based on equally partitioning photons among the processors and uses the Message Passing Interface (MPI) library for interprocessor communication and the Scalable Parallel Random Number Generator (SPRNG) to generate uncorrelated random number streams. These parallelization techniques are also applicable to other distributed memory architectures. A linear increase in computing speed with the number of processors is demonstrated for up to 32 processors. This speed-up is especially significant in Single Photon Emission Computed Tomography (SPECT) simulations involving higher energy photon emitters, where explicit modeling of the phantom and collimator is required. For (131)I, the accuracy of the parallel code is demonstrated by comparing simulated and experimental SPECT images from a heart/thorax phantom. Clinically realistic SPECT simulations using the voxel-man phantom are carried out to assess scatter and attenuation correction.

  3. CHOLLA: A NEW MASSIVELY PARALLEL HYDRODYNAMICS CODE FOR ASTROPHYSICAL SIMULATION

    Schneider, Evan E.; Robertson, Brant E. [Steward Observatory, University of Arizona, 933 North Cherry Avenue, Tucson, AZ 85721 (United States)

    2015-04-15

    We present Computational Hydrodynamics On ParaLLel Architectures (Cholla ), a new three-dimensional hydrodynamics code that harnesses the power of graphics processing units (GPUs) to accelerate astrophysical simulations. Cholla models the Euler equations on a static mesh using state-of-the-art techniques, including the unsplit Corner Transport Upwind algorithm, a variety of exact and approximate Riemann solvers, and multiple spatial reconstruction techniques including the piecewise parabolic method (PPM). Using GPUs, Cholla evolves the fluid properties of thousands of cells simultaneously and can update over 10 million cells per GPU-second while using an exact Riemann solver and PPM reconstruction. Owing to the massively parallel architecture of GPUs and the design of the Cholla code, astrophysical simulations with physically interesting grid resolutions (≳256{sup 3}) can easily be computed on a single device. We use the Message Passing Interface library to extend calculations onto multiple devices and demonstrate nearly ideal scaling beyond 64 GPUs. A suite of test problems highlights the physical accuracy of our modeling and provides a useful comparison to other codes. We then use Cholla to simulate the interaction of a shock wave with a gas cloud in the interstellar medium, showing that the evolution of the cloud is highly dependent on its density structure. We reconcile the computed mixing time of a turbulent cloud with a realistic density distribution destroyed by a strong shock with the existing analytic theory for spherical cloud destruction by describing the system in terms of its median gas density.

  4. CHOLLA: A NEW MASSIVELY PARALLEL HYDRODYNAMICS CODE FOR ASTROPHYSICAL SIMULATION

    Schneider, Evan E.; Robertson, Brant E.

    2015-01-01

    We present Computational Hydrodynamics On ParaLLel Architectures (Cholla ), a new three-dimensional hydrodynamics code that harnesses the power of graphics processing units (GPUs) to accelerate astrophysical simulations. Cholla models the Euler equations on a static mesh using state-of-the-art techniques, including the unsplit Corner Transport Upwind algorithm, a variety of exact and approximate Riemann solvers, and multiple spatial reconstruction techniques including the piecewise parabolic method (PPM). Using GPUs, Cholla evolves the fluid properties of thousands of cells simultaneously and can update over 10 million cells per GPU-second while using an exact Riemann solver and PPM reconstruction. Owing to the massively parallel architecture of GPUs and the design of the Cholla code, astrophysical simulations with physically interesting grid resolutions (≳256 3 ) can easily be computed on a single device. We use the Message Passing Interface library to extend calculations onto multiple devices and demonstrate nearly ideal scaling beyond 64 GPUs. A suite of test problems highlights the physical accuracy of our modeling and provides a useful comparison to other codes. We then use Cholla to simulate the interaction of a shock wave with a gas cloud in the interstellar medium, showing that the evolution of the cloud is highly dependent on its density structure. We reconcile the computed mixing time of a turbulent cloud with a realistic density distribution destroyed by a strong shock with the existing analytic theory for spherical cloud destruction by describing the system in terms of its median gas density

  5. Parallel implementation of many-body mean-field equations

    Chinn, C.R.; Umar, A.S.; Vallieres, M.; Strayer, M.R.

    1994-01-01

    We describe the numerical methods used to solve the system of stiff, nonlinear partial differential equations resulting from the Hartree-Fock description of many-particle quantum systems, as applied to the structure of the nucleus. The solutions are performed on a three-dimensional Cartesian lattice. Discretization is achieved through the lattice basis-spline collocation method, in which quantum-state vectors and coordinate-space operators are expressed in terms of basis-spline functions on a spatial lattice. All numerical procedures reduce to a series of matrix-vector multiplications and other elementary operations, which we perform on a number of different computing architectures, including the Intel Paragon and the Intel iPSC/860 hypercube. Parallelization is achieved through a combination of mechanisms employing the Gram-Schmidt procedure, broadcasts, global operations, and domain decomposition of state vectors. We discuss the approach to the problems of limited node memory and node-to-node communication overhead inherent in using distributed-memory, multiple-instruction, multiple-data stream parallel computers. An algorithm was developed to reduce the communication overhead by pipelining some of the message passing procedures

  6. Does the Screening Status of Message Characters Affect Message Effects?

    Alber, Julia M.; Glanz, Karen

    2018-01-01

    Public health messages can be used to increase awareness about colorectal cancer screenings. Free or inexpensive images for creating health messages are readily available, yet little is known about how a pictured individual's engagement in the behavior of interest affects message outcomes. Participants (N = 360), aged 50 to 75 years, completed an…

  7. Degree sequence in message transfer

    Yamuna, M.

    2017-11-01

    Message encryption is always an issue in current communication scenario. Methods are being devised using various domains. Graphs satisfy numerous unique properties which can be used for message transfer. In this paper, I propose a message encryption method based on degree sequence of graphs.

  8. Message from Fermilab Director

    2009-01-01

    With this issue’s message, Fermilab Director Pier Oddone opens a new series of occasional exchanges between CERN and other laboratories world-wide. As part of this exchange, CERN Director-General Rolf Heuer, wrote a message in Tuesday’s edition of Fermilab TodayPerspectivesNothing is more important for our worldwide particle physics community than successfully turning on the LHC later this year. The promise for great discoveries is huge, and many of the plans for our future depend on LHC results. Those of us planning national programmes in anticipation of data from the LHC face formidable challenges to develop future facilities that are complementary to the LHC, whatever the physics discoveries may be. At Fermilab, this has led us to move forcefully with a programme at the intensity frontier, where experiments with neutrinos and rare decays open a complementary window into nature. Our ultimate goal for a unified picture of nat...

  9. New adaptive differencing strategy in the PENTRAN 3-d parallel Sn code

    Sjoden, G.E.; Haghighat, A.

    1996-01-01

    It is known that three-dimensional (3-D) discrete ordinates (S n ) transport problems require an immense amount of storage and computational effort to solve. For this reason, parallel codes that offer a capability to completely decompose the angular, energy, and spatial domains among a distributed network of processors are required. One such code recently developed is PENTRAN, which iteratively solves 3-D multi-group, anisotropic S n problems on distributed-memory platforms, such as the IBM-SP2. Because large problems typically contain several different material zones with various properties, available differencing schemes should automatically adapt to the transport physics in each material zone. To minimize the memory and message-passing overhead required for massively parallel S n applications, available differencing schemes in an adaptive strategy should also offer reasonable accuracy and positivity, yet require only the zeroth spatial moment of the transport equation; differencing schemes based on higher spatial moments, in spite of their greater accuracy, require at least twice the amount of storage and communication cost for implementation in a massively parallel transport code. This paper discusses a new adaptive differencing strategy that uses increasingly accurate schemes with low parallel memory and communication overhead. This strategy, implemented in PENTRAN, includes a new scheme, exponential directional averaged (EDA) differencing

  10. Parallel Implementation of Triangular Cellular Automata for Computing Two-Dimensional Elastodynamic Response on Arbitrary Domains

    Leamy, Michael J.; Springer, Adam C.

    In this research we report parallel implementation of a Cellular Automata-based simulation tool for computing elastodynamic response on complex, two-dimensional domains. Elastodynamic simulation using Cellular Automata (CA) has recently been presented as an alternative, inherently object-oriented technique for accurately and efficiently computing linear and nonlinear wave propagation in arbitrarily-shaped geometries. The local, autonomous nature of the method should lead to straight-forward and efficient parallelization. We address this notion on symmetric multiprocessor (SMP) hardware using a Java-based object-oriented CA code implementing triangular state machines (i.e., automata) and the MPI bindings written in Java (MPJ Express). We use MPJ Express to reconfigure our existing CA code to distribute a domain's automata to cores present on a dual quad-core shared-memory system (eight total processors). We note that this message passing parallelization strategy is directly applicable to computer clustered computing, which will be the focus of follow-on research. Results on the shared memory platform indicate nearly-ideal, linear speed-up. We conclude that the CA-based elastodynamic simulator is easily configured to run in parallel, and yields excellent speed-up on SMP hardware.

  11. High-performance parallel approaches for three-dimensional light detection and ranging point clouds gridding

    Rizki, Permata Nur Miftahur; Lee, Heezin; Lee, Minsu; Oh, Sangyoon

    2017-01-01

    With the rapid advance of remote sensing technology, the amount of three-dimensional point-cloud data has increased extraordinarily, requiring faster processing in the construction of digital elevation models. There have been several attempts to accelerate the computation using parallel methods; however, little attention has been given to investigating different approaches for selecting the most suited parallel programming model for a given computing environment. We present our findings and insights identified by implementing three popular high-performance parallel approaches (message passing interface, MapReduce, and GPGPU) on time demanding but accurate kriging interpolation. The performances of the approaches are compared by varying the size of the grid and input data. In our empirical experiment, we demonstrate the significant acceleration by all three approaches compared to a C-implemented sequential-processing method. In addition, we also discuss the pros and cons of each method in terms of usability, complexity infrastructure, and platform limitation to give readers a better understanding of utilizing those parallel approaches for gridding purposes.

  12. Application of parallel computing to seismic damage process simulation of an arch dam

    Zhong Hong; Lin Gao; Li Jianbo

    2010-01-01

    The simulation of damage process of high arch dam subjected to strong earthquake shocks is significant to the evaluation of its performance and seismic safety, considering the catastrophic effect of dam failure. However, such numerical simulation requires rigorous computational capacity. Conventional serial computing falls short of that and parallel computing is a fairly promising solution to this problem. The parallel finite element code PDPAD was developed for the damage prediction of arch dams utilizing the damage model with inheterogeneity of concrete considered. Developed with programming language Fortran, the code uses a master/slave mode for programming, domain decomposition method for allocation of tasks, MPI (Message Passing Interface) for communication and solvers from AZTEC library for solution of large-scale equations. Speedup test showed that the performance of PDPAD was quite satisfactory. The code was employed to study the damage process of a being-built arch dam on a 4-node PC Cluster, with more than one million degrees of freedom considered. The obtained damage mode was quite similar to that of shaking table test, indicating that the proposed procedure and parallel code PDPAD has a good potential in simulating seismic damage mode of arch dams. With the rapidly growing need for massive computation emerged from engineering problems, parallel computing will find more and more applications in pertinent areas.

  13. Experiences in the parallelization of the discrete ordinates method using OpenMP and MPI

    Pautz, A. [TUV Hannover/Sachsen-Anhalt e.V. (Germany); Langenbuch, S. [Gesellschaft fur Anlagen- und Reaktorsicherheit (GRS) mbH (Germany)

    2003-07-01

    The method of Discrete Ordinates is in principle parallelizable to a high degree, since the transport 'mesh sweeps' are mutually independent for all angular directions. However, in the well-known production code Dort such a type of angular domain decomposition has to be done on a spatial line-byline basis, causing the parallelism in the code to be very fine-grained. The construction of scalar fluxes and moments requires a large effort for inter-thread or inter-process communication. We have implemented two different parallelization approaches in Dort: firstly, we have used a shared-memory model suitable for SMP (Symmetric Multiprocessor) machines based on the standard OpenMP. The second approach uses the well-known Message Passing Interface (MPI) to establish communication between parallel processes running in a distributed-memory environment. We investigate the benefits and drawbacks of both models and show first results on performance and scaling behaviour of the parallel Dort code. (authors)

  14. Regional-scale calculation of the LS factor using parallel processing

    Liu, Kai; Tang, Guoan; Jiang, Ling; Zhu, A.-Xing; Yang, Jianyi; Song, Xiaodong

    2015-05-01

    With the increase of data resolution and the increasing application of USLE over large areas, the existing serial implementation of algorithms for computing the LS factor is becoming a bottleneck. In this paper, a parallel processing model based on message passing interface (MPI) is presented for the calculation of the LS factor, so that massive datasets at a regional scale can be processed efficiently. The parallel model contains algorithms for calculating flow direction, flow accumulation, drainage network, slope, slope length and the LS factor. According to the existence of data dependence, the algorithms are divided into local algorithms and global algorithms. Parallel strategy are designed according to the algorithm characters including the decomposition method for maintaining the integrity of the results, optimized workflow for reducing the time taken for exporting the unnecessary intermediate data and a buffer-communication-computation strategy for improving the communication efficiency. Experiments on a multi-node system show that the proposed parallel model allows efficient calculation of the LS factor at a regional scale with a massive dataset.

  15. Teaching Scientific Computing: A Model-Centered Approach to Pipeline and Parallel Programming with C

    Vladimiras Dolgopolovas

    2015-01-01

    Full Text Available The aim of this study is to present an approach to the introduction into pipeline and parallel computing, using a model of the multiphase queueing system. Pipeline computing, including software pipelines, is among the key concepts in modern computing and electronics engineering. The modern computer science and engineering education requires a comprehensive curriculum, so the introduction to pipeline and parallel computing is the essential topic to be included in the curriculum. At the same time, the topic is among the most motivating tasks due to the comprehensive multidisciplinary and technical requirements. To enhance the educational process, the paper proposes a novel model-centered framework and develops the relevant learning objects. It allows implementing an educational platform of constructivist learning process, thus enabling learners’ experimentation with the provided programming models, obtaining learners’ competences of the modern scientific research and computational thinking, and capturing the relevant technical knowledge. It also provides an integral platform that allows a simultaneous and comparative introduction to pipelining and parallel computing. The programming language C for developing programming models and message passing interface (MPI and OpenMP parallelization tools have been chosen for implementation.

  16. Experiences in the parallelization of the discrete ordinates method using OpenMP and MPI

    Pautz, A.; Langenbuch, S.

    2003-01-01

    The method of Discrete Ordinates is in principle parallelizable to a high degree, since the transport 'mesh sweeps' are mutually independent for all angular directions. However, in the well-known production code Dort such a type of angular domain decomposition has to be done on a spatial line-byline basis, causing the parallelism in the code to be very fine-grained. The construction of scalar fluxes and moments requires a large effort for inter-thread or inter-process communication. We have implemented two different parallelization approaches in Dort: firstly, we have used a shared-memory model suitable for SMP (Symmetric Multiprocessor) machines based on the standard OpenMP. The second approach uses the well-known Message Passing Interface (MPI) to establish communication between parallel processes running in a distributed-memory environment. We investigate the benefits and drawbacks of both models and show first results on performance and scaling behaviour of the parallel Dort code. (authors)

  17. Energy and exergy analysis in double-pass solar air heater

    P VELMURUGAN

    mesh) in the second pass, and also by mounting longitudinal fins in the back side of the absorber plate ( ... energy sources. ... indoor solar simulator test facility photographically shown ..... El-khawajah et al [19] who employed multiple parallel.

  18. A Parallel Butterfly Algorithm

    Poulson, Jack; Demanet, Laurent; Maxwell, Nicholas; Ying, Lexing

    2014-01-01

    The butterfly algorithm is a fast algorithm which approximately evaluates a discrete analogue of the integral transform (Equation Presented.) at large numbers of target points when the kernel, K(x, y), is approximately low-rank when restricted to subdomains satisfying a certain simple geometric condition. In d dimensions with O(Nd) quasi-uniformly distributed source and target points, when each appropriate submatrix of K is approximately rank-r, the running time of the algorithm is at most O(r2Nd logN). A parallelization of the butterfly algorithm is introduced which, assuming a message latency of α and per-process inverse bandwidth of β, executes in at most (Equation Presented.) time using p processes. This parallel algorithm was then instantiated in the form of the open-source DistButterfly library for the special case where K(x, y) = exp(iΦ(x, y)), where Φ(x, y) is a black-box, sufficiently smooth, real-valued phase function. Experiments on Blue Gene/Q demonstrate impressive strong-scaling results for important classes of phase functions. Using quasi-uniform sources, hyperbolic Radon transforms, and an analogue of a three-dimensional generalized Radon transform were, respectively, observed to strong-scale from 1-node/16-cores up to 1024-nodes/16,384-cores with greater than 90% and 82% efficiency, respectively. © 2014 Society for Industrial and Applied Mathematics.

  19. A Parallel Butterfly Algorithm

    Poulson, Jack

    2014-02-04

    The butterfly algorithm is a fast algorithm which approximately evaluates a discrete analogue of the integral transform (Equation Presented.) at large numbers of target points when the kernel, K(x, y), is approximately low-rank when restricted to subdomains satisfying a certain simple geometric condition. In d dimensions with O(Nd) quasi-uniformly distributed source and target points, when each appropriate submatrix of K is approximately rank-r, the running time of the algorithm is at most O(r2Nd logN). A parallelization of the butterfly algorithm is introduced which, assuming a message latency of α and per-process inverse bandwidth of β, executes in at most (Equation Presented.) time using p processes. This parallel algorithm was then instantiated in the form of the open-source DistButterfly library for the special case where K(x, y) = exp(iΦ(x, y)), where Φ(x, y) is a black-box, sufficiently smooth, real-valued phase function. Experiments on Blue Gene/Q demonstrate impressive strong-scaling results for important classes of phase functions. Using quasi-uniform sources, hyperbolic Radon transforms, and an analogue of a three-dimensional generalized Radon transform were, respectively, observed to strong-scale from 1-node/16-cores up to 1024-nodes/16,384-cores with greater than 90% and 82% efficiency, respectively. © 2014 Society for Industrial and Applied Mathematics.

  20. High-speed parallel counter

    Gus'kov, B.N.; Kalinnikov, V.A.; Krastev, V.R.; Maksimov, A.N.; Nikityuk, N.M.

    1985-01-01

    This paper describes a high-speed parallel counter that contains 31 inputs and 15 outputs and is implemented by integrated circuits of series 500. The counter is designed for fast sampling of events according to the number of particles that pass simultaneously through the hodoscopic plane of the detector. The minimum delay of the output signals relative to the input is 43 nsec. The duration of the output signals can be varied from 75 to 120 nsec

  1. Performance and scalability analysis of teraflop-scale parallel architectures using multidimensional wavefront applications

    Hoisie, A.; Lubeck, O.; Wasserman, H.

    1998-01-01

    The authors develop a model for the parallel performance of algorithms that consist of concurrent, two-dimensional wavefronts implemented in a message passing environment. The model, based on a LogGP machine parameterization, combines the separate contributions of computation and communication wavefronts. They validate the model on three important supercomputer systems, on up to 500 processors. They use data from a deterministic particle transport application taken from the ASCI workload, although the model is general to any wavefront algorithm implemented on a 2-D processor domain. They also use the validated model to make estimates of performance and scalability of wavefront algorithms on 100-TFLOPS computer systems expected to be in existence within the next decade as part of the ASCI program and elsewhere. In this context, they analyze two problem sizes. The model shows that on the largest such problem (1 billion cells), inter-processor communication performance is not the bottleneck. Single-node efficiency is the dominant factor

  2. Parallel Conjugate Gradient: Effects of Ordering Strategies, Programming Paradigms, and Architectural Platforms

    Oliker, Leonid; Heber, Gerd; Biswas, Rupak

    2000-01-01

    The Conjugate Gradient (CG) algorithm is perhaps the best-known iterative technique to solve sparse linear systems that are symmetric and positive definite. A sparse matrix-vector multiply (SPMV) usually accounts for most of the floating-point operations within a CG iteration. In this paper, we investigate the effects of various ordering and partitioning strategies on the performance of parallel CG and SPMV using different programming paradigms and architectures. Results show that for this class of applications, ordering significantly improves overall performance, that cache reuse may be more important than reducing communication, and that it is possible to achieve message passing performance using shared memory constructs through careful data ordering and distribution. However, a multi-threaded implementation of CG on the Tera MTA does not require special ordering or partitioning to obtain high efficiency and scalability.

  3. Portable Parallel Programming for the Dynamic Load Balancing of Unstructured Grid Applications

    Biswas, Rupak; Das, Sajal K.; Harvey, Daniel; Oliker, Leonid

    1999-01-01

    The ability to dynamically adapt an unstructured -rid (or mesh) is a powerful tool for solving computational problems with evolving physical features; however, an efficient parallel implementation is rather difficult, particularly from the view point of portability on various multiprocessor platforms We address this problem by developing PLUM, tin automatic anti architecture-independent framework for adaptive numerical computations in a message-passing environment. Portability is demonstrated by comparing performance on an SP2, an Origin2000, and a T3E, without any code modifications. We also present a general-purpose load balancer that utilizes symmetric broadcast networks (SBN) as the underlying communication pattern, with a goal to providing a global view of system loads across processors. Experiments on, an SP2 and an Origin2000 demonstrate the portability of our approach which achieves superb load balance at the cost of minimal extra overhead.

  4. An Optimized Parallel FDTD Topology for Challenging Electromagnetic Simulations on Supercomputers

    Shugang Jiang

    2015-01-01

    Full Text Available It may not be a challenge to run a Finite-Difference Time-Domain (FDTD code for electromagnetic simulations on a supercomputer with more than 10 thousands of CPU cores; however, to make FDTD code work with the highest efficiency is a challenge. In this paper, the performance of parallel FDTD is optimized through MPI (message passing interface virtual topology, based on which a communication model is established. The general rules of optimal topology are presented according to the model. The performance of the method is tested and analyzed on three high performance computing platforms with different architectures in China. Simulations including an airplane with a 700-wavelength wingspan, and a complex microstrip antenna array with nearly 2000 elements are performed very efficiently using a maximum of 10240 CPU cores.

  5. A massively parallel algorithm for the collision probability calculations in the Apollo-II code using the PVM library

    Stankovski, Z.

    1995-01-01

    The collision probability method in neutron transport, as applied to 2D geometries, consume a great amount of computer time, for a typical 2D assembly calculation evaluations. Consequently RZ or 3D calculations became prohibitive. In this paper we present a simple but efficient parallel algorithm based on the message passing host/node programing model. Parallelization was applied to the energy group treatment. Such approach permits parallelization of the existing code, requiring only limited modifications. Sequential/parallel computer portability is preserved, witch is a necessary condition for a industrial code. Sequential performances are also preserved. The algorithm is implemented on a CRAY 90 coupled to a 128 processor T3D computer, a 16 processor IBM SP1 and a network of workstations, using the Public Domain PVM library. The tests were executed for a 2D geometry with the standard 99-group library. All results were very satisfactory, the best ones with IBM SP1. Because of heterogeneity of the workstation network, we did ask high performances for this architecture. The same source code was used for all computers. A more impressive advantage of this algorithm will appear in the calculations of the SAPHYR project (with the future fine multigroup library of about 8000 groups) with a massively parallel computer, using several hundreds of processors. (author). 5 refs., 6 figs., 2 tabs

  6. A massively parallel algorithm for the collision probability calculations in the Apollo-II code using the PVM library

    Stankovski, Z.

    1995-01-01

    The collision probability method in neutron transport, as applied to 2D geometries, consume a great amount of computer time, for a typical 2D assembly calculation about 90% of the computing time is consumed in the collision probability evaluations. Consequently RZ or 3D calculations became prohibitive. In this paper the author presents a simple but efficient parallel algorithm based on the message passing host/node programmation model. Parallelization was applied to the energy group treatment. Such approach permits parallelization of the existing code, requiring only limited modifications. Sequential/parallel computer portability is preserved, which is a necessary condition for a industrial code. Sequential performances are also preserved. The algorithm is implemented on a CRAY 90 coupled to a 128 processor T3D computer, a 16 processor IBM SPI and a network of workstations, using the Public Domain PVM library. The tests were executed for a 2D geometry with the standard 99-group library. All results were very satisfactory, the best ones with IBM SPI. Because of heterogeneity of the workstation network, the author did not ask high performances for this architecture. The same source code was used for all computers. A more impressive advantage of this algorithm will appear in the calculations of the SAPHYR project (with the future fine multigroup library of about 8000 groups) with a massively parallel computer, using several hundreds of processors

  7. The simplified spherical harmonics (SP{sub L}) methodology with space and moment decomposition in parallel environments

    Gianluca, Longoni; Alireza, Haghighat [Florida University, Nuclear and Radiological Engineering Department, Gainesville, FL (United States)

    2003-07-01

    In recent years, the SP{sub L} (simplified spherical harmonics) equations have received renewed interest for the simulation of nuclear systems. We have derived the SP{sub L} equations starting from the even-parity form of the S{sub N} equations. The SP{sub L} equations form a system of (L+1)/2 second order partial differential equations that can be solved with standard iterative techniques such as the Conjugate Gradient (CG). We discretized the SP{sub L} equations with the finite-volume approach in a 3-D Cartesian space. We developed a new 3-D general code, Pensp{sub L} (Parallel Environment Neutral-particle SP{sub L}). Pensp{sub L} solves both fixed source and criticality eigenvalue problems. In order to optimize the memory management, we implemented a Compressed Diagonal Storage (CDS) to store the SP{sub L} matrices. Pensp{sub L} includes parallel algorithms for space and moment domain decomposition. The computational load is distributed on different processors, using a mapping function, which maps the 3-D Cartesian space and moments onto processors. The code is written in Fortran 90 using the Message Passing Interface (MPI) libraries for the parallel implementation of the algorithm. The code has been tested on the Pcpen cluster and the parallel performance has been assessed in terms of speed-up and parallel efficiency. (author)

  8. Parallel R

    McCallum, Ethan

    2011-01-01

    It's tough to argue with R as a high-quality, cross-platform, open source statistical software product-unless you're in the business of crunching Big Data. This concise book introduces you to several strategies for using R to analyze large datasets. You'll learn the basics of Snow, Multicore, Parallel, and some Hadoop-related tools, including how to find them, how to use them, when they work well, and when they don't. With these packages, you can overcome R's single-threaded nature by spreading work across multiple CPUs, or offloading work to multiple machines to address R's memory barrier.

  9. North Texas Sediment Budget: Sabine Pass to San Luis Pass

    2006-09-01

    concrete units have been placed over sand-filled fabric tube . .......................................33 Figure 28. Sand-filled fabric tubes protecting...system UTM Zone 15, NAD 83 Longshore drift directions King (in preparation) Based on wave hindcast statistics and limited buoy data Rollover Pass...along with descriptions of the jetties and limited geographic coordinate data1 (Figure 18). The original velum or Mylar sheets from which the report

  10. Domain decomposition parallel computing for transient two-phase flow of nuclear reactors

    Lee, Jae Ryong; Yoon, Han Young [KAERI, Daejeon (Korea, Republic of); Choi, Hyoung Gwon [Seoul National University, Seoul (Korea, Republic of)

    2016-05-15

    KAERI (Korea Atomic Energy Research Institute) has been developing a multi-dimensional two-phase flow code named CUPID for multi-physics and multi-scale thermal hydraulics analysis of Light water reactors (LWRs). The CUPID code has been validated against a set of conceptual problems and experimental data. In this work, the CUPID code has been parallelized based on the domain decomposition method with Message passing interface (MPI) library. For domain decomposition, the CUPID code provides both manual and automatic methods with METIS library. For the effective memory management, the Compressed sparse row (CSR) format is adopted, which is one of the methods to represent the sparse asymmetric matrix. CSR format saves only non-zero value and its position (row and column). By performing the verification for the fundamental problem set, the parallelization of the CUPID has been successfully confirmed. Since the scalability of a parallel simulation is generally known to be better for fine mesh system, three different scales of mesh system are considered: 40000 meshes for coarse mesh system, 320000 meshes for mid-size mesh system, and 2560000 meshes for fine mesh system. In the given geometry, both single- and two-phase calculations were conducted. In addition, two types of preconditioners for a matrix solver were compared: Diagonal and incomplete LU preconditioner. In terms of enhancement of the parallel performance, the OpenMP and MPI hybrid parallel computing for a pressure solver was examined. It is revealed that the scalability of hybrid calculation was enhanced for the multi-core parallel computation.

  11. A 3D gyrokinetic particle-in-cell simulation of fusion plasma microturbulence on parallel computers

    Williams, T. J.

    1992-12-01

    One of the grand challenge problems now supported by HPCC is the Numerical Tokamak Project. A goal of this project is the study of low-frequency micro-instabilities in tokamak plasmas, which are believed to cause energy loss via turbulent thermal transport across the magnetic field lines. An important tool in this study is gyrokinetic particle-in-cell (PIC) simulation. Gyrokinetic, as opposed to fully-kinetic, methods are particularly well suited to the task because they are optimized to study the frequency and wavelength domain of the microinstabilities. Furthermore, many researchers now employ low-noise delta(f) methods to greatly reduce statistical noise by modelling only the perturbation of the gyrokinetic distribution function from a fixed background, not the entire distribution function. In spite of the increased efficiency of these improved algorithms over conventional PIC algorithms, gyrokinetic PIC simulations of tokamak micro-turbulence are still highly demanding of computer power--even fully-vectorized codes on vector supercomputers. For this reason, we have worked for several years to redevelop these codes on massively parallel computers. We have developed 3D gyrokinetic PIC simulation codes for SIMD and MIMD parallel processors, using control-parallel, data-parallel, and domain-decomposition message-passing (DDMP) programming paradigms. This poster summarizes our earlier work on codes for the Connection Machine and BBN TC2000 and our development of a generic DDMP code for distributed-memory parallel machines. We discuss the memory-access issues which are of key importance in writing parallel PIC codes, with special emphasis on issues peculiar to gyrokinetic PIC. We outline the domain decompositions in our new DDMP code and discuss the interplay of different domain decompositions suited for the particle-pushing and field-solution components of the PIC algorithm.

  12. Parallel Lines

    James G. Worner

    2017-05-01

    Full Text Available James Worner is an Australian-based writer and scholar currently pursuing a PhD at the University of Technology Sydney. His research seeks to expose masculinities lost in the shadow of Australia’s Anzac hegemony while exploring new opportunities for contemporary historiography. He is the recipient of the Doctoral Scholarship in Historical Consciousness at the university’s Australian Centre of Public History and will be hosted by the University of Bologna during 2017 on a doctoral research writing scholarship.   ‘Parallel Lines’ is one of a collection of stories, The Shapes of Us, exploring liminal spaces of modern life: class, gender, sexuality, race, religion and education. It looks at lives, like lines, that do not meet but which travel in proximity, simultaneously attracted and repelled. James’ short stories have been published in various journals and anthologies.

  13. Parallel definition of tear film maps on distributed-memory clusters for the support of dry eye diagnosis.

    González-Domínguez, Jorge; Remeseiro, Beatriz; Martín, María J

    2017-02-01

    The analysis of the interference patterns on the tear film lipid layer is a useful clinical test to diagnose dry eye syndrome. This task can be automated with a high degree of accuracy by means of the use of tear film maps. However, the time required by the existing applications to generate them prevents a wider acceptance of this method by medical experts. Multithreading has been previously successfully employed by the authors to accelerate the tear film map definition on multicore single-node machines. In this work, we propose a hybrid message-passing and multithreading parallel approach that further accelerates the generation of tear film maps by exploiting the computational capabilities of distributed-memory systems such as multicore clusters and supercomputers. The algorithm for drawing tear film maps is parallelized using Message Passing Interface (MPI) for inter-node communications and the multithreading support available in the C++11 standard for intra-node parallelization. The original algorithm is modified to reduce the communications and increase the scalability. The hybrid method has been tested on 32 nodes of an Intel cluster (with two 12-core Haswell 2680v3 processors per node) using 50 representative images. Results show that maximum runtime is reduced from almost two minutes using the previous only-multithreaded approach to less than ten seconds using the hybrid method. The hybrid MPI/multithreaded implementation can be used by medical experts to obtain tear film maps in only a few seconds, which will significantly accelerate and facilitate the diagnosis of the dry eye syndrome. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.

  14. A Messaging Infrastructure for WLCG

    Casey, James; Cons, Lionel; Lapka, Wojciech; Paladin, Massimo; Skaburskas, Konstantin

    2011-01-01

    During the EGEE-III project operational tools such as SAM, Nagios, Gridview, the regional Dashboard and GGUS moved to a communication architecture based on ActiveMQ, an open-source enterprise messaging solution. LHC experiments, in particular ATLAS, developed prototypes of systems using the same messaging infrastructure, validating the system for their use-cases. In this paper we describe the WLCG messaging use cases and outline an improved messaging architecture based on the experience gained during the EGEE-III period. We show how this provides a solid basis for many applications, including the grid middleware, to improve their resilience and reliability.

  15. Survey of Instant Messaging Applications Encryption Methods

    Kabakuş, Abdullah; Kara, Resul

    2015-01-01

    Instant messaging applications has already taken the place of traditional Short Messaging Service (SMS) and Multimedia Messaging Service (MMS) due to their popularity and usage easement they provide. Users of instant messaging applications are able to send both text and audio messages, different types of attachments such as photos, videos, contact information to their contacts in real time. Because of instant messaging applications use internet instead of Short Message Service Technical Reali...

  16. What motivates consumers to re-tweet brand content? The impact of information, emotion, and traceability on pass-along behavior

    Araujo, T.; Neijens, P.; Vliegenthart, R.

    2015-01-01

    How do certain cues influence pass-along behavior (re-Tweeting) of brand messages on Twitter? Analyzing 19,343 global brand messages over a three-year period, the authors of this article found that informational cues were predictors of higher levels of re-Tweeting, particularly product details and

  17. Getting Your Message Across: Mobile Phone Text Messaging

    Beecher, Constance C.; Hayungs, Lori

    2017-01-01

    Want to send a message that 99% of your audience will read? Many Extension professionals are familiar with using social media tools to enhance Extension programming. Extension professionals may be less familiar with the use of mobile phone text-based marketing tools. The purpose of this article is to introduce SMS (short message system) marketing…

  18. Automatic Migration from PARMACS to MPI in Parallel Fortran Applications

    Rolf Hempel

    1999-01-01

    Full Text Available The PARMACS message passing interface has been in widespread use by application projects, especially in Europe. With the new MPI standard for message passing, many projects face the problem of replacing PARMACS with MPI. An automatic translation tool has been developed which replaces all PARMACS 6.0 calls in an application program with their corresponding MPI calls. In this paper we describe the mapping of the PARMACS programming model onto MPI. We then present some implementation details of the converter tool.

  19. Chaos-pass filtering in injection-locked semiconductor lasers

    Murakami, Atsushi; Shore, K. Alan

    2005-01-01

    Chaos-pass filtering (CPF) of semiconductor lasers has been studied theoretically. CPF is a phenomenon which occurs in laser chaos synchronization by injection locking and is a fundamental technique for the extraction of messages at the receiver laser in chaotic communications systems. We employ a simple theory based on driven damped oscillators to clarify the physical background of CPF. The receiver laser is optically driven by injection from the transmitter laser. We have numerically investigated the response characteristics of the receiver when it is driven by periodic (message) and chaotic (carrier) signals. It is thereby revealed that the response of the receiver laser in the two cases is quite different. For the periodic drive, the receiver exhibits a response depending on the signal frequency, while the chaotic drive provides a frequency-independent synchronous response to the receiver laser. We verify that the periodic and chaotic drives occur independently in the CPF response, and, consequently, CPF can be clearly understood in the difference of the two drives. Message extraction using CPF is also examined, and the validity of our theoretical explanation for the physical mechanism underlying CPF is thus verified

  20. A high performance image processing platform based on CPU-GPU heterogeneous cluster with parallel image reconstroctions for micro-CT

    Ding Yu; Qi Yujin; Zhang Xuezhu; Zhao Cuilan

    2011-01-01

    In this paper, we report the development of a high-performance image processing platform, which is based on CPU-GPU heterogeneous cluster. Currently, it consists of a Dell Precision T7500 and HP XW8600 workstations with parallel programming and runtime environment, using the message-passing interface (MPI) and CUDA (Compute Unified Device Architecture). We succeeded in developing parallel image processing techniques for 3D image reconstruction of X-ray micro-CT imaging. The results show that a GPU provides a computing efficiency of about 194 times faster than a single CPU, and the CPU-GPU clusters provides a computing efficiency of about 46 times faster than the CPU clusters. These meet the requirements of rapid 3D image reconstruction and real time image display. In conclusion, the use of CPU-GPU heterogeneous cluster is an effective way to build high-performance image processing platform. (authors)

  1. Study of MPI based on parallel MOM on PC clusters for EM-beam scattering by 2-D PEC rough surfaces

    Jun, Ma; Li-Xin, Guo; An-Qi, Wang

    2009-01-01

    This paper firstly applies the finite impulse response filter (FIR) theory combined with the fast Fourier transform (FFT) method to generate two-dimensional Gaussian rough surface. Using the electric field integral equation (EFIE), it introduces the method of moment (MOM) with RWG vector basis function and Galerkin's method to investigate the electromagnetic beam scattering by a two-dimensional PEC Gaussian rough surface on personal computer (PC) clusters. The details of the parallel conjugate gradient method (CGM) for solving the matrix equation are also presented and the numerical simulations are obtained through the message passing interface (MPI) platform on the PC clusters. It finds significantly that the parallel MOM supplies a novel technique for solving a two-dimensional rough surface electromagnetic-scattering problem. The influences of the root-mean-square height, the correlation length and the polarization on the beam scattering characteristics by two-dimensional PEC Gaussian rough surfaces are finally discussed. (classical areas of phenomenology)

  2. A Visualized Message Interface (VMI) for intelligent messaging services

    Endo, T.; Kasahara, H.; Nakagawa, T.

    1984-01-01

    In CCITT, Message Handling Systems (MHS) have been studied from the viewpoint of communications protocol standardization. In addition to MHS services, Message Processing (MP) services, such as image processing, filing and retrieving services, will come into increasing demand in office automation field. These messaging services, including MHS services, can be thought of as Intelligent Messaging (IM) services. IM services include many basic services, optional user facilities and service parameters. Accordingly, it is necessary to deal with these parameters and MP procedures in as systematic and user-friendly a manner as possible. As one step towards realizing a user-friendly IM services interface, the characteristics of IM service parameters are studied and a Visualized Message Interface (VMI) which resembles a conventional letter exchange format is presented. The concept of VMI formation is discussed using the generic document structure concept as well as a Screen Interface and Protocol Interface conversion package

  3. Methodologies and Tools for Tuning Parallel Programs: 80% Art, 20% Science, and 10% Luck

    Yan, Jerry C.; Bailey, David (Technical Monitor)

    1996-01-01

    The need for computing power has forced a migration from serial computation on a single processor to parallel processing on multiprocessors. However, without effective means to monitor (and analyze) program execution, tuning the performance of parallel programs becomes exponentially difficult as program complexity and machine size increase. In the past few years, the ubiquitous introduction of performance tuning tools from various supercomputer vendors (Intel's ParAide, TMC's PRISM, CRI's Apprentice, and Convex's CXtrace) seems to indicate the maturity of performance instrumentation/monitor/tuning technologies and vendors'/customers' recognition of their importance. However, a few important questions remain: What kind of performance bottlenecks can these tools detect (or correct)? How time consuming is the performance tuning process? What are some important technical issues that remain to be tackled in this area? This workshop reviews the fundamental concepts involved in analyzing and improving the performance of parallel and heterogeneous message-passing programs. Several alternative strategies will be contrasted, and for each we will describe how currently available tuning tools (e.g. AIMS, ParAide, PRISM, Apprentice, CXtrace, ATExpert, Pablo, IPS-2) can be used to facilitate the process. We will characterize the effectiveness of the tools and methodologies based on actual user experiences at NASA Ames Research Center. Finally, we will discuss their limitations and outline recent approaches taken by vendors and the research community to address them.

  4. An iterative algorithm for solving the multidimensional neutron diffusion nodal method equations on parallel computers

    Kirk, B.L.; Azmy, Y.Y.

    1992-01-01

    In this paper the one-group, steady-state neutron diffusion equation in two-dimensional Cartesian geometry is solved using the nodal integral method. The discrete variable equations comprise loosely coupled sets of equations representing the nodal balance of neutrons, as well as neutron current continuity along rows or columns of computational cells. An iterative algorithm that is more suitable for solving large problems concurrently is derived based on the decomposition of the spatial domain and is accelerated using successive overrelaxation. This algorithm is very well suited for parallel computers, especially since the spatial domain decomposition occurs naturally, so that the number of iterations required for convergence does not depend on the number of processors participating in the calculation. Implementation of the authors' algorithm on the Intel iPSC/2 hypercube and Sequent Balance 8000 parallel computer is presented, and measured speedup and efficiency for test problems are reported. The results suggest that the efficiency of the hypercube quickly deteriorates when many processors are used, while the Sequent Balance retains very high efficiency for a comparable number of participating processors. This leads to the conjecture that message-passing parallel computers are not as well suited for this algorithm as shared-memory machines

  5. A Parallel Supercomputer Implementation of a Biological Inspired Neural Network and its use for Pattern Recognition

    De Ladurantaye, Vincent; Lavoie, Jean; Bergeron, Jocelyn; Parenteau, Maxime; Lu Huizhong; Pichevar, Ramin; Rouat, Jean

    2012-01-01

    A parallel implementation of a large spiking neural network is proposed and evaluated. The neural network implements the binding by synchrony process using the Oscillatory Dynamic Link Matcher (ODLM). Scalability, speed and performance are compared for 2 implementations: Message Passing Interface (MPI) and Compute Unified Device Architecture (CUDA) running on clusters of multicore supercomputers and NVIDIA graphical processing units respectively. A global spiking list that represents at each instant the state of the neural network is described. This list indexes each neuron that fires during the current simulation time so that the influence of their spikes are simultaneously processed on all computing units. Our implementation shows a good scalability for very large networks. A complex and large spiking neural network has been implemented in parallel with success, thus paving the road towards real-life applications based on networks of spiking neurons. MPI offers a better scalability than CUDA, while the CUDA implementation on a GeForce GTX 285 gives the best cost to performance ratio. When running the neural network on the GTX 285, the processing speed is comparable to the MPI implementation on RQCHP's Mammouth parallel with 64 notes (128 cores).

  6. I/O Parallelization for the Goddard Earth Observing System Data Assimilation System (GEOS DAS)

    Lucchesi, Rob; Sawyer, W.; Takacs, L. L.; Lyster, P.; Zero, J.

    1998-01-01

    The National Aeronautics and Space Administration (NASA) Data Assimilation Office (DAO) at the Goddard Space Flight Center (GSFC) has developed the GEOS DAS, a data assimilation system that provides production support for NASA missions and will support NASA's Earth Observing System (EOS) in the coming years. The GEOS DAS will be used to provide background fields of meteorological quantities to EOS satellite instrument teams for use in their data algorithms as well as providing assimilated data sets for climate studies on decadal time scales. The DAO has been involved in prototyping parallel implementations of the GEOS DAS for a number of years and is now embarking on an effort to convert the production version from shared-memory parallelism to distributed-memory parallelism using the portable Message-Passing Interface (MPI). The GEOS DAS consists of two main components, an atmospheric General Circulation Model (GCM) and a Physical-space Statistical Analysis System (PSAS). The GCM operates on data that are stored on a regular grid while PSAS works with observational data that are scattered irregularly throughout the atmosphere. As a result, the two components have different data decompositions. The GCM is decomposed horizontally as a checkerboard with all vertical levels of each box existing on the same processing element(PE). The dynamical core of the GCM can also operate on a rotated grid, which requires communication-intensive grid transformations during GCM integration. PSAS groups observations on PEs in a more irregular and dynamic fashion.

  7. Connecticut church passes genetics resolution.

    Culliton, B J

    1984-11-09

    The Connecticut Conference of the United Church of Christ, which represents the largest Protestant denomination in the state, has passed a resolution affirming an ethical duty to do research on human gene therapy and is planning to form local church groups to study the scientific and ethical issues involved. The resolution is intended to counter an earlier one proposed by Jeremy Rifkin to ban all efforts at engineering specific traits into the human germline. The Rifkin proposal had been endorsed by a large number of religious leaders, including the head of the U.S. United Church of Christ, but was subsequently characterized by many of the church leaders as overly restrictive.

  8. Xyce Parallel Electronic Simulator - User's Guide, Version 1.0

    HUTCHINSON, SCOTT A; KEITER, ERIC R.; HOEKSTRA, ROBERT J.; WATERS, LON J.; RUSSO, THOMAS V.; RANKIN, ERIC LAMONT; WIX, STEVEN D.

    2002-11-01

    This manual describes the use of the Xyce Parallel Electronic Simulator code for simulating electrical circuits at a variety of abstraction levels. The Xyce Parallel Electronic Simulator has been written to support,in a rigorous manner, the simulation needs of the Sandia National Laboratories electrical designers. As such, the development has focused on improving the capability over the current state-of-the-art in the following areas: (1) Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). Note that this includes support for most popular parallel and serial computers. (2) Improved performance for all numerical kernels (e.g., time integrator, nonlinear and linear solvers) through state-of-the-art algorithms and novel techniques. (3) A client-server or multi-tiered operating model wherein the numerical kernel can operate independently of the graphical user interface (GUI). (4) Object-oriented code design and implementation using modern coding-practices that ensure that the Xyce Parallel Electronic Simulator will be maintainable and extensible far into the future. The code is a parallel code in the most general sense of the phrase--a message passing parallel implementation--which allows it to run efficiently on the widest possible number of computing platforms. These include serial, shared-memory and distributed-memory parallel as well as heterogeneous platforms. Furthermore, careful attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved even as the number of processors grows. Another feature required by designers is the ability to add device models, many specific to the needs of Sandia, to the code. To this end, the device package in the Xyce Parallel Electronic Simulator is designed to support a variety of device model inputs. These input formats include standard analytical models, behavioral models

  9. Parallelization of the Physical-Space Statistical Analysis System (PSAS)

    Larson, J. W.; Guo, J.; Lyster, P. M.

    1999-01-01

    Atmospheric data assimilation is a method of combining observations with model forecasts to produce a more accurate description of the atmosphere than the observations or forecast alone can provide. Data assimilation plays an increasingly important role in the study of climate and atmospheric chemistry. The NASA Data Assimilation Office (DAO) has developed the Goddard Earth Observing System Data Assimilation System (GEOS DAS) to create assimilated datasets. The core computational components of the GEOS DAS include the GEOS General Circulation Model (GCM) and the Physical-space Statistical Analysis System (PSAS). The need for timely validation of scientific enhancements to the data assimilation system poses computational demands that are best met by distributed parallel software. PSAS is implemented in Fortran 90 using object-based design principles. The analysis portions of the code solve two equations. The first of these is the "innovation" equation, which is solved on the unstructured observation grid using a preconditioned conjugate gradient (CG) method. The "analysis" equation is a transformation from the observation grid back to a structured grid, and is solved by a direct matrix-vector multiplication. Use of a factored-operator formulation reduces the computational complexity of both the CG solver and the matrix-vector multiplication, rendering the matrix-vector multiplications as a successive product of operators on a vector. Sparsity is introduced to these operators by partitioning the observations using an icosahedral decomposition scheme. PSAS builds a large (approx. 128MB) run-time database of parameters used in the calculation of these operators. Implementing a message passing parallel computing paradigm into an existing yet developing computational system as complex as PSAS is nontrivial. One of the technical challenges is balancing the requirements for computational reproducibility with the need for high performance. The problem of computational

  10. Parallel discrete ordinates algorithms on distributed and common memory systems

    Wienke, B.R.; Hiromoto, R.E.; Brickner, R.G.

    1987-01-01

    The S/sub n/ algorithm employs iterative techniques in solving the linear Boltzmann equation. These methods, both ordered and chaotic, were compared on both the Denelcor HEP and the Intel hypercube. Strategies are linked to the organization and accessibility of memory (common memory versus distributed memory architectures), with common concern for acquisition of global information. Apart from this, the inherent parallelism of the algorithm maps directly onto the two architectures. Results comparing execution times, speedup, and efficiency are based on a representative 16-group (full upscatter and downscatter) sample problem. Calculations were performed on both the Los Alamos National Laboratory (LANL) Denelcor HEP and the LANL Intel hypercube. The Denelcor HEP is a 64-bit multi-instruction, multidate MIMD machine consisting of up to 16 process execution modules (PEMs), each capable of executing 64 processes concurrently. Each PEM can cooperate on a job, or run several unrelated jobs, and share a common global memory through a crossbar switch. The Intel hypercube, on the other hand, is a distributed memory system composed of 128 processing elements, each with its own local memory. Processing elements are connected in a nearest-neighbor hypercube configuration and sharing of data among processors requires execution of explicit message-passing constructs

  11. GPU-Accelerated Parallel FDTD on Distributed Heterogeneous Platform

    Ronglin Jiang

    2014-01-01

    Full Text Available This paper introduces a (finite difference time domain FDTD code written in Fortran and CUDA for realistic electromagnetic calculations with parallelization methods of Message Passing Interface (MPI and Open Multiprocessing (OpenMP. Since both Central Processing Unit (CPU and Graphics Processing Unit (GPU resources are utilized, a faster execution speed can be reached compared to a traditional pure GPU code. In our experiments, 64 NVIDIA TESLA K20m GPUs and 64 INTEL XEON E5-2670 CPUs are used to carry out the pure CPU, pure GPU, and CPU + GPU tests. Relative to the pure CPU calculations for the same problems, the speedup ratio achieved by CPU + GPU calculations is around 14. Compared to the pure GPU calculations for the same problems, the CPU + GPU calculations have 7.6%–13.2% performance improvement. Because of the small memory size of GPUs, the FDTD problem size is usually very small. However, this code can enlarge the maximum problem size by 25% without reducing the performance of traditional pure GPU code. Finally, using this code, a microstrip antenna array with 16×18 elements is calculated and the radiation patterns are compared with the ones of MoM. Results show that there is a well agreement between them.

  12. Parallel implementation of a Lagrangian-based model on an adaptive mesh in C++: Application to sea-ice

    Samaké, Abdoulaye; Rampal, Pierre; Bouillon, Sylvain; Ólason, Einar

    2017-12-01

    We present a parallel implementation framework for a new dynamic/thermodynamic sea-ice model, called neXtSIM, based on the Elasto-Brittle rheology and using an adaptive mesh. The spatial discretisation of the model is done using the finite-element method. The temporal discretisation is semi-implicit and the advection is achieved using either a pure Lagrangian scheme or an Arbitrary Lagrangian Eulerian scheme (ALE). The parallel implementation presented here focuses on the distributed-memory approach using the message-passing library MPI. The efficiency and the scalability of the parallel algorithms are illustrated by the numerical experiments performed using up to 500 processor cores of a cluster computing system. The performance obtained by the proposed parallel implementation of the neXtSIM code is shown being sufficient to perform simulations for state-of-the-art sea ice forecasting and geophysical process studies over geographical domain of several millions squared kilometers like the Arctic region.

  13. Development of Parallel Computing Framework to Enhance Radiation Transport Code Capabilities for Rare Isotope Beam Facility Design

    Kostin, Mikhail [Michigan State Univ., East Lansing, MI (United States); Mokhov, Nikolai [Fermi National Accelerator Lab. (FNAL), Batavia, IL (United States); Niita, Koji [Research Organization for Information Science and Technology, Ibaraki-ken (Japan)

    2013-09-25

    A parallel computing framework has been developed to use with general-purpose radiation transport codes. The framework was implemented as a C++ module that uses MPI for message passing. It is intended to be used with older radiation transport codes implemented in Fortran77, Fortran 90 or C. The module is significantly independent of radiation transport codes it can be used with, and is connected to the codes by means of a number of interface functions. The framework was developed and tested in conjunction with the MARS15 code. It is possible to use it with other codes such as PHITS, FLUKA and MCNP after certain adjustments. Besides the parallel computing functionality, the framework offers a checkpoint facility that allows restarting calculations with a saved checkpoint file. The checkpoint facility can be used in single process calculations as well as in the parallel regime. The framework corrects some of the known problems with the scheduling and load balancing found in the original implementations of the parallel computing functionality in MARS15 and PHITS. The framework can be used efficiently on homogeneous systems and networks of workstations, where the interference from the other users is possible.

  14. A parallel solver for huge dense linear systems

    Badia, J. M.; Movilla, J. L.; Climente, J. I.; Castillo, M.; Marqués, M.; Mayo, R.; Quintana-Ortí, E. S.; Planelles, J.

    2011-11-01

    HDSS (Huge Dense Linear System Solver) is a Fortran Application Programming Interface (API) to facilitate the parallel solution of very large dense systems to scientists and engineers. The API makes use of parallelism to yield an efficient solution of the systems on a wide range of parallel platforms, from clusters of processors to massively parallel multiprocessors. It exploits out-of-core strategies to leverage the secondary memory in order to solve huge linear systems O(100.000). The API is based on the parallel linear algebra library PLAPACK, and on its Out-Of-Core (OOC) extension POOCLAPACK. Both PLAPACK and POOCLAPACK use the Message Passing Interface (MPI) as the communication layer and BLAS to perform the local matrix operations. The API provides a friendly interface to the users, hiding almost all the technical aspects related to the parallel execution of the code and the use of the secondary memory to solve the systems. In particular, the API can automatically select the best way to store and solve the systems, depending of the dimension of the system, the number of processes and the main memory of the platform. Experimental results on several parallel platforms report high performance, reaching more than 1 TFLOP with 64 cores to solve a system with more than 200 000 equations and more than 10 000 right-hand side vectors. New version program summaryProgram title: Huge Dense System Solver (HDSS) Catalogue identifier: AEHU_v1_1 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEHU_v1_1.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 87 062 No. of bytes in distributed program, including test data, etc.: 1 069 110 Distribution format: tar.gz Programming language: Fortran90, C Computer: Parallel architectures: multiprocessors, computer clusters Operating system

  15. Parallel hyperbolic PDE simulation on clusters: Cell versus GPU

    Rostrup, Scott; De Sterck, Hans

    2010-12-01

    Increasingly, high-performance computing is looking towards data-parallel computational devices to enhance computational performance. Two technologies that have received significant attention are IBM's Cell Processor and NVIDIA's CUDA programming model for graphics processing unit (GPU) computing. In this paper we investigate the acceleration of parallel hyperbolic partial differential equation simulation on structured grids with explicit time integration on clusters with Cell and GPU backends. The message passing interface (MPI) is used for communication between nodes at the coarsest level of parallelism. Optimizations of the simulation code at the several finer levels of parallelism that the data-parallel devices provide are described in terms of data layout, data flow and data-parallel instructions. Optimized Cell and GPU performance are compared with reference code performance on a single x86 central processing unit (CPU) core in single and double precision. We further compare the CPU, Cell and GPU platforms on a chip-to-chip basis, and compare performance on single cluster nodes with two CPUs, two Cell processors or two GPUs in a shared memory configuration (without MPI). We finally compare performance on clusters with 32 CPUs, 32 Cell processors, and 32 GPUs using MPI. Our GPU cluster results use NVIDIA Tesla GPUs with GT200 architecture, but some preliminary results on recently introduced NVIDIA GPUs with the next-generation Fermi architecture are also included. This paper provides computational scientists and engineers who are considering porting their codes to accelerator environments with insight into how structured grid based explicit algorithms can be optimized for clusters with Cell and GPU accelerators. It also provides insight into the speed-up that may be gained on current and future accelerator architectures for this class of applications. Program summaryProgram title: SWsolver Catalogue identifier: AEGY_v1_0 Program summary URL

  16. The Swedish Blood Pass project.

    Berglund, B; Ekblom, B; Ekblom, E; Berglund, L; Kallner, A; Reinebo, P; Lindeberg, S

    2007-06-01

    Manipulation of the blood's oxygen carrying capacity (CaO(2)) through reinfusion of red blood cells, injections of recombinant erythropoietin or by other means results in an increased maximal oxygen uptake and concomitantly enhanced endurance performance. Therefore, there is a need to establish a system--"A Blood Pass"--through which such illegal and unethical methods can be detected. Venous blood samples were taken under standardized conditions from 47 male and female Swedish national and international elite endurance athletes four times during the athletic year of the individual sport (beginning and end of the preparation period and at the beginning and during peak performance in the competition period). In these samples, different hematological values were determined. ON(hes) and OFF(hre) values were calculated according to the formula of Gore et al. A questionnaire regarding training at altitude, alcohol use and other important factors for hematological status was answered by the athletes. There were some individual variations comparing hematological values obtained at different times of the athletic year or at the same time in the athletic year but in different years. However, the median values of all individual hematological, ON(hes) and OFF(hre), values taken at the beginning and the end of the preparation or at the beginning and the end of the competition period, respectively, as well as median values for the preparation and competition periods in the respective sport, were all within the 95% confidence limit (CI) of each comparison. It must be mentioned that there was no gender difference in this respect. This study shows that even if there are some individual variations in different hematological values between different sampling times in the athletic year, median values of important hematological factors are stable over time. It must be emphasized that for each blood sample, the 95% CI in each athlete will be increasingly narrower. The conclusion is that

  17. Link failure detection in a parallel computer

    Archer, Charles J.; Blocksome, Michael A.; Megerian, Mark G.; Smith, Brian E.

    2010-11-09

    Methods, apparatus, and products are disclosed for link failure detection in a parallel computer including compute nodes connected in a rectangular mesh network, each pair of adjacent compute nodes in the rectangular mesh network connected together using a pair of links, that includes: assigning each compute node to either a first group or a second group such that adjacent compute nodes in the rectangular mesh network are assigned to different groups; sending, by each of the compute nodes assigned to the first group, a first test message to each adjacent compute node assigned to the second group; determining, by each of the compute nodes assigned to the second group, whether the first test message was received from each adjacent compute node assigned to the first group; and notifying a user, by each of the compute nodes assigned to the second group, whether the first test message was received.

  18. The PASS project architectural model

    Day, C.T.; Loken, S.; Macfarlane, J.F.

    1994-01-01

    The PASS project has as its goal the implementation of solutions to the foreseen data access problems of the next generation of scientific experiments. The architectural model results from an evaluation of the operational and technical requirements and is described in terms of an abstract reference model, an implementation model and a discussion of some design aspects. The abstract reference model describes a system that matches the requirements in terms of its components and the mechanisms by which they communicate, but does not discuss policy or design issues that would be necessary to match the model to an actual implementation. Some of these issues are discussed, but more detailed design and simulation work will be necessary before choices can be made

  19. Parallelization and implementation of approximate root isolation for nonlinear system by Monte Carlo

    Khosravi, Ebrahim

    1998-12-01

    This dissertation solves a fundamental problem of isolating the real roots of nonlinear systems of equations by Monte-Carlo that were published by Bush Jones. This algorithm requires only function values and can be applied readily to complicated systems of transcendental functions. The implementation of this sequential algorithm provides scientists with the means to utilize function analysis in mathematics or other fields of science. The algorithm, however, is so computationally intensive that the system is limited to a very small set of variables, and this will make it unfeasible for large systems of equations. Also a computational technique was needed for investigating a metrology of preventing the algorithm structure from converging to the same root along different paths of computation. The research provides techniques for improving the efficiency and correctness of the algorithm. The sequential algorithm for this technique was corrected and a parallel algorithm is presented. This parallel method has been formally analyzed and is compared with other known methods of root isolation. The effectiveness, efficiency, enhanced overall performance of the parallel processing of the program in comparison to sequential processing is discussed. The message passing model was used for this parallel processing, and it is presented and implemented on Intel/860 MIMD architecture. The parallel processing proposed in this research has been implemented in an ongoing high energy physics experiment: this algorithm has been used to track neutrinoes in a super K detector. This experiment is located in Japan, and data can be processed on-line or off-line locally or remotely.

  20. Lemon : An MPI parallel I/O library for data encapsulation using LIME

    Deuzeman, Albert; Reker, Siebren; Urbach, Carsten

    We introduce Lemon, an MPI parallel I/O library that provides efficient parallel I/O of both binary and metadata on massively parallel architectures. Motivated by the demands of the lattice Quantum Chromodynamics community, the data is stored in the SciDAC Lattice QCD Interchange Message

  1. WebPASS ICASS (HR Personnel Management)

    US Agency for International Development — WebPASS Joint Administrative Support Platforms Post Administrative Software Suite - U.S. Department of State Executive Officers application suite. Web.PASS is the...

  2. Education and training in radiation protection: a challenge in passing on a difficult and intricate message

    Sabol, Jozef; Hudzietzova, Jana; Rosina, Jozef, E-mail: j.sabol44@gmail.com, E-mail: hudzijan@fbmi.cvut.cz, E-mail: rosina@fbmi.cvut.cz [Faculty of Biomedical Engineering, Czech Technical University in Prague (Czech Republic)

    2013-07-01

    Current radiation protection is a very sophisticated and elaborate domain where, once the information about the exposure of persons is known in terms of the quantity of the effective dose, we can predict resulting radiological consequences related to the stochastic risk to the health of the exposed persons without a need for other details. In fact, the effective dose contains all pertinent information including the average organ dose distribution and relevant radiation and tissue weighting factors which take into account the specific effects of different types of radiation and selected tissue radiosensitivity. Since the effective dose cannot be measured directly, one has to rely on the monitoring of other appropriate measurable quantities and then do some conversions. The current structure of radiation protection quantities includes too many quantities, the definitions of some of which are not easy to understand and interpret. Moreover, there are numerous quantities based on the dose equivalent, such as the equivalent dose, effective dose, committed equivalent dose, committed effective dose, collective equivalent dose, collective effective dose, personal dose equivalent, ambient dose equivalent and directional dose equivalent, where the only unit of Sv is used.. There are a number of cases in open literature reflecting the difficulties and mistakes in the use of radiation protection quantities. Even more complicated situations are encountered in the field, where the staff responsible for personal and workplace monitoring is confused because of so many different quantities and where the staff may not be qualified and experienced enough to be able to make the relevant conversions and interpretations. The paper summarizes our experience in teaching students and lecturing in various training courses addressing radiation protection where the primary task was to ensure that all radiation protection personnel understood the quantities and units used in radiation protection in the correct way consistent with their latest definitions and ICRP recommendations. (author)

  3. A Survey of Rollback-Recovery Protocols in Message-Passing Systems

    1999-06-01

    and M.A. Castillo. "Checkpointing through garbage collection." Technical report. Departamento de Ciencia de la Computation, Escuela de Ingenieria ...between consecutive checkpoints. It can be implemented by using the dirty-bit of the memory protection hardware or by emulating a dirty-bit in software [4...compare the program’s state with the previous checkpoint in software , and writing the difference in a new checkpoint [46]. The required storage and

  4. Message-Passing Receiver for OFDM Systems over Highly Delay-Dispersive Channels

    Barbu, Oana-Elena; Manchón, Carles Navarro; Rom, Christian

    2017-01-01

    Propagation channels with maximum excess delay exceeding the duration of the cyclic prefix (CP) in OFDM systems cause intercarrier and intersymbol interference which, unless accounted for, degrade the receiver performance. Using tools from Bayesian inference and sparse signal reconstruction, we...... derive an iterative algorithm that estimates an approximate representation of the channel impulse response and the noise variance, estimates and cancels the intrinsic interference and decodes the data over a block of symbols. Simulation results show that the receiver employing our algorithm outperforms...

  5. A model based message passing approach for flexible and scalable home automation controllers

    Bienhaus, D. [INNIAS GmbH und Co. KG, Frankenberg (Germany); David, K.; Klein, N.; Kroll, D. [ComTec Kassel Univ., SE Kassel Univ. (Germany); Heerdegen, F.; Jubeh, R.; Zuendorf, A. [Kassel Univ. (Germany). FG Software Engineering; Hofmann, J. [BSC Computer GmbH, Allendorf (Germany)

    2012-07-01

    There is a large variety of home automation systems that are largely proprietary systems from different vendors. In addition, the configuration and administration of home automation systems is frequently a very complex task especially, if more complex functionality shall be achieved. Therefore, an open model for home automation was developed that is especially designed for easy integration of various home automation systems. This solution also provides a simple modeling approach that is inspired by typical home automation components like switches, timers, etc. In addition, a model based technology to achieve rich functionality and usability was implemented. (orig.)

  6. Shared memory and message passing revisited in the many-core era

    CERN. Geneva

    2016-01-01

    In the 70s, Edsgar Dijkstra, Per Brinch Hansen and C.A.R Hoare introduced the fundamental concepts for concurrent computing. It was clear that concrete communication mechanisms were required in order to achieve effective concurrency. Whether you're developing a multithreaded program running on a single node, or a distributed system spanning over hundreds of thousands cores, the choice of the communication mechanism for your system must be done intelligently because of the implicit programmability, performance and scalability trade-offs. With the emergence of many-core computing architectures many assumptions may not be true anymore. In this talk we will try to provide insight on the characteristics of these communication models by providing basic theoretical background and then focus on concrete practical examples based on indicative use case scenarios. The case studies of this presentation cover popular programming models, operating systems and concurrency frameworks in the context of many-core processors.

  7. Education and training in radiation protection: a challenge in passing on a difficult and intricate message

    Sabol, Jozef; Hudzietzova, Jana; Rosina, Jozef

    2013-01-01

    Current radiation protection is a very sophisticated and elaborate domain where, once the information about the exposure of persons is known in terms of the quantity of the effective dose, we can predict resulting radiological consequences related to the stochastic risk to the health of the exposed persons without a need for other details. In fact, the effective dose contains all pertinent information including the average organ dose distribution and relevant radiation and tissue weighting factors which take into account the specific effects of different types of radiation and selected tissue radiosensitivity. Since the effective dose cannot be measured directly, one has to rely on the monitoring of other appropriate measurable quantities and then do some conversions. The current structure of radiation protection quantities includes too many quantities, the definitions of some of which are not easy to understand and interpret. Moreover, there are numerous quantities based on the dose equivalent, such as the equivalent dose, effective dose, committed equivalent dose, committed effective dose, collective equivalent dose, collective effective dose, personal dose equivalent, ambient dose equivalent and directional dose equivalent, where the only unit of Sv is used.. There are a number of cases in open literature reflecting the difficulties and mistakes in the use of radiation protection quantities. Even more complicated situations are encountered in the field, where the staff responsible for personal and workplace monitoring is confused because of so many different quantities and where the staff may not be qualified and experienced enough to be able to make the relevant conversions and interpretations. The paper summarizes our experience in teaching students and lecturing in various training courses addressing radiation protection where the primary task was to ensure that all radiation protection personnel understood the quantities and units used in radiation protection in the correct way consistent with their latest definitions and ICRP recommendations. (author)

  8. Xyce Parallel Electronic Simulator : users' guide, version 2.0.

    Hoekstra, Robert John; Waters, Lon J.; Rankin, Eric Lamont; Fixel, Deborah A.; Russo, Thomas V.; Keiter, Eric Richard; Hutchinson, Scott Alan; Pawlowski, Roger Patrick; Wix, Steven D.

    2004-06-01

    This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator capable of simulating electrical circuits at a variety of abstraction levels. Primarily, Xyce has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability the current state-of-the-art in the following areas: {sm_bullet} Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). Note that this includes support for most popular parallel and serial computers. {sm_bullet} Improved performance for all numerical kernels (e.g., time integrator, nonlinear and linear solvers) through state-of-the-art algorithms and novel techniques. {sm_bullet} Device models which are specifically tailored to meet Sandia's needs, including many radiation-aware devices. {sm_bullet} A client-server or multi-tiered operating model wherein the numerical kernel can operate independently of the graphical user interface (GUI). {sm_bullet} Object-oriented code design and implementation using modern coding practices that ensure that the Xyce Parallel Electronic Simulator will be maintainable and extensible far into the future. Xyce is a parallel code in the most general sense of the phrase - a message passing of computing platforms. These include serial, shared-memory and distributed-memory parallel implementation - which allows it to run efficiently on the widest possible number parallel as well as heterogeneous platforms. Careful attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows. One feature required by designers is the ability to add device models, many specific to the needs of Sandia, to the code. To this end, the device package in the Xyce

  9. Parallel keyed hash function construction based on chaotic maps

    Xiao Di; Liao Xiaofeng; Deng Shaojiang

    2008-01-01

    Recently, a variety of chaos-based hash functions have been proposed. Nevertheless, none of them works efficiently in parallel computing environment. In this Letter, an algorithm for parallel keyed hash function construction is proposed, whose structure can ensure the uniform sensitivity of hash value to the message. By means of the mechanism of both changeable-parameter and self-synchronization, the keystream establishes a close relation with the algorithm key, the content and the order of each message block. The entire message is modulated into the chaotic iteration orbit, and the coarse-graining trajectory is extracted as the hash value. Theoretical analysis and computer simulation indicate that the proposed algorithm can satisfy the performance requirements of hash function. It is simple, efficient, practicable, and reliable. These properties make it a good choice for hash on parallel computing platform

  10. Distributed parallel cooperative coevolutionary multi-objective large-scale immune algorithm for deployment of wireless sensor networks

    Cao, Bin; Zhao, Jianwei; Yang, Po

    2018-01-01

    -objective evolutionary algorithms the Cooperative Coevolutionary Generalized Differential Evolution 3, the Cooperative Multi-objective Differential Evolution and the Nondominated Sorting Genetic Algorithm III, the proposed algorithm addresses the deployment optimization problem efficiently and effectively.......Using immune algorithms is generally a time-intensive process especially for problems with a large number of variables. In this paper, we propose a distributed parallel cooperative coevolutionary multi-objective large-scale immune algorithm that is implemented using the message passing interface...... (MPI). The proposed algorithm is composed of three layers: objective, group and individual layers. First, for each objective in the multi-objective problem to be addressed, a subpopulation is used for optimization, and an archive population is used to optimize all the objectives. Second, the large...

  11. Optimal data replication: A new approach to optimizing parallel EM algorithms on a mesh-connected multiprocessor for 3D PET image reconstruction

    Chen, C.M.; Lee, S.Y.

    1995-01-01

    The EM algorithm promises an estimated image with the maximal likelihood for 3D PET image reconstruction. However, due to its long computation time, the EM algorithm has not been widely used in practice. While several parallel implementations of the EM algorithm have been developed to make the EM algorithm feasible, they do not guarantee an optimal parallelization efficiency. In this paper, the authors propose a new parallel EM algorithm which maximizes the performance by optimizing data replication on a mesh-connected message-passing multiprocessor. To optimize data replication, the authors have formally derived the optimal allocation of shared data, group sizes, integration and broadcasting of replicated data as well as the scheduling of shared data accesses. The proposed parallel EM algorithm has been implemented on an iPSC/860 with 16 PEs. The experimental and theoretical results, which are consistent with each other, have shown that the proposed parallel EM algorithm could improve performance substantially over those using unoptimized data replication

  12. Parallel algorithms for nuclear reactor analysis via domain decomposition method

    Kim, Yong Hee

    1995-02-01

    In this thesis, the neutron diffusion equation in reactor physics is discretized by the finite difference method and is solved on a parallel computer network which is composed of T-800 transputers. T-800 transputer is a message-passing type MIMD (multiple instruction streams and multiple data streams) architecture. A parallel variant of Schwarz alternating procedure for overlapping subdomains is developed with domain decomposition. The thesis provides convergence analysis and improvement of the convergence of the algorithm. The convergence of the parallel Schwarz algorithms with DN(or ND), DD, NN, and mixed pseudo-boundary conditions(a weighted combination of Dirichlet and Neumann conditions) is analyzed for both continuous and discrete models in two-subdomain case and various underlying features are explored. The analysis shows that the convergence rate of the algorithm highly depends on the pseudo-boundary conditions and the theoretically best one is the mixed boundary conditions(MM conditions). Also it is shown that there may exist a significant discrepancy between continuous model analysis and discrete model analysis. In order to accelerate the convergence of the parallel Schwarz algorithm, relaxation in pseudo-boundary conditions is introduced and the convergence analysis of the algorithm for two-subdomain case is carried out. The analysis shows that under-relaxation of the pseudo-boundary conditions accelerates the convergence of the parallel Schwarz algorithm if the convergence rate without relaxation is negative, and any relaxation(under or over) decelerates convergence if the convergence rate without relaxation is positive. Numerical implementation of the parallel Schwarz algorithm on an MIMD system requires multi-level iterations: two levels for fixed source problems, three levels for eigenvalue problems. Performance of the algorithm turns out to be very sensitive to the iteration strategy. In general, multi-level iterations provide good performance when

  13. GPS Ephemeris Message Broadcast Simulation

    Browne, Nathan J; Light, James J

    2005-01-01

    The warfighter constantly needs increased accuracy from GPS and a means to increasing this accuracy to the decimeter level is a broadcast ephemeris message containing GPS satellite orbit and clock corrections...

  14. Military Message Experiment. Volume II.

    1982-04-01

    elements of the Department of Defense. This resulted in a memorandum from the Director, Telecomunications and Comand and Control, OSD, in June 1975...1978 to April 1979 and provides a discussion of the telecomunications inter- face aspects of the experiment. This Final Report covers the period of...arise in the telecomunication system which require A retransmission of an outgoing message. A "service" message may be created within the

  15. Triplets pass their pressure test

    2007-01-01

    All the LHC inner triplets have now been repaired and are in position. The first ones have passed their pressure tests with flying colours. The repaired inner triplet at LHC Point 1, right side (1R). Ranko Ostojic (on the right), who headed the team responsible for repairing the triplets, shows the magnet to Robert Zimmer, President of the University of Chicago and of Fermi Research Alliance, who visited CERN on 20th August.Three cheers for the triplets! All the LHC inner triplets have now been repaired and are in position in the tunnel. Thanks to the mobilisation of a multidisciplinary team from CERN and Fermilab, assisted by the KEK Laboratory and the Lawrence Berkeley National Laboratory (LBNL), a solution has been found, tested, validated and applied. At the end of March this year, one of the inner triplets at Point 5 failed to withstand a pressure test. A fault was identified in the supports of two out of the three quadruple magne...

  16. Effect of Message Format and Content on Attitude Accessibility Regarding Sexually Transmitted Infections.

    Jain, Parul; Hoffman, Eric; Beam, Michael; Xu, Shan Susan

    2017-11-01

    Sexually transmitted infections (STIs) are widespread in the United States among people ages 15-24 years and cost almost $16 billion yearly. It is therefore important to understand message design strategies that could help reduce these numbers. Guided by exemplification theory and the extended parallel process model (EPPM), this study examines the influence of message format and the presence versus absence of a graphic image on recipients' accessibility of STI attitudes regarding safe sex. Results of the experiment indicate a significant effect from testimonial messages on increased attitude accessibility regarding STIs compared to statistical messages. Results also indicate a conditional indirect effect of testimonial messages on STI attitude accessibility, though threat is greater when a graphic image is included. Implications and directions for future research are discussed.

  17. Integrating marker passing and problem solving a spreading activation approach to improved choice in planning

    Hendler, James A

    2014-01-01

    A recent area of interest in the Artificial Intelligence community has been the application of massively parallel algorithms to enhance the choice mechanism in traditional AI problems. This volume provides a detailed description of how marker-passing -- a parallel, non-deductive, spreading activation algorithm -- is a powerful approach to refining the choice mechanisms in an AI problem-solving system. The author scrutinizes the design of both the algorithm and the system, and then reviews the current literature and research in planning and marker passing. Also included: a comparison of this

  18. MessageSpace: a messaging system for health research

    Escobar, Rodrigo D.; Akopian, David; Parra-Medina, Deborah; Esparza, Laura

    2013-03-01

    Mobile Health (mHealth) has emerged as a promising direction for delivery of healthcare services via mobile communication devices such as cell phones. Examples include texting-based interventions for chronic disease monitoring, diabetes management, control of hypertension, smoking cessation, monitoring medication adherence, appointment keeping and medical test result delivery; as well as improving patient-provider communication, health information communication, data collection and access to health records. While existing messaging systems very well support bulk messaging and some polling applications, they are not designed for data collection and processing of health research oriented studies. For that reason known studies based on text-messaging campaigns have been constrained in participant numbers. In order to empower healthcare promotion and education research, this paper presents a system dedicated for healthcare research. It is designed for convenient communication with various study groups, feedback collection and automated processing.

  19. Reactions to threatening health messages.

    Ten Hoor, Gill A; Peters, Gjalt-Jorn Y; Kalagi, Janice; de Groot, Lianne; Grootjans, Karlijne; Huschens, Alexander; Köhninger, Constanze; Kölgen, Lizan; Pelssers, Isabelle; Schütt, Toby; Thomas, Sophia; Ruiter, Robert A C; Kok, Gerjo

    2012-11-21

    Threatening health messages that focus on severity are popular, but frequently have no effect or even a counterproductive effect on behavior change. This paradox (i.e. wide application despite low effectiveness) may be partly explained by the intuitive appeal of threatening communication: it may be hard to predict the defensive reactions occurring in response to fear appeals. We examine this hypothesis by using two studies by Brown and colleagues, which provide evidence that threatening health messages in the form of distressing imagery in anti-smoking and anti-alcohol campaigns cause defensive reactions. We simulated both Brown et al. experiments, asking participants to estimate the reactions of the original study subjects to the threatening health information (n = 93). Afterwards, we presented the actual original study outcomes. One week later, we assessed whether this knowledge of the actual study outcomes helped participants to more successfully estimate the effectiveness of the threatening health information (n = 72). Results showed that participants were initially convinced of the effectiveness of threatening health messages and were unable to anticipate the defensive reactions that in fact occurred. Furthermore, these estimates did not improve after participants had been explained the dynamics of threatening communication as well as what the effects of the threatening communication had been in reality. These findings are consistent with the hypothesis that the effectiveness of threatening health messages is intuitively appealing. What is more, providing empirical evidence against the use of threatening health messages has very little effect on this intuitive appeal.

  20. A PARALLEL MONTE CARLO CODE FOR SIMULATING COLLISIONAL N-BODY SYSTEMS

    Pattabiraman, Bharath; Umbreit, Stefan; Liao, Wei-keng; Choudhary, Alok; Kalogera, Vassiliki; Memik, Gokhan; Rasio, Frederic A., E-mail: bharath@u.northwestern.edu [Center for Interdisciplinary Exploration and Research in Astrophysics, Northwestern University, Evanston, IL (United States)

    2013-02-15

    We present a new parallel code for computing the dynamical evolution of collisional N-body systems with up to N {approx} 10{sup 7} particles. Our code is based on the Henon Monte Carlo method for solving the Fokker-Planck equation, and makes assumptions of spherical symmetry and dynamical equilibrium. The principal algorithmic developments involve optimizing data structures and the introduction of a parallel random number generation scheme as well as a parallel sorting algorithm required to find nearest neighbors for interactions and to compute the gravitational potential. The new algorithms we introduce along with our choice of decomposition scheme minimize communication costs and ensure optimal distribution of data and workload among the processing units. Our implementation uses the Message Passing Interface library for communication, which makes it portable to many different supercomputing architectures. We validate the code by calculating the evolution of clusters with initial Plummer distribution functions up to core collapse with the number of stars, N, spanning three orders of magnitude from 10{sup 5} to 10{sup 7}. We find that our results are in good agreement with self-similar core-collapse solutions, and the core-collapse times generally agree with expectations from the literature. Also, we observe good total energy conservation, within {approx}< 0.04% throughout all simulations. We analyze the performance of the code, and demonstrate near-linear scaling of the runtime with the number of processors up to 64 processors for N = 10{sup 5}, 128 for N = 10{sup 6} and 256 for N = 10{sup 7}. The runtime reaches saturation with the addition of processors beyond these limits, which is a characteristic of the parallel sorting algorithm. The resulting maximum speedups we achieve are approximately 60 Multiplication-Sign , 100 Multiplication-Sign , and 220 Multiplication-Sign , respectively.

  1. A PARALLEL MONTE CARLO CODE FOR SIMULATING COLLISIONAL N-BODY SYSTEMS

    Pattabiraman, Bharath; Umbreit, Stefan; Liao, Wei-keng; Choudhary, Alok; Kalogera, Vassiliki; Memik, Gokhan; Rasio, Frederic A.

    2013-01-01

    We present a new parallel code for computing the dynamical evolution of collisional N-body systems with up to N ∼ 10 7 particles. Our code is based on the Hénon Monte Carlo method for solving the Fokker-Planck equation, and makes assumptions of spherical symmetry and dynamical equilibrium. The principal algorithmic developments involve optimizing data structures and the introduction of a parallel random number generation scheme as well as a parallel sorting algorithm required to find nearest neighbors for interactions and to compute the gravitational potential. The new algorithms we introduce along with our choice of decomposition scheme minimize communication costs and ensure optimal distribution of data and workload among the processing units. Our implementation uses the Message Passing Interface library for communication, which makes it portable to many different supercomputing architectures. We validate the code by calculating the evolution of clusters with initial Plummer distribution functions up to core collapse with the number of stars, N, spanning three orders of magnitude from 10 5 to 10 7 . We find that our results are in good agreement with self-similar core-collapse solutions, and the core-collapse times generally agree with expectations from the literature. Also, we observe good total energy conservation, within ∼ 5 , 128 for N = 10 6 and 256 for N = 10 7 . The runtime reaches saturation with the addition of processors beyond these limits, which is a characteristic of the parallel sorting algorithm. The resulting maximum speedups we achieve are approximately 60×, 100×, and 220×, respectively.

  2. Homemade Buckeye-Pi: A Learning Many-Node Platform for High-Performance Parallel Computing

    Amooie, M. A.; Moortgat, J.

    2017-12-01

    We report on the "Buckeye-Pi" cluster, the supercomputer developed in The Ohio State University School of Earth Sciences from 128 inexpensive Raspberry Pi (RPi) 3 Model B single-board computers. Each RPi is equipped with fast Quad Core 1.2GHz ARMv8 64bit processor, 1GB of RAM, and 32GB microSD card for local storage. Therefore, the cluster has a total RAM of 128GB that is distributed on the individual nodes and a flash capacity of 4TB with 512 processors, while it benefits from low power consumption, easy portability, and low total cost. The cluster uses the Message Passing Interface protocol to manage the communications between each node. These features render our platform the most powerful RPi supercomputer to date and suitable for educational applications in high-performance-computing (HPC) and handling of large datasets. In particular, we use the Buckeye-Pi to implement optimized parallel codes in our in-house simulator for subsurface media flows with the goal of achieving a massively-parallelized scalable code. We present benchmarking results for the computational performance across various number of RPi nodes. We believe our project could inspire scientists and students to consider the proposed unconventional cluster architecture as a mainstream and a feasible learning platform for challenging engineering and scientific problems.

  3. Three-Dimensional Induced Polarization Parallel Inversion Using Nonlinear Conjugate Gradients Method

    Huan Ma

    2015-01-01

    Full Text Available Four kinds of array of induced polarization (IP methods (surface, borehole-surface, surface-borehole, and borehole-borehole are widely used in resource exploration. However, due to the presence of large amounts of the sources, it will take much time to complete the inversion. In the paper, a new parallel algorithm is described which uses message passing interface (MPI and graphics processing unit (GPU to accelerate 3D inversion of these four methods. The forward finite differential equation is solved by ILU0 preconditioner and the conjugate gradient (CG solver. The inverse problem is solved by nonlinear conjugate gradients (NLCG iteration which is used to calculate one forward and two “pseudo-forward” modelings and update the direction, space, and model in turn. Because each source is independent in forward and “pseudo-forward” modelings, multiprocess modes are opened by calling MPI library. The iterative matrix solver within CULA is called in each process. Some tables and synthetic data examples illustrate that this parallel inversion algorithm is effective. Furthermore, we demonstrate that the joint inversion of surface and borehole data produces resistivity and chargeability results are superior to those obtained from inversions of individual surface data.

  4. Enhancing Application Performance Using Mini-Apps: Comparison of Hybrid Parallel Programming Paradigms

    Lawson, Gary; Sosonkina, Masha; Baurle, Robert; Hammond, Dana

    2017-01-01

    In many fields, real-world applications for High Performance Computing have already been developed. For these applications to stay up-to-date, new parallel strategies must be explored to yield the best performance; however, restructuring or modifying a real-world application may be daunting depending on the size of the code. In this case, a mini-app may be employed to quickly explore such options without modifying the entire code. In this work, several mini-apps have been created to enhance a real-world application performance, namely the VULCAN code for complex flow analysis developed at the NASA Langley Research Center. These mini-apps explore hybrid parallel programming paradigms with Message Passing Interface (MPI) for distributed memory access and either Shared MPI (SMPI) or OpenMP for shared memory accesses. Performance testing shows that MPI+SMPI yields the best execution performance, while requiring the largest number of code changes. A maximum speedup of 23 was measured for MPI+SMPI, but only 11 was measured for MPI+OpenMP.

  5. An Efficient Parallel Multi-Scale Segmentation Method for Remote Sensing Imagery

    Haiyan Gu

    2018-04-01

    Full Text Available Remote sensing (RS image segmentation is an essential step in geographic object-based image analysis (GEOBIA to ultimately derive “meaningful objects”. While many segmentation methods exist, most of them are not efficient for large data sets. Thus, the goal of this research is to develop an efficient parallel multi-scale segmentation method for RS imagery by combining graph theory and the fractal net evolution approach (FNEA. Specifically, a minimum spanning tree (MST algorithm in graph theory is proposed to be combined with a minimum heterogeneity rule (MHR algorithm that is used in FNEA. The MST algorithm is used for the initial segmentation while the MHR algorithm is used for object merging. An efficient implementation of the segmentation strategy is presented using data partition and the “reverse searching-forward processing” chain based on message passing interface (MPI parallel technology. Segmentation results of the proposed method using images from multiple sensors (airborne, SPECIM AISA EAGLE II, WorldView-2, RADARSAT-2 and different selected landscapes (residential/industrial, residential/agriculture covering four test sites indicated its efficiency in accuracy and speed. We conclude that the proposed method is applicable and efficient for the segmentation of a variety of RS imagery (airborne optical, satellite optical, SAR, high-spectral, while the accuracy is comparable with that of the FNEA method.

  6. A Parallel, Finite-Volume Algorithm for Large-Eddy Simulation of Turbulent Flows

    Bui, Trong T.

    1999-01-01

    A parallel, finite-volume algorithm has been developed for large-eddy simulation (LES) of compressible turbulent flows. This algorithm includes piecewise linear least-square reconstruction, trilinear finite-element interpolation, Roe flux-difference splitting, and second-order MacCormack time marching. Parallel implementation is done using the message-passing programming model. In this paper, the numerical algorithm is described. To validate the numerical method for turbulence simulation, LES of fully developed turbulent flow in a square duct is performed for a Reynolds number of 320 based on the average friction velocity and the hydraulic diameter of the duct. Direct numerical simulation (DNS) results are available for this test case, and the accuracy of this algorithm for turbulence simulations can be ascertained by comparing the LES solutions with the DNS results. The effects of grid resolution, upwind numerical dissipation, and subgrid-scale dissipation on the accuracy of the LES are examined. Comparison with DNS results shows that the standard Roe flux-difference splitting dissipation adversely affects the accuracy of the turbulence simulation. For accurate turbulence simulations, only 3-5 percent of the standard Roe flux-difference splitting dissipation is needed.

  7. Oil price pass-through into inflation

    Chen, Shiu-Sheng

    2009-01-01

    This paper uses data from 19 industrialized countries to investigate oil price pass-through into inflation across countries and over time. A time-varying pass-through coefficient is estimated and the determinants of the recent declining effects of oil shocks on inflation are investigated. The appreciation of the domestic currency, a more active monetary policy in response to inflation, and a higher degree of trade openness are found to explain the decline in oil price pass-through. (author)

  8. Photoacoustic Soot Spectrometer (PASS) Instrument Handbook

    Dubey, M [Los Alamos National Laboratory; Springston, S [Brookhaven National Laboratory; Koontz, A [Pacific Northwest National Laboratory; Aiken, A [Los Alamos National Laboratory

    2013-01-17

    The photoacoustic soot spectrometer (PASS) measures light absorption by aerosol particles. As the particles pass through a laser beam, the absorbed energy heats the particles and in turn the surrounding air, which sets off a pressure wave that can be detected by a microphone. The PASS instruments deployed by ARM can also simultaneously measure the scattered laser light at three wavelengths and therefore provide a direct measure of the single-scattering albedo. The Operator Manual for the PASS-3100 is included here with the permission of Droplet Measurement Technologies, the instrument’s manufacturer.

  9. Possibilities of the fish pass restoration

    Čubanová, Lea

    2018-03-01

    According to the new elaborated methodology of the Ministry of Environment of the Slovak Republic: Identification of the appropriate fish pass types according to water body typology (2015) each barrier on the river must be passable. On the barriers or structures without fish passes new ones should be design and built and on some water structures with existed but nonfunctional fish passes must be realized reconstruction or restoration of such objects. Assessment should be done in terms of the existing migratory fish fauna and hydraulic conditions. Fish fauna requirements resulting from the ichthyological research of the river section with barrier. Hydraulic conditions must than fulfil these requirements inside the fish pass body.

  10. Extracting messages masked by chaos

    Perez, G.; Cerdeira, H.A.

    1995-01-01

    We show how to extract messages that are masked by a chaotic signal in a system of two Lorenz oscillators. This mask removal is done for two different modes of transmission, a digital one where a parameter of the sender is switched between two values, and an analog mode, where a small amplitude message is added to the carrier signal. We achieve this without using a second Lorenz oscillator as receiver, and without doing a full reconstruction of the dynamics. This method is robust with respect to transformations that impede the unmasking using a Lorenz receiver, and is not affected by the broad-band noise that is inherent to the synchronization process. We also discuss the limitations of this way of extraction for messages in high frequency bands. (author). 12 refs, 4 figs

  11. EDITORIAL: Message from the Editor

    Thomas, Paul

    2009-01-01

    The end of 2008 cannot pass without remarking that the economic news has repeatedly strengthened the case for nuclear fusion; not perhaps to solve the immediate crises but to offer long-term security of energy supply. Although temporary, the passage of the price of oil through 100 per barrel is a portent of things to come and should bolster our collective determination to develop nuclear fusion into a viable energy source. It is with great pride, therefore, that I can highlight the contributions that the Nuclear Fusion journal has made to the research programme and the consolidation of its position as the lead journal in the field. Of course, the journal would be nothing without its authors and referees and I would like to pass on my sincere thanks to them all for their work in 2008 and look forward to a continuing, successful collaboration in 2009. Refereeing The Nuclear Fusion Editorial Office understands how much effort is required of our referees. The Editorial Board decided that an expression of thanks to our most loyal referees is appropriate and so, since January 2005, we have been offering the top ten most loyal referees over the past year a personal subscription to Nuclear Fusion with electronic access for one year, free of charge. To select the top referees we have adopted the criterion that a researcher should have acted as a referee or adjudicator for at least two different manuscripts during the period from November 2007 to November 2008 and provided particularly detailed advice to the authors. We have excluded our Board members and those referees who were already listed in the last four years. According to our records the following people met this criterion. Congratulations and many, many thanks! T. Hino (Hokkaido University, Japan) M. Sugihara (ITER Cadarache, France) M. Dreval (Saskatchewan University, Canada) M. Fenstermacher (General Atomics, USA) V.S. Marchenko (Institute for Nuclear Research, Ukraine) G.V. Pereverzev (Max-Planck-Institut fuer

  12. Parallel Programming with Intel Parallel Studio XE

    Blair-Chappell , Stephen

    2012-01-01

    Optimize code for multi-core processors with Intel's Parallel Studio Parallel programming is rapidly becoming a "must-know" skill for developers. Yet, where to start? This teach-yourself tutorial is an ideal starting point for developers who already know Windows C and C++ and are eager to add parallelism to their code. With a focus on applying tools, techniques, and language extensions to implement parallelism, this essential resource teaches you how to write programs for multicore and leverage the power of multicore in your programs. Sharing hands-on case studies and real-world examples, the

  13. Analysis of Parallel Algorithms on SMP Node and Cluster of Workstations Using Parallel Programming Models with New Tile-based Method for Large Biological Datasets

    Shrimankar, D. D.; Sathe, S. R.

    2016-01-01

    Sequence alignment is an important tool for describing the relationships between DNA sequences. Many sequence alignment algorithms exist, differing in efficiency, in their models of the sequences, and in the relationship between sequences. The focus of this study is to obtain an optimal alignment between two sequences of biological data, particularly DNA sequences. The algorithm is discussed with particular emphasis on time, speedup, and efficiency optimizations. Parallel programming presents a number of critical challenges to application developers. Today’s supercomputer often consists of clusters of SMP nodes. Programming paradigms such as OpenMP and MPI are used to write parallel codes for such architectures. However, the OpenMP programs cannot be scaled for more than a single SMP node. However, programs written in MPI can have more than single SMP nodes. But such a programming paradigm has an overhead of internode communication. In this work, we explore the tradeoffs between using OpenMP and MPI. We demonstrate that the communication overhead incurs significantly even in OpenMP loop execution and increases with the number of cores participating. We also demonstrate a communication model to approximate the overhead from communication in OpenMP loops. Our results are astonishing and interesting to a large variety of input data files. We have developed our own load balancing and cache optimization technique for message passing model. Our experimental results show that our own developed techniques give optimum performance of our parallel algorithm for various sizes of input parameter, such as sequence size and tile size, on a wide variety of multicore architectures. PMID:27932868

  14. Analysis of Parallel Algorithms on SMP Node and Cluster of Workstations Using Parallel Programming Models with New Tile-based Method for Large Biological Datasets.

    Shrimankar, D D; Sathe, S R

    2016-01-01

    Sequence alignment is an important tool for describing the relationships between DNA sequences. Many sequence alignment algorithms exist, differing in efficiency, in their models of the sequences, and in the relationship between sequences. The focus of this study is to obtain an optimal alignment between two sequences of biological data, particularly DNA sequences. The algorithm is discussed with particular emphasis on time, speedup, and efficiency optimizations. Parallel programming presents a number of critical challenges to application developers. Today's supercomputer often consists of clusters of SMP nodes. Programming paradigms such as OpenMP and MPI are used to write parallel codes for such architectures. However, the OpenMP programs cannot be scaled for more than a single SMP node. However, programs written in MPI can have more than single SMP nodes. But such a programming paradigm has an overhead of internode communication. In this work, we explore the tradeoffs between using OpenMP and MPI. We demonstrate that the communication overhead incurs significantly even in OpenMP loop execution and increases with the number of cores participating. We also demonstrate a communication model to approximate the overhead from communication in OpenMP loops. Our results are astonishing and interesting to a large variety of input data files. We have developed our own load balancing and cache optimization technique for message passing model. Our experimental results show that our own developed techniques give optimum performance of our parallel algorithm for various sizes of input parameter, such as sequence size and tile size, on a wide variety of multicore architectures.

  15. Xyce Parallel Electronic Simulator Users Guide Version 6.2.

    Keiter, Eric R.; Mei, Ting; Russo, Thomas V.; Schiek, Richard Louis; Sholander, Peter E.; Thornquist, Heidi K.; Verley, Jason C.; Baur, David Gregory

    2014-09-01

    This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been de- signed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: Capability to solve extremely large circuit problems by supporting large-scale parallel com- puting platforms (up to thousands of processors). This includes support for most popular parallel and serial computers. A differential-algebraic-equation (DAE) formulation, which better isolates the device model package from solver algorithms. This allows one to develop new types of analysis without requiring the implementation of analysis-specific device models. Device models that are specifically tailored to meet Sandia's needs, including some radiation- aware devices (for Sandia users only). Object-oriented code design and implementation using modern coding practices. Xyce is a parallel code in the most general sense of the phrase -- a message passing parallel implementation -- which allows it to run efficiently a wide range of computing platforms. These include serial, shared-memory and distributed-memory parallel platforms. Attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows. Trademarks The information herein is subject to change without notice. Copyright c 2002-2014 Sandia Corporation. All rights reserved. Xyce TM Electronic Simulator and Xyce TM are trademarks of Sandia Corporation. Portions of the Xyce TM code are: Copyright c 2002, The Regents of the University of California. Produced at the Lawrence Livermore National Laboratory. Written by Alan Hindmarsh, Allan Taylor, Radu Serban. UCRL-CODE-2002-59 All rights reserved. Orcad, Orcad Capture, PSpice and Probe are

  16. Xyce Parallel Electronic Simulator Users Guide Version 6.4

    Keiter, Eric R. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Mei, Ting [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Russo, Thomas V. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Schiek, Richard [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Sholander, Peter E. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Thornquist, Heidi K. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Verley, Jason [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Baur, David Gregory [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)

    2015-12-01

    This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been de- signed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: Capability to solve extremely large circuit problems by supporting large-scale parallel com- puting platforms (up to thousands of processors). This includes support for most popular parallel and serial computers. A differential-algebraic-equation (DAE) formulation, which better isolates the device model package from solver algorithms. This allows one to develop new types of analysis without requiring the implementation of analysis-specific device models. Device models that are specifically tailored to meet Sandia's needs, including some radiation- aware devices (for Sandia users only). Object-oriented code design and implementation using modern coding practices. Xyce is a parallel code in the most general sense of the phrase -- a message passing parallel implementation -- which allows it to run efficiently a wide range of computing platforms. These include serial, shared-memory and distributed-memory parallel platforms. Attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows. Trademarks The information herein is subject to change without notice. Copyright c 2002-2015 Sandia Corporation. All rights reserved. Xyce TM Electronic Simulator and Xyce TM are trademarks of Sandia Corporation. Portions of the Xyce TM code are: Copyright c 2002, The Regents of the University of California. Produced at the Lawrence Livermore National Laboratory. Written by Alan Hindmarsh, Allan Taylor, Radu Serban. UCRL-CODE-2002-59 All rights reserved. Orcad, Orcad Capture, PSpice and Probe are

  17. Functional effectiveness of threat appeals in exercise promotion messages

    Olivier Mairesse

    2010-01-01

    Full Text Available As more than 70% of individuals in Western societies can be categorized as sedentary and inactivity has been recognized to lead to a series of serious physical and psychological disorders, the importance of physical activity promotion is ever more emphasized. Many social marketing campaigns use threat (or fear appeals to promote healthy behaviors. Theoretical models, such as the Extended Parallel Process Model integrate concepts as 'perceived threat' and 'perceived efficacy' to explain how such messages operate and can cause diverse behavioral reactions. It is however still not entirely clear how these different aspects are valuated and combined to determine desired versus undesired response behaviors in individuals. In a functional integration task, threat-appeal based exercise promotion messages varying in psychological threat and efficacy content were shown to sedentary employees in order to assess how they affect their intention to engage in physical exercise. Our results show that individuals can be categorized in 4 different clusters depending on the way they valuate threat and efficacy appeals: i.e. individuals sensitive to both types of cues, those sensitive to either the threat or the efficacy component in the message and those insensitive to either one of them. As different segments of receivers of the message react differently to threat and efficacy combinations, it is concluded that different approaches to designing effective mass media campaigns may be required for effective exercise promotion.

  18. ParaHaplo 3.0: A program package for imputation and a haplotype-based whole-genome association study using hybrid parallel computing

    Kamatani Naoyuki

    2011-05-01

    Full Text Available Abstract Background Use of missing genotype imputations and haplotype reconstructions are valuable in genome-wide association studies (GWASs. By modeling the patterns of linkage disequilibrium in a reference panel, genotypes not directly measured in the study samples can be imputed and used for GWASs. Since millions of single nucleotide polymorphisms need to be imputed in a GWAS, faster methods for genotype imputation and haplotype reconstruction are required. Results We developed a program package for parallel computation of genotype imputation and haplotype reconstruction. Our program package, ParaHaplo 3.0, is intended for use in workstation clusters using the Intel Message Passing Interface. We compared the performance of ParaHaplo 3.0 on the Japanese in Tokyo, Japan and Han Chinese in Beijing, and Chinese in the HapMap dataset. A parallel version of ParaHaplo 3.0 can conduct genotype imputation 20 times faster than a non-parallel version of ParaHaplo. Conclusions ParaHaplo 3.0 is an invaluable tool for conducting haplotype-based GWASs. The need for faster genotype imputation and haplotype reconstruction using parallel computing will become increasingly important as the data sizes of such projects continue to increase. ParaHaplo executable binaries and program sources are available at http://en.sourceforge.jp/projects/parallelgwas/releases/.

  19. Xyce Parallel Electronic Simulator Users' Guide Version 6.7.

    Keiter, Eric R. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Aadithya, Karthik Venkatraman [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Mei, Ting [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Russo, Thomas V. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Schiek, Richard [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Sholander, Peter E. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Thornquist, Heidi K. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Verley, Jason [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)

    2017-05-01

    This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: Capability to solve extremely large circuit problems by supporting large-scale parallel com- puting platforms (up to thousands of processors). This includes support for most popular parallel and serial computers. A differential-algebraic-equation (DAE) formulation, which better isolates the device model package from solver algorithms. This allows one to develop new types of analysis without requiring the implementation of analysis-specific device models. Device models that are specifically tailored to meet Sandia's needs, including some radiation- aware devices (for Sandia users only). Object-oriented code design and implementation using modern coding practices. Xyce is a parallel code in the most general sense of the phrase -- a message passing parallel implementation -- which allows it to run efficiently a wide range of computing platforms. These include serial, shared-memory and distributed-memory parallel platforms. Attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows. The information herein is subject to change without notice. Copyright c 2002-2017 Sandia Corporation. All rights reserved. Trademarks Xyce TM Electronic Simulator and Xyce TM are trademarks of Sandia Corporation. Orcad, Orcad Capture, PSpice and Probe are registered trademarks of Cadence Design Systems, Inc. Microsoft, Windows and Windows 7 are registered trademarks of Microsoft Corporation. Medici, DaVinci and Taurus are registered trademarks of Synopsys Corporation. Amtec and TecPlot are trademarks of

  20. Radiating Messages: An International Perspective.

    Walker, Janet

    2003-01-01

    Negative messages about the detrimental impacts of divorce on children prompted urgent calls in the United Kingdom for a reinstatement of traditional family values. Suggests that although the effects of divorce are real, care should be taken to avoid exaggeration, thus moving the debate to one centered on providing better support, advice, and…

  1. Re: Design Changing the Message

    Wall, Miranda Wakeman

    2008-01-01

    The advertisements that flood everyone's visual culture are designed to create desire. From the author's experience, most high school students are not aware of the messages that they are bombarded with every day, and if they are, few care or think about them critically. The author's goals for this lesson were to increase students' awareness of the…

  2. Instant Apache Camel message routing

    Ibryam, Bilgin

    2013-01-01

    Filled with practical, step-by-step instructions and clear explanations for the most important and useful tasks. This short, instruction-based guide shows you how to perform application integration using the industry standard Enterprise Integration Patterns.This book is intended for Java developers who are new to Apache Camel and message- oriented applications.

  3. The Media and the Message.

    Cook, Glenn

    2001-01-01

    The experiences of Columbine and El Cajon high schools with media onslaughts following traumatic shooting incidents underscore the importance of getting the message across and sticking to known facts. In a crisis, speculation can hurt everyone. The most important elements in crisis communications are planning and media relations. (MLH)

  4. Spatial variation in messaging effects

    Warshaw, Christopher

    2018-05-01

    There is large geographic variation in the public's views about climate change in the United States. Research now shows that climate messages can influence public beliefs about the scientific consensus on climate change, particularly in the places that are initially more skeptical.

  5. Implementing the PM Programming Language using MPI and OpenMP - a New Tool for Programming Geophysical Models on Parallel Systems

    Bellerby, Tim

    2015-04-01

    PM (Parallel Models) is a new parallel programming language specifically designed for writing environmental and geophysical models. The language is intended to enable implementers to concentrate on the science behind the model rather than the details of running on parallel hardware. At the same time PM leaves the programmer in control - all parallelisation is explicit and the parallel structure of any given program may be deduced directly from the code. This paper describes a PM implementation based on the Message Passing Interface (MPI) and Open Multi-Processing (OpenMP) standards, looking at issues involved with translating the PM parallelisation model to MPI/OpenMP protocols and considering performance in terms of the competing factors of finer-grained parallelisation and increased communication overhead. In order to maximise portability, the implementation stays within the MPI 1.3 standard as much as possible, with MPI-2 MPI-IO file handling the only significant exception. Moreover, it does not assume a thread-safe implementation of MPI. PM adopts a two-tier abstract representation of parallel hardware. A PM processor is a conceptual unit capable of efficiently executing a set of language tasks, with a complete parallel system consisting of an abstract N-dimensional array of such processors. PM processors may map to single cores executing tasks using cooperative multi-tasking, to multiple cores or even to separate processing nodes, efficiently sharing tasks using algorithms such as work stealing. While tasks may move between hardware elements within a PM processor, they may not move between processors without specific programmer intervention. Tasks are assigned to processors using a nested parallelism approach, building on ideas from Reyes et al. (2009). The main program owns all available processors. When the program enters a parallel statement then either processors are divided out among the newly generated tasks (number of new tasks number of processors

  6. Reactions to threatening health messages

    ten Hoor Gill A

    2012-11-01

    Full Text Available Abstract Background Threatening health messages that focus on severity are popular, but frequently have no effect or even a counterproductive effect on behavior change. This paradox (i.e. wide application despite low effectiveness may be partly explained by the intuitive appeal of threatening communication: it may be hard to predict the defensive reactions occurring in response to fear appeals. We examine this hypothesis by using two studies by Brown and colleagues, which provide evidence that threatening health messages in the form of distressing imagery in anti-smoking and anti-alcohol campaigns cause defensive reactions. Methods We simulated both Brown et al. experiments, asking participants to estimate the reactions of the original study subjects to the threatening health information (n = 93. Afterwards, we presented the actual original study outcomes. One week later, we assessed whether this knowledge of the actual study outcomes helped participants to more successfully estimate the effectiveness of the threatening health information (n = 72. Results Results showed that participants were initially convinced of the effectiveness of threatening health messages and were unable to anticipate the defensive reactions that in fact occurred. Furthermore, these estimates did not improve after participants had been explained the dynamics of threatening communication as well as what the effects of the threatening communication had been in reality. Conclusions These findings are consistent with the hypothesis that the effectiveness of threatening health messages is intuitively appealing. What is more, providing empirical evidence against the use of threatening health messages has very little effect on this intuitive appeal.

  7. Representing culture in interstellar messages

    Vakoch, Douglas A.

    2008-09-01

    As scholars involved with the Search for Extraterrestrial Intelligence (SETI) have contemplated how we might portray humankind in any messages sent to civilizations beyond Earth, one of the challenges they face is adequately representing the diversity of human cultures. For example, in a 2003 workshop in Paris sponsored by the SETI Institute, the International Academy of Astronautics (IAA) SETI Permanent Study Group, the International Society for the Arts, Sciences and Technology (ISAST), and the John Templeton Foundation, a varied group of artists, scientists, and scholars from the humanities considered how to encode notions of altruism in interstellar messages . Though the group represented 10 countries, most were from Europe and North America, leading to the group's recommendation that subsequent discussions on the topic should include more globally representative perspectives. As a result, the IAA Study Group on Interstellar Message Construction and the SETI Institute sponsored a follow-up workshop in Santa Fe, New Mexico, USA in February 2005. The Santa Fe workshop brought together scholars from a range of disciplines including anthropology, archaeology, chemistry, communication science, philosophy, and psychology. Participants included scholars familiar with interstellar message design as well as specialists in cross-cultural research who had participated in the Symposium on Altruism in Cross-cultural Perspective, held just prior to the workshop during the annual conference of the Society for Cross-cultural Research . The workshop included discussion of how cultural understandings of altruism can complement and critique the more biologically based models of altruism proposed for interstellar messages at the 2003 Paris workshop. This paper, written by the chair of both the Paris and Santa Fe workshops, will explore the challenges of communicating concepts of altruism that draw on both biological and cultural models.

  8. "The Medium and the Message."

    Cowan, Andrew

    Radio communications have been as necessary to the development of Canadian territories north of the 60th parallel as roads, schools, medical services, and airstrips. The Canadian Broadcasting Corporation did not pioneer broadcasting in northern Canada, but its Northern Service has been the only broadcasting company north of the 60th parallel for…

  9. Practical parallel computing

    Morse, H Stephen

    1994-01-01

    Practical Parallel Computing provides information pertinent to the fundamental aspects of high-performance parallel processing. This book discusses the development of parallel applications on a variety of equipment.Organized into three parts encompassing 12 chapters, this book begins with an overview of the technology trends that converge to favor massively parallel hardware over traditional mainframes and vector machines. This text then gives a tutorial introduction to parallel hardware architectures. Other chapters provide worked-out examples of programs using several parallel languages. Thi

  10. Parallel sorting algorithms

    Akl, Selim G

    1985-01-01

    Parallel Sorting Algorithms explains how to use parallel algorithms to sort a sequence of items on a variety of parallel computers. The book reviews the sorting problem, the parallel models of computation, parallel algorithms, and the lower bounds on the parallel sorting problems. The text also presents twenty different algorithms, such as linear arrays, mesh-connected computers, cube-connected computers. Another example where algorithm can be applied is on the shared-memory SIMD (single instruction stream multiple data stream) computers in which the whole sequence to be sorted can fit in the

  11. Toward Predicting Popularity of Social Marketing Messages

    Yu, Bei; Chen, Miao; Kwok, Linchi

    Popularity of social marketing messages indicates the effectiveness of the corresponding marketing strategies. This research aims to discover the characteristics of social marketing messages that contribute to different level of popularity. Using messages posted by a sample of restaurants on Facebook as a case study, we measured the message popularity by the number of "likes" voted by fans, and examined the relationship between the message popularity and two properties of the messages: (1) content, and (2) media type. Combining a number of text mining and statistics methods, we have discovered some interesting patterns correlated to "more popular" and "less popular" social marketing messages. This work lays foundation for building computational models to predict the popularity of social marketing messages in the future.

  12. Improving Type Error Messages in OCaml

    Charguéraud , Arthur

    2015-01-01

    International audience; Cryptic type error messages are a major obstacle to learning OCaml or other ML-based languages. In many cases, error messages cannot be interpreted without a sufficiently-precise model of the type inference algorithm. The problem of improving type error messages in ML has received quite a bit of attention over the past two decades, and many different strategies have been considered. The challenge is not only to produce error messages that are both sufficiently concise ...

  13. Parallel computing works!

    Fox, Geoffrey C; Messina, Guiseppe C

    2014-01-01

    A clear illustration of how parallel computers can be successfully appliedto large-scale scientific computations. This book demonstrates how avariety of applications in physics, biology, mathematics and other scienceswere implemented on real parallel computers to produce new scientificresults. It investigates issues of fine-grained parallelism relevant forfuture supercomputers with particular emphasis on hypercube architecture. The authors describe how they used an experimental approach to configuredifferent massively parallel machines, design and implement basic systemsoftware, and develop

  14. 78 FR 52166 - Quantitative Messaging Research

    2013-08-22

    ... COMMODITY FUTURES TRADING COMMISSION Quantitative Messaging Research AGENCY: Commodity Futures... survey will follow qualitative message testing research (for which CFTC received fast-track OMB approval... message testing research (for which CFTC received fast-track OMB approval) and is necessary to identify...

  15. Effects of Text Messaging on Academic Performance

    Barks Amanda; Searight H. Russell; Ratwik Susan

    2011-01-01

    University students frequently send and receive cellular phone text messages during classroominstruction. Cognitive psychology research indicates that multi-tasking is frequently associatedwith performance cost. However, university students often have considerable experience withelectronic multi-tasking and may believe that they can devote necessary attention to a classroomlecture while sending and receiving text messages. In the current study, university students whoused text messaging were ...

  16. Message exchange in the building industry

    Vries, de B.; Somers, L.J.A.M.

    1995-01-01

    A process model is described for exchanging information in the building industry. In this model participants send and receive messages. On receipt of a message an activity is executed if all required information is available. Otherwise a message will be sent to another participant to obtain the

  17. How to pass higher English colour

    Bridges, Ann

    2009-01-01

    How to Pass is the Number 1 revision series for Scottish qualifications across the three examination levels of Standard Grade, Intermediate and Higher! Second editions of the books present all of the material in full colour for the first time.

  18. Framing of health information messages.

    Akl, Elie A; Oxman, Andrew D; Herrin, Jeph; Vist, Gunn E; Terrenato, Irene; Sperati, Francesca; Costiniuk, Cecilia; Blank, Diana; Schünemann, Holger

    2011-12-07

    -planned subgroup analyses based on the type of message (screening, prevention, and treatment). The primary outcome was behaviour. We did not assess any adverse outcomes. We included 35 studies involving 16,342 participants (all health consumers) and reporting 51 comparisons.In the context of attribute framing, participants in one included study understood the message better when it was framed negatively than when it was framed positively (1 study; SMD -0.58 (95% confidence interval (CI) -0.94 to -0.22); moderate effect size; low quality evidence). Although positively-framed messages may have led to more positive perception of effectiveness than negatively-framed messages (2 studies; SMD 0.36 (95% CI -0.13 to 0.85); small effect size; low quality evidence), there was little or no difference in persuasiveness (11 studies; SMD 0.07 (95% CI -0.23 to 0.37); low quality evidence) and behavior (1 study; SMD 0.09 (95% CI -0.14 to 0.31); moderate quality evidence).In the context of goal framing, loss messages led to a more positive perception of effectiveness compared to gain messages for screening messages (5 studies; SMD -0.30 (95% CI -0.49 to -0.10); small effect size; moderate quality evidence) and may have been more persuasive for treatment messages (3 studies; SMD -0.50 (95% CI -1.04 to 0.04); moderate effect size; very low quality evidence). There was little or no difference in behavior (16 studies; SMD -0.06 (95% CI -0.15 to 0.03); low quality evidence). No study assessed the effect on understanding. Contrary to commonly held beliefs, the available low to moderate quality evidence suggests that both attribute and goal framing may have little if any consistent effect on health consumers' behaviour. The unexplained heterogeneity between studies suggests the possibility of a framing effect under specific conditions. Future research needs to investigate these conditions.

  19. Evaluation of technologies of parallel computers. Communication networks for a real-time triggering application for a high-energy physics experiment at CERN

    Hoertnagl, Ch.

    1997-12-01

    Experiments at the future Large Hadron Collider (LHC) at CERN will be faced with an extraordinary challenge of event selection in real time. The primary event rate, equal to the bunch crossing frequency of 40 MHz, will have to be reduced by a factor of almost one-in-a-million in order to reveal traces of rare physics processes from an abundant background. This work presents various contributions to ongoing feasibility studies concerning the possible use of commercial technologies from the proximities of parallel computers and their communication networks for the second trigger stage, which faces an average data input rate of 100 kHz. Studies in this thesis apply a combination of methodologies, namely the build-up of lab-scale prototype implementations (including their exposition to test beam runs), algorithm development, technology tracking and benchmarking, as well as discrete event simulation. The main contribution consists of several technology case studies, which are based on the exploration of a set of standard benchmark programs for revealing simple parameters for characterizing delays during communication. Studied technologies include the communication sub-system of the Meiko CS-2, Asynchronous Transfer Mode (ATM), MEMORY CHANNEL, and Scalable Coherent Interface (SCI); all could be considered typical for candidate technologies. The discussion sheds light on the relative benefits and costs associated with different parallel programming models, in general, and with the use of message-passing libraries, such as Message Passing Interface (MPI), in particular. Best observed end-user-to-end-user latencies were ∼ 10 μs, best asymptotic bandwidths were ∼ 70 MByte/s. Typical sub-patterns of communication that have to be applied in the second trigger stage were sustained at ∼ 13 kHz, using today's technologies in realistic embeddings. (author)

  20. Reconfigurable multi-DSP parallel computing architecture based on DSM%基于DSM的可重构多DSP并行处理架构

    程鑫; 吴华春

    2012-01-01

    提出一种基于DSM的可在线重构多DSP并行处理架构,采用基于自定义内部总线的信息传递服务,在分布式物理内存上实现了统一编址的共享内存模型,减小了DSP之间的数据传递开销;设计基于VME总线的在线重构来实现针对消息传递服务的重定义,增强了并行计算架构的通用性.实验表明,采用此DSM能减小了并行DSP对共享数据同步访问开销,满足多轴精密同步运动控制系统需求.%A design of reconfigurable multi-digital signal processor (DSP) parallel computing architecture based on distributed shared memory (DSM) was proposed. A message-passing communication based on the user-defined internal bus (IB) was designed to implement a shared memory model on physically distributed memory, which decreased the data transmission overhead. Online reconfiguration mechanism was designed to implement message-passing communication reconfiguration, which in-creasd the universality of parallel architecture. The experiment shows that adopting the DSM introduced can reduce simultaneous access overhead to shared data, which satisfies the requirements of ultra-precise multi-axis motion control system.

  1. Reactions to threatening health messages

    ten Hoor, Gill A; Peters, Gjalt-Jorn Y; Kalagi, Janice; de Groot, Lianne; Grootjans, Karlijne; Huschens, Alexander; K?hninger, Constanze; K?lgen, Lizan; Pelssers, Isabelle; Sch?tt, Toby; Thomas, Sophia; Ruiter, Robert AC; Kok, Gerjo

    2012-01-01

    Abstract Background Threatening health messages that focus on severity are popular, but frequently have no effect or even a counterproductive effect on behavior change. This paradox (i.e. wide application despite low effectiveness) may be partly explained by the intuitive appeal of threatening communication: it may be hard to predict the defensive reactions occurring in response to fear appeals. We examine this hypothesis by using two studies by Brown and colleagues, which provide evidence th...

  2. A message to school girls.

    Akinwande, A

    1993-06-01

    Information, education, and communication (IEC) programs need to be strengthened to appeal to adolescents, who are increasingly contributing to unwanted pregnancy and are using abortion as a means of birth control. Successful IEC programs have the following characteristics: 1) established communication theories that guide development of materials; 2) a multimedia and a mass media approach to information dissemination, and 3) emphasis on visual displays. The primary emphasis should be on presentation of a concise, clear message with the appropriate visual medium. Many communication specialists in developing countries, however, lack the training to design and use effective IEC software. Designing effective messages involves a process of integrating scientific ideas with artistic appeal. The aim is to stimulate the target audience to change its behavior of life style. The message must be convincing and contain practical and useful information. The IEC Software Design Cycle focuses on analysis and diagnosis, design production, pretesting and modification, and distribution and evaluation. Each of these processes are described. Necessary before any attempt is made is obtaining data on historical, sociocultural, and demographic characteristics, economic activities, health and social services, communication infrastructure, marriage and family life patterns, and decision making systems. Focus group discussions may be used to collect information about the target group. An example is given of the process of development, in a course through the Center or African Family Studies, of a poster about premarital sex directed to 11-16 year olds. On the basis of focus group discussions, it was decided that the message would be to encourage girls to talk with their mothers about family life and premarital sex. The poster was produced with 2 school girls talking in front of the school. The evaluation yielded modifications such as including a school building that resembled actual

  3. Instant Messaging in Dental Education.

    Khatoon, Binish; Hill, Kirsty B; Walmsley, A Damien

    2015-12-01

    Instant messaging (IM) is when users communicate instantly via their mobile devices, and it has become one of the most preferred choices of tools to communicate amongst health professions students. The aim of this study was to understand how dental students communicate via IM, faculty members' perspectives on using IM to communicate with students, and whether such tools are useful in the learning environment. After free-associating themes on online communication, two draft topic guides for structured interviews were designed that focussed on mobile device-related communication activities. A total of 20 students and six faculty members at the University of Birmingham School of Dentistry agreed to take part in the interviews. Students were selected from years 1-5 representing each year group. The most preferred communication tools were emails, social networking, and IM. Emails were used for more formal messages, and IM and social networking sites were used for shorter messages. WhatsApp was the most used IM app because of its popular features such as being able to check if recipients have read and received messages and group work. The students reported that changes were necessary to improve their communication with faculty members. The faculty members reported having mixed feelings toward the use of IM to communicate with students. The students wished to make such tools a permanent part of their learning environment, but only with the approval of faculty members. The faculty members were willing to accept IM as a communication tool only if it is monitored and maintained by the university and has a positive effect on learning.

  4. A Modular Instant Messaging System

    Mohamad Raad; Zouhair Bazzal; Majd Ghareeb; Hanan Farhat; Semar Bahmad

    2017-01-01

    Instant Messaging (IM) Android applications are a trend nowadays. These applications are categorized according to their features: usability, flexibility, privacy and security. However, IM applications tend to be inflexible in terms of functionality offered. The “Dble-U” system was developed as a solution to this inflexibility, with a focus on privacy as an example use case. “Dble-U” is a configurable modular system consisting of an Android chatting application, a privacy controller applicatio...

  5. On a model of three-dimensional bursting and its parallel implementation

    Tabik, S.; Romero, L. F.; Garzón, E. M.; Ramos, J. I.

    2008-04-01

    A mathematical model for the simulation of three-dimensional bursting phenomena and its parallel implementation are presented. The model consists of four nonlinearly coupled partial differential equations that include fast and slow variables, and exhibits bursting in the absence of diffusion. The differential equations have been discretized by means of a second-order accurate in both space and time, linearly-implicit finite difference method in equally-spaced grids. The resulting system of linear algebraic equations at each time level has been solved by means of the Preconditioned Conjugate Gradient (PCG) method. Three different parallel implementations of the proposed mathematical model have been developed; two of these implementations, i.e., the MPI and the PETSc codes, are based on a message passing paradigm, while the third one, i.e., the OpenMP code, is based on a shared space address paradigm. These three implementations are evaluated on two current high performance parallel architectures, i.e., a dual-processor cluster and a Shared Distributed Memory (SDM) system. A novel representation of the results that emphasizes the most relevant factors that affect the performance of the paralled implementations, is proposed. The comparative analysis of the computational results shows that the MPI and the OpenMP implementations are about twice more efficient than the PETSc code on the SDM system. It is also shown that, for the conditions reported here, the nonlinear dynamics of the three-dimensional bursting phenomena exhibits three stages characterized by asynchronous, synchronous and then asynchronous oscillations, before a quiescent state is reached. It is also shown that the fast system reaches steady state in much less time than the slow variables.

  6. A lightweight, flow-based toolkit for parallel and distributed bioinformatics pipelines

    Cieślik Marcin

    2011-02-01

    Full Text Available Abstract Background Bioinformatic analyses typically proceed as chains of data-processing tasks. A pipeline, or 'workflow', is a well-defined protocol, with a specific structure defined by the topology of data-flow interdependencies, and a particular functionality arising from the data transformations applied at each step. In computer science, the dataflow programming (DFP paradigm defines software systems constructed in this manner, as networks of message-passing components. Thus, bioinformatic workflows can be naturally mapped onto DFP concepts. Results To enable the flexible creation and execution of bioinformatics dataflows, we have written a modular framework for parallel pipelines in Python ('PaPy'. A PaPy workflow is created from re-usable components connected by data-pipes into a directed acyclic graph, which together define nested higher-order map functions. The successive functional transformations of input data are evaluated on flexibly pooled compute resources, either local or remote. Input items are processed in batches of adjustable size, all flowing one to tune the trade-off between parallelism and lazy-evaluation (memory consumption. An add-on module ('NuBio' facilitates the creation of bioinformatics workflows by providing domain specific data-containers (e.g., for biomolecular sequences, alignments, structures and functionality (e.g., to parse/write standard file formats. Conclusions PaPy offers a modular framework for the creation and deployment of parallel and distributed data-processing workflows. Pipelines derive their functionality from user-written, data-coupled components, so PaPy also can be viewed as a lightweight toolkit for extensible, flow-based bioinformatics data-processing. The simplicity and flexibility of distributed PaPy pipelines may help users bridge the gap between traditional desktop/workstation and grid computing. PaPy is freely distributed as open-source Python code at http://muralab.org/PaPy, and

  7. Three-pass protocol scheme for bitmap image security by using vernam cipher algorithm

    Rachmawati, D.; Budiman, M. A.; Aulya, L.

    2018-02-01

    Confidentiality, integrity, and efficiency are the crucial aspects of data security. Among the other digital data, image data is too prone to abuse of operation like duplication, modification, etc. There are some data security techniques, one of them is cryptography. The security of Vernam Cipher cryptography algorithm is very dependent on the key exchange process. If the key is leaked, security of this algorithm will collapse. Therefore, a method that minimizes key leakage during the exchange of messages is required. The method which is used, is known as Three-Pass Protocol. This protocol enables message delivery process without the key exchange. Therefore, the sending messages process can reach the receiver safely without fear of key leakage. The system is built by using Java programming language. The materials which are used for system testing are image in size 200×200 pixel, 300×300 pixel, 500×500 pixel, 800×800 pixel and 1000×1000 pixel. The result of experiments showed that Vernam Cipher algorithm in Three-Pass Protocol scheme could restore the original image.

  8. Low-sensitivity active filter realization using a complex all-pass filter

    Regalia, Phillip A.; Mitra, Sanjit K.

    1987-04-01

    A wide class of continuous-time transfer functions may be implemented as the parallel combination of two all-pass filters, including Butterworth, Chebyshev, and elliptic low-pass approximations of odd order. Here, the realization of even-order low-pass classical approximations is considered, and it is shown that they may be decomposed in terms of complex all-pass functions. A systematic realization approach, based on scattering domain simulation (i.e., wave active filters), allows for a low-sensitivity active filter implementation. Further insight into the low-sensitivity property is gained by connecting the insertion loss of doubly terminated antimetric networks with the imaginary return loss of complex lossless networks.

  9. Nonblocking and orphan free message logging protocols

    Alvisi, Lorenzo; Hoppe, Bruce; Marzullo, Keith

    1992-12-01

    Currently existing message logging protocols demonstrate a classic pessimistic vs. optimistic tradeoff. We show that the optimistic-pessimistic tradeoff is not inherent to the problem of message logging. We construct a message-logging protocol that has the positive features of both optimistic and pessimistic protocol: our protocol prevents orphans and allows simple failure recovery; however, it requires no blocking in failure-free runs. Furthermore, this protocol does not introduce any additional message overhead as compared to one implemented for a system in which messages may be lost but processes do not crash.

  10. Entity-based Classification of Twitter Messages

    Yerva, Surender Reddy; Miklós, Zoltán; Aberer, Karl

    2012-01-01

    Twitter is a popular micro-blogging service on theWeb, where people can enter short messages, which then become visible to some other users of the service. While the topics of these messages varies, there are a lot of messages where the users express their opinions about some companies or their products. These messages are a rich source of information for companies for sentiment analysis or opinion mining. There is however a great obstacle for analyzing the messages directly: as the company n...

  11. Parallel Atomistic Simulations

    HEFFELFINGER,GRANT S.

    2000-01-18

    Algorithms developed to enable the use of atomistic molecular simulation methods with parallel computers are reviewed. Methods appropriate for bonded as well as non-bonded (and charged) interactions are included. While strategies for obtaining parallel molecular simulations have been developed for the full variety of atomistic simulation methods, molecular dynamics and Monte Carlo have received the most attention. Three main types of parallel molecular dynamics simulations have been developed, the replicated data decomposition, the spatial decomposition, and the force decomposition. For Monte Carlo simulations, parallel algorithms have been developed which can be divided into two categories, those which require a modified Markov chain and those which do not. Parallel algorithms developed for other simulation methods such as Gibbs ensemble Monte Carlo, grand canonical molecular dynamics, and Monte Carlo methods for protein structure determination are also reviewed and issues such as how to measure parallel efficiency, especially in the case of parallel Monte Carlo algorithms with modified Markov chains are discussed.

  12. Asynchronous Message Service Reference Implementation

    Burleigh, Scott C.

    2011-01-01

    This software provides a library of middleware functions with a simple application programming interface, enabling implementation of distributed applications in conformance with the CCSDS AMS (Consultative Committee for Space Data Systems Asynchronous Message Service) specification. The AMS service, and its protocols, implement an architectural concept under which the modules of mission systems may be designed as if they were to operate in isolation, each one producing and consuming mission information without explicit awareness of which other modules are currently operating. Communication relationships among such modules are self-configuring; this tends to minimize complexity in the development and operations of modular data systems. A system built on this model is a society of generally autonomous, inter-operating modules that may fluctuate freely over time in response to changing mission objectives, modules functional upgrades, and recovery from individual module failure. The purpose of AMS, then, is to reduce mission cost and risk by providing standard, reusable infrastructure for the exchange of information among data system modules in a manner that is simple to use, highly automated, flexible, robust, scalable, and efficient. The implementation is designed to spawn multiple threads of AMS functionality under the control of an AMS application program. These threads enable all members of an AMS-based, distributed application to discover one another in real time, subscribe to messages on specific topics, and to publish messages on specific topics. The query/reply (client/server) communication model is also supported. Message exchange is optionally subject to encryption (to support confidentiality) and authorization. Fault tolerance measures in the discovery protocol minimize the likelihood of overall application failure due to any single operational error anywhere in the system. The multi-threaded design simplifies processing while enabling application nodes to

  13. Performance of a fine-grained parallel model for multi-group nodal-transport calculations in three-dimensional pin-by-pin reactor geometry

    Masahiro, Tatsumi; Akio, Yamamoto

    2003-01-01

    A production code SCOPE2 was developed based on the fine-grained parallel algorithm by the red/black iterative method targeting parallel computing environments such as a PC-cluster. It can perform a depletion calculation in a few hours using a PC-cluster with the model based on a 9-group nodal-SP3 transport method in 3-dimensional pin-by-pin geometry for in-core fuel management of commercial PWRs. The present algorithm guarantees the identical convergence process as that in serial execution, which is very important from the viewpoint of quality management. The fine-mesh geometry is constructed by hierarchical decomposition with introduction of intermediate management layer as a block that is a quarter piece of a fuel assembly in radial direction. A combination of a mesh division scheme forcing even meshes on each edge and a latency-hidden communication algorithm provided simplicity and efficiency to message passing to enhance parallel performance. Inter-processor communication and parallel I/O access were realized using the MPI functions. Parallel performance was measured for depletion calculations by the 9-group nodal-SP3 transport method in 3-dimensional pin-by-pin geometry with 340 x 340 x 26 meshes for full core geometry and 170 x 170 x 26 for quarter core geometry. A PC cluster that consists of 24 Pentium-4 processors connected by the Fast Ethernet was used for the performance measurement. Calculations in full core geometry gave better speedups compared to those in quarter core geometry because of larger granularity. Fine-mesh sweep and feedback calculation parts gave almost perfect scalability since granularity is large enough, while 1-group coarse-mesh diffusion acceleration gave only around 80%. The speedup and parallel efficiency for total computation time were 22.6 and 94%, respectively, for the calculation in full core geometry with 24 processors. (authors)

  14. Guidelines for designing messages in risk communication

    Takashita, Hirofumi; Horikoshi, Hidehiko

    2004-07-01

    Risk Communication Study Team (hereafter called RC team) has designed messages for risk communication based on the analysis of the local residents' opinions which were expressed in several questionnaire surveys. The messages are described in a side format (Power Point format) every single content. This report provides basic guidelines for making messages that are used for risk communication, and does not include concrete messages which RC team designed. The RC team has already published the report entitled 'Information materials for risk communication' (JNC TN8450 2003-008) separately, and it gives the concrete messages. This report shows general cautions and checklists in designing messages, comments on the messages from outside risk communication experts, and opinions from local residents. (author)

  15. Improving the effectiveness of fundraising messages: The impact of charity goal attainment, message framing, and evidence on persuasion

    Das, E.; Kerkhof, P.; Kuiper, J.

    2008-01-01

    This experimental study assessed the effectiveness of fundraising messages. Based on recent findings regarding the effects of message framing and evidence, effective fundraising messages should combine abstract, statistical information with a negative message frame and anecdotal evidence with a

  16. Comparison of 250 MHz R10K Origin 2000 and 400 MHz Origin 2000 Using NAS Parallel Benchmarks

    Turney, Raymond D.; Thigpen, William W. (Technical Monitor)

    2001-01-01

    This report describes results of benchmark tests on Steger, a 250 MHz Origin 2000 system with R10K processors, currently installed at the NASA Ames National Advanced Supercomputing (NAS) facility. For comparison purposes, the tests were also run on Lomax, a 400 MHz Origin 2000 with R12K processors. The BT, LU, and SP application benchmarks in the NAS Parallel Benchmark Suite and the kernel benchmark FT were chosen to measure system performance. Having been written to measure performance on Computational Fluid Dynamics applications, these benchmarks are assumed appropriate to represent the NAS workload. Since the NAS runs both message passing (MPI) and shared-memory, compiler directive type codes, both MPI and OpenMP versions of the benchmarks were used. The MPI versions used were the latest official release of the NAS Parallel Benchmarks, version 2.3. The OpenMP versions used were PBN3b2, a beta version that is in the process of being released. NPB 2.3 and PBN3b2 are technically different benchmarks, and NPB results are not directly comparable to PBN results.

  17. A Parallel Non-Overlapping Domain-Decomposition Algorithm for Compressible Fluid Flow Problems on Triangulated Domains

    Barth, Timothy J.; Chan, Tony F.; Tang, Wei-Pai

    1998-01-01

    This paper considers an algebraic preconditioning algorithm for hyperbolic-elliptic fluid flow problems. The algorithm is based on a parallel non-overlapping Schur complement domain-decomposition technique for triangulated domains. In the Schur complement technique, the triangulation is first partitioned into a number of non-overlapping subdomains and interfaces. This suggests a reordering of triangulation vertices which separates subdomain and interface solution unknowns. The reordering induces a natural 2 x 2 block partitioning of the discretization matrix. Exact LU factorization of this block system yields a Schur complement matrix which couples subdomains and the interface together. The remaining sections of this paper present a family of approximate techniques for both constructing and applying the Schur complement as a domain-decomposition preconditioner. The approximate Schur complement serves as an algebraic coarse space operator, thus avoiding the known difficulties associated with the direct formation of a coarse space discretization. In developing Schur complement approximations, particular attention has been given to improving sequential and parallel efficiency of implementations without significantly degrading the quality of the preconditioner. A computer code based on these developments has been tested on the IBM SP2 using MPI message passing protocol. A number of 2-D calculations are presented for both scalar advection-diffusion equations as well as the Euler equations governing compressible fluid flow to demonstrate performance of the preconditioning algorithm.

  18. INVESTIGATION OF SINGLE-PASS/DOUBLE-PASS TECHNIQUES ON FRICTION STIR WELDING OF ALUMINIUM

    N.A.A. Sathari

    2014-12-01

    Full Text Available The aim of this research is to study the effects of single-pass/ double-pass techniques on friction stir welding of aluminium. Two pieces of AA1100 with a thickness of 6.0 mm were friction stir welded using a CNC milling machine at rotational speeds of 1400 rpm, 1600 rpm and 1800 rpm respectively for single-pass and double-pass. Microstructure observations of the welded area were studied using an optical microscope. The specimens were tested by using a tensile test and Vickers hardness test to evaluate their mechanical properties. The results indicated that, at low rotational speed, defects such as ‘surface lack of fill’ and tunnels in the welded area contributed to a decrease in mechanical properties. Welded specimens using double-pass techniques show increasing values of tensile strength and hardness. From this investigation it is found that the best parameters of FSW welded aluminium AA1100 plate were those using double-pass techniques that produce mechanically sound joints with a hardness of 56.38 HV and 108 MPa strength at 1800 rpm compared to the single-pass technique. Friction stir welding, single-pass/ double-pass techniques, AA1100, microstructure, mechanical properties.

  19. A Novel Single Pass Authenticated Encryption Stream Cipher for Software Defined Radios

    Khajuria, Samant

    2012-01-01

    to propose cryptographic services such as confidentiality, integrity and authentication. Therefore, integration of security services into SDR devices is essential. Authenticated Encryption schemes donate the class of cryptographic algorithms that are designed for protecting both message confidentiality....... This makes authenticated encryption very attractive for low-cost low-power hardware implementations, as it allows for the substantial decrease in the circuit area and power consumed compared to the traditional schemes. In this thesis, an authenticated encryption scheme is proposed with the focus of achieving...... high throughput and low overhead for SDRs. The thesis is divided into two research topics. One topic is the design of a 1-pass authenticated encryption scheme that can accomplish both message secrecy and authenticity in a single cryptographic primitive. The other topic is the implementation...

  20. Technical Evaluation Report 6: Chat and Instant Messaging Systems

    Jennifer Stein

    2002-04-01

    Full Text Available Text-based conferencing can be both asynchronous (i.e., participants log into the conference at separate times, and synchronous (i.e., interaction takes place in real time. It is thus subject to the same wide variation as the online audio- and video-conferencing methods (see the earlier Reports in this series. Synchronous text-based approaches (e.g., online chat groups and instant messaging systems are highly popular among online users generally owing to their ability to bring together special-interest groups from around the world without cost. In distance education (DE, however, synchronous chat methods are less widely used, owing in part to the problems of arranging for working adults in different time zones to join a discussion group simultaneously. Instant text messaging is more popular among DE users in view of the choice it provides between responding to a message immediately (synchronous communication or after a delay (asynchronous. The different synchronous and asynchronous approaches are likely to become more widely used in parallel with one another, as they are integrated in individual product packages.

  1. Parallelization in Modern C++

    CERN. Geneva

    2016-01-01

    The traditionally used and well established parallel programming models OpenMP and MPI are both targeting lower level parallelism and are meant to be as language agnostic as possible. For a long time, those models were the only widely available portable options for developing parallel C++ applications beyond using plain threads. This has strongly limited the optimization capabilities of compilers, has inhibited extensibility and genericity, and has restricted the use of those models together with other, modern higher level abstractions introduced by the C++11 and C++14 standards. The recent revival of interest in the industry and wider community for the C++ language has also spurred a remarkable amount of standardization proposals and technical specifications being developed. Those efforts however have so far failed to build a vision on how to seamlessly integrate various types of parallelism, such as iterative parallel execution, task-based parallelism, asynchronous many-task execution flows, continuation s...

  2. Parallelism in matrix computations

    Gallopoulos, Efstratios; Sameh, Ahmed H

    2016-01-01

    This book is primarily intended as a research monograph that could also be used in graduate courses for the design of parallel algorithms in matrix computations. It assumes general but not extensive knowledge of numerical linear algebra, parallel architectures, and parallel programming paradigms. The book consists of four parts: (I) Basics; (II) Dense and Special Matrix Computations; (III) Sparse Matrix Computations; and (IV) Matrix functions and characteristics. Part I deals with parallel programming paradigms and fundamental kernels, including reordering schemes for sparse matrices. Part II is devoted to dense matrix computations such as parallel algorithms for solving linear systems, linear least squares, the symmetric algebraic eigenvalue problem, and the singular-value decomposition. It also deals with the development of parallel algorithms for special linear systems such as banded ,Vandermonde ,Toeplitz ,and block Toeplitz systems. Part III addresses sparse matrix computations: (a) the development of pa...

  3. Effects of Text Messaging on Academic Performance

    Barks Amanda

    2011-12-01

    Full Text Available University students frequently send and receive cellular phone text messages during classroominstruction. Cognitive psychology research indicates that multi-tasking is frequently associatedwith performance cost. However, university students often have considerable experience withelectronic multi-tasking and may believe that they can devote necessary attention to a classroomlecture while sending and receiving text messages. In the current study, university students whoused text messaging were randomly assigned to one of two conditions: 1. a group that sent andreceived text messages during a lecture or, 2. a group that did not engage in text messagingduring the lecture. Participants who engaged in text messaging demonstrated significantlypoorer performance on a test covering lecture content compared with the group that did notsend and receive text messages. Participants exhibiting higher levels of text messaging skill hadsignificantly lower test scores than participants who were less proficient at text messaging. It ishypothesized that in terms of retention of lecture material, more frequent task shifting by thosewith greater text messaging proficiency contributed to poorer performance. Overall, the findingsdo not support the view, held by many university students, that this form of multitasking has littleeffect on the acquisition of lecture content. Results provide empirical support for teachers andprofessors who ban text messaging in the classroom.

  4. Towards a HPC-oriented parallel implementation of a learning algorithm for bioinformatics applications.

    D'Angelo, Gianni; Rampone, Salvatore

    2014-01-01

    the communications between different memories (RAM, Cache, Mass, Virtual) and to achieve efficient I/O performance, we design a mass storage structure able to access its data with a high degree of temporal and spatial locality. Then we develop a parallel implementation of the algorithm. We model it as a SPMD system together to a Message-Passing Programming Paradigm. Here, we adopt the high-level message-passing systems MPI (Message Passing Interface) in the version for the Java programming language, MPJ. The parallel processing is organized into four stages: partitioning, communication, agglomeration and mapping. The decomposition of the U-BRAIN algorithm determines the necessity of a communication protocol design among the processors involved. Efficient synchronization design is also discussed. In the context of a collaboration between public and private institutions, the parallel model of U-BRAIN has been implemented and tested on the INTEL XEON E7xxx and E5xxx family of the CRESCO structure of Italian National Agency for New Technologies, Energy and Sustainable Economic Development (ENEA), developed in the framework of the European Grid Infrastructure (EGI), a series of efforts to provide access to high-throughput computing resources across Europe using grid computing techniques. The implementation is able to minimize both the memory space and the execution time. The test data used in this study are IPDATA (Irvine Primate splice- junction DATA set), a subset of HS3D (Homo Sapiens Splice Sites Dataset) and a subset of COSMIC (the Catalogue of Somatic Mutations in Cancer). The execution time and the speed-up on IPDATA reach the best values within about 90 processors. Then the parallelization advantage is balanced by the greater cost of non-local communications between the processors. A similar behaviour is evident on HS3D, but at a greater number of processors, so evidencing the direct relationship between data size and parallelization gain. This behaviour is

  5. Parallel SOR methods with a parabolic-diffusion acceleration technique for solving an unstructured-grid Poisson equation on 3D arbitrary geometries

    Zapata, M. A. Uh; Van Bang, D. Pham; Nguyen, K. D.

    2016-05-01

    This paper presents a parallel algorithm for the finite-volume discretisation of the Poisson equation on three-dimensional arbitrary geometries. The proposed method is formulated by using a 2D horizontal block domain decomposition and interprocessor data communication techniques with message passing interface. The horizontal unstructured-grid cells are reordered according to the neighbouring relations and decomposed into blocks using a load-balanced distribution to give all processors an equal amount of elements. In this algorithm, two parallel successive over-relaxation methods are presented: a multi-colour ordering technique for unstructured grids based on distributed memory and a block method using reordering index following similar ideas of the partitioning for structured grids. In all cases, the parallel algorithms are implemented with a combination of an acceleration iterative solver. This solver is based on a parabolic-diffusion equation introduced to obtain faster solutions of the linear systems arising from the discretisation. Numerical results are given to evaluate the performances of the methods showing speedups better than linear.

  6. Simplified models of the symmetric single-pass parallel-plate counterflow heat exchanger: a tutorial

    Pickard, William F.; Abraham-Shrauner, Barbara

    2018-03-01

    The heat exchanger is important in practical thermal processes, especially those of (i) the molten-salt storage schemes, (ii) compressed air energy storage schemes and (iii) other load-shifting thermal storage presumed to undergird a Smart Grid. Such devices, although central to the utilization of energy from sustainable (but intermittent) renewable sources, will be unfamiliar to many scientists, who nevertheless need a working knowledge of them. This tutorial paper provides a largely self-contained conceptual introduction for such persons. It begins by modelling a novel quantized exchanger,1 impractical as a device, but useful for comprehending the underlying thermophysics. It then reviews the one-dimensional steady-state idealization which demonstrates that effectiveness of heat transfer increases monotonically with (device length)/(device throughput). Next, it presents a two-dimensional steady-state idealization for plug flow and from it derives a novel formula for effectiveness of transfer; this formula is then shown to agree well with a finite-difference time-domain solution of the two-dimensional idealization under Hagen-Poiseuille flow. These results are consistent with a conclusion that effectiveness of heat exchange can approach unity, but may involve unwelcome trade-offs among device cost, size and throughput.

  7. Setting pass scores for clinical skills assessment.

    Liu, Min; Liu, Keh-Min

    2008-12-01

    In a clinical skills assessment, the decision to pass or fail an examinee should be based on the test content or on the examinees' performance. The process of deciding a pass score is known as setting a standard of the examination. This requires a properly selected panel of expert judges and a suitable standard setting method, which best fits the purpose of the examination. Six standard setting methods that are often used in clinical skills assessment are described to provide an overview of the standard setting process.

  8. Setting Pass Scores for Clinical Skills Assessment

    Min Liu

    2008-12-01

    Full Text Available In a clinical skills assessment, the decision to pass or fail an examinee should be based on the test content or on the examinees' performance. The process of deciding a pass score is known as setting a standard of the examination. This requires a properly selected panel of expert judges and a suitable standard setting method, which best fits the purpose of the examination. Six standard setting methods that are often used in clinical skills assessment are described to provide an overview of the standard setting process.

  9. Single beam pass migmacell method and apparatus

    Maglich, B.C.; Nering, J.E.; Mazarakis, M.G.; Miller, R.A.

    1976-01-01

    The invention provides improvements in migmacell apparatus and method by dispensing with the need for metastable confinement of injected molecular ions for multiple precession periods. Injected molecular ions undergo a 'single pass' through the reaction volume. By preconditioning the injected beam such that it contains a population distribution of molecules in higher vibrational states than in the case of a normal distribution, injected molecules in the single pass exper-ience collisionless dissociation in the migmacell under magnetic influence, i.e., so-called Lorentz dissociation. Dissociationions then form atomic migma

  10. A parallel buffer tree

    Sitchinava, Nodar; Zeh, Norbert

    2012-01-01

    We present the parallel buffer tree, a parallel external memory (PEM) data structure for batched search problems. This data structure is a non-trivial extension of Arge's sequential buffer tree to a private-cache multiprocessor environment and reduces the number of I/O operations by the number of...... in the optimal OhOf(psortN + K/PB) parallel I/O complexity, where K is the size of the output reported in the process and psortN is the parallel I/O complexity of sorting N elements using P processors....

  11. Parallel MR imaging.

    Deshmane, Anagha; Gulani, Vikas; Griswold, Mark A; Seiberlich, Nicole

    2012-07-01

    Parallel imaging is a robust method for accelerating the acquisition of magnetic resonance imaging (MRI) data, and has made possible many new applications of MR imaging. Parallel imaging works by acquiring a reduced amount of k-space data with an array of receiver coils. These undersampled data can be acquired more quickly, but the undersampling leads to aliased images. One of several parallel imaging algorithms can then be used to reconstruct artifact-free images from either the aliased images (SENSE-type reconstruction) or from the undersampled data (GRAPPA-type reconstruction). The advantages of parallel imaging in a clinical setting include faster image acquisition, which can be used, for instance, to shorten breath-hold times resulting in fewer motion-corrupted examinations. In this article the basic concepts behind parallel imaging are introduced. The relationship between undersampling and aliasing is discussed and two commonly used parallel imaging methods, SENSE and GRAPPA, are explained in detail. Examples of artifacts arising from parallel imaging are shown and ways to detect and mitigate these artifacts are described. Finally, several current applications of parallel imaging are presented and recent advancements and promising research in parallel imaging are briefly reviewed. Copyright © 2012 Wiley Periodicals, Inc.

  12. Parallel Algorithms and Patterns

    Robey, Robert W. [Los Alamos National Lab. (LANL), Los Alamos, NM (United States)

    2016-06-16

    This is a powerpoint presentation on parallel algorithms and patterns. A parallel algorithm is a well-defined, step-by-step computational procedure that emphasizes concurrency to solve a problem. Examples of problems include: Sorting, searching, optimization, matrix operations. A parallel pattern is a computational step in a sequence of independent, potentially concurrent operations that occurs in diverse scenarios with some frequency. Examples are: Reductions, prefix scans, ghost cell updates. We only touch on parallel patterns in this presentation. It really deserves its own detailed discussion which Gabe Rockefeller would like to develop.

  13. Ultrastructural evaluation of multiple pass low energy versus single pass high energy radio-frequency treatment.

    Kist, David; Burns, A Jay; Sanner, Roth; Counters, Jeff; Zelickson, Brian

    2006-02-01

    The radio-frequency (RF) device is a system capable of volumetric heating of the mid to deep dermis and selective heating of the fibrous septa strands and fascia layer. Clinically, these effects promote dermal collagen production, and tightening of these deep subcutaneous structures. A new technique of using multiple low energy passes has been described which results in lower patient discomfort and fewer side effects. This technique has also been anecdotally described as giving more reproducible and reliable clinical results of tissue tightening and contouring. This study will compare ultrastructural changes in collagen between a single pass high energy versus up to five passes of a multiple pass lower energy treatment. Three subjects were consented and treated in the preauricular region with the RF device using single or multiple passes (three or five) in the same 1.5 cm(2) treatment area with a slight delay between passes to allow tissue cooling. Biopsies from each treatment region and a control biopsy were taken immediately, 24 hours or 6 months post treatment for electron microscopic examination of the 0-1 mm and 1-2 mm levels. Sections of tissue 1 mm x 1 mm x 80 nm were examined with an RCA EMU-4 Transmission Electron Microscope. Twenty sections from 6 blocks from each 1 mm depth were examined by 2 blinded observers. The morphology and degree of collagen change in relation to area examined was compared to the control tissue, and estimated using a quantitative scale. Ultrastructural examination of tissue showed that an increased amount of collagen fibril changes with increasing passes at energies of 97 J (three passes) and 122 J (five passes), respectively. The changes seen after five multiple passes were similar to those detected after much more painful single pass high-energy treatments. This ultrastructural study shows changes in collagen fibril morphology with an increased effect demonstrated at greater depths of the skin with multiple low-fluence passes

  14. The Physical Therapy and Society Summit (PASS) Meeting: observations and opportunities.

    Kigin, Colleen M; Rodgers, Mary M; Wolf, Steven L

    2010-11-01

    The construct of delivering high-quality and cost-effective health care is in flux, and the profession must strategically plan how to meet the needs of society. In 2006, the House of Delegates of the American Physical Therapy Association passed a motion to convene a summit on "how physical therapists can meet current, evolving, and future societal health care needs." The Physical Therapy and Society Summit (PASS) meeting on February 27-28, 2009, in Leesburg, Virginia, sent a clear message that for physical therapists to be effective and thrive in the health care environment of the future, a paradigm shift is required. During the PASS meeting, participants reframed our traditional focus on the physical therapist and the patient/client (consumer) to one in which physical therapists are an integral part of a collaborative, multidisciplinary health care team with the health care consumer as its focus. The PASS Steering Committee recognized that some of the opportunities that surfaced during the PASS meeting may be disruptive or may not be within the profession's present strategic or tactical plans. Thus, adopting a framework that helps to establish the need for change that is provocative and potentially disruptive to our present care delivery, yet prioritizes opportunities, is a critical and essential step. Each of us in the physical therapy profession must take on post-PASS roles and responsibilities to accomplish the systemic change that is so intimately intertwined with our destiny. This article offers a perspective of the dynamic dialogue and suggestions that emerged from the PASS event, providing further opportunities for discussion and action within our profession.

  15. The WLCG Messaging Service and its Future

    Cons, Lionel; Paladin, Massimo

    2012-01-01

    Enterprise messaging is seen as an attractive mechanism to simplify and extend several portions of the Grid middleware, from low level monitoring to experiments dashboards. The production messaging service currently used by WLCG includes four tightly coupled brokers operated by EGI (running Apache ActiveMQ and designed to host the Grid operational tools such as SAM) as well as two dedicated services for ATLAS-DDM and experiments dashboards (currently also running Apache ActiveMQ). In the future, this service is expected to grow in numbers of applications supported, brokers and technologies. The WLCG Messaging Roadmap identified three areas with room for improvement (security, scalability and availability/reliability) as well as ten practical recommendations to address them. This paper describes a messaging service architecture that is in line with these recommendations as well as a software architecture based on reusable components that ease interactions with the messaging service. These two architectures will support the growth of the WLCG messaging service.

  16. Hand hygiene posters: motivators or mixed messages?

    Jenner, E A; Jones, F; Fletcher, B C; Miller, L; Scott, G M

    2005-07-01

    Poster campaigns regarding hand hygiene are commonly used by infection control teams to improve practice, yet little is known of the extent to which they are based on established theory or research. This study reports on the content analysis of hand hygiene posters (N=69) and their messages (N=75) using message-framing theory. The results showed that posters seldom drew on knowledge about effective ways to frame messages. Frequently, they simply conveyed information 'telling' rather than 'selling' and some of this was confusing. Most posters were not designed to motivate, and some conveyed mixed messages. Few used fear appeals. Hand hygiene posters could have a greater impact if principles of message framing were utilized in their design. Suggestions for gain-framed messages are offered, but these need to be tested empirically.

  17. Message Scheduling and Forwarding in Congested DTNs

    Elwhishi, Ahmed; Ho, Pin-Han; Shihada, Basem

    2012-01-01

    Multi-copy utility-based routing has been considered as one of the most applicable approaches to effective message delivery in Delay Tolerant Networks (DTNs). By allowing multiple message replicas launched, the ratio of message delivery or delay can be significantly reduced compared with other counterparts. Such an advantage, nonetheless, is at the expense of taking more buffer space at each node and higher complexity in message forwarding decisions. This paper investigates an efficient message scheduling and dropping policy via analytical modeling approach, aiming to achieve optimal performance in terms of message delivery delay. Extensive simulation results, based on a synthetic mobility model and real mobility traces, show that the proposed scheduling framework can achieve superb performance against its counterparts in terms of delivery delay.

  18. Diabetes education via mobile text messaging.

    Wangberg, Silje C; Arsand, Eirik; Andersson, Niklas

    2006-01-01

    Living with diabetes makes great educational demands on a family. We have tested the feasibility of using the mobile phone short message service (SMS) for reaching people with diabetes information. We also assessed user satisfaction and perceived pros and cons of the medium through interviews. Eleven parents of children with type 1 diabetes received messages for 11 weeks. The parents were positive about the system and said that they would like to continue to use it. The pop-up reminding effect of SMS messages in busy everyday life was noted as positive. Some parents experienced the messages as somewhat intrusive, arriving too often and at inconvenient times. The parents also noted the potential of the messages to facilitate communication with their adolescent children. The inability to store all of the messages or to print them out were seen as major disadvantages. Overall, the SMS seems to hold promise as means of delivering diabetes information.

  19. CMLOG: A common message logging system

    Chen, J.; Akers, W.; Bickley, M.; Wu, D.; Watson, W. III

    1997-01-01

    The Common Message Logging (CMLOG) system is an object-oriented and distributed system that not only allows applications and systems to log data (messages) of any type into a centralized database but also lets applications view incoming messages in real-time or retrieve stored data from the database according to selection rules. It consists of a concurrent Unix server that handles incoming logging or searching messages, a Motif browser that can view incoming messages in real-time or display stored data in the database, a client daemon that buffers and sends logging messages to the server, and libraries that can be used by applications to send data to or retrieve data from the database via the server. This paper presents the design and implementation of the CMLOG system meanwhile it will also address the issue of integration of CMLOG into existing control systems. CMLOG into existing control systems

  20. Message Scheduling and Forwarding in Congested DTNs

    Elwhishi, Ahmed

    2012-08-19

    Multi-copy utility-based routing has been considered as one of the most applicable approaches to effective message delivery in Delay Tolerant Networks (DTNs). By allowing multiple message replicas launched, the ratio of message delivery or delay can be significantly reduced compared with other counterparts. Such an advantage, nonetheless, is at the expense of taking more buffer space at each node and higher complexity in message forwarding decisions. This paper investigates an efficient message scheduling and dropping policy via analytical modeling approach, aiming to achieve optimal performance in terms of message delivery delay. Extensive simulation results, based on a synthetic mobility model and real mobility traces, show that the proposed scheduling framework can achieve superb performance against its counterparts in terms of delivery delay.

  1. Optimization and parallelization of the thermal–hydraulic subchannel code CTF for high-fidelity multi-physics applications

    Salko, Robert K.; Schmidt, Rodney C.; Avramova, Maria N.

    2015-01-01

    Highlights: • COBRA-TF was adopted by the Consortium for Advanced Simulation of LWRs. • We have improved code performance to support running large-scale LWR simulations. • Code optimization has led to reductions in execution time and memory usage. • An MPI parallelization has reduced full-core simulation time from days to minutes. - Abstract: This paper describes major improvements to the computational infrastructure of the CTF subchannel code so that full-core, pincell-resolved (i.e., one computational subchannel per real bundle flow channel) simulations can now be performed in much shorter run-times, either in stand-alone mode or as part of coupled-code multi-physics calculations. These improvements support the goals of the Department Of Energy Consortium for Advanced Simulation of Light Water Reactors (CASL) Energy Innovation Hub to develop high fidelity multi-physics simulation tools for nuclear energy design and analysis. A set of serial code optimizations—including fixing computational inefficiencies, optimizing the numerical approach, and making smarter data storage choices—are first described and shown to reduce both execution time and memory usage by about a factor of ten. Next, a “single program multiple data” parallelization strategy targeting distributed memory “multiple instruction multiple data” platforms utilizing domain decomposition is presented. In this approach, data communication between processors is accomplished by inserting standard Message-Passing Interface (MPI) calls at strategic points in the code. The domain decomposition approach implemented assigns one MPI process to each fuel assembly, with each domain being represented by its own CTF input file. The creation of CTF input files, both for serial and parallel runs, is also fully automated through use of a pressurized water reactor (PWR) pre-processor utility that uses a greatly simplified set of user input compared with the traditional CTF input. To run CTF in

  2. Generalizing Galileo's Passe-Dix Game

    Hombas, Vassilios

    2012-01-01

    This article shows a generalization of Galileo's "passe-dix" game. The game was born following one of Galileo's [G. Galileo, "Sopra le Scoperte dei Dadi" (Galileo, Opere, Firenze, Barbera, Vol. 8). Translated by E.H. Thorne, 1898, pp. 591-594] explanations on a paradox that occurred in the experiment of tossing three fair "six-sided" dice.…

  3. TREsPASS Book 3: Creative Engagements

    Coles-Kemp, Lizzie; Hall, Peter

    2016-01-01

    In this book we examine the role that creative security engagements have played in the TREsPASS project. These engagements are part of a wider creative securities approach that explores the contributions that social practices make to protection of data and information. Our most popular creative

  4. Passing the Bond Issue (with Related Video)

    Erickson, Paul W.

    2011-01-01

    When a bond referendum comes around for a school district, it often is the culmination of years of planning, strategizing and communicating to the public. Especially in these economic times, passing a building referendum is challenging. Complete transparency among the superintendent, school board and community is essential to communicate the…

  5. MINUIT package parallelization and applications using the RooFit package

    Lazzaro, Alfio; Moneta, Lorenzo

    2010-01-01

    The fitting procedures are based on numerical minimization of functions. The MINUIT package is the most common package used for such procedures in High Energy Physics community. The main algorithm in this package, MIGRAD, searches the minimum of a function using the gradient information. For each minimization iteration, MIGRAD requires the calculation of the derivative for each free parameter of the function to be minimized. Minimization is required for data analysis problems based on the maximum likelihood technique. The calculation of complex likelihood functions, with several free parameters, many independent variables and large data samples, can be very CPU-time consuming. Then, the minimization process requires the calculation of the likelihood functions several times for each minimization iteration. In this paper we will show how MIGRAD algorithm and the likelihood function calculation can be easily parallelized using Message Passing Interface techniques. We will present the speed-up improvements obtained in typical physics applications such as complex maximum likelihood fits using the RooFit package.

  6. PhyloBayes MPI: phylogenetic reconstruction with infinite mixtures of profiles in a parallel environment.

    Lartillot, Nicolas; Rodrigue, Nicolas; Stubbs, Daniel; Richer, Jacques

    2013-07-01

    Modeling across site variation of the substitution process is increasingly recognized as important for obtaining more accurate phylogenetic reconstructions. Both finite and infinite mixture models have been proposed and have been shown to significantly improve on classical single-matrix models. Compared with their finite counterparts, infinite mixtures have a greater expressivity. However, they are computationally more challenging. This has resulted in practical compromises in the design of infinite mixture models. In particular, a fast but simplified version of a Dirichlet process model over equilibrium frequency profiles implemented in PhyloBayes has often been used in recent phylogenomics studies, while more refined model structures, more realistic and empirically more fit, have been practically out of reach. We introduce a message passing interface version of PhyloBayes, implementing the Dirichlet process mixture models as well as more classical empirical matrices and finite mixtures. The parallelization is made efficient thanks to the combination of two algorithmic strategies: a partial Gibbs sampling update of the tree topology and the use of a truncated stick-breaking representation for the Dirichlet process prior. The implementation shows close to linear gains in computational speed for up to 64 cores, thus allowing faster phylogenetic reconstruction under complex mixture models. PhyloBayes MPI is freely available from our website www.phylobayes.org.

  7. Message from the Program Chairs

    Sheng, Quan Z.; Wang, Guoren; Jensen, Christian S.

    2012-01-01

    . The papers cover contemporary topics in the fields of Web management and World Wide Web related research and applications, such as advanced application of databases, cloud computing, content management, data mining and knowledge discovery, distributed and parallel processing, grid computing, internet...

  8. Message Received: Virtual Ethnography in Online Message Boards

    Kevin F. Steinmetz

    2012-02-01

    Full Text Available As the Internet begins to encapsulate more people within online communities, it is important that the social researcher have well-rounded ethnographic methodologies for observing these phenomena. This article seeks to contribute to methodology by detailing and providing insights into three specific facets of virtual ethnography that need attention: space and time, identity and authenticity, and ethics. Because the Internet is a globalized and instantaneous medium where space and time collapse, identity becomes more playful, and ethics become more tenuous; understanding these aspects is crucial to the study of online social groups. A second focus of this article is to apply these notions to the study of online message boards—a frequently used medium for online communication that is frequently overlooked by methodologists.

  9. Nonlinear unknown input sliding mode observer based chaotic system synchronization and message recovery scheme with uncertainty

    Sharma, Vivek; Sharma, B.B.; Nath, R.

    2017-01-01

    In the present manuscript, observer based synchronization and message recovery scheme is discussed for a system with uncertainties. LMI conditions are analytically derived solution of which gives the observer design matrices. Earlier approaches have used adaptive laws to address the uncertainties, however in present work, decoupling approach is used to make observer robust against uncertainties. The methodology requires upper bounds on nonlinearity and the message signal and estimates for these bounds are generated adaptively. Thus no information about the nature of nonlinearity and associated Lipschitz constant is needed in proposed approach. Message signal is recovered using equivalent output injection which is a low pass filtered equivalent of the discontinuous effort required to maintain the sliding motion. Finally, the efficacy of proposed Nonlinear Unknown Input Sliding Mode Observer (NUISMO) for chaotic communication is verified by conducting simulation studies on two chaotic systems i.e. third order Chua circuit and Rossler system.

  10. Modeling drivers' passing duration and distance in a virtual environment

    Haneen Farah

    2013-07-01

    The main contribution of this paper is in the empirical models developed for passing duration and distance which highlights the factors that affect drivers' passing behavior and can be used to enhance the passing models in simulation programs.

  11. Parallel discrete event simulation

    Overeinder, B.J.; Hertzberger, L.O.; Sloot, P.M.A.; Withagen, W.J.

    1991-01-01

    In simulating applications for execution on specific computing systems, the simulation performance figures must be known in a short period of time. One basic approach to the problem of reducing the required simulation time is the exploitation of parallelism. However, in parallelizing the simulation

  12. Parallel reservoir simulator computations

    Hemanth-Kumar, K.; Young, L.C.

    1995-01-01

    The adaptation of a reservoir simulator for parallel computations is described. The simulator was originally designed for vector processors. It performs approximately 99% of its calculations in vector/parallel mode and relative to scalar calculations it achieves speedups of 65 and 81 for black oil and EOS simulations, respectively on the CRAY C-90

  13. Message framing in social networking sites.

    Kao, Danny Tengti; Chuang, Shih-Chieh; Wang, Sui-Min; Zhang, Lei

    2013-10-01

    Online social networking sites represent significant new opportunities for Internet advertisers. However, results based on the real world cannot be generalized to all virtual worlds. In this research, the moderating effects of need for cognition (NFC) and knowledge were applied to examine the impact of message framing on attitudes toward social networking sites. A total of 216 undergraduates participated in the study. Results reveal that for social networking sites, while high-NFC individuals form more favorable attitudes toward negatively framed messages than positively framed messages, low-NFC individuals form more favorable attitudes toward positively framed messages than negatively framed messages. In addition, low-knowledge individuals demonstrate more favorable attitudes toward negatively framed messages than positively framed messages; however, the framing effect does not differentially affect the attitudes of high-knowledge individuals. Furthermore, the framing effect does not differentially affect the attitudes of high-NFC individuals with high knowledge. In contrast, low-NFC individuals with low knowledge hold more favorable attitudes toward positively framed messages than negatively framed messages.

  14. Evaluation of Sexual Communication Message Strategies

    2011-01-01

    Parent-child communication about sex is an important proximal reproductive health outcome. But while campaigns to promote it such as the Parents Speak Up National Campaign (PSUNC) have been effective, little is known about how messages influence parental cognitions and behavior. This study examines which message features explain responses to sexual communication messages. We content analyzed 4 PSUNC ads to identify specific, measurable message and advertising execution features. We then develop quantitative measures of those features, including message strategies, marketing strategies, and voice and other stylistic features, and merged the resulting data into a dataset drawn from a national media tracking survey of the campaign. Finally, we conducted multivariable logistic regression models to identify relationships between message content and ad reactions/receptivity, and between ad reactions/receptivity and parents' cognitions related to sexual communication included in the campaign's conceptual model. We found that overall parents were highly receptive to the PSUNC ads. We did not find significant associations between message content and ad reactions/receptivity. However, we found that reactions/receptivity to specific PSUNC ads were associated with increased norms, self-efficacy, short- and long-term expectations about parent-child sexual communication, as theorized in the conceptual model. This study extends previous research and methods to analyze message content and reactions/receptivity. The results confirm and extend previous PSUNC campaign evaluation and provide further evidence for the conceptual model. Future research should examine additional message content features and the effects of reactions/receptivity. PMID:21599875

  15. Totally parallel multilevel algorithms

    Frederickson, Paul O.

    1988-01-01

    Four totally parallel algorithms for the solution of a sparse linear system have common characteristics which become quite apparent when they are implemented on a highly parallel hypercube such as the CM2. These four algorithms are Parallel Superconvergent Multigrid (PSMG) of Frederickson and McBryan, Robust Multigrid (RMG) of Hackbusch, the FFT based Spectral Algorithm, and Parallel Cyclic Reduction. In fact, all four can be formulated as particular cases of the same totally parallel multilevel algorithm, which are referred to as TPMA. In certain cases the spectral radius of TPMA is zero, and it is recognized to be a direct algorithm. In many other cases the spectral radius, although not zero, is small enough that a single iteration per timestep keeps the local error within the required tolerance.

  16. Parallel computing works

    1991-10-23

    An account of the Caltech Concurrent Computation Program (C{sup 3}P), a five year project that focused on answering the question: Can parallel computers be used to do large-scale scientific computations '' As the title indicates, the question is answered in the affirmative, by implementing numerous scientific applications on real parallel computers and doing computations that produced new scientific results. In the process of doing so, C{sup 3}P helped design and build several new computers, designed and implemented basic system software, developed algorithms for frequently used mathematical computations on massively parallel machines, devised performance models and measured the performance of many computers, and created a high performance computing facility based exclusively on parallel computers. While the initial focus of C{sup 3}P was the hypercube architecture developed by C. Seitz, many of the methods developed and lessons learned have been applied successfully on other massively parallel architectures.

  17. Massively parallel mathematical sieves

    Montry, G.R.

    1989-01-01

    The Sieve of Eratosthenes is a well-known algorithm for finding all prime numbers in a given subset of integers. A parallel version of the Sieve is described that produces computational speedups over 800 on a hypercube with 1,024 processing elements for problems of fixed size. Computational speedups as high as 980 are achieved when the problem size per processor is fixed. The method of parallelization generalizes to other sieves and will be efficient on any ensemble architecture. We investigate two highly parallel sieves using scattered decomposition and compare their performance on a hypercube multiprocessor. A comparison of different parallelization techniques for the sieve illustrates the trade-offs necessary in the design and implementation of massively parallel algorithms for large ensemble computers.

  18. Contact conditions in skin-pass rolling

    Kijima, Hideo; Bay, Niels

    2007-01-01

    The special contact conditions in skin-pass rolling of steel strip is analysed by studying plane strain upsetting of thin sheet with low reduction applying long narrow tools and dry friction conditions. An extended sticking region is estimated by an elasto-plastic FEM analysis of the plane strain...... upsetting. This sticking region causes a highly inhomogeneous elasto-plastic deformation with large influence of work-hardening and friction. A numerical analysis of skin-pass rolling shows the same contact conditions, i.e. an extended sticking region around the center of the contact zone. The calculated...... size of the sticking region with varying contact length and pressure/reduction is experimentally verified by plane strain upsetting tests measuring the local surface deformation of the work pieces after unloading....

  19. Two-pass greedy regular expression parsing

    Grathwohl, Niels Bjørn Bugge; Henglein, Fritz; Nielsen, Lasse

    2013-01-01

    We present new algorithms for producing greedy parses for regular expressions (REs) in a semi-streaming fashion. Our lean-log algorithm executes in time O(mn) for REs of size m and input strings of size n and outputs a compact bit-coded parse tree representation. It improves on previous algorithms...... by: operating in only 2 passes; using only O(m) words of random-access memory (independent of n); requiring only kn bits of sequentially written and read log storage, where k ... and not requiring it to be stored at all. Previous RE parsing algorithms do not scale linearly with input size, or require substantially more log storage and employ 3 passes where the first consists of reversing the input, or do not or are not known to produce a greedy parse. The performance of our unoptimized C...

  20. 3D streamers simulation in a pin to plane configuration using massively parallel computing

    Plewa, J.-M.; Eichwald, O.; Ducasse, O.; Dessante, P.; Jacobs, C.; Renon, N.; Yousfi, M.

    2018-03-01

    This paper concerns the 3D simulation of corona discharge using high performance computing (HPC) managed with the message passing interface (MPI) library. In the field of finite volume methods applied on non-adaptive mesh grids and in the case of a specific 3D dynamic benchmark test devoted to streamer studies, the great efficiency of the iterative R&B SOR and BiCGSTAB methods versus the direct MUMPS method was clearly demonstrated in solving the Poisson equation using HPC resources. The optimization of the parallelization and the resulting scalability was undertaken as a function of the HPC architecture for a number of mesh cells ranging from 8 to 512 million and a number of cores ranging from 20 to 1600. The R&B SOR method remains at least about four times faster than the BiCGSTAB method and requires significantly less memory for all tested situations. The R&B SOR method was then implemented in a 3D MPI parallelized code that solves the classical first order model of an atmospheric pressure corona discharge in air. The 3D code capabilities were tested by following the development of one, two and four coplanar streamers generated by initial plasma spots for 6 ns. The preliminary results obtained allowed us to follow in detail the formation of the tree structure of a corona discharge and the effects of the mutual interactions between the streamers in terms of streamer velocity, trajectory and diameter. The computing time for 64 million of mesh cells distributed over 1000 cores using the MPI procedures is about 30 min ns-1, regardless of the number of streamers.

  1. Expectancy Theory in Media and Message Selection.

    Van Leuven, Jim

    1981-01-01

    Argues for reversing emphasis on uses and gratifications research in favor of an expectancy model which holds that selection of a particular medium depends on (1) the expectation that the choice will be followed by a message of interest and (2) the importance of that message in satisfying user's values. (PD)

  2. Should We Ban Instant Messaging In School?

    Texley, Sharon; DeGennaro, Donna

    2005-01-01

    This article is a brief debate on the pros and cons of allowing students to use instant messaging (IM) in school. On one hand, teenagers' desire to socialize can overcome other priorities and schools may set policies to ban instant messaging. The contrary view is that schools should embrace the IM technology being popularized by youth and find…

  3. 78 FR 64202 - Quantitative Messaging Research

    2013-10-28

    ... COMMODITY FUTURES TRADING COMMISSION Quantitative Messaging Research AGENCY: Commodity Futures... survey will follow qualitative message testing research (for which CFTC received fast- track OMB approval... comments. Please submit your comments using only one method and identify that it is for the ``Quantitative...

  4. Undergraduates' Text Messaging Language and Literacy Skills

    Grace, Abbie; Kemp, Nenagh; Martin, Frances Heritage; Parrila, Rauno

    2014-01-01

    Research investigating whether people's literacy skill is being affected by the use of text messaging language has produced largely positive results for children, but mixed results for adults. We asked 150 undergraduate university students in Western Canada and 86 in South Eastern Australia to supply naturalistic text messages and to complete…

  5. Arbitrated quantum signature scheme with message recovery

    Lee, Hwayean; Hong, Changho; Kim, Hyunsang; Lim, Jongin; Yang, Hyung Jin

    2004-01-01

    Two quantum signature schemes with message recovery relying on the availability of an arbitrator are proposed. One scheme uses a public board and the other does not. However both schemes provide confidentiality of the message and a higher efficiency in transmission

  6. Messages about Sexuality: An Ecological Perspective

    Boone, Tanya L.

    2015-01-01

    The goal of this two-part study was to identify the perceived influence of sexuality messages from parents, peers, school and the media--four microsystems within the Ecological Model--on emerging adult US college women's sexual attitudes. Findings suggest that parents were the most likely source of the message to "remain abstinent until…

  7. Suspecting Neurological Dysfunction From E Mail Messages ...

    A non medical person suspected and confirmed neurological dysfunction in an individual, based only on e mail messages sent by the individual. With email communication becoming rampant “peculiar” email messages may raise the suspicion of neurological dysfunction. Organic pathology explaining the abnormal email ...

  8. Single Pass Albumin Dialysis in Hepatorenal Syndrome

    Rahman Ebadur

    2008-01-01

    Full Text Available Hepatorenal syndrome (HRS is the most appalling complication of acute or chronic liver disease with 90% mortality rate. Single pass albumin dialysis (SPAD can be considered as a noble liver support technique in HRS. Here, we present a case of a young healthy patient who developed hyperacute fulminant liver failure that progressed to HRS. The patient was offered SPAD as a bridge to liver transplantation, however, it resulted in an excellent recovery.

  9. Fear appeals in HIV-prevention messages: young people's perceptions in northern Tanzania.

    Bastien, Sheri

    2011-12-01

    The aims of the study were to elicit the perceptions of young people in Tanzania on the role of fear appeals in HIV-prevention messages and to identify important contextual factors that may influence young people's perceptions of HIV-prevention posters. A total of 10 focus groups were conducted to investigate the role of fear appeals using the extended parallel process model (EPPM) as a guide. Young people were shown a series of images (mostly posters) with alternating high and low-threat messages (fear appeals), and then asked questions about their overall beliefs about HIV and AIDS, as well as about their response in terms of perceived susceptibility to HIV infection, the severity of the message, and their perceptions of self-efficacy and response efficacy. The images and messages that specifically targeted young people were highest in inducing perceived susceptibility to HIV infection, while pictorial descriptions of the physical consequences of HIV infection and those messages related to the stigma and discrimination faced by HIV-infected or affected people induced greater perceptions of severity. The information-based posters rated high in inducing response efficacy, while none of the images seemed to convince young people that they had the self-efficacy to perform the recommended health behaviours. The young people expressed a preference for fear-based appeals and a belief that this could work well in HIV-prevention efforts, yet they also stated a desire for more information-based messages about how to protect themselves. Finally, the messages evoking the most emotional responses were those that had been locally conceived rather than ones developed by large-scale donor-funded campaigns. Finding the appropriate balance between fear and efficacy in HIV-prevention messages is imperative. Further research is needed to better understand how and when fear appeals work and do not work in African settings, especially among young people.

  10. The WLCG Messaging Service and its Future

    Cons, Lionel

    2012-01-01

    Enterprise messaging is seen as an attractive mechanism to simplify and extend several portions of the Grid middleware, from low level monitoring to experiments dashboards. The production messaging service currently used by WLCG includes four tightly coupled brokers operated by EGI (running Apache ActiveMQ and designed to host the Grid operational tools such as SAM) as well as two dedicated services for ATLAS-DDM and experiments dashboards (currently also running Apache ActiveMQ). In the future, this service is expected to grow in numbers of applications supported, brokers and technologies. The WLCG Messaging Roadmap identified three areas with room for improvement (security, scalability and availability/reliability) as well as ten practical recommendations to address them. This paper describes a messaging service architecture that is in line with these recommendations as well as a software architecture based on reusable components that ease interactions with the messaging service. These two architectures wil...

  11. AMS: Area Message Service for SLC

    Crane, M.; Mackenzie, R.; Millsom, D.; Zelazny, M.

    1993-04-01

    The Area Message Service (AMS) is a TCP/IP based messaging service currently in use at SLAC. A number of projects under development here at SLAC require and application level interface to the 4.3BSD UNIX socket level communications functions using TCP/IP over ethernet. AMS provides connection management, solicited message transfer, unsolicited message transfer, and asynchronous notification of pending messages. AMS is written completely in ANSI 'C' and is currently portable over three hardware/operating system/network manager platforms, VAX/VMS/Multinet, PC/MS-DOS/Pathworks, VME 68K/pSOS/pNA. The basic architecture is a client-server connection where either end of the interface may be the server. This allows for connections and data flow to be initiated from either end of the interface. Included in the paper are details concerning the connection management, the handling of the multi-platform code, and the implementation process

  12. Factors influencing message dissemination through social media

    Zheng, Zeyu; Yang, Huancheng; Fu, Yang; Fu, Dianzheng; Podobnik, Boris; Stanley, H. Eugene

    2018-06-01

    Online social networks strongly impact our daily lives. An internet user (a "Netizen") wants messages to be efficiently disseminated. The susceptible-infected-recovered (SIR) dissemination model is the traditional tool for exploring the spreading mechanism of information diffusion. We here test our SIR-based dissemination model on open and real-world data collected from Twitter. We locate and identify phase transitions in the message dissemination process. We find that message content is a stronger factor than the popularity of the sender. We also find that the probability that a message will be forwarded has a threshold that affects its ability to spread, and when the probability is above the threshold the message quickly achieves mass dissemination.

  13. Hand hygiene posters: selling the message.

    Jenner, E A; Jones, F; Fletcher, B C; Miller, L; Scott, G M

    2005-02-01

    This literature review was undertaken to determine the established theory and research that might be utilized to inform the construction of persuasive messages on hand hygiene posters. It discusses the principles of message framing and the use of fear appeals. Current theory suggests that the most effective messages for health promotion behaviours should be framed in terms of gains rather than losses for the individual. However, as clinical hand hygiene is largely for the benefit of others (i.e. patients), messages should also invoke a sense of personal responsibility and appeal to altruistic behaviour. The use of repeated minimal fear appeals have their place. Posters that simply convey training messages are not effective persuaders.

  14. Algorithms for parallel computers

    Churchhouse, R.F.

    1985-01-01

    Until relatively recently almost all the algorithms for use on computers had been designed on the (usually unstated) assumption that they were to be run on single processor, serial machines. With the introduction of vector processors, array processors and interconnected systems of mainframes, minis and micros, however, various forms of parallelism have become available. The advantage of parallelism is that it offers increased overall processing speed but it also raises some fundamental questions, including: (i) which, if any, of the existing 'serial' algorithms can be adapted for use in the parallel mode. (ii) How close to optimal can such adapted algorithms be and, where relevant, what are the convergence criteria. (iii) How can we design new algorithms specifically for parallel systems. (iv) For multi-processor systems how can we handle the software aspects of the interprocessor communications. Aspects of these questions illustrated by examples are considered in these lectures. (orig.)

  15. Parallelism and array processing

    Zacharov, V.

    1983-01-01

    Modern computing, as well as the historical development of computing, has been dominated by sequential monoprocessing. Yet there is the alternative of parallelism, where several processes may be in concurrent execution. This alternative is discussed in a series of lectures, in which the main developments involving parallelism are considered, both from the standpoint of computing systems and that of applications that can exploit such systems. The lectures seek to discuss parallelism in a historical context, and to identify all the main aspects of concurrency in computation right up to the present time. Included will be consideration of the important question as to what use parallelism might be in the field of data processing. (orig.)

  16. Messages about appearance, food, weight and exercise in "tween" television.

    Simpson, Courtney C; Kwitowski, Melissa; Boutte, Rachel; Gow, Rachel W; Mazzeo, Suzanne E

    2016-12-01

    Tweens (children ages ~8-14years) are a relatively recently defined age group, increasingly targeted by marketers. Individuals in this age group are particularly vulnerable to opinions and behaviors presented in media messages, given their level of cognitive and social development. However, little research has examined messages about appearance, food, weight, and exercise in television specifically targeting tweens, despite the popularity of this media type among this age group. This study used a content analytic approach to explore these messages in the five most popular television shows for tweens on the Disney Channel (as of 2015). Using a multiple-pass approach, relevant content in episodes from the most recently completed seasons of each show was coded. Appearance related incidents occurred in every episode; these most frequently mentioned attractiveness/beauty. Food related incidents were also present in every episode; typically, these situations were appearance and weight neutral. Exercise related incidents occurred in 53.3% of episodes; the majority expressed resistance to exercise. Weight related incidents occurred in 40.0% of the episodes; the majority praised the muscular ideal. Women were more likely to initiate appearance incidents, and men were more likely to initiate exercise incidents. These results suggest that programs specifically marketed to tweens reinforce appearance ideals, including stereotypes about female attractiveness and male athleticism, two constructs linked to eating pathology and body dissatisfaction. Given the developmental vulnerability of the target group, these findings are concerning, and highlight potential foci for prevention programming, including media literacy, for tweens. Copyright © 2016 Elsevier Ltd. All rights reserved.

  17. (Nearly) portable PIC code for parallel computers

    Decyk, V.K.

    1993-01-01

    As part of the Numerical Tokamak Project, the author has developed a (nearly) portable, one dimensional version of the GCPIC algorithm for particle-in-cell codes on parallel computers. This algorithm uses a spatial domain decomposition for the fields, and passes particles from one domain to another as the particles move spatially. With only minor changes, the code has been run in parallel on the Intel Delta, the Cray C-90, the IBM ES/9000 and a cluster of workstations. After a line by line translation into cmfortran, the code was also run on the CM-200. Impressive speeds have been achieved, both on the Intel Delta and the Cray C-90, around 30 nanoseconds per particle per time step. In addition, the author was able to isolate the data management modules, so that the physics modules were not changed much from their sequential version, and the data management modules can be used as open-quotes black boxes.close quotes

  18. Parallel magnetic resonance imaging

    Larkman, David J; Nunes, Rita G

    2007-01-01

    Parallel imaging has been the single biggest innovation in magnetic resonance imaging in the last decade. The use of multiple receiver coils to augment the time consuming Fourier encoding has reduced acquisition times significantly. This increase in speed comes at a time when other approaches to acquisition time reduction were reaching engineering and human limits. A brief summary of spatial encoding in MRI is followed by an introduction to the problem parallel imaging is designed to solve. There are a large number of parallel reconstruction algorithms; this article reviews a cross-section, SENSE, SMASH, g-SMASH and GRAPPA, selected to demonstrate the different approaches. Theoretical (the g-factor) and practical (coil design) limits to acquisition speed are reviewed. The practical implementation of parallel imaging is also discussed, in particular coil calibration. How to recognize potential failure modes and their associated artefacts are shown. Well-established applications including angiography, cardiac imaging and applications using echo planar imaging are reviewed and we discuss what makes a good application for parallel imaging. Finally, active research areas where parallel imaging is being used to improve data quality by repairing artefacted images are also reviewed. (invited topical review)

  19. Designing Anti-Binge Drinking Prevention Messages: Message Framing vs. Evidence Type.

    Kang, Hannah; Lee, Moon J

    2017-09-27

    We investigated whether presenting anti-binge drinking health campaign messages in different message framing and evidence types influences college students' intention to avoid binge drinking, based on prospect theory (PT) and exemplification theory. A 2 (message framing: loss-framed message/gain-framed message) X 2 (evidence type: statistical/narrative) between-subjects factorial design with a control group was conducted with 156 college students. College students who were exposed to the loss-framed message condition exhibited a higher level of intention to avoid binge drinking in the near future than those who did not see any messages (the control group). This finding was mainly among non-binge drinkers. Regardless of evidence type, those who were exposed to the messages exhibited a higher level of intention to avoid binge drinking than those in the control group. This is also mainly among non-binge drinkers. We also found the main effects of message framing and evidence type on attitude toward the message and the main effect of message framing on attitude toward drinking.

  20. "Which pass is better?" Novel approaches to assess passing effectiveness in elite soccer.

    Rein, Robert; Raabe, Dominik; Memmert, Daniel

    2017-10-01

    Passing behaviour is a key property of successful performance in team sports. Previous investigations however have mainly focused on notational measurements like total passing frequencies which provide little information about what actually constitutes successful passing behaviour. Consequently, this has hampered the transfer of research findings into applied settings. Here we present two novel approaches to assess passing effectiveness in elite soccer by evaluating their effects on majority situations and space control in front of the goal. Majority situations are assessed by calculating the number of defenders between the ball carrier and the goal. Control of space is estimated using Voronoi-diagrams based on the player's positions on the pitch. Both methods were applied to position data from 103 German First division games from the 2011/2012, 2012/2013 and 2014/2015 seasons using a big data approach. The results show that both measures are significantly related to successful game play with respect to the number of goals scored and to the probability of winning a game. The results further show that on average passes from the mid-field into the attacking area are most effective. The presented passing efficiency measures thereby offer new opportunities for future applications in soccer and other sports disciplines whilst maintaining practical relevance with respect to tactical training regimes or game performances analysis. Copyright © 2017 Elsevier B.V. All rights reserved.

  1. Improving the Effectiveness of Fundraising Messages: The Impact of Charity Goal Attainment, Message Framing, and Evidence on Persuasion

    Das, Enny; Kerkhof, Peter; Kuiper, Joyce

    2008-01-01

    This experimental study assessed the effectiveness of fundraising messages. Based on recent findings regarding the effects of message framing and evidence, effective fundraising messages should combine abstract, statistical information with a negative message frame and anecdotal evidence with a positive message frame. In addition, building on…

  2. Are you Scared Yet?: Evaluating Fear Appeal Messages in Tweets about the Tips Campaign.

    Emery, Sherry L; Szczypka, Glen; Abril, Eulàlia Puig; Kim, Yoonsang; Vera, Lisa

    2014-04-01

    In March 2012, the CDC launched "Tips from Former Smokers," a $54 million national campaign featuring individuals experiencing long-term health consequences of smoking. The campaign approach was based on strong evidence that anti-tobacco ads portraying fear, graphic images, and personal testimonials are associated with attitudinal and behavior change. Yet it was also controversial; critics cited the danger that viewers might reject such intensely graphic messages. Tasked with informing this debate, our study analyzes the corpus of Tips campaign-related tweets obtained via the Twitter Firehose. We provide a novel and rigorous method for media campaign evaluation within the framework of the Extended Parallel Process Model. Among the relevant Tweets, 87% showed evidence of message acceptance, while 7% exhibited message rejection.

  3. Text messages as a learning tool for midwives | Woods | South ...

    The use of cell phone text messaging to improve access to continuing ... with 50 of the message recipients, demonstrated that the text messages were well received by ... services, such as the management of HIV-infected children and adults.

  4. A Message Without a Code?

    Tom Conley

    1981-01-01

    Full Text Available The photographic paradox is said to be that of a message without a code, a communication lacking a relay or gap essential to the process of communication. Tracing the recurrence of Barthes's definition in the essays included in Image/Music/Text and in La Chambre claire , this paper argues that Barthes's definition is platonic in its will to dematerialize the troubling — graphic — immediacy of the photograph. He writes of the image in order to flee its signature. As a function of media, his categories are written in order to be insufficient and inadequate; to maintain an ineluctable difference between language heard and letters seen; to protect an idiom of loss which the photograph disallows. The article studies the strategies of his definition in «The Photographic Paradox» as instrument of abstraction, opposes the notion of code, in an aural sense, to audio-visual markers of closed relay in advertising, and critiques the layout and order of La Chambre claire in respect to Barthes's ideology of absence.

  5. The STAPL Parallel Graph Library

    Harshvardhan,; Fidel, Adam; Amato, Nancy M.; Rauchwerger, Lawrence

    2013-01-01

    This paper describes the stapl Parallel Graph Library, a high-level framework that abstracts the user from data-distribution and parallelism details and allows them to concentrate on parallel graph algorithm development. It includes a customizable

  6. Improved Message Authentication and Confidentiality Checking

    Ismail Jabiullah, M.; Abdullah Al-Shamim, M.; Lutfar Rahman, M.

    2005-01-01

    The most confusing areas of the secured network communications are the message authentication and confidentiality checking. The attacks and the counter measures have become so convoluted that the users in this area begin to account for all contingencies. Two session-key generation techniques are used here to generate two separate session keys K 1 and K 2 ; and both the sender and the reveiver share these keys for higher degree of authentication and confidentiality. For this, the message is first encrypted by the key K 1 , and then the intermediary message authenticatin code (MAC) is generated by encrypting the encrypted message using the key K 2 . Then, the encrypted message and the intermediary MAC is again encrypted by using the K 2 and concatenated with the encrypted message and sent to the destination. At the receiving end, first, the received ciphertext is encrypted by using key K 2 and compared to the received MAC. The received ciphertext again is decrypted by the key K 2 and compared with the first decrypted MAC twice by the key K 2 . The plaintext is obtained by decrypting the received ciphertext first by K 2 and then by K 1 , using the corresponding decryption techniques respectively. The encryption techniques with key K 2 provides the authentication and with key K 1 provides the confidentiality checking of the transmitted message. The developed technique can be applied to both academic and commercial applications in online or offline electronic transactions for security.(authors)

  7. Management and Archiving e-mail Messages in Governmental Organization

    Ashraf Mohamed A.Mohsen

    2006-01-01

    The study deals great issue of digital preservation that is e-mail archiving, it covered all aspects of the topic; it discuss: e-mail system, components of e-mail message, advantages and disadvantages of e-mail, official e-mail messages, management of e-mail messages, organizing and arrangement of e-mail messages, keeping and deleting messages, archiving e-mail messages, and some related issues like: privacy and security.

  8. Pass-transistor asynchronous sequential circuits

    Whitaker, Sterling R.; Maki, Gary K.

    1989-01-01

    Design methods for asynchronous sequential pass-transistor circuits, which result in circuits that are hazard- and critical-race-free and which have added degrees of freedom for the input signals, are discussed. The design procedures are straightforward and easy to implement. Two single-transition-time state assignment methods are presented, and hardware bounds for each are established. A surprising result is that the hardware realizations for each next state variable and output variable is identical for a given flow table. Thus, a state machine with N states and M outputs can be constructed using a single layout replicated N + M times.

  9. Evaluation of Sexual Communication Message Strategies

    Khan Munziba

    2011-05-01

    Full Text Available Abstract Parent-child communication about sex is an important proximal reproductive health outcome. But while campaigns to promote it such as the Parents Speak Up National Campaign (PSUNC have been effective, little is known about how messages influence parental cognitions and behavior. This study examines which message features explain responses to sexual communication messages. We content analyzed 4 PSUNC ads to identify specific, measurable message and advertising execution features. We then develop quantitative measures of those features, including message strategies, marketing strategies, and voice and other stylistic features, and merged the resulting data into a dataset drawn from a national media tracking survey of the campaign. Finally, we conducted multivariable logistic regression models to identify relationships between message content and ad reactions/receptivity, and between ad reactions/receptivity and parents' cognitions related to sexual communication included in the campaign's conceptual model. We found that overall parents were highly receptive to the PSUNC ads. We did not find significant associations between message content and ad reactions/receptivity. However, we found that reactions/receptivity to specific PSUNC ads were associated with increased norms, self-efficacy, short- and long-term expectations about parent-child sexual communication, as theorized in the conceptual model. This study extends previous research and methods to analyze message content and reactions/receptivity. The results confirm and extend previous PSUNC campaign evaluation and provide further evidence for the conceptual model. Future research should examine additional message content features and the effects of reactions/receptivity.

  10. When message-frame fits salient cultural-frame, messages feel more persuasive.

    Uskul, Ayse K; Oyserman, Daphna

    2010-03-01

    The present study examines the persuasive effects of tailored health messages comparing those tailored to match (versus not match) both chronic cultural frame and momentarily salient cultural frame. Evidence from two studies (Study 1: n = 72 European Americans; Study 2: n = 48 Asian Americans) supports the hypothesis that message persuasiveness increases when chronic cultural frame, health message tailoring and momentarily salient cultural frame all match. The hypothesis was tested using a message about health risks of caffeine consumption among individuals prescreened to be regular caffeine consumers. After being primed for individualism, European Americans who read a health message that focused on the personal self were more likely to accept the message-they found it more persuasive, believed they were more at risk and engaged in more message-congruent behaviour. These effects were also found among Asian Americans who were primed for collectivism and who read a health message that focused on relational obligations. The findings point to the importance of investigating the role of situational cues in persuasive effects of health messages and suggest that matching content to primed frame consistent with the chronic frame may be a way to know what to match messages to.

  11. Comment ameliorer la selection et le traitement des messages verbaux? (How to Improve the Selection and Processing of Verbal Messages)

    Rivenez, Marie; Darwin, Chris; Guillaume, Anne

    2005-01-01

    L'objectif de cette recherche est d'ameliorer la selection des messages verbaux. Nous cherchons a determiner les facteurs influencant le traitement d'un message verbal lorsque l'attention est portee sur un autre message...

  12. Fabrication of seamless calandria tubes by cold pilgering route using 3-pass and 2-pass schedules

    Saibaba, N.

    2008-12-01

    Calandria tube is a large diameter, extremely thin walled zirconium alloy tube which has diameter to wall thickness ratio as high as 90-95. Such tubes are conventionally produced by the 'welded route', which involves extrusion of slabs followed by a series of hot and cold rolling passes, intermediate anneals, press forming of sheets into circular shape and closing the gap by TIG welding. Though pilgering is a well established process for the fabrication of seamless tubes, production of extremely thin walled tubes offers several challenges during pilgering. Nuclear fuel complex (NFC), Hyderabad, has successfully developed a process for the production of Zircaloy-4 calandria tubes by adopting the 'seamless route' which involves hot extrusion of mother blanks followed by three-pass pilgering or two-pass pilgering schedules. This paper deals with standardization of the seamless route processes for fabrication of calandria tubes, comparison between the tubes produced by 2-pass and 3-pass pilgering schedules, role of ultrasonic test charts for control of process parameters, development of new testing methods for burst testing and other properties.

  13. Fabrication of seamless calandria tubes by cold pilgering route using 3-pass and 2-pass schedules

    Saibaba, N.

    2008-01-01

    Calandria tube is a large diameter, extremely thin walled zirconium alloy tube which has diameter to wall thickness ratio as high as 90-95. Such tubes are conventionally produced by the 'welded route', which involves extrusion of slabs followed by a series of hot and cold rolling passes, intermediate anneals, press forming of sheets into circular shape and closing the gap by TIG welding. Though pilgering is a well established process for the fabrication of seamless tubes, production of extremely thin walled tubes offers several challenges during pilgering. Nuclear fuel complex (NFC), Hyderabad, has successfully developed a process for the production of Zircaloy-4 calandria tubes by adopting the 'seamless route' which involves hot extrusion of mother blanks followed by three-pass pilgering or two-pass pilgering schedules. This paper deals with standardization of the seamless route processes for fabrication of calandria tubes, comparison between the tubes produced by 2-pass and 3-pass pilgering schedules, role of ultrasonic test charts for control of process parameters, development of new testing methods for burst testing and other properties

  14. Common pass decentered annular ring resonator

    Holmes, D. A.; Waite, T. R.

    1985-04-30

    An optical resonator having an annular cylindrical gain region for use in a chemical laser or the like in which two ring-shaped mirrors having substantially conical reflecting surfaces are spaced apart along a common axis of revolution of the respective conical surfaces. A central conical mirror reflects incident light directed along said axis radially outwardly to the reflecting surface of a first one of the ring-shaped mirrors. The radial light rays are reflected by the first ring mirror to the second ring mirror within an annular cylindrical volume concentric with said common axis and forming a gain region. Light rays impinging on the second ring mirror are reflected to diametrically opposite points on the same conical mirror surfaces and back to the first ring mirror through the same annular cylindrical volume. The return rays are then reflected by the conical mirror surface of the first ring mirror back to the central conical mirror. The mirror surfaces are angled such that the return rays are reflected back along the common axis by the central mirror in a concentric annular cylindrical volume. A scraper mirror having a central opening centered on said axis and an offset opening reflects all but the rays passing through the two openings in an output beam. The rays passing through the second opening are reflected back through the first opening to provide feedback.

  15. Comparison of cryogenic low-pass filters

    Thalmann, M.; Pernau, H.-F.; Strunk, C.; Scheer, E.; Pietsch, T.

    2017-11-01

    Low-temperature electronic transport measurements with high energy resolution require both effective low-pass filtering of high-frequency input noise and an optimized thermalization of the electronic system of the experiment. In recent years, elaborate filter designs have been developed for cryogenic low-level measurements, driven by the growing interest in fundamental quantum-physical phenomena at energy scales corresponding to temperatures in the few millikelvin regime. However, a single filter concept is often insufficient to thermalize the electronic system to the cryogenic bath and eliminate spurious high frequency noise. Moreover, the available concepts often provide inadequate filtering to operate at temperatures below 10 mK, which are routinely available now in dilution cryogenic systems. Herein we provide a comprehensive analysis of commonly used filter types, introduce a novel compact filter type based on ferrite compounds optimized for the frequency range above 20 GHz, and develop an improved filtering scheme providing adaptable broad-band low-pass characteristic for cryogenic low-level and quantum measurement applications at temperatures down to few millikelvin.

  16. Comparison of cryogenic low-pass filters.

    Thalmann, M; Pernau, H-F; Strunk, C; Scheer, E; Pietsch, T

    2017-11-01

    Low-temperature electronic transport measurements with high energy resolution require both effective low-pass filtering of high-frequency input noise and an optimized thermalization of the electronic system of the experiment. In recent years, elaborate filter designs have been developed for cryogenic low-level measurements, driven by the growing interest in fundamental quantum-physical phenomena at energy scales corresponding to temperatures in the few millikelvin regime. However, a single filter concept is often insufficient to thermalize the electronic system to the cryogenic bath and eliminate spurious high frequency noise. Moreover, the available concepts often provide inadequate filtering to operate at temperatures below 10 mK, which are routinely available now in dilution cryogenic systems. Herein we provide a comprehensive analysis of commonly used filter types, introduce a novel compact filter type based on ferrite compounds optimized for the frequency range above 20 GHz, and develop an improved filtering scheme providing adaptable broad-band low-pass characteristic for cryogenic low-level and quantum measurement applications at temperatures down to few millikelvin.

  17. Massively parallel multicanonical simulations

    Gross, Jonathan; Zierenberg, Johannes; Weigel, Martin; Janke, Wolfhard

    2018-03-01

    Generalized-ensemble Monte Carlo simulations such as the multicanonical method and similar techniques are among the most efficient approaches for simulations of systems undergoing discontinuous phase transitions or with rugged free-energy landscapes. As Markov chain methods, they are inherently serial computationally. It was demonstrated recently, however, that a combination of independent simulations that communicate weight updates at variable intervals allows for the efficient utilization of parallel computational resources for multicanonical simulations. Implementing this approach for the many-thread architecture provided by current generations of graphics processing units (GPUs), we show how it can be efficiently employed with of the order of 104 parallel walkers and beyond, thus constituting a versatile tool for Monte Carlo simulations in the era of massively parallel computing. We provide the fully documented source code for the approach applied to the paradigmatic example of the two-dimensional Ising model as starting point and reference for practitioners in the field.

  18. An experimental evaluation of multi-pass solar air heaters

    Satcunanathan, S.; Persad, P.

    1980-12-01

    Three collectors of identical dimensions but operating in the single-pass, two-pass and three-pass modes were tested simultaneously under ambient conditions. It was found that the two-pass air heater was consistently better than the single-pass air heater over the day for the range of mass flow rates considered. It was also found that at a mass flow rate of 0.0095 kg s/sup -1/ m/sup -2/, the thermal performances of the two-pass and three-pass collectors were identical, but at higher flow rates the two-pass collector was superior to the three-pass collector, the superiority decreasing with increasing mass flow rate.

  19. SPINning parallel systems software

    Matlin, O.S.; Lusk, E.; McCune, W.

    2002-01-01

    We describe our experiences in using Spin to verify parts of the Multi Purpose Daemon (MPD) parallel process management system. MPD is a distributed collection of processes connected by Unix network sockets. MPD is dynamic processes and connections among them are created and destroyed as MPD is initialized, runs user processes, recovers from faults, and terminates. This dynamic nature is easily expressible in the Spin/Promela framework but poses performance and scalability challenges. We present here the results of expressing some of the parallel algorithms of MPD and executing both simulation and verification runs with Spin

  20. Parallel programming with Python

    Palach, Jan

    2014-01-01

    A fast, easy-to-follow and clear tutorial to help you develop Parallel computing systems using Python. Along with explaining the fundamentals, the book will also introduce you to slightly advanced concepts and will help you in implementing these techniques in the real world. If you are an experienced Python programmer and are willing to utilize the available computing resources by parallelizing applications in a simple way, then this book is for you. You are required to have a basic knowledge of Python development to get the most of this book.

  1. Anxiety, Construct Differentiation, and Message Production.

    Shepherd, Gregory J.; Condra, Mollie B.

    1989-01-01

    Examines the nature of the construct differentiation/anxiety relationship in light of messages produced. Considers recent and complex conceptualizations of social-cognitive development and anxiety. Finds no significant relationship between state anxiety and construct differentiation. (MM)

  2. MORPHOLOGICAL STRATEGIES IN TEXT MESSAGING AMONG ...

    Text messaging is the application of abridged morphological forms in order ... the emergence of the Global System for Mobile Communication (GSM) in the world. ... Our thesis statement is that these morphological patterns as used in SMS are ...

  3. Safety message broadcast in vehicular networks

    Bi, Yuanguo; Zhuang, Weihua; Zhao, Hai

    2017-01-01

    This book presents the current research on safety message dissemination in vehicular networks, covering medium access control and relay selection for multi-hop safety message broadcast. Along with an overall overview of the architecture, characteristics, and applications of vehicular networks, the authors discuss the challenging issues in the research on performance improvement for safety applications, and provide a comprehensive review of the research literature. A cross layer broadcast protocol is included to support efficient safety message broadcast by jointly considering geographical location, physical-layer channel condition, and moving velocity of vehicles in the highway scenario. To further support multi-hop safety message broadcast in a complex road layout, the authors propose an urban multi-hop broadcast protocol that utilizes a novel forwarding node selection scheme. Additionally, a busy tone based medium access control scheme is designed to provide strict priority to safety applications in vehicle...

  4. Photometric requirements for portable changeable message signs.

    2001-09-01

    This project reviewed the performance of pchangeable message signs (PCMSs) and developed photometric standards to establish performance requirements. In addition, researchers developed photometric test methods and recommended them for use in evaluati...

  5. Wyoming CV Pilot Traveler Information Message Sample

    Department of Transportation — This dataset contains a sample of the sanitized Traveler Information Messages (TIM) being generated by the Wyoming Connected Vehicle (CV) Pilot. The full set of TIMs...

  6. A comprehensive study of MPI parallelism in three-dimensional discrete element method (DEM) simulation of complex-shaped granular particles

    Yan, Beichuan; Regueiro, Richard A.

    2018-02-01

    A three-dimensional (3D) DEM code for simulating complex-shaped granular particles is parallelized using message-passing interface (MPI). The concepts of link-block, ghost/border layer, and migration layer are put forward for design of the parallel algorithm, and theoretical scalability function of 3-D DEM scalability and memory usage is derived. Many performance-critical implementation details are managed optimally to achieve high performance and scalability, such as: minimizing communication overhead, maintaining dynamic load balance, handling particle migrations across block borders, transmitting C++ dynamic objects of particles between MPI processes efficiently, eliminating redundant contact information between adjacent MPI processes. The code executes on multiple US Department of Defense (DoD) supercomputers and tests up to 2048 compute nodes for simulating 10 million three-axis ellipsoidal particles. Performance analyses of the code including speedup, efficiency, scalability, and granularity across five orders of magnitude of simulation scale (number of particles) are provided, and they demonstrate high speedup and excellent scalability. It is also discovered that communication time is a decreasing function of the number of compute nodes in strong scaling measurements. The code's capability of simulating a large number of complex-shaped particles on modern supercomputers will be of value in both laboratory studies on micromechanical properties of granular materials and many realistic engineering applications involving granular materials.

  7. Introducing heterogeneity in Monte Carlo models for risk assessments of high-level nuclear waste. A parallel implementation of the MLCRYSTAL code

    Andersson, M.

    1996-09-01

    We have introduced heterogeneity to an existing model as a special feature and simultaneously extended the model from 1D to 3D. Briefly, the code generates stochastic fractures in a given geosphere. These fractures are connected in series to form one pathway for radionuclide transport from the repository to the biosphere. Rock heterogeneity is realized by simulating physical and chemical properties for each fracture, i.e. these properties vary along the transport pathway (which is an ensemble of all fractures serially connected). In this case, each Monte Carlo simulation involves a set of many thousands of realizations, one for each pathway. Each pathway can be formed by approx. 100 fractures. This means that for a Monte Carlo simulation of 1000 realizations, we need to perform a total of 100,000 simulations. Therefore the introduction of heterogeneity has increased the CPU demands by two orders of magnitude. To overcome the demand for CPU, the program, MLCRYSTAL, has been implemented in a parallel workstation environment using the MPI, Message Passing Interface, and later on ported to an IBM-SP2 parallel supercomputer. The program is presented here and a preliminary set of results is given with the conclusions that can be drawn. 3 refs, 12 figs.

  8. Introducing heterogeneity in Monte Carlo models for risk assessments of high-level nuclear waste. A parallel implementation of the MLCRYSTAL code

    Andersson, M.

    1996-09-01

    We have introduced heterogeneity to an existing model as a special feature and simultaneously extended the model from 1D to 3D. Briefly, the code generates stochastic fractures in a given geosphere. These fractures are connected in series to form one pathway for radionuclide transport from the repository to the biosphere. Rock heterogeneity is realized by simulating physical and chemical properties for each fracture, i.e. these properties vary along the transport pathway (which is an ensemble of all fractures serially connected). In this case, each Monte Carlo simulation involves a set of many thousands of realizations, one for each pathway. Each pathway can be formed by approx. 100 fractures. This means that for a Monte Carlo simulation of 1000 realizations, we need to perform a total of 100,000 simulations. Therefore the introduction of heterogeneity has increased the CPU demands by two orders of magnitude. To overcome the demand for CPU, the program, MLCRYSTAL, has been implemented in a parallel workstation environment using the MPI, Message Passing Interface, and later on ported to an IBM-SP2 parallel supercomputer. The program is presented here and a preliminary set of results is given with the conclusions that can be drawn. 3 refs, 12 figs

  9. Programs Lucky and LuckyC - 3D parallel transport codes for the multi-group transport equation solution for XYZ geometry by Pm Sn method

    Moriakov, A.; Vasyukhno, V.; Netecha, M.; Khacheresov, G.

    2003-01-01

    Powerful supercomputers are available today. MBC-1000M is one of Russian supercomputers that may be used by distant way access. Programs LUCKY and LUCKY C were created to work for multi-processors systems. These programs have algorithms created especially for these computers and used MPI (message passing interface) service for exchanges between processors. LUCKY may resolved shielding tasks by multigroup discreet ordinate method. LUCKY C may resolve critical tasks by same method. Only XYZ orthogonal geometry is available. Under little space steps to approximate discreet operator this geometry may be used as universal one to describe complex geometrical structures. Cross section libraries are used up to P8 approximation by Legendre polynomials for nuclear data in GIT format. Programming language is Fortran-90. 'Vector' processors may be used that lets get a time profit up to 30 times. But unfortunately MBC-1000M has not these processors. Nevertheless sufficient value for efficiency of parallel calculations was obtained under 'space' (LUCKY) and 'space and energy' (LUCKY C ) paralleling. AUTOCAD program is used to control geometry after a treatment of input data. Programs have powerful geometry module, it is a beautiful tool to achieve any geometry. Output results may be processed by graphic programs on personal computer. (authors)

  10. Getting the message across: age differences in the positive and negative framing of health care messages.

    Shamaskin, Andrea M; Mikels, Joseph A; Reed, Andrew E

    2010-09-01

    Although valenced health care messages influence impressions, memory, and behavior (Levin, Schneider, & Gaeth, 1998) and the processing of valenced information changes with age (Carstensen & Mikels, 2005), these 2 lines of research have thus far been disconnected. This study examined impressions of, and memory for, positively and negatively framed health care messages that were presented in pamphlets to 25 older adults and 24 younger adults. Older adults relative to younger adults rated positive pamphlets more informative than negative pamphlets and remembered a higher proportion of positive to negative messages. However, older adults misremembered negative messages to be positive. These findings demonstrate the age-related positivity effect in health care messages with promise as to the persuasive nature and lingering effects of positive messages. (c) 2010 APA, all rights reserved.

  11. Fear, threat and efficacy in threat appeals: message involvement as a key mediator to message acceptance.

    Cauberghe, Verolien; De Pelsmacker, Patrick; Janssens, Wim; Dens, Nathalie

    2009-03-01

    In a sample of 170 youngsters, the effect of two versions of a public service announcement (PSA) threat appeal against speeding, placed in four different contexts, on evoked fear, perceived threat (severity and probability of occurrence), perceived response efficacy and self-efficacy, message involvement and anti-speeding attitude and anti-speeding intention is investigated. Evoked fear and perceived threat and efficacy independently influence message involvement. Message involvement is a full mediator between evoked fear, perceived threat and efficacy perception on the one hand, and attitudes towards the message and behavioral intention to accept the message on the other. Speeding experience has a significantly negative impact on anti-speeding attitudes. Message and medium context threat levels and context thematic congruency have a significant effect on evoked fear and to a lesser extent on perceived threat.

  12. Message-driven factors influencing opening and forwarding of mobile advertising messages

    Sanz Blas, Silvia; Ruiz Mafé, Carla; Martí Parreño, José

    2015-01-01

    This work aims to analyse the influence of message-driven factors -informativeness, ubiquity, frequency and personalization- on consumer attitude and behaviour -opening and forwarding- towards mobile advertising messages. A theoretical model was developed and empirically tested using a sample of 355 Spanish teenager mobile users. Findings show that frequency is the dimension accounting the most -and significantly- of the four message-driven factors analysed on attitude toward mobile advertisi...

  13. When message-frame fits salient cultural-frame, messages feel more persuasive

    Uskul, Ayse K.; Oyserman, Daphna

    2010-01-01

    The present study examines the persuasive effects of tailored health messages comparing those tailored to match (versus not match) both chronic cultural frame and momentarily salient cultural frame. Evidence from two studies (Study 1: n = 72 European Americans; Study 2: n = 48 Asian Americans) supports the hypothesis that message persuasiveness increases when chronic cultural frame, health message tailoring and momentarily salient cultural frame all match. The hypothesis was tested using a me...

  14. Persuasive messages. Development of persuasive messages may help increase mothers' compliance of their children's immunization schedule.

    Gore, P; Madhavan, S; Curry, D; McClurg, G; Castiglia, M; Rosenbluth, S A; Smego, R A

    1998-01-01

    Effective immunization campaigns can be designed by determining which persuasion strategy is most effective in attracting the attention of mothers of preschoolers. The authors assess the impact of three persuasional strategies: fear-arousal, motherhood-arousal, and rational messages, on mothers of preschoolers who are late for their immunizations. The fear-arousal message was found to be most effective, followed by the motherhood-arousal, and then the rational message, in attracting mothers' attention to their child's immunization status.

  15. Improving Type Error Messages in OCaml

    Arthur Charguéraud

    2015-12-01

    Full Text Available Cryptic type error messages are a major obstacle to learning OCaml or other ML-based languages. In many cases, error messages cannot be interpreted without a sufficiently-precise model of the type inference algorithm. The problem of improving type error messages in ML has received quite a bit of attention over the past two decades, and many different strategies have been considered. The challenge is not only to produce error messages that are both sufficiently concise and systematically useful to the programmer, but also to handle a full-blown programming language and to cope with large-sized programs efficiently. In this work, we present a modification to the traditional ML type inference algorithm implemented in OCaml that, by significantly reducing the left-to-right bias, allows us to report error messages that are more helpful to the programmer. Our algorithm remains fully predictable and continues to produce fairly concise error messages that always help making some progress towards fixing the code. We implemented our approach as a patch to the OCaml compiler in just a few hundred lines of code. We believe that this patch should benefit not just to beginners, but also to experienced programs developing large-scale OCaml programs.

  16. Recent computer attacks via Instant Messaging

    IT Department

    2008-01-01

    Be cautious of any unexpected messages containing web links even if they appear to come from known contacts. If you happen to click on such a link and if your permission is requested to run or install software, always decline it. Several computers at CERN have recently been broken into by attackers who have tricked users of Instant Messaging applications (e.g. MSN, Yahoo Messenger, etc.) into clicking on web links which appeared to come from known contacts. The links appeared to be photos from ‘friends’ and requested software to be installed. In practice, attacker software was installed and the messages did not come from real contacts. In the past such fake messages were mainly sent by email but now a wider range of applications are being targeted, including Instant Messaging. Cybercriminals are making growing use of fake messages to try to trick you into clicking on Web links which will help them to install malicious software on your computer. Anti-virus software cann...

  17. Gender messages in contemporary popular Malay songs

    Collin Jerome

    2013-07-01

    Full Text Available Gender has been an important area of research in the field of popular music studies. Numerous scholars have found that contemporary popular music functions as a locus of diverse constructions and expressions of gender. While most studies focus on content analyses of popular music, there is still a need for more research on audience’s perception of popular music’s messages. This study examined adult Malay listeners’ perceptions of gender messages in contemporary Malay songs. A total of 16 contemporary Malay songs were analysed using Fairclough’s (1992 method of text analysis. The content of the songs that conveyed messages about gender were the basis for analysis. The results showed that the messages revolve mainly around socially constructed gender roles and expectations in romantic relationships. Gender stereotypes are also used in the songs to reinforce men’s and women’s roles in romantic relationships. The results also showed that, while listeners acknowledge the songs’ messages about gender, their own perceptions of gender and what it means to be a gendered being in today’s world are neither represented nor discussed fully in the songs analysed. It is hoped the findings from this, particularly the mismatch between projected and perceived notions of gender, contribute to the field of popular Malay music studies in particular, and popular music studies in general where gender messages in popular songs and their influence on listeners’ perceptions of their own gender is concerned.

  18. Quantum signature scheme for known quantum messages

    Kim, Taewan; Lee, Hyang-Sook

    2015-01-01

    When we want to sign a quantum message that we create, we can use arbitrated quantum signature schemes which are possible to sign for not only known quantum messages but also unknown quantum messages. However, since the arbitrated quantum signature schemes need the help of a trusted arbitrator in each verification of the signature, it is known that the schemes are not convenient in practical use. If we consider only known quantum messages such as the above situation, there can exist a quantum signature scheme with more efficient structure. In this paper, we present a new quantum signature scheme for known quantum messages without the help of an arbitrator. Differing from arbitrated quantum signature schemes based on the quantum one-time pad with the symmetric key, since our scheme is based on quantum public-key cryptosystems, the validity of the signature can be verified by a receiver without the help of an arbitrator. Moreover, we show that our scheme provides the functions of quantum message integrity, user authentication and non-repudiation of the origin as in digital signature schemes. (paper)

  19. Expressing Parallelism with ROOT

    Piparo, D. [CERN; Tejedor, E. [CERN; Guiraud, E. [CERN; Ganis, G. [CERN; Mato, P. [CERN; Moneta, L. [CERN; Valls Pla, X. [CERN; Canal, P. [Fermilab

    2017-11-22

    The need for processing the ever-increasing amount of data generated by the LHC experiments in a more efficient way has motivated ROOT to further develop its support for parallelism. Such support is being tackled both for shared-memory and distributed-memory environments. The incarnations of the aforementioned parallelism are multi-threading, multi-processing and cluster-wide executions. In the area of multi-threading, we discuss the new implicit parallelism and related interfaces, as well as the new building blocks to safely operate with ROOT objects in a multi-threaded environment. Regarding multi-processing, we review the new MultiProc framework, comparing it with similar tools (e.g. multiprocessing module in Python). Finally, as an alternative to PROOF for cluster-wide executions, we introduce the efforts on integrating ROOT with state-of-the-art distributed data processing technologies like Spark, both in terms of programming model and runtime design (with EOS as one of the main components). For all the levels of parallelism, we discuss, based on real-life examples and measurements, how our proposals can increase the productivity of scientists.

  20. Expressing Parallelism with ROOT

    Piparo, D.; Tejedor, E.; Guiraud, E.; Ganis, G.; Mato, P.; Moneta, L.; Valls Pla, X.; Canal, P.

    2017-10-01

    The need for processing the ever-increasing amount of data generated by the LHC experiments in a more efficient way has motivated ROOT to further develop its support for parallelism. Such support is being tackled both for shared-memory and distributed-memory environments. The incarnations of the aforementioned parallelism are multi-threading, multi-processing and cluster-wide executions. In the area of multi-threading, we discuss the new implicit parallelism and related interfaces, as well as the new building blocks to safely operate with ROOT objects in a multi-threaded environment. Regarding multi-processing, we review the new MultiProc framework, comparing it with similar tools (e.g. multiprocessing module in Python). Finally, as an alternative to PROOF for cluster-wide executions, we introduce the efforts on integrating ROOT with state-of-the-art distributed data processing technologies like Spark, both in terms of programming model and runtime design (with EOS as one of the main components). For all the levels of parallelism, we discuss, based on real-life examples and measurements, how our proposals can increase the productivity of scientists.