WorldWideScience

Sample records for hybrid parallel computer

  1. Hybrid parallel computing architecture for multiview phase shifting

    Science.gov (United States)

    Zhong, Kai; Li, Zhongwei; Zhou, Xiaohui; Shi, Yusheng; Wang, Congjun

    2014-11-01

    The multiview phase-shifting method shows its powerful capability in achieving high resolution three-dimensional (3-D) shape measurement. Unfortunately, this ability results in very high computation costs and 3-D computations have to be processed offline. To realize real-time 3-D shape measurement, a hybrid parallel computing architecture is proposed for multiview phase shifting. In this architecture, the central processing unit can co-operate with the graphic processing unit (GPU) to achieve hybrid parallel computing. The high computation cost procedures, including lens distortion rectification, phase computation, correspondence, and 3-D reconstruction, are implemented in GPU, and a three-layer kernel function model is designed to simultaneously realize coarse-grained and fine-grained paralleling computing. Experimental results verify that the developed system can perform 50 fps (frame per second) real-time 3-D measurement with 260 K 3-D points per frame. A speedup of up to 180 times is obtained for the performance of the proposed technique using a NVIDIA GT560Ti graphics card rather than a sequential C in a 3.4 GHZ Inter Core i7 3770.

  2. A hybrid method for the parallel computation of Green's functions

    DEFF Research Database (Denmark)

    Petersen, Dan Erik; Li, Song; Stokbro, Kurt

    2009-01-01

    of the large number of times this calculation needs to be performed, this is computationally very expensive even on supercomputers. The classical approach is based on recurrence formulas which cannot be efficiently parallelized. This practically prevents the solution of large problems with hundreds...... of thousands of atoms. We propose new recurrences for a general class of sparse matrices to calculate Green's and lesser Green's function matrices which extend formulas derived by Takahashi and others. We show that these recurrences may lead to a dramatically reduced computational cost because they only...... require computing a small number of entries of the inverse matrix. Then. we propose a parallelization strategy for block tridiagonal matrices which involves a combination of Schur complement calculations and cyclic reduction. It achieves good scalability even on problems of modest size....

  3. A hybrid method for the parallel computation of Green's functions

    International Nuclear Information System (INIS)

    Petersen, Dan Erik; Li Song; Stokbro, Kurt; Sorensen, Hans Henrik B.; Hansen, Per Christian; Skelboe, Stig; Darve, Eric

    2009-01-01

    Quantum transport models for nanodevices using the non-equilibrium Green's function method require the repeated calculation of the block tridiagonal part of the Green's and lesser Green's function matrices. This problem is related to the calculation of the inverse of a sparse matrix. Because of the large number of times this calculation needs to be performed, this is computationally very expensive even on supercomputers. The classical approach is based on recurrence formulas which cannot be efficiently parallelized. This practically prevents the solution of large problems with hundreds of thousands of atoms. We propose new recurrences for a general class of sparse matrices to calculate Green's and lesser Green's function matrices which extend formulas derived by Takahashi and others. We show that these recurrences may lead to a dramatically reduced computational cost because they only require computing a small number of entries of the inverse matrix. Then, we propose a parallelization strategy for block tridiagonal matrices which involves a combination of Schur complement calculations and cyclic reduction. It achieves good scalability even on problems of modest size.

  4. Parallel computations

    CERN Document Server

    1982-01-01

    Parallel Computations focuses on parallel computation, with emphasis on algorithms used in a variety of numerical and physical applications and for many different types of parallel computers. Topics covered range from vectorization of fast Fourier transforms (FFTs) and of the incomplete Cholesky conjugate gradient (ICCG) algorithm on the Cray-1 to calculation of table lookups and piecewise functions. Single tridiagonal linear systems and vectorized computation of reactive flow are also discussed.Comprised of 13 chapters, this volume begins by classifying parallel computers and describing techn

  5. Parallel computation

    International Nuclear Information System (INIS)

    Jejcic, A.; Maillard, J.; Maurel, G.; Silva, J.; Wolff-Bacha, F.

    1997-01-01

    The work in the field of parallel processing has developed as research activities using several numerical Monte Carlo simulations related to basic or applied current problems of nuclear and particle physics. For the applications utilizing the GEANT code development or improvement works were done on parts simulating low energy physical phenomena like radiation, transport and interaction. The problem of actinide burning by means of accelerators was approached using a simulation with the GEANT code. A program of neutron tracking in the range of low energies up to the thermal region has been developed. It is coupled to the GEANT code and permits in a single pass the simulation of a hybrid reactor core receiving a proton burst. Other works in this field refers to simulations for nuclear medicine applications like, for instance, development of biological probes, evaluation and characterization of the gamma cameras (collimators, crystal thickness) as well as the method for dosimetric calculations. Particularly, these calculations are suited for a geometrical parallelization approach especially adapted to parallel machines of the TN310 type. Other works mentioned in the same field refer to simulation of the electron channelling in crystals and simulation of the beam-beam interaction effect in colliders. The GEANT code was also used to simulate the operation of germanium detectors designed for natural and artificial radioactivity monitoring of environment

  6. Parallel Computing Characteristics of CUPID code under MPI and Hybrid environment

    Energy Technology Data Exchange (ETDEWEB)

    Lee, Jae Ryong; Yoon, Han Young [Korea Atomic Energy Research Institute, Daejeon (Korea, Republic of); Jeon, Byoung Jin; Choi, Hyoung Gwon [Seoul National Univ. of Science and Technology, Seoul (Korea, Republic of)

    2014-05-15

    In this paper, a characteristic of parallel algorithm is presented for solving an elliptic type equation of CUPID via domain decomposition method using the MPI and the parallel performance is estimated in terms of a scalability which shows the speedup ratio. In addition, the time-consuming pattern of major subroutines is studied. Two different grid systems are taken into account: 40,000 meshes for coarse system and 320,000 meshes for fine system. Since the matrix of the CUPID code differs according to whether the flow is single-phase or two-phase, the effect of matrix shape is evaluated. Finally, the effect of the preconditioner for matrix solver is also investigated. Finally, the hybrid (OpenMP+MPI) parallel algorithm is introduced and discussed in detail for solving pressure solver. Component-scale thermal-hydraulics code, CUPID has been developed for two-phase flow analysis, which adopts a three-dimensional, transient, three-field model, and parallelized to fulfill a recent demand for long-transient and highly resolved multi-phase flow behavior. In this study, the parallel performance of the CUPID code was investigated in terms of scalability. The CUPID code was parallelized with domain decomposition method. The MPI library was adopted to communicate the information at the neighboring domain. For managing the sparse matrix effectively, the CSR storage format is used. To take into account the characteristics of the pressure matrix which turns to be asymmetric for two-phase flow, both single-phase and two-phase calculations were run. In addition, the effect of the matrix size and preconditioning was also investigated. The fine mesh calculation shows better scalability than the coarse mesh because the number of coarse mesh does not need to decompose the computational domain excessively. The fine mesh can be present good scalability when dividing geometry with considering the ratio between computation and communication time. For a given mesh, single-phase flow

  7. Hybrid MPI/OpenMP parallelization of the explicit Volterra integral equation solver for multi-core computer architectures

    KAUST Repository

    Al Jarro, Ahmed

    2011-08-01

    A hybrid MPI/OpenMP scheme for efficiently parallelizing the explicit marching-on-in-time (MOT)-based solution of the time-domain volume (Volterra) integral equation (TD-VIE) is presented. The proposed scheme equally distributes tested field values and operations pertinent to the computation of tested fields among the nodes using the MPI standard; while the source field values are stored in all nodes. Within each node, OpenMP standard is used to further accelerate the computation of the tested fields. Numerical results demonstrate that the proposed parallelization scheme scales well for problems involving three million or more spatial discretization elements. © 2011 IEEE.

  8. Parallel computing works!

    CERN Document Server

    Fox, Geoffrey C; Messina, Guiseppe C

    2014-01-01

    A clear illustration of how parallel computers can be successfully appliedto large-scale scientific computations. This book demonstrates how avariety of applications in physics, biology, mathematics and other scienceswere implemented on real parallel computers to produce new scientificresults. It investigates issues of fine-grained parallelism relevant forfuture supercomputers with particular emphasis on hypercube architecture. The authors describe how they used an experimental approach to configuredifferent massively parallel machines, design and implement basic systemsoftware, and develop

  9. Parallel computing works

    Energy Technology Data Exchange (ETDEWEB)

    1991-10-23

    An account of the Caltech Concurrent Computation Program (C{sup 3}P), a five year project that focused on answering the question: Can parallel computers be used to do large-scale scientific computations '' As the title indicates, the question is answered in the affirmative, by implementing numerous scientific applications on real parallel computers and doing computations that produced new scientific results. In the process of doing so, C{sup 3}P helped design and build several new computers, designed and implemented basic system software, developed algorithms for frequently used mathematical computations on massively parallel machines, devised performance models and measured the performance of many computers, and created a high performance computing facility based exclusively on parallel computers. While the initial focus of C{sup 3}P was the hypercube architecture developed by C. Seitz, many of the methods developed and lessons learned have been applied successfully on other massively parallel architectures.

  10. Parallelism in matrix computations

    CERN Document Server

    Gallopoulos, Efstratios; Sameh, Ahmed H

    2016-01-01

    This book is primarily intended as a research monograph that could also be used in graduate courses for the design of parallel algorithms in matrix computations. It assumes general but not extensive knowledge of numerical linear algebra, parallel architectures, and parallel programming paradigms. The book consists of four parts: (I) Basics; (II) Dense and Special Matrix Computations; (III) Sparse Matrix Computations; and (IV) Matrix functions and characteristics. Part I deals with parallel programming paradigms and fundamental kernels, including reordering schemes for sparse matrices. Part II is devoted to dense matrix computations such as parallel algorithms for solving linear systems, linear least squares, the symmetric algebraic eigenvalue problem, and the singular-value decomposition. It also deals with the development of parallel algorithms for special linear systems such as banded ,Vandermonde ,Toeplitz ,and block Toeplitz systems. Part III addresses sparse matrix computations: (a) the development of pa...

  11. Practical parallel computing

    CERN Document Server

    Morse, H Stephen

    1994-01-01

    Practical Parallel Computing provides information pertinent to the fundamental aspects of high-performance parallel processing. This book discusses the development of parallel applications on a variety of equipment.Organized into three parts encompassing 12 chapters, this book begins with an overview of the technology trends that converge to favor massively parallel hardware over traditional mainframes and vector machines. This text then gives a tutorial introduction to parallel hardware architectures. Other chapters provide worked-out examples of programs using several parallel languages. Thi

  12. Algorithmically specialized parallel computers

    CERN Document Server

    Snyder, Lawrence; Gannon, Dennis B

    1985-01-01

    Algorithmically Specialized Parallel Computers focuses on the concept and characteristics of an algorithmically specialized computer.This book discusses the algorithmically specialized computers, algorithmic specialization using VLSI, and innovative architectures. The architectures and algorithms for digital signal, speech, and image processing and specialized architectures for numerical computations are also elaborated. Other topics include the model for analyzing generalized inter-processor, pipelined architecture for search tree maintenance, and specialized computer organization for raster

  13. Parallel reservoir simulator computations

    International Nuclear Information System (INIS)

    Hemanth-Kumar, K.; Young, L.C.

    1995-01-01

    The adaptation of a reservoir simulator for parallel computations is described. The simulator was originally designed for vector processors. It performs approximately 99% of its calculations in vector/parallel mode and relative to scalar calculations it achieves speedups of 65 and 81 for black oil and EOS simulations, respectively on the CRAY C-90

  14. Implementation of a cell-wise block-Gauss-Seidel iterative method for SN transport on a hybrid parallel computer architecture

    International Nuclear Information System (INIS)

    Rosa, Massimiliano; Warsa, James S.; Perks, Michael

    2011-01-01

    We have implemented a cell-wise, block-Gauss-Seidel (bGS) iterative algorithm, for the solution of the S_n transport equations on the Roadrunner hybrid, parallel computer architecture. A compute node of this massively parallel machine comprises AMD Opteron cores that are linked to a Cell Broadband Engine™ (Cell/B.E.)"1. LAPACK routines have been ported to the Cell/B.E. in order to make use of its parallel Synergistic Processing Elements (SPEs). The bGS algorithm is based on the LU factorization and solution of a linear system that couples the fluxes for all S_n angles and energy groups on a mesh cell. For every cell of a mesh that has been parallel decomposed on the higher-level Opteron processors, a linear system is transferred to the Cell/B.E. and the parallel LAPACK routines are used to compute a solution, which is then transferred back to the Opteron, where the rest of the computations for the S_n transport problem take place. Compared to standard parallel machines, a hundred-fold speedup of the bGS was observed on the hybrid Roadrunner architecture. Numerical experiments with strong and weak parallel scaling demonstrate the bGS method is viable and compares favorably to full parallel sweeps (FPS) on two-dimensional, unstructured meshes when it is applied to optically thick, multi-material problems. As expected, however, it is not as efficient as FPS in optically thin problems. (author)

  15. Hybrid MPI/OpenMP parallelization of the explicit Volterra integral equation solver for multi-core computer architectures

    KAUST Repository

    Al Jarro, Ahmed; Bagci, Hakan

    2011-01-01

    A hybrid MPI/OpenMP scheme for efficiently parallelizing the explicit marching-on-in-time (MOT)-based solution of the time-domain volume (Volterra) integral equation (TD-VIE) is presented. The proposed scheme equally distributes tested field values

  16. Algorithms for parallel computers

    International Nuclear Information System (INIS)

    Churchhouse, R.F.

    1985-01-01

    Until relatively recently almost all the algorithms for use on computers had been designed on the (usually unstated) assumption that they were to be run on single processor, serial machines. With the introduction of vector processors, array processors and interconnected systems of mainframes, minis and micros, however, various forms of parallelism have become available. The advantage of parallelism is that it offers increased overall processing speed but it also raises some fundamental questions, including: (i) which, if any, of the existing 'serial' algorithms can be adapted for use in the parallel mode. (ii) How close to optimal can such adapted algorithms be and, where relevant, what are the convergence criteria. (iii) How can we design new algorithms specifically for parallel systems. (iv) For multi-processor systems how can we handle the software aspects of the interprocessor communications. Aspects of these questions illustrated by examples are considered in these lectures. (orig.)

  17. GENESIS 1.1: A hybrid-parallel molecular dynamics simulator with enhanced sampling algorithms on multiple computational platforms.

    Science.gov (United States)

    Kobayashi, Chigusa; Jung, Jaewoon; Matsunaga, Yasuhiro; Mori, Takaharu; Ando, Tadashi; Tamura, Koichi; Kamiya, Motoshi; Sugita, Yuji

    2017-09-30

    GENeralized-Ensemble SImulation System (GENESIS) is a software package for molecular dynamics (MD) simulation of biological systems. It is designed to extend limitations in system size and accessible time scale by adopting highly parallelized schemes and enhanced conformational sampling algorithms. In this new version, GENESIS 1.1, new functions and advanced algorithms have been added. The all-atom and coarse-grained potential energy functions used in AMBER and GROMACS packages now become available in addition to CHARMM energy functions. The performance of MD simulations has been greatly improved by further optimization, multiple time-step integration, and hybrid (CPU + GPU) computing. The string method and replica-exchange umbrella sampling with flexible collective variable choice are used for finding the minimum free-energy pathway and obtaining free-energy profiles for conformational changes of a macromolecule. These new features increase the usefulness and power of GENESIS for modeling and simulation in biological research. © 2017 Wiley Periodicals, Inc. © 2017 Wiley Periodicals, Inc.

  18. Parallel Computing in SCALE

    International Nuclear Information System (INIS)

    DeHart, Mark D.; Williams, Mark L.; Bowman, Stephen M.

    2010-01-01

    The SCALE computational architecture has remained basically the same since its inception 30 years ago, although constituent modules and capabilities have changed significantly. This SCALE concept was intended to provide a framework whereby independent codes can be linked to provide a more comprehensive capability than possible with the individual programs - allowing flexibility to address a wide variety of applications. However, the current system was designed originally for mainframe computers with a single CPU and with significantly less memory than today's personal computers. It has been recognized that the present SCALE computation system could be restructured to take advantage of modern hardware and software capabilities, while retaining many of the modular features of the present system. Preliminary work is being done to define specifications and capabilities for a more advanced computational architecture. This paper describes the state of current SCALE development activities and plans for future development. With the release of SCALE 6.1 in 2010, a new phase of evolutionary development will be available to SCALE users within the TRITON and NEWT modules. The SCALE (Standardized Computer Analyses for Licensing Evaluation) code system developed by Oak Ridge National Laboratory (ORNL) provides a comprehensive and integrated package of codes and nuclear data for a wide range of applications in criticality safety, reactor physics, shielding, isotopic depletion and decay, and sensitivity/uncertainty (S/U) analysis. Over the last three years, since the release of version 5.1 in 2006, several important new codes have been introduced within SCALE, and significant advances applied to existing codes. Many of these new features became available with the release of SCALE 6.0 in early 2009. However, beginning with SCALE 6.1, a first generation of parallel computing is being introduced. In addition to near-term improvements, a plan for longer term SCALE enhancement

  19. Applied Parallel Computing Industrial Computation and Optimization

    DEFF Research Database (Denmark)

    Madsen, Kaj; NA NA NA Olesen, Dorte

    Proceedings and the Third International Workshop on Applied Parallel Computing in Industrial Problems and Optimization (PARA96)......Proceedings and the Third International Workshop on Applied Parallel Computing in Industrial Problems and Optimization (PARA96)...

  20. Massive hybrid parallelism for fully implicit multiphysics

    International Nuclear Information System (INIS)

    Gaston, D. R.; Permann, C. J.; Andrs, D.; Peterson, J. W.

    2013-01-01

    As hardware advances continue to modify the supercomputing landscape, traditional scientific software development practices will become more outdated, ineffective, and inefficient. The process of rewriting/retooling existing software for new architectures is a Sisyphean task, and results in substantial hours of development time, effort, and money. Software libraries which provide an abstraction of the resources provided by such architectures are therefore essential if the computational engineering and science communities are to continue to flourish in this modern computing environment. The Multiphysics Object Oriented Simulation Environment (MOOSE) framework enables complex multiphysics analysis tools to be built rapidly by scientists, engineers, and domain specialists, while also allowing them to both take advantage of current HPC architectures, and efficiently prepare for future supercomputer designs. MOOSE employs a hybrid shared-memory and distributed-memory parallel model and provides a complete and consistent interface for creating multiphysics analysis tools. In this paper, a brief discussion of the mathematical algorithms underlying the framework and the internal object-oriented hybrid parallel design are given. Representative massively parallel results from several applications areas are presented, and a brief discussion of future areas of research for the framework are provided. (authors)

  1. Massive hybrid parallelism for fully implicit multiphysics

    Energy Technology Data Exchange (ETDEWEB)

    Gaston, D. R.; Permann, C. J.; Andrs, D.; Peterson, J. W. [Idaho National Laboratory, 2525 N. Fremont Ave., Idaho Falls, ID 83415 (United States)

    2013-07-01

    As hardware advances continue to modify the supercomputing landscape, traditional scientific software development practices will become more outdated, ineffective, and inefficient. The process of rewriting/retooling existing software for new architectures is a Sisyphean task, and results in substantial hours of development time, effort, and money. Software libraries which provide an abstraction of the resources provided by such architectures are therefore essential if the computational engineering and science communities are to continue to flourish in this modern computing environment. The Multiphysics Object Oriented Simulation Environment (MOOSE) framework enables complex multiphysics analysis tools to be built rapidly by scientists, engineers, and domain specialists, while also allowing them to both take advantage of current HPC architectures, and efficiently prepare for future supercomputer designs. MOOSE employs a hybrid shared-memory and distributed-memory parallel model and provides a complete and consistent interface for creating multiphysics analysis tools. In this paper, a brief discussion of the mathematical algorithms underlying the framework and the internal object-oriented hybrid parallel design are given. Representative massively parallel results from several applications areas are presented, and a brief discussion of future areas of research for the framework are provided. (authors)

  2. MASSIVE HYBRID PARALLELISM FOR FULLY IMPLICIT MULTIPHYSICS

    Energy Technology Data Exchange (ETDEWEB)

    Cody J. Permann; David Andrs; John W. Peterson; Derek R. Gaston

    2013-05-01

    As hardware advances continue to modify the supercomputing landscape, traditional scientific software development practices will become more outdated, ineffective, and inefficient. The process of rewriting/retooling existing software for new architectures is a Sisyphean task, and results in substantial hours of development time, effort, and money. Software libraries which provide an abstraction of the resources provided by such architectures are therefore essential if the computational engineering and science communities are to continue to flourish in this modern computing environment. The Multiphysics Object Oriented Simulation Environment (MOOSE) framework enables complex multiphysics analysis tools to be built rapidly by scientists, engineers, and domain specialists, while also allowing them to both take advantage of current HPC architectures, and efficiently prepare for future supercomputer designs. MOOSE employs a hybrid shared-memory and distributed-memory parallel model and provides a complete and consistent interface for creating multiphysics analysis tools. In this paper, a brief discussion of the mathematical algorithms underlying the framework and the internal object-oriented hybrid parallel design are given. Representative massively parallel results from several applications areas are presented, and a brief discussion of future areas of research for the framework are provided.

  3. Parallel Computing for Brain Simulation.

    Science.gov (United States)

    Pastur-Romay, L A; Porto-Pazos, A B; Cedron, F; Pazos, A

    2017-01-01

    The human brain is the most complex system in the known universe, it is therefore one of the greatest mysteries. It provides human beings with extraordinary abilities. However, until now it has not been understood yet how and why most of these abilities are produced. For decades, researchers have been trying to make computers reproduce these abilities, focusing on both understanding the nervous system and, on processing data in a more efficient way than before. Their aim is to make computers process information similarly to the brain. Important technological developments and vast multidisciplinary projects have allowed creating the first simulation with a number of neurons similar to that of a human brain. This paper presents an up-to-date review about the main research projects that are trying to simulate and/or emulate the human brain. They employ different types of computational models using parallel computing: digital models, analog models and hybrid models. This review includes the current applications of these works, as well as future trends. It is focused on various works that look for advanced progress in Neuroscience and still others which seek new discoveries in Computer Science (neuromorphic hardware, machine learning techniques). Their most outstanding characteristics are summarized and the latest advances and future plans are presented. In addition, this review points out the importance of considering not only neurons: Computational models of the brain should also include glial cells, given the proven importance of astrocytes in information processing. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  4. Event monitoring of parallel computations

    Directory of Open Access Journals (Sweden)

    Gruzlikov Alexander M.

    2015-06-01

    Full Text Available The paper considers the monitoring of parallel computations for detection of abnormal events. It is assumed that computations are organized according to an event model, and monitoring is based on specific test sequences

  5. Analog and hybrid computing

    CERN Document Server

    Hyndman, D E

    2013-01-01

    Analog and Hybrid Computing focuses on the operations of analog and hybrid computers. The book first outlines the history of computing devices that influenced the creation of analog and digital computers. The types of problems to be solved on computers, computing systems, and digital computers are discussed. The text looks at the theory and operation of electronic analog computers, including linear and non-linear computing units and use of analog computers as operational amplifiers. The monograph examines the preparation of problems to be deciphered on computers. Flow diagrams, methods of ampl

  6. Massively parallel quantum computer simulator

    NARCIS (Netherlands)

    De Raedt, K.; Michielsen, K.; De Raedt, H.; Trieu, B.; Arnold, G.; Richter, M.; Lippert, Th.; Watanabe, H.; Ito, N.

    2007-01-01

    We describe portable software to simulate universal quantum computers on massive parallel Computers. We illustrate the use of the simulation software by running various quantum algorithms on different computer architectures, such as a IBM BlueGene/L, a IBM Regatta p690+, a Hitachi SR11000/J1, a Cray

  7. Parallel Hybrid Vehicle Optimal Storage System

    Science.gov (United States)

    Bloomfield, Aaron P.

    2009-01-01

    A paper reports the results of a Hybrid Diesel Vehicle Project focused on a parallel hybrid configuration suitable for diesel-powered, medium-sized, commercial vehicles commonly used for parcel delivery and shuttle buses, as the missions of these types of vehicles require frequent stops. During these stops, electric hybridization can effectively recover the vehicle's kinetic energy during the deceleration, store it onboard, and then use that energy to assist in the subsequent acceleration.

  8. A Hybrid Parallel Preconditioning Algorithm For CFD

    Science.gov (United States)

    Barth,Timothy J.; Tang, Wei-Pai; Kwak, Dochan (Technical Monitor)

    1995-01-01

    A new hybrid preconditioning algorithm will be presented which combines the favorable attributes of incomplete lower-upper (ILU) factorization with the favorable attributes of the approximate inverse method recently advocated by numerous researchers. The quality of the preconditioner is adjustable and can be increased at the cost of additional computation while at the same time the storage required is roughly constant and approximately equal to the storage required for the original matrix. In addition, the preconditioning algorithm suggests an efficient and natural parallel implementation with reduced communication. Sample calculations will be presented for the numerical solution of multi-dimensional advection-diffusion equations. The matrix solver has also been embedded into a Newton algorithm for solving the nonlinear Euler and Navier-Stokes equations governing compressible flow. The full paper will show numerous examples in CFD to demonstrate the efficiency and robustness of the method.

  9. Parallel algorithms and cluster computing

    CERN Document Server

    Hoffmann, Karl Heinz

    2007-01-01

    This book presents major advances in high performance computing as well as major advances due to high performance computing. It contains a collection of papers in which results achieved in the collaboration of scientists from computer science, mathematics, physics, and mechanical engineering are presented. From the science problems to the mathematical algorithms and on to the effective implementation of these algorithms on massively parallel and cluster computers we present state-of-the-art methods and technology as well as exemplary results in these fields. This book shows that problems which seem superficially distinct become intimately connected on a computational level.

  10. Wakefield calculations on parallel computers

    International Nuclear Information System (INIS)

    Schoessow, P.

    1990-01-01

    The use of parallelism in the solution of wakefield problems is illustrated for two different computer architectures (SIMD and MIMD). Results are given for finite difference codes which have been implemented on a Connection Machine and an Alliant FX/8 and which are used to compute wakefields in dielectric loaded structures. Benchmarks on code performance are presented for both cases. 4 refs., 3 figs., 2 tabs

  11. Parallel R-matrix computation

    International Nuclear Information System (INIS)

    Heggarty, J.W.

    1999-06-01

    For almost thirty years, sequential R-matrix computation has been used by atomic physics research groups, from around the world, to model collision phenomena involving the scattering of electrons or positrons with atomic or molecular targets. As considerable progress has been made in the understanding of fundamental scattering processes, new data, obtained from more complex calculations, is of current interest to experimentalists. Performing such calculations, however, places considerable demands on the computational resources to be provided by the target machine, in terms of both processor speed and memory requirement. Indeed, in some instances the computational requirements are so great that the proposed R-matrix calculations are intractable, even when utilising contemporary classic supercomputers. Historically, increases in the computational requirements of R-matrix computation were accommodated by porting the problem codes to a more powerful classic supercomputer. Although this approach has been successful in the past, it is no longer considered to be a satisfactory solution due to the limitations of current (and future) Von Neumann machines. As a consequence, there has been considerable interest in the high performance multicomputers, that have emerged over the last decade which appear to offer the computational resources required by contemporary R-matrix research. Unfortunately, developing codes for these machines is not as simple a task as it was to develop codes for successive classic supercomputers. The difficulty arises from the considerable differences in the computing models that exist between the two types of machine and results in the programming of multicomputers to be widely acknowledged as a difficult, time consuming and error-prone task. Nevertheless, unless parallel R-matrix computation is realised, important theoretical and experimental atomic physics research will continue to be hindered. This thesis describes work that was undertaken in

  12. Parallel computing: numerics, applications, and trends

    National Research Council Canada - National Science Library

    Trobec, Roman; Vajteršic, Marián; Zinterhof, Peter

    2009-01-01

    ... and/or distributed systems. The contributions to this book are focused on topics most concerned in the trends of today's parallel computing. These range from parallel algorithmics, programming, tools, network computing to future parallel computing. Particular attention is paid to parallel numerics: linear algebra, differential equations, numerica...

  13. Vibration Isolation for Parallel Hydraulic Hybrid Vehicles

    Directory of Open Access Journals (Sweden)

    The M. Nguyen

    2008-01-01

    Full Text Available In recent decades, several types of hybrid vehicles have been developed in order to improve the fuel economy and to reduce the pollution. Hybrid electric vehicles (HEV have shown a significant improvement in fuel efficiency for small and medium-sized passenger vehicles and SUVs. HEV has several limitations when applied to heavy vehicles; one is that larger vehicles demand more power, which requires significantly larger battery capacities. As an alternative solution, hydraulic hybrid technology has been found effective for heavy duty vehicle because of its high power density. The mechanical batteries used in hydraulic hybrid vehicles (HHV can be charged and discharged remarkably faster than chemical batteries. This feature is essential for heavy vehicle hybridization. One of the main problems that should be solved for the successful commercialization of HHV is the excessive noise and vibration involving with the hydraulic systems. This study focuses on using magnetorheological (MR technology to reduce the noise and vibration transmissibility from the hydraulic system to the vehicle body. In order to study the noise and vibration of HHV, a hydraulic hybrid subsystem in parallel design is analyzed. This research shows that the MR elements play an important role in reducing the transmitted noise and vibration to the vehicle body. Additionally, locations and orientations of the isolation system also affect the efficiency of the noise and vibration mitigation. In simulations, a skyhook control algorithm is used to achieve the highest possible effectiveness of the MR isolation system.

  14. The numerical parallel computing of photon transport

    International Nuclear Information System (INIS)

    Huang Qingnan; Liang Xiaoguang; Zhang Lifa

    1998-12-01

    The parallel computing of photon transport is investigated, the parallel algorithm and the parallelization of programs on parallel computers both with shared memory and with distributed memory are discussed. By analyzing the inherent law of the mathematics and physics model of photon transport according to the structure feature of parallel computers, using the strategy of 'to divide and conquer', adjusting the algorithm structure of the program, dissolving the data relationship, finding parallel liable ingredients and creating large grain parallel subtasks, the sequential computing of photon transport into is efficiently transformed into parallel and vector computing. The program was run on various HP parallel computers such as the HY-1 (PVP), the Challenge (SMP) and the YH-3 (MPP) and very good parallel speedup has been gotten

  15. Parallel computation of rotating flows

    DEFF Research Database (Denmark)

    Lundin, Lars Kristian; Barker, Vincent A.; Sørensen, Jens Nørkær

    1999-01-01

    This paper deals with the simulation of 3‐D rotating flows based on the velocity‐vorticity formulation of the Navier‐Stokes equations in cylindrical coordinates. The governing equations are discretized by a finite difference method. The solution is advanced to a new time level by a two‐step process...... is that of solving a singular, large, sparse, over‐determined linear system of equations, and the iterative method CGLS is applied for this purpose. We discuss some of the mathematical and numerical aspects of this procedure and report on the performance of our software on a wide range of parallel computers. Darbe...

  16. Parallel computing in enterprise modeling.

    Energy Technology Data Exchange (ETDEWEB)

    Goldsby, Michael E.; Armstrong, Robert C.; Shneider, Max S.; Vanderveen, Keith; Ray, Jaideep; Heath, Zach; Allan, Benjamin A.

    2008-08-01

    This report presents the results of our efforts to apply high-performance computing to entity-based simulations with a multi-use plugin for parallel computing. We use the term 'Entity-based simulation' to describe a class of simulation which includes both discrete event simulation and agent based simulation. What simulations of this class share, and what differs from more traditional models, is that the result sought is emergent from a large number of contributing entities. Logistic, economic and social simulations are members of this class where things or people are organized or self-organize to produce a solution. Entity-based problems never have an a priori ergodic principle that will greatly simplify calculations. Because the results of entity-based simulations can only be realized at scale, scalable computing is de rigueur for large problems. Having said that, the absence of a spatial organizing principal makes the decomposition of the problem onto processors problematic. In addition, practitioners in this domain commonly use the Java programming language which presents its own problems in a high-performance setting. The plugin we have developed, called the Parallel Particle Data Model, overcomes both of these obstacles and is now being used by two Sandia frameworks: the Decision Analysis Center, and the Seldon social simulation facility. While the ability to engage U.S.-sized problems is now available to the Decision Analysis Center, this plugin is central to the success of Seldon. Because Seldon relies on computationally intensive cognitive sub-models, this work is necessary to achieve the scale necessary for realistic results. With the recent upheavals in the financial markets, and the inscrutability of terrorist activity, this simulation domain will likely need a capability with ever greater fidelity. High-performance computing will play an important part in enabling that greater fidelity.

  17. An Introduction to Parallel Computation R

    Indian Academy of Sciences (India)

    How are they programmed? This article provides an introduction. A parallel computer is a network of processors built for ... and have been used to solve problems much faster than a single ... in parallel computer design is to select an organization which ..... The most ambitious approach to parallel computing is to develop.

  18. Advances in randomized parallel computing

    CERN Document Server

    Rajasekaran, Sanguthevar

    1999-01-01

    The technique of randomization has been employed to solve numerous prob­ lems of computing both sequentially and in parallel. Examples of randomized algorithms that are asymptotically better than their deterministic counterparts in solving various fundamental problems abound. Randomized algorithms have the advantages of simplicity and better performance both in theory and often in practice. This book is a collection of articles written by renowned experts in the area of randomized parallel computing. A brief introduction to randomized algorithms In the aflalysis of algorithms, at least three different measures of performance can be used: the best case, the worst case, and the average case. Often, the average case run time of an algorithm is much smaller than the worst case. 2 For instance, the worst case run time of Hoare's quicksort is O(n ), whereas its average case run time is only O( n log n). The average case analysis is conducted with an assumption on the input space. The assumption made to arrive at t...

  19. Hybrid quantum computation

    International Nuclear Information System (INIS)

    Sehrawat, Arun; Englert, Berthold-Georg; Zemann, Daniel

    2011-01-01

    We present a hybrid model of the unitary-evolution-based quantum computation model and the measurement-based quantum computation model. In the hybrid model, part of a quantum circuit is simulated by unitary evolution and the rest by measurements on star graph states, thereby combining the advantages of the two standard quantum computation models. In the hybrid model, a complicated unitary gate under simulation is decomposed in terms of a sequence of single-qubit operations, the controlled-z gates, and multiqubit rotations around the z axis. Every single-qubit and the controlled-z gate are realized by a respective unitary evolution, and every multiqubit rotation is executed by a single measurement on a required star graph state. The classical information processing in our model requires only an information flow vector and propagation matrices. We provide the implementation of multicontrol gates in the hybrid model. They are very useful for implementing Grover's search algorithm, which is studied as an illustrative example.

  20. Security in hybrid cloud computing

    OpenAIRE

    Koudelka, Ondřej

    2016-01-01

    This bachelor thesis deals with the area of hybrid cloud computing, specifically with its security. The major aim of the thesis is to analyze and compare the chosen hybrid cloud providers. For the minor aim this thesis compares the security challenges of hybrid cloud as opponent to other deployment models. In order to accomplish said aims, this thesis defines the terms cloud computing and hybrid cloud computing in its theoretical part. Furthermore the security challenges for cloud computing a...

  1. Aspects of computation on asynchronous parallel processors

    International Nuclear Information System (INIS)

    Wright, M.

    1989-01-01

    The increasing availability of asynchronous parallel processors has provided opportunities for original and useful work in scientific computing. However, the field of parallel computing is still in a highly volatile state, and researchers display a wide range of opinion about many fundamental questions such as models of parallelism, approaches for detecting and analyzing parallelism of algorithms, and tools that allow software developers and users to make effective use of diverse forms of complex hardware. This volume collects the work of researchers specializing in different aspects of parallel computing, who met to discuss the framework and the mechanics of numerical computing. The far-reaching impact of high-performance asynchronous systems is reflected in the wide variety of topics, which include scientific applications (e.g. linear algebra, lattice gauge simulation, ordinary and partial differential equations), models of parallelism, parallel language features, task scheduling, automatic parallelization techniques, tools for algorithm development in parallel environments, and system design issues

  2. Parallel Computing Using Web Servers and "Servlets".

    Science.gov (United States)

    Lo, Alfred; Bloor, Chris; Choi, Y. K.

    2000-01-01

    Describes parallel computing and presents inexpensive ways to implement a virtual parallel computer with multiple Web servers. Highlights include performance measurement of parallel systems; models for using Java and intranet technology including single server, multiple clients and multiple servers, single client; and a comparison of CGI (common…

  3. Broadcasting a message in a parallel computer

    Science.gov (United States)

    Berg, Jeremy E [Rochester, MN; Faraj, Ahmad A [Rochester, MN

    2011-08-02

    Methods, systems, and products are disclosed for broadcasting a message in a parallel computer. The parallel computer includes a plurality of compute nodes connected together using a data communications network. The data communications network optimized for point to point data communications and is characterized by at least two dimensions. The compute nodes are organized into at least one operational group of compute nodes for collective parallel operations of the parallel computer. One compute node of the operational group assigned to be a logical root. Broadcasting a message in a parallel computer includes: establishing a Hamiltonian path along all of the compute nodes in at least one plane of the data communications network and in the operational group; and broadcasting, by the logical root to the remaining compute nodes, the logical root's message along the established Hamiltonian path.

  4. ParaHaplo 3.0: A program package for imputation and a haplotype-based whole-genome association study using hybrid parallel computing

    Directory of Open Access Journals (Sweden)

    Kamatani Naoyuki

    2011-05-01

    Full Text Available Abstract Background Use of missing genotype imputations and haplotype reconstructions are valuable in genome-wide association studies (GWASs. By modeling the patterns of linkage disequilibrium in a reference panel, genotypes not directly measured in the study samples can be imputed and used for GWASs. Since millions of single nucleotide polymorphisms need to be imputed in a GWAS, faster methods for genotype imputation and haplotype reconstruction are required. Results We developed a program package for parallel computation of genotype imputation and haplotype reconstruction. Our program package, ParaHaplo 3.0, is intended for use in workstation clusters using the Intel Message Passing Interface. We compared the performance of ParaHaplo 3.0 on the Japanese in Tokyo, Japan and Han Chinese in Beijing, and Chinese in the HapMap dataset. A parallel version of ParaHaplo 3.0 can conduct genotype imputation 20 times faster than a non-parallel version of ParaHaplo. Conclusions ParaHaplo 3.0 is an invaluable tool for conducting haplotype-based GWASs. The need for faster genotype imputation and haplotype reconstruction using parallel computing will become increasingly important as the data sizes of such projects continue to increase. ParaHaplo executable binaries and program sources are available at http://en.sourceforge.jp/projects/parallelgwas/releases/.

  5. Template based parallel checkpointing in a massively parallel computer system

    Science.gov (United States)

    Archer, Charles Jens [Rochester, MN; Inglett, Todd Alan [Rochester, MN

    2009-01-13

    A method and apparatus for a template based parallel checkpoint save for a massively parallel super computer system using a parallel variation of the rsync protocol, and network broadcast. In preferred embodiments, the checkpoint data for each node is compared to a template checkpoint file that resides in the storage and that was previously produced. Embodiments herein greatly decrease the amount of data that must be transmitted and stored for faster checkpointing and increased efficiency of the computer system. Embodiments are directed to a parallel computer system with nodes arranged in a cluster with a high speed interconnect that can perform broadcast communication. The checkpoint contains a set of actual small data blocks with their corresponding checksums from all nodes in the system. The data blocks may be compressed using conventional non-lossy data compression algorithms to further reduce the overall checkpoint size.

  6. Compiler Technology for Parallel Scientific Computation

    Directory of Open Access Journals (Sweden)

    Can Özturan

    1994-01-01

    Full Text Available There is a need for compiler technology that, given the source program, will generate efficient parallel codes for different architectures with minimal user involvement. Parallel computation is becoming indispensable in solving large-scale problems in science and engineering. Yet, the use of parallel computation is limited by the high costs of developing the needed software. To overcome this difficulty we advocate a comprehensive approach to the development of scalable architecture-independent software for scientific computation based on our experience with equational programming language (EPL. Our approach is based on a program decomposition, parallel code synthesis, and run-time support for parallel scientific computation. The program decomposition is guided by the source program annotations provided by the user. The synthesis of parallel code is based on configurations that describe the overall computation as a set of interacting components. Run-time support is provided by the compiler-generated code that redistributes computation and data during object program execution. The generated parallel code is optimized using techniques of data alignment, operator placement, wavefront determination, and memory optimization. In this article we discuss annotations, configurations, parallel code generation, and run-time support suitable for parallel programs written in the functional parallel programming language EPL and in Fortran.

  7. Parallel algorithms for mapping pipelined and parallel computations

    Science.gov (United States)

    Nicol, David M.

    1988-01-01

    Many computational problems in image processing, signal processing, and scientific computing are naturally structured for either pipelined or parallel computation. When mapping such problems onto a parallel architecture it is often necessary to aggregate an obvious problem decomposition. Even in this context the general mapping problem is known to be computationally intractable, but recent advances have been made in identifying classes of problems and architectures for which optimal solutions can be found in polynomial time. Among these, the mapping of pipelined or parallel computations onto linear array, shared memory, and host-satellite systems figures prominently. This paper extends that work first by showing how to improve existing serial mapping algorithms. These improvements have significantly lower time and space complexities: in one case a published O(nm sup 3) time algorithm for mapping m modules onto n processors is reduced to an O(nm log m) time complexity, and its space requirements reduced from O(nm sup 2) to O(m). Run time complexity is further reduced with parallel mapping algorithms based on these improvements, which run on the architecture for which they create the mappings.

  8. Massively parallel evolutionary computation on GPGPUs

    CERN Document Server

    Tsutsui, Shigeyoshi

    2013-01-01

    Evolutionary algorithms (EAs) are metaheuristics that learn from natural collective behavior and are applied to solve optimization problems in domains such as scheduling, engineering, bioinformatics, and finance. Such applications demand acceptable solutions with high-speed execution using finite computational resources. Therefore, there have been many attempts to develop platforms for running parallel EAs using multicore machines, massively parallel cluster machines, or grid computing environments. Recent advances in general-purpose computing on graphics processing units (GPGPU) have opened u

  9. Collectively loading an application in a parallel computer

    Science.gov (United States)

    Aho, Michael E.; Attinella, John E.; Gooding, Thomas M.; Miller, Samuel J.; Mundy, Michael B.

    2016-01-05

    Collectively loading an application in a parallel computer, the parallel computer comprising a plurality of compute nodes, including: identifying, by a parallel computer control system, a subset of compute nodes in the parallel computer to execute a job; selecting, by the parallel computer control system, one of the subset of compute nodes in the parallel computer as a job leader compute node; retrieving, by the job leader compute node from computer memory, an application for executing the job; and broadcasting, by the job leader to the subset of compute nodes in the parallel computer, the application for executing the job.

  10. Parallel Computing Strategies for Irregular Algorithms

    Science.gov (United States)

    Biswas, Rupak; Oliker, Leonid; Shan, Hongzhang; Biegel, Bryan (Technical Monitor)

    2002-01-01

    Parallel computing promises several orders of magnitude increase in our ability to solve realistic computationally-intensive problems, but relies on their efficient mapping and execution on large-scale multiprocessor architectures. Unfortunately, many important applications are irregular and dynamic in nature, making their effective parallel implementation a daunting task. Moreover, with the proliferation of parallel architectures and programming paradigms, the typical scientist is faced with a plethora of questions that must be answered in order to obtain an acceptable parallel implementation of the solution algorithm. In this paper, we consider three representative irregular applications: unstructured remeshing, sparse matrix computations, and N-body problems, and parallelize them using various popular programming paradigms on a wide spectrum of computer platforms ranging from state-of-the-art supercomputers to PC clusters. We present the underlying problems, the solution algorithms, and the parallel implementation strategies. Smart load-balancing, partitioning, and ordering techniques are used to enhance parallel performance. Overall results demonstrate the complexity of efficiently parallelizing irregular algorithms.

  11. Massively Parallel Computing: A Sandia Perspective

    Energy Technology Data Exchange (ETDEWEB)

    Dosanjh, Sudip S.; Greenberg, David S.; Hendrickson, Bruce; Heroux, Michael A.; Plimpton, Steve J.; Tomkins, James L.; Womble, David E.

    1999-05-06

    The computing power available to scientists and engineers has increased dramatically in the past decade, due in part to progress in making massively parallel computing practical and available. The expectation for these machines has been great. The reality is that progress has been slower than expected. Nevertheless, massively parallel computing is beginning to realize its potential for enabling significant break-throughs in science and engineering. This paper provides a perspective on the state of the field, colored by the authors' experiences using large scale parallel machines at Sandia National Laboratories. We address trends in hardware, system software and algorithms, and we also offer our view of the forces shaping the parallel computing industry.

  12. Domain decomposition methods and parallel computing

    International Nuclear Information System (INIS)

    Meurant, G.

    1991-01-01

    In this paper, we show how to efficiently solve large linear systems on parallel computers. These linear systems arise from discretization of scientific computing problems described by systems of partial differential equations. We show how to get a discrete finite dimensional system from the continuous problem and the chosen conjugate gradient iterative algorithm is briefly described. Then, the different kinds of parallel architectures are reviewed and their advantages and deficiencies are emphasized. We sketch the problems found in programming the conjugate gradient method on parallel computers. For this algorithm to be efficient on parallel machines, domain decomposition techniques are introduced. We give results of numerical experiments showing that these techniques allow a good rate of convergence for the conjugate gradient algorithm as well as computational speeds in excess of a billion of floating point operations per second. (author). 5 refs., 11 figs., 2 tabs., 1 inset

  13. Parallel quantum computing in a single ensemble quantum computer

    International Nuclear Information System (INIS)

    Long Guilu; Xiao, L.

    2004-01-01

    We propose a parallel quantum computing mode for ensemble quantum computer. In this mode, some qubits are in pure states while other qubits are in mixed states. It enables a single ensemble quantum computer to perform 'single-instruction-multidata' type of parallel computation. Parallel quantum computing can provide additional speedup in Grover's algorithm and Shor's algorithm. In addition, it also makes a fuller use of qubit resources in an ensemble quantum computer. As a result, some qubits discarded in the preparation of an effective pure state in the Schulman-Varizani and the Cleve-DiVincenzo algorithms can be reutilized

  14. Structured Parallel Programming Patterns for Efficient Computation

    CERN Document Server

    McCool, Michael; Robison, Arch

    2012-01-01

    Programming is now parallel programming. Much as structured programming revolutionized traditional serial programming decades ago, a new kind of structured programming, based on patterns, is relevant to parallel programming today. Parallel computing experts and industry insiders Michael McCool, Arch Robison, and James Reinders describe how to design and implement maintainable and efficient parallel algorithms using a pattern-based approach. They present both theory and practice, and give detailed concrete examples using multiple programming models. Examples are primarily given using two of th

  15. Parallel adaptation of general three-dimensional hybrid meshes

    International Nuclear Information System (INIS)

    Kavouklis, Christos; Kallinderis, Yannis

    2010-01-01

    A new parallel dynamic mesh adaptation and load balancing algorithm for general hybrid grids has been developed. The meshes considered in this work are composed of four kinds of elements; tetrahedra, prisms, hexahedra and pyramids, which poses a challenge to parallel mesh adaptation. Additional complexity imposed by the presence of multiple types of elements affects especially data migration, updates of local data structures and interpartition data structures. Efficient partition of hybrid meshes has been accomplished by transforming them to suitable graphs and using serial graph partitioning algorithms. Communication among processors is based on the faces of the interpartition boundary and the termination detection algorithm of Dijkstra is employed to ensure proper flagging of edges for refinement. An inexpensive dynamic load balancing strategy is introduced to redistribute work load among processors after adaptation. In particular, only the initial coarse mesh, with proper weighting, is balanced which yields savings in computation time and relatively simple implementation of mesh quality preservation rules, while facilitating coarsening of refined elements. Special algorithms are employed for (i) data migration and dynamic updates of the local data structures, (ii) determination of the resulting interpartition boundary and (iii) identification of the communication pattern of processors. Several representative applications are included to evaluate the method.

  16. Parallel computation of rotating flows

    DEFF Research Database (Denmark)

    Lundin, Lars Kristian; Barker, Vincent A.; Sørensen, Jens Nørkær

    1999-01-01

    This paper deals with the simulation of 3‐D rotating flows based on the velocity‐vorticity formulation of the Navier‐Stokes equations in cylindrical coordinates. The governing equations are discretized by a finite difference method. The solution is advanced to a new time level by a two‐step process....... In the first step, the vorticity at the new time level is computed using the velocity at the previous time level. In the second step, the velocity at the new time level is computed using the new vorticity. We discuss here the second part which is by far the most time‐consuming. The numerical problem...

  17. Computer-Aided Parallelizer and Optimizer

    Science.gov (United States)

    Jin, Haoqiang

    2011-01-01

    The Computer-Aided Parallelizer and Optimizer (CAPO) automates the insertion of compiler directives (see figure) to facilitate parallel processing on Shared Memory Parallel (SMP) machines. While CAPO currently is integrated seamlessly into CAPTools (developed at the University of Greenwich, now marketed as ParaWise), CAPO was independently developed at Ames Research Center as one of the components for the Legacy Code Modernization (LCM) project. The current version takes serial FORTRAN programs, performs interprocedural data dependence analysis, and generates OpenMP directives. Due to the widely supported OpenMP standard, the generated OpenMP codes have the potential to run on a wide range of SMP machines. CAPO relies on accurate interprocedural data dependence information currently provided by CAPTools. Compiler directives are generated through identification of parallel loops in the outermost level, construction of parallel regions around parallel loops and optimization of parallel regions, and insertion of directives with automatic identification of private, reduction, induction, and shared variables. Attempts also have been made to identify potential pipeline parallelism (implemented with point-to-point synchronization). Although directives are generated automatically, user interaction with the tool is still important for producing good parallel codes. A comprehensive graphical user interface is included for users to interact with the parallelization process.

  18. High performance parallel computers for science

    International Nuclear Information System (INIS)

    Nash, T.; Areti, H.; Atac, R.; Biel, J.; Cook, A.; Deppe, J.; Edel, M.; Fischler, M.; Gaines, I.; Hance, R.

    1989-01-01

    This paper reports that Fermilab's Advanced Computer Program (ACP) has been developing cost effective, yet practical, parallel computers for high energy physics since 1984. The ACP's latest developments are proceeding in two directions. A Second Generation ACP Multiprocessor System for experiments will include $3500 RISC processors each with performance over 15 VAX MIPS. To support such high performance, the new system allows parallel I/O, parallel interprocess communication, and parallel host processes. The ACP Multi-Array Processor, has been developed for theoretical physics. Each $4000 node is a FORTRAN or C programmable pipelined 20 Mflops (peak), 10 MByte single board computer. These are plugged into a 16 port crossbar switch crate which handles both inter and intra crate communication. The crates are connected in a hypercube. Site oriented applications like lattice gauge theory are supported by system software called CANOPY, which makes the hardware virtually transparent to users. A 256 node, 5 GFlop, system is under construction

  19. Parallel computational in nuclear group constant calculation

    International Nuclear Information System (INIS)

    Su'ud, Zaki; Rustandi, Yaddi K.; Kurniadi, Rizal

    2002-01-01

    In this paper parallel computational method in nuclear group constant calculation using collision probability method will be discuss. The main focus is on the calculation of collision matrix which need large amount of computational time. The geometry treated here is concentric cylinder. The calculation of collision probability matrix is carried out using semi analytic method using Beckley Naylor Function. To accelerate computation speed some computer parallel used to solve the problem. We used LINUX based parallelization using PVM software with C or fortran language. While in windows based we used socket programming using DELPHI or C builder. The calculation results shows the important of optimal weight for each processor in case there area many type of processor speed

  20. Locating hardware faults in a parallel computer

    Science.gov (United States)

    Archer, Charles J.; Megerian, Mark G.; Ratterman, Joseph D.; Smith, Brian E.

    2010-04-13

    Locating hardware faults in a parallel computer, including defining within a tree network of the parallel computer two or more sets of non-overlapping test levels of compute nodes of the network that together include all the data communications links of the network, each non-overlapping test level comprising two or more adjacent tiers of the tree; defining test cells within each non-overlapping test level, each test cell comprising a subtree of the tree including a subtree root compute node and all descendant compute nodes of the subtree root compute node within a non-overlapping test level; performing, separately on each set of non-overlapping test levels, an uplink test on all test cells in a set of non-overlapping test levels; and performing, separately from the uplink tests and separately on each set of non-overlapping test levels, a downlink test on all test cells in a set of non-overlapping test levels.

  1. Impact analysis on a massively parallel computer

    International Nuclear Information System (INIS)

    Zacharia, T.; Aramayo, G.A.

    1994-01-01

    Advanced mathematical techniques and computer simulation play a major role in evaluating and enhancing the design of beverage cans, industrial, and transportation containers for improved performance. Numerical models are used to evaluate the impact requirements of containers used by the Department of Energy (DOE) for transporting radioactive materials. Many of these models are highly compute-intensive. An analysis may require several hours of computational time on current supercomputers despite the simplicity of the models being studied. As computer simulations and materials databases grow in complexity, massively parallel computers have become important tools. Massively parallel computational research at the Oak Ridge National Laboratory (ORNL) and its application to the impact analysis of shipping containers is briefly described in this paper

  2. Internode data communications in a parallel computer

    Science.gov (United States)

    Archer, Charles J.; Blocksome, Michael A.; Miller, Douglas R.; Parker, Jeffrey J.; Ratterman, Joseph D.; Smith, Brian E.

    2013-09-03

    Internode data communications in a parallel computer that includes compute nodes that each include main memory and a messaging unit, the messaging unit including computer memory and coupling compute nodes for data communications, in which, for each compute node at compute node boot time: a messaging unit allocates, in the messaging unit's computer memory, a predefined number of message buffers, each message buffer associated with a process to be initialized on the compute node; receives, prior to initialization of a particular process on the compute node, a data communications message intended for the particular process; and stores the data communications message in the message buffer associated with the particular process. Upon initialization of the particular process, the process establishes a messaging buffer in main memory of the compute node and copies the data communications message from the message buffer of the messaging unit into the message buffer of main memory.

  3. Link failure detection in a parallel computer

    Science.gov (United States)

    Archer, Charles J.; Blocksome, Michael A.; Megerian, Mark G.; Smith, Brian E.

    2010-11-09

    Methods, apparatus, and products are disclosed for link failure detection in a parallel computer including compute nodes connected in a rectangular mesh network, each pair of adjacent compute nodes in the rectangular mesh network connected together using a pair of links, that includes: assigning each compute node to either a first group or a second group such that adjacent compute nodes in the rectangular mesh network are assigned to different groups; sending, by each of the compute nodes assigned to the first group, a first test message to each adjacent compute node assigned to the second group; determining, by each of the compute nodes assigned to the second group, whether the first test message was received from each adjacent compute node assigned to the first group; and notifying a user, by each of the compute nodes assigned to the second group, whether the first test message was received.

  4. Implementations of BLAST for parallel computers.

    Science.gov (United States)

    Jülich, A

    1995-02-01

    The BLAST sequence comparison programs have been ported to a variety of parallel computers-the shared memory machine Cray Y-MP 8/864 and the distributed memory architectures Intel iPSC/860 and nCUBE. Additionally, the programs were ported to run on workstation clusters. We explain the parallelization techniques and consider the pros and cons of these methods. The BLAST programs are very well suited for parallelization for a moderate number of processors. We illustrate our results using the program blastp as an example. As input data for blastp, a 799 residue protein query sequence and the protein database PIR were used.

  5. Parallel visualization on leadership computing resources

    Energy Technology Data Exchange (ETDEWEB)

    Peterka, T; Ross, R B [Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, IL 60439 (United States); Shen, H-W [Department of Computer Science and Engineering, Ohio State University, Columbus, OH 43210 (United States); Ma, K-L [Department of Computer Science, University of California at Davis, Davis, CA 95616 (United States); Kendall, W [Department of Electrical Engineering and Computer Science, University of Tennessee at Knoxville, Knoxville, TN 37996 (United States); Yu, H, E-mail: tpeterka@mcs.anl.go [Sandia National Laboratories, California, Livermore, CA 94551 (United States)

    2009-07-01

    Changes are needed in the way that visualization is performed, if we expect the analysis of scientific data to be effective at the petascale and beyond. By using similar techniques as those used to parallelize simulations, such as parallel I/O, load balancing, and effective use of interprocess communication, the supercomputers that compute these datasets can also serve as analysis and visualization engines for them. Our team is assessing the feasibility of performing parallel scientific visualization on some of the most powerful computational resources of the U.S. Department of Energy's National Laboratories in order to pave the way for analyzing the next generation of computational results. This paper highlights some of the conclusions of that research.

  6. Parallel visualization on leadership computing resources

    International Nuclear Information System (INIS)

    Peterka, T; Ross, R B; Shen, H-W; Ma, K-L; Kendall, W; Yu, H

    2009-01-01

    Changes are needed in the way that visualization is performed, if we expect the analysis of scientific data to be effective at the petascale and beyond. By using similar techniques as those used to parallelize simulations, such as parallel I/O, load balancing, and effective use of interprocess communication, the supercomputers that compute these datasets can also serve as analysis and visualization engines for them. Our team is assessing the feasibility of performing parallel scientific visualization on some of the most powerful computational resources of the U.S. Department of Energy's National Laboratories in order to pave the way for analyzing the next generation of computational results. This paper highlights some of the conclusions of that research.

  7. The new landscape of parallel computer architecture

    Energy Technology Data Exchange (ETDEWEB)

    Shalf, John [NERSC Division, Lawrence Berkeley National Laboratory 1 Cyclotron Road, Berkeley California, 94720 (United States)

    2007-07-15

    The past few years has seen a sea change in computer architecture that will impact every facet of our society as every electronic device from cell phone to supercomputer will need to confront parallelism of unprecedented scale. Whereas the conventional multicore approach (2, 4, and even 8 cores) adopted by the computing industry will eventually hit a performance plateau, the highest performance per watt and per chip area is achieved using manycore technology (hundreds or even thousands of cores). However, fully unleashing the potential of the manycore approach to ensure future advances in sustained computational performance will require fundamental advances in computer architecture and programming models that are nothing short of reinventing computing. In this paper we examine the reasons behind the movement to exponentially increasing parallelism, and its ramifications for system design, applications and programming models.

  8. The new landscape of parallel computer architecture

    International Nuclear Information System (INIS)

    Shalf, John

    2007-01-01

    The past few years has seen a sea change in computer architecture that will impact every facet of our society as every electronic device from cell phone to supercomputer will need to confront parallelism of unprecedented scale. Whereas the conventional multicore approach (2, 4, and even 8 cores) adopted by the computing industry will eventually hit a performance plateau, the highest performance per watt and per chip area is achieved using manycore technology (hundreds or even thousands of cores). However, fully unleashing the potential of the manycore approach to ensure future advances in sustained computational performance will require fundamental advances in computer architecture and programming models that are nothing short of reinventing computing. In this paper we examine the reasons behind the movement to exponentially increasing parallelism, and its ramifications for system design, applications and programming models

  9. Temporal fringe pattern analysis with parallel computing

    International Nuclear Information System (INIS)

    Tuck Wah Ng; Kar Tien Ang; Argentini, Gianluca

    2005-01-01

    Temporal fringe pattern analysis is invaluable in transient phenomena studies but necessitates long processing times. Here we describe a parallel computing strategy based on the single-program multiple-data model and hyperthreading processor technology to reduce the execution time. In a two-node cluster workstation configuration we found that execution periods were reduced by 1.6 times when four virtual processors were used. To allow even lower execution times with an increasing number of processors, the time allocated for data transfer, data read, and waiting should be minimized. Parallel computing is found here to present a feasible approach to reduce execution times in temporal fringe pattern analysis

  10. Endpoint-based parallel data processing in a parallel active messaging interface of a parallel computer

    Science.gov (United States)

    Archer, Charles J.; Blocksome, Michael A.; Ratterman, Joseph D.; Smith, Brian E.

    2014-08-12

    Endpoint-based parallel data processing in a parallel active messaging interface (`PAMI`) of a parallel computer, the PAMI composed of data communications endpoints, each endpoint including a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task, the compute nodes coupled for data communications through the PAMI, including establishing a data communications geometry, the geometry specifying, for tasks representing processes of execution of the parallel application, a set of endpoints that are used in collective operations of the PAMI including a plurality of endpoints for one of the tasks; receiving in endpoints of the geometry an instruction for a collective operation; and executing the instruction for a collective operation through the endpoints in dependence upon the geometry, including dividing data communications operations among the plurality of endpoints for one of the tasks.

  11. Systematic approach for deriving feasible mappings of parallel algorithms to parallel computing platforms

    NARCIS (Netherlands)

    Arkin, Ethem; Tekinerdogan, Bedir; Imre, Kayhan M.

    2017-01-01

    The need for high-performance computing together with the increasing trend from single processor to parallel computer architectures has leveraged the adoption of parallel computing. To benefit from parallel computing power, usually parallel algorithms are defined that can be mapped and executed

  12. Parallel computation of nondeterministic algorithms in VLSI

    Energy Technology Data Exchange (ETDEWEB)

    Hortensius, P D

    1987-01-01

    This work examines parallel VLSI implementations of nondeterministic algorithms. It is demonstrated that conventional pseudorandom number generators are unsuitable for highly parallel applications. Efficient parallel pseudorandom sequence generation can be accomplished using certain classes of elementary one-dimensional cellular automata. The pseudorandom numbers appear in parallel on each clock cycle. Extensive study of the properties of these new pseudorandom number generators is made using standard empirical random number tests, cycle length tests, and implementation considerations. Furthermore, it is shown these particular cellular automata can form the basis of efficient VLSI architectures for computations involved in the Monte Carlo simulation of both the percolation and Ising models from statistical mechanics. Finally, a variation on a Built-In Self-Test technique based upon cellular automata is presented. These Cellular Automata-Logic-Block-Observation (CALBO) circuits improve upon conventional design for testability circuitry.

  13. Frontiers of massively parallel scientific computation

    International Nuclear Information System (INIS)

    Fischer, J.R.

    1987-07-01

    Practical applications using massively parallel computer hardware first appeared during the 1980s. Their development was motivated by the need for computing power orders of magnitude beyond that available today for tasks such as numerical simulation of complex physical and biological processes, generation of interactive visual displays, satellite image analysis, and knowledge based systems. Representative of the first generation of this new class of computers is the Massively Parallel Processor (MPP). A team of scientists was provided the opportunity to test and implement their algorithms on the MPP. The first results are presented. The research spans a broad variety of applications including Earth sciences, physics, signal and image processing, computer science, and graphics. The performance of the MPP was very good. Results obtained using the Connection Machine and the Distributed Array Processor (DAP) are presented

  14. Analysis of parallel computing performance of the code MCNP

    International Nuclear Information System (INIS)

    Wang Lei; Wang Kan; Yu Ganglin

    2006-01-01

    Parallel computing can reduce the running time of the code MCNP effectively. With the MPI message transmitting software, MCNP5 can achieve its parallel computing on PC cluster with Windows operating system. Parallel computing performance of MCNP is influenced by factors such as the type, the complexity level and the parameter configuration of the computing problem. This paper analyzes the parallel computing performance of MCNP regarding with these factors and gives measures to improve the MCNP parallel computing performance. (authors)

  15. Hybrid computing - Generalities and bibliography

    International Nuclear Information System (INIS)

    Neel, Daniele

    1970-01-01

    This note presents the content of a research thesis. It describes the evolution of hybrid computing systems, discusses the benefits and shortcomings of analogue or hybrid systems, discusses the building up of an hybrid system (requires properties), comments different possible uses, addresses the issues of language and programming, discusses analysis methods and scopes of application. An appendix proposes a bibliography on these issues and notably the different scopes of application (simulation, fluid dynamics, biology, chemistry, electronics, energy, errors, space, programming languages, hardware, mechanics, and optimisation of equations or processes, physics) [fr

  16. Vector and parallel processors in computational science

    International Nuclear Information System (INIS)

    Duff, I.S.; Reid, J.K.

    1985-01-01

    These proceedings contain the articles presented at the named conference. These concern hardware and software for vector and parallel processors, numerical methods and algorithms for the computation on such processors, as well as applications of such methods to different fields of physics and related sciences. See hints under the relevant topics. (HSI)

  17. Contributions to computational stereology and parallel programming

    DEFF Research Database (Denmark)

    Rasmusson, Allan

    rotator, even without the need for isotropic sections. To meet the need for computational power to perform image restoration of virtual tissue sections, parallel programming on GPUs has also been part of the project. This has lead to a significant change in paradigm for a previously developed surgical...

  18. A hybrid parallel framework for the cellular Potts model simulations

    Energy Technology Data Exchange (ETDEWEB)

    Jiang, Yi [Los Alamos National Laboratory; He, Kejing [SOUTH CHINA UNIV; Dong, Shoubin [SOUTH CHINA UNIV

    2009-01-01

    The Cellular Potts Model (CPM) has been widely used for biological simulations. However, most current implementations are either sequential or approximated, which can't be used for large scale complex 3D simulation. In this paper we present a hybrid parallel framework for CPM simulations. The time-consuming POE solving, cell division, and cell reaction operation are distributed to clusters using the Message Passing Interface (MPI). The Monte Carlo lattice update is parallelized on shared-memory SMP system using OpenMP. Because the Monte Carlo lattice update is much faster than the POE solving and SMP systems are more and more common, this hybrid approach achieves good performance and high accuracy at the same time. Based on the parallel Cellular Potts Model, we studied the avascular tumor growth using a multiscale model. The application and performance analysis show that the hybrid parallel framework is quite efficient. The hybrid parallel CPM can be used for the large scale simulation ({approx}10{sup 8} sites) of complex collective behavior of numerous cells ({approx}10{sup 6}).

  19. Parallel peak pruning for scalable SMP contour tree computation

    Energy Technology Data Exchange (ETDEWEB)

    Carr, Hamish A. [Univ. of Leeds (United Kingdom); Weber, Gunther H. [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Univ. of California, Davis, CA (United States); Sewell, Christopher M. [Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Ahrens, James P. [Los Alamos National Lab. (LANL), Los Alamos, NM (United States)

    2017-03-09

    As data sets grow to exascale, automated data analysis and visualisation are increasingly important, to intermediate human understanding and to reduce demands on disk storage via in situ analysis. Trends in architecture of high performance computing systems necessitate analysis algorithms to make effective use of combinations of massively multicore and distributed systems. One of the principal analytic tools is the contour tree, which analyses relationships between contours to identify features of more than local importance. Unfortunately, the predominant algorithms for computing the contour tree are explicitly serial, and founded on serial metaphors, which has limited the scalability of this form of analysis. While there is some work on distributed contour tree computation, and separately on hybrid GPU-CPU computation, there is no efficient algorithm with strong formal guarantees on performance allied with fast practical performance. Here in this paper, we report the first shared SMP algorithm for fully parallel contour tree computation, withfor-mal guarantees of O(lgnlgt) parallel steps and O(n lgn) work, and implementations with up to 10x parallel speed up in OpenMP and up to 50x speed up in NVIDIA Thrust.

  20. Java parallel secure stream for grid computing

    International Nuclear Information System (INIS)

    Chen, J.; Akers, W.; Chen, Y.; Watson, W.

    2001-01-01

    The emergence of high speed wide area networks makes grid computing a reality. However grid applications that need reliable data transfer still have difficulties to achieve optimal TCP performance due to network tuning of TCP window size to improve the bandwidth and to reduce latency on a high speed wide area network. The authors present a pure Java package called JPARSS (Java Parallel Secure Stream) that divides data into partitions that are sent over several parallel Java streams simultaneously and allows Java or Web applications to achieve optimal TCP performance in a gird environment without the necessity of tuning the TCP window size. Several experimental results are provided to show that using parallel stream is more effective than tuning TCP window size. In addition X.509 certificate based single sign-on mechanism and SSL based connection establishment are integrated into this package. Finally a few applications using this package will be discussed

  1. Intranode data communications in a parallel computer

    Science.gov (United States)

    Archer, Charles J; Blocksome, Michael A; Miller, Douglas R; Ratterman, Joseph D; Smith, Brian E

    2014-01-07

    Intranode data communications in a parallel computer that includes compute nodes configured to execute processes, where the data communications include: allocating, upon initialization of a first process of a computer node, a region of shared memory; establishing, by the first process, a predefined number of message buffers, each message buffer associated with a process to be initialized on the compute node; sending, to a second process on the same compute node, a data communications message without determining whether the second process has been initialized, including storing the data communications message in the message buffer of the second process; and upon initialization of the second process: retrieving, by the second process, a pointer to the second process's message buffer; and retrieving, by the second process from the second process's message buffer in dependence upon the pointer, the data communications message sent by the first process.

  2. Intranode data communications in a parallel computer

    Science.gov (United States)

    Archer, Charles J; Blocksome, Michael A; Miller, Douglas R; Ratterman, Joseph D; Smith, Brian E

    2013-07-23

    Intranode data communications in a parallel computer that includes compute nodes configured to execute processes, where the data communications include: allocating, upon initialization of a first process of a compute node, a region of shared memory; establishing, by the first process, a predefined number of message buffers, each message buffer associated with a process to be initialized on the compute node; sending, to a second process on the same compute node, a data communications message without determining whether the second process has been initialized, including storing the data communications message in the message buffer of the second process; and upon initialization of the second process: retrieving, by the second process, a pointer to the second process's message buffer; and retrieving, by the second process from the second process's message buffer in dependence upon the pointer, the data communications message sent by the first process.

  3. Parallel evolutionary computation in bioinformatics applications.

    Science.gov (United States)

    Pinho, Jorge; Sobral, João Luis; Rocha, Miguel

    2013-05-01

    A large number of optimization problems within the field of Bioinformatics require methods able to handle its inherent complexity (e.g. NP-hard problems) and also demand increased computational efforts. In this context, the use of parallel architectures is a necessity. In this work, we propose ParJECoLi, a Java based library that offers a large set of metaheuristic methods (such as Evolutionary Algorithms) and also addresses the issue of its efficient execution on a wide range of parallel architectures. The proposed approach focuses on the easiness of use, making the adaptation to distinct parallel environments (multicore, cluster, grid) transparent to the user. Indeed, this work shows how the development of the optimization library can proceed independently of its adaptation for several architectures, making use of Aspect-Oriented Programming. The pluggable nature of parallelism related modules allows the user to easily configure its environment, adding parallelism modules to the base source code when needed. The performance of the platform is validated with two case studies within biological model optimization. Copyright © 2012 Elsevier Ireland Ltd. All rights reserved.

  4. Parallel computers and three-dimensional computational electromagnetics

    International Nuclear Information System (INIS)

    Madsen, N.K.

    1994-01-01

    The authors have continued to enhance their ability to use new massively parallel processing computers to solve time-domain electromagnetic problems. New vectorization techniques have improved the performance of their code DSI3D by factors of 5 to 15, depending on the computer used. New radiation boundary conditions and far-field transformations now allow the computation of radar cross-section values for complex objects. A new parallel-data extraction code has been developed that allows the extraction of data subsets from large problems, which have been run on parallel computers, for subsequent post-processing on workstations with enhanced graphics capabilities. A new charged-particle-pushing version of DSI3D is under development. Finally, DSI3D has become a focal point for several new Cooperative Research and Development Agreement activities with industrial companies such as Lockheed Advanced Development Company, Varian, Hughes Electron Dynamics Division, General Atomic, and Cray

  5. COMPARISON OF PARALLEL AND SERIES HYBRID POWERTRAINS FOR TRANSIT BUS APPLICATION

    Energy Technology Data Exchange (ETDEWEB)

    Gao, Zhiming [ORNL; Daw, C Stuart [ORNL; Smith, David E [ORNL; Jones, Perry T [ORNL; LaClair, Tim J [ORNL; Parks, II, James E [ORNL

    2016-01-01

    The fuel economy and emissions of both conventional and hybrid buses equipped with emissions aftertreatment were evaluated via computational simulation for six representative city bus drive cycles. Both series and parallel configurations for the hybrid case were studied. The simulation results indicate that series hybrid buses have the greatest overall advantage in fuel economy. The series and parallel hybrid buses were predicted to produce similar CO and HC tailpipe emissions but were also predicted to have reduced NOx tailpipe emissions compared to the conventional bus in higher speed cycles. For the New York bus cycle (NYBC), which has the lowest average speed among the cycles evaluated, the series bus tailpipe emissions were somewhat higher than they were for the conventional bus, while the parallel hybrid bus had significantly lower tailpipe emissions. All three bus powertrains were found to require periodic active DPF regeneration to maintain PM control. Plug-in operation of series hybrid buses appears to offer significant fuel economy benefits and is easily employed due to the relatively large battery capacity that is typical of the series hybrid configuration.

  6. On synchronous parallel computations with independent probabilistic choice

    International Nuclear Information System (INIS)

    Reif, J.H.

    1984-01-01

    This paper introduces probabilistic choice to synchronous parallel machine models; in particular parallel RAMs. The power of probabilistic choice in parallel computations is illustrate by parallelizing some known probabilistic sequential algorithms. The authors characterize the computational complexity of time, space, and processor bounded probabilistic parallel RAMs in terms of the computational complexity of probabilistic sequential RAMs. They show that parallelism uniformly speeds up time bounded probabilistic sequential RAM computations by nearly a quadratic factor. They also show that probabilistic choice can be eliminated from parallel computations by introducing nonuniformity

  7. Efficient Parallel Engineering Computing on Linux Workstations

    Science.gov (United States)

    Lou, John Z.

    2010-01-01

    A C software module has been developed that creates lightweight processes (LWPs) dynamically to achieve parallel computing performance in a variety of engineering simulation and analysis applications to support NASA and DoD project tasks. The required interface between the module and the application it supports is simple, minimal and almost completely transparent to the user applications, and it can achieve nearly ideal computing speed-up on multi-CPU engineering workstations of all operating system platforms. The module can be integrated into an existing application (C, C++, Fortran and others) either as part of a compiled module or as a dynamically linked library (DLL).

  8. Hybrid massively parallel fast sweeping method for static Hamilton–Jacobi equations

    Energy Technology Data Exchange (ETDEWEB)

    Detrixhe, Miles, E-mail: mdetrixhe@engineering.ucsb.edu [Department of Mechanical Engineering (United States); University of California Santa Barbara, Santa Barbara, CA, 93106 (United States); Gibou, Frédéric, E-mail: fgibou@engineering.ucsb.edu [Department of Mechanical Engineering (United States); University of California Santa Barbara, Santa Barbara, CA, 93106 (United States); Department of Computer Science (United States); Department of Mathematics (United States)

    2016-10-01

    The fast sweeping method is a popular algorithm for solving a variety of static Hamilton–Jacobi equations. Fast sweeping algorithms for parallel computing have been developed, but are severely limited. In this work, we present a multilevel, hybrid parallel algorithm that combines the desirable traits of two distinct parallel methods. The fine and coarse grained components of the algorithm take advantage of heterogeneous computer architecture common in high performance computing facilities. We present the algorithm and demonstrate its effectiveness on a set of example problems including optimal control, dynamic games, and seismic wave propagation. We give results for convergence, parallel scaling, and show state-of-the-art speedup values for the fast sweeping method.

  9. Hybrid massively parallel fast sweeping method for static Hamilton–Jacobi equations

    International Nuclear Information System (INIS)

    Detrixhe, Miles; Gibou, Frédéric

    2016-01-01

    The fast sweeping method is a popular algorithm for solving a variety of static Hamilton–Jacobi equations. Fast sweeping algorithms for parallel computing have been developed, but are severely limited. In this work, we present a multilevel, hybrid parallel algorithm that combines the desirable traits of two distinct parallel methods. The fine and coarse grained components of the algorithm take advantage of heterogeneous computer architecture common in high performance computing facilities. We present the algorithm and demonstrate its effectiveness on a set of example problems including optimal control, dynamic games, and seismic wave propagation. We give results for convergence, parallel scaling, and show state-of-the-art speedup values for the fast sweeping method.

  10. Accelerating Climate Simulations Through Hybrid Computing

    Science.gov (United States)

    Zhou, Shujia; Sinno, Scott; Cruz, Carlos; Purcell, Mark

    2009-01-01

    Unconventional multi-core processors (e.g., IBM Cell B/E and NYIDIDA GPU) have emerged as accelerators in climate simulation. However, climate models typically run on parallel computers with conventional processors (e.g., Intel and AMD) using MPI. Connecting accelerators to this architecture efficiently and easily becomes a critical issue. When using MPI for connection, we identified two challenges: (1) identical MPI implementation is required in both systems, and; (2) existing MPI code must be modified to accommodate the accelerators. In response, we have extended and deployed IBM Dynamic Application Virtualization (DAV) in a hybrid computing prototype system (one blade with two Intel quad-core processors, two IBM QS22 Cell blades, connected with Infiniband), allowing for seamlessly offloading compute-intensive functions to remote, heterogeneous accelerators in a scalable, load-balanced manner. Currently, a climate solar radiation model running with multiple MPI processes has been offloaded to multiple Cell blades with approx.10% network overhead.

  11. Computational chaos in massively parallel neural networks

    Science.gov (United States)

    Barhen, Jacob; Gulati, Sandeep

    1989-01-01

    A fundamental issue which directly impacts the scalability of current theoretical neural network models to massively parallel embodiments, in both software as well as hardware, is the inherent and unavoidable concurrent asynchronicity of emerging fine-grained computational ensembles and the possible emergence of chaotic manifestations. Previous analyses attributed dynamical instability to the topology of the interconnection matrix, to parasitic components or to propagation delays. However, researchers have observed the existence of emergent computational chaos in a concurrently asynchronous framework, independent of the network topology. Researcher present a methodology enabling the effective asynchronous operation of large-scale neural networks. Necessary and sufficient conditions guaranteeing concurrent asynchronous convergence are established in terms of contracting operators. Lyapunov exponents are computed formally to characterize the underlying nonlinear dynamics. Simulation results are presented to illustrate network convergence to the correct results, even in the presence of large delays.

  12. Electromagnetic Physics Models for Parallel Computing Architectures

    International Nuclear Information System (INIS)

    Amadio, G; Bianchini, C; Iope, R; Ananya, A; Apostolakis, J; Aurora, A; Bandieramonte, M; Brun, R; Carminati, F; Gheata, A; Gheata, M; Goulas, I; Nikitina, T; Bhattacharyya, A; Mohanty, A; Canal, P; Elvira, D; Jun, S Y; Lima, G; Duhem, L

    2016-01-01

    The recent emergence of hardware architectures characterized by many-core or accelerated processors has opened new opportunities for concurrent programming models taking advantage of both SIMD and SIMT architectures. GeantV, a next generation detector simulation, has been designed to exploit both the vector capability of mainstream CPUs and multi-threading capabilities of coprocessors including NVidia GPUs and Intel Xeon Phi. The characteristics of these architectures are very different in terms of the vectorization depth and type of parallelization needed to achieve optimal performance. In this paper we describe implementation of electromagnetic physics models developed for parallel computing architectures as a part of the GeantV project. Results of preliminary performance evaluation and physics validation are presented as well. (paper)

  13. Electromagnetic Physics Models for Parallel Computing Architectures

    Science.gov (United States)

    Amadio, G.; Ananya, A.; Apostolakis, J.; Aurora, A.; Bandieramonte, M.; Bhattacharyya, A.; Bianchini, C.; Brun, R.; Canal, P.; Carminati, F.; Duhem, L.; Elvira, D.; Gheata, A.; Gheata, M.; Goulas, I.; Iope, R.; Jun, S. Y.; Lima, G.; Mohanty, A.; Nikitina, T.; Novak, M.; Pokorski, W.; Ribon, A.; Seghal, R.; Shadura, O.; Vallecorsa, S.; Wenzel, S.; Zhang, Y.

    2016-10-01

    The recent emergence of hardware architectures characterized by many-core or accelerated processors has opened new opportunities for concurrent programming models taking advantage of both SIMD and SIMT architectures. GeantV, a next generation detector simulation, has been designed to exploit both the vector capability of mainstream CPUs and multi-threading capabilities of coprocessors including NVidia GPUs and Intel Xeon Phi. The characteristics of these architectures are very different in terms of the vectorization depth and type of parallelization needed to achieve optimal performance. In this paper we describe implementation of electromagnetic physics models developed for parallel computing architectures as a part of the GeantV project. Results of preliminary performance evaluation and physics validation are presented as well.

  14. (Nearly) portable PIC code for parallel computers

    International Nuclear Information System (INIS)

    Decyk, V.K.

    1993-01-01

    As part of the Numerical Tokamak Project, the author has developed a (nearly) portable, one dimensional version of the GCPIC algorithm for particle-in-cell codes on parallel computers. This algorithm uses a spatial domain decomposition for the fields, and passes particles from one domain to another as the particles move spatially. With only minor changes, the code has been run in parallel on the Intel Delta, the Cray C-90, the IBM ES/9000 and a cluster of workstations. After a line by line translation into cmfortran, the code was also run on the CM-200. Impressive speeds have been achieved, both on the Intel Delta and the Cray C-90, around 30 nanoseconds per particle per time step. In addition, the author was able to isolate the data management modules, so that the physics modules were not changed much from their sequential version, and the data management modules can be used as open-quotes black boxes.close quotes

  15. Computation and parallel implementation for early vision

    Science.gov (United States)

    Gualtieri, J. Anthony

    1990-01-01

    The problem of early vision is to transform one or more retinal illuminance images-pixel arrays-to image representations built out of such primitive visual features such as edges, regions, disparities, and clusters. These transformed representations form the input to later vision stages that perform higher level vision tasks including matching and recognition. Researchers developed algorithms for: (1) edge finding in the scale space formulation; (2) correlation methods for computing matches between pairs of images; and (3) clustering of data by neural networks. These algorithms are formulated for parallel implementation of SIMD machines, such as the Massively Parallel Processor, a 128 x 128 array processor with 1024 bits of local memory per processor. For some cases, researchers can show speedups of three orders of magnitude over serial implementations.

  16. Parallel computing techniques for rotorcraft aerodynamics

    Science.gov (United States)

    Ekici, Kivanc

    The modification of unsteady three-dimensional Navier-Stokes codes for application on massively parallel and distributed computing environments is investigated. The Euler/Navier-Stokes code TURNS (Transonic Unsteady Rotor Navier-Stokes) was chosen as a test bed because of its wide use by universities and industry. For the efficient implementation of TURNS on parallel computing systems, two algorithmic changes are developed. First, main modifications to the implicit operator, Lower-Upper Symmetric Gauss Seidel (LU-SGS) originally used in TURNS, is performed. Second, application of an inexact Newton method, coupled with a Krylov subspace iterative method (Newton-Krylov method) is carried out. Both techniques have been tried previously for the Euler equations mode of the code. In this work, we have extended the methods to the Navier-Stokes mode. Several new implicit operators were tried because of convergence problems of traditional operators with the high cell aspect ratio (CAR) grids needed for viscous calculations on structured grids. Promising results for both Euler and Navier-Stokes cases are presented for these operators. For the efficient implementation of Newton-Krylov methods to the Navier-Stokes mode of TURNS, efficient preconditioners must be used. The parallel implicit operators used in the previous step are employed as preconditioners and the results are compared. The Message Passing Interface (MPI) protocol has been used because of its portability to various parallel architectures. It should be noted that the proposed methodology is general and can be applied to several other CFD codes (e.g. OVERFLOW).

  17. Parallel deposition, sorting, and reordering methods in the Hybrid Ordered Plasma Simulation (HOPS) code

    International Nuclear Information System (INIS)

    Anderson, D.V.; Shumaker, D.E.

    1993-01-01

    From a computational standpoint, particle simulation calculations for plasmas have not adapted well to the transitions from scalar to vector processing nor from serial to parallel environments. They have suffered from inordinate and excessive accessing of computer memory and have been hobbled by relatively inefficient gather-scatter constructs resulting from the use of indirect indexing. Lastly, the many-to-one mapping characteristic of the deposition phase has made it difficult to perform this in parallel. The authors' code sorts and reorders the particles in a spatial order. This allows them to greatly reduce the memory references, to run in directly indexed vector mode, and to employ domain decomposition to achieve parallelization. In this hybrid simulation the electrons are modeled as a fluid and the field equations solved are obtained from the electron momentum equation together with the pre-Maxwell equations (displacement current neglected). Either zero or finite electron mass can be used in the electron model. The resulting field equations are solved with an iteratively explicit procedure which is thus trivial to parallelize. Likewise, the field interpolations and the particle pushing is simple to parallelize. The deposition, sorting, and reordering phases are less simple and it is for these that the authors present detailed algorithms. They have now successfully tested the parallel version of HOPS in serial mode and it is now being readied for parallel execution on the Cray C-90. They will then port HOPS to a massively parallel computer, in the next year

  18. Energy and fuel efficient parallel mild hybrids for urban roads

    International Nuclear Information System (INIS)

    Babu, Ajay; Ashok, S.

    2016-01-01

    Highlights: • Energy and fuel savings depend on battery charge variations and the vehicle speed parameters. • Indian urban conditions provide lot of scope for energy and fuel savings in mild hybrids. • Energy saving strategy has lower payback periods than the fuel saving one in mild hybrids. • Sensitivity to parameter variations is the least for energy saving strategy in a mild hybrid. - Abstract: Fuel economy improvements and battery energy savings can promote the adoption of parallel mild hybrids for urban driving conditions. The aim of this study is to establish these benefits through two operating modes: an energy saving mode and a fuel saving mode. The performances of a typical parallel mild hybrid using these modes were analysed over urban driving cycles, in the US, Europe, and India, with a particular focus on the Indian urban conditions. The energy pack available from the proposed energy-saving operating mode, in addition to the energy already available from the conventional mode, was observed to be the highest for the representative urban driving cycle of the US. The extra energy pack available was found to be approximately 21.9 times that available from the conventional mode. By employing the proposed fuel saving operating mode, the fuel economy improvement achievable in New York City was observed to be approximately 22.69% of the fuel economy with the conventional strategy. The energy saving strategy was found to possess the lowest payback periods and highest immunity to variations in various cost parameters.

  19. Computational fluid dynamics on a massively parallel computer

    Science.gov (United States)

    Jespersen, Dennis C.; Levit, Creon

    1989-01-01

    A finite difference code was implemented for the compressible Navier-Stokes equations on the Connection Machine, a massively parallel computer. The code is based on the ARC2D/ARC3D program and uses the implicit factored algorithm of Beam and Warming. The codes uses odd-even elimination to solve linear systems. Timings and computation rates are given for the code, and a comparison is made with a Cray XMP.

  20. Three-dimensional pseudo-random number generator for implementing in hybrid computer systems

    International Nuclear Information System (INIS)

    Ivanov, M.A.; Vasil'ev, N.P.; Voronin, A.V.; Kravtsov, M.Yu.; Maksutov, A.A.; Spiridonov, A.A.; Khudyakova, V.I.; Chugunkov, I.V.

    2012-01-01

    The algorithm for generating pseudo-random numbers oriented to implementation by using hybrid computer systems is considered. The proposed solution is characterized by a high degree of parallel computing [ru

  1. Climate models on massively parallel computers

    International Nuclear Information System (INIS)

    Vitart, F.; Rouvillois, P.

    1993-01-01

    First results got on massively parallel computers (Multiple Instruction Multiple Data and Simple Instruction Multiple Data) allow to consider building of coupled models with high resolutions. This would make possible simulation of thermoaline circulation and other interaction phenomena between atmosphere and ocean. The increasing of computers powers, and then the improvement of resolution will go us to revise our approximations. Then hydrostatic approximation (in ocean circulation) will not be valid when the grid mesh will be of a dimension lower than a few kilometers: We shall have to find other models. The expert appraisement got in numerical analysis at the Center of Limeil-Valenton (CEL-V) will be used again to imagine global models taking in account atmosphere, ocean, ice floe and biosphere, allowing climate simulation until a regional scale

  2. Massively parallel computation of conservation laws

    Energy Technology Data Exchange (ETDEWEB)

    Garbey, M [Univ. Claude Bernard, Villeurbanne (France); Levine, D [Argonne National Lab., IL (United States)

    1990-01-01

    The authors present a new method for computing solutions of conservation laws based on the use of cellular automata with the method of characteristics. The method exploits the high degree of parallelism available with cellular automata and retains important features of the method of characteristics. It yields high numerical accuracy and extends naturally to adaptive meshes and domain decomposition methods for perturbed conservation laws. They describe the method and its implementation for a Dirichlet problem with a single conservation law for the one-dimensional case. Numerical results for the one-dimensional law with the classical Burgers nonlinearity or the Buckley-Leverett equation show good numerical accuracy outside the neighborhood of the shocks. The error in the area of the shocks is of the order of the mesh size. The algorithm is well suited for execution on both massively parallel computers and vector machines. They present timing results for an Alliant FX/8, Connection Machine Model 2, and CRAY X-MP.

  3. A hybrid algorithm for parallel molecular dynamics simulations

    Science.gov (United States)

    Mangiardi, Chris M.; Meyer, R.

    2017-10-01

    This article describes algorithms for the hybrid parallelization and SIMD vectorization of molecular dynamics simulations with short-range forces. The parallelization method combines domain decomposition with a thread-based parallelization approach. The goal of the work is to enable efficient simulations of very large (tens of millions of atoms) and inhomogeneous systems on many-core processors with hundreds or thousands of cores and SIMD units with large vector sizes. In order to test the efficiency of the method, simulations of a variety of configurations with up to 74 million atoms have been performed. Results are shown that were obtained on multi-core systems with Sandy Bridge and Haswell processors as well as systems with Xeon Phi many-core processors.

  4. A Computational Fluid Dynamics Algorithm on a Massively Parallel Computer

    Science.gov (United States)

    Jespersen, Dennis C.; Levit, Creon

    1989-01-01

    The discipline of computational fluid dynamics is demanding ever-increasing computational power to deal with complex fluid flow problems. We investigate the performance of a finite-difference computational fluid dynamics algorithm on a massively parallel computer, the Connection Machine. Of special interest is an implicit time-stepping algorithm; to obtain maximum performance from the Connection Machine, it is necessary to use a nonstandard algorithm to solve the linear systems that arise in the implicit algorithm. We find that the Connection Machine ran achieve very high computation rates on both explicit and implicit algorithms. The performance of the Connection Machine puts it in the same class as today's most powerful conventional supercomputers.

  5. External parallel sorting with multiprocessor computers

    International Nuclear Information System (INIS)

    Comanceau, S.I.

    1984-01-01

    This article describes methods of external sorting in which the entire main computer memory is used for the internal sorting of entries, forming out of them sorted segments of the greatest possible size, and outputting them to external memories. The obtained segments are merged into larger segments until all entries form one ordered segment. The described methods are suitable for sequential files stored on magnetic tape. The needs of the sorting algorithm can be met by using the relatively slow peripheral storage devices (e.g., tapes, disks, drums). The efficiency of the external sorting methods is determined by calculating the total sorting time as a function of the number of entries to be sorted and the number of parallel processors participating in the sorting process

  6. Contact-impact algorithms on parallel computers

    International Nuclear Information System (INIS)

    Zhong Zhihua; Nilsson, Larsgunnar

    1994-01-01

    Contact-impact algorithms on parallel computers are discussed within the context of explicit finite element analysis. The algorithms concerned include a contact searching algorithm and an algorithm for contact force calculations. The contact searching algorithm is based on the territory concept of the general HITA algorithm. However, no distinction is made between different contact bodies, or between different contact surfaces. All contact segments from contact boundaries are taken as a single set. Hierarchy territories and contact territories are expanded. A three-dimensional bucket sort algorithm is used to sort contact nodes. The defence node algorithm is used in the calculation of contact forces. Both the contact searching algorithm and the defence node algorithm are implemented on the connection machine CM-200. The performance of the algorithms is examined under different circumstances, and numerical results are presented. ((orig.))

  7. A Case Study of a Hybrid Parallel 3D Surface Rendering Graphics Architecture

    DEFF Research Database (Denmark)

    Holten-Lund, Hans Erik; Madsen, Jan; Pedersen, Steen

    1997-01-01

    This paper presents a case study in the design strategy used inbuilding a graphics computer, for drawing very complex 3Dgeometric surfaces. The goal is to build a PC based computer systemcapable of handling surfaces built from about 2 million triangles, andto be able to render a perspective view...... of these on a computer displayat interactive frame rates, i.e. processing around 50 milliontriangles per second. The paper presents a hardware/softwarearchitecture called HPGA (Hybrid Parallel Graphics Architecture) whichis likely to be able to carry out this task. The case study focuses ontechniques to increase...

  8. Comparison Of Hybrid Sorting Algorithms Implemented On Different Parallel Hardware Platforms

    Directory of Open Access Journals (Sweden)

    Dominik Zurek

    2013-01-01

    Full Text Available Sorting is a common problem in computer science. There are lot of well-known sorting algorithms created for sequential execution on a single processor. Recently, hardware platforms enable to create wide parallel algorithms. We have standard processors consist of multiple cores and hardware accelerators like GPU. The graphic cards with their parallel architecture give new possibility to speed up many algorithms. In this paper we describe results of implementation of a few different sorting algorithms on GPU cards and multicore processors. Then hybrid algorithm will be presented which consists of parts executed on both platforms, standard CPU and GPU.

  9. Distributed Memory Parallel Computing with SEAWAT

    Science.gov (United States)

    Verkaik, J.; Huizer, S.; van Engelen, J.; Oude Essink, G.; Ram, R.; Vuik, K.

    2017-12-01

    Fresh groundwater reserves in coastal aquifers are threatened by sea-level rise, extreme weather conditions, increasing urbanization and associated groundwater extraction rates. To counteract these threats, accurate high-resolution numerical models are required to optimize the management of these precious reserves. The major model drawbacks are long run times and large memory requirements, limiting the predictive power of these models. Distributed memory parallel computing is an efficient technique for reducing run times and memory requirements, where the problem is divided over multiple processor cores. A new Parallel Krylov Solver (PKS) for SEAWAT is presented. PKS has recently been applied to MODFLOW and includes Conjugate Gradient (CG) and Biconjugate Gradient Stabilized (BiCGSTAB) linear accelerators. Both accelerators are preconditioned by an overlapping additive Schwarz preconditioner in a way that: a) subdomains are partitioned using Recursive Coordinate Bisection (RCB) load balancing, b) each subdomain uses local memory only and communicates with other subdomains by Message Passing Interface (MPI) within the linear accelerator, c) it is fully integrated in SEAWAT. Within SEAWAT, the PKS-CG solver replaces the Preconditioned Conjugate Gradient (PCG) solver for solving the variable-density groundwater flow equation and the PKS-BiCGSTAB solver replaces the Generalized Conjugate Gradient (GCG) solver for solving the advection-diffusion equation. PKS supports the third-order Total Variation Diminishing (TVD) scheme for computing advection. Benchmarks were performed on the Dutch national supercomputer (https://userinfo.surfsara.nl/systems/cartesius) using up to 128 cores, for a synthetic 3D Henry model (100 million cells) and the real-life Sand Engine model ( 10 million cells). The Sand Engine model was used to investigate the potential effect of the long-term morphological evolution of a large sand replenishment and climate change on fresh groundwater resources

  10. Automatic Parallelization Tool: Classification of Program Code for Parallel Computing

    Directory of Open Access Journals (Sweden)

    Mustafa Basthikodi

    2016-04-01

    Full Text Available Performance growth of single-core processors has come to a halt in the past decade, but was re-enabled by the introduction of parallelism in processors. Multicore frameworks along with Graphical Processing Units empowered to enhance parallelism broadly. Couples of compilers are updated to developing challenges forsynchronization and threading issues. Appropriate program and algorithm classifications will have advantage to a great extent to the group of software engineers to get opportunities for effective parallelization. In present work we investigated current species for classification of algorithms, in that related work on classification is discussed along with the comparison of issues that challenges the classification. The set of algorithms are chosen which matches the structure with different issues and perform given task. We have tested these algorithms utilizing existing automatic species extraction toolsalong with Bones compiler. We have added functionalities to existing tool, providing a more detailed characterization. The contributions of our work include support for pointer arithmetic, conditional and incremental statements, user defined types, constants and mathematical functions. With this, we can retain significant data which is not captured by original speciesof algorithms. We executed new theories into the device, empowering automatic characterization of program code.

  11. Broadcasting collective operation contributions throughout a parallel computer

    Science.gov (United States)

    Faraj, Ahmad [Rochester, MN

    2012-02-21

    Methods, systems, and products are disclosed for broadcasting collective operation contributions throughout a parallel computer. The parallel computer includes a plurality of compute nodes connected together through a data communications network. Each compute node has a plurality of processors for use in collective parallel operations on the parallel computer. Broadcasting collective operation contributions throughout a parallel computer according to embodiments of the present invention includes: transmitting, by each processor on each compute node, that processor's collective operation contribution to the other processors on that compute node using intra-node communications; and transmitting on a designated network link, by each processor on each compute node according to a serial processor transmission sequence, that processor's collective operation contribution to the other processors on the other compute nodes using inter-node communications.

  12. Parallel Multivariate Spatio-Temporal Clustering of Large Ecological Datasets on Hybrid Supercomputers

    Energy Technology Data Exchange (ETDEWEB)

    Sreepathi, Sarat [ORNL; Kumar, Jitendra [ORNL; Mills, Richard T. [Argonne National Laboratory; Hoffman, Forrest M. [ORNL; Sripathi, Vamsi [Intel Corporation; Hargrove, William Walter [United States Department of Agriculture (USDA), United States Forest Service (USFS)

    2017-09-01

    A proliferation of data from vast networks of remote sensing platforms (satellites, unmanned aircraft systems (UAS), airborne etc.), observational facilities (meteorological, eddy covariance etc.), state-of-the-art sensors, and simulation models offer unprecedented opportunities for scientific discovery. Unsupervised classification is a widely applied data mining approach to derive insights from such data. However, classification of very large data sets is a complex computational problem that requires efficient numerical algorithms and implementations on high performance computing (HPC) platforms. Additionally, increasing power, space, cooling and efficiency requirements has led to the deployment of hybrid supercomputing platforms with complex architectures and memory hierarchies like the Titan system at Oak Ridge National Laboratory. The advent of such accelerated computing architectures offers new challenges and opportunities for big data analytics in general and specifically, large scale cluster analysis in our case. Although there is an existing body of work on parallel cluster analysis, those approaches do not fully meet the needs imposed by the nature and size of our large data sets. Moreover, they had scaling limitations and were mostly limited to traditional distributed memory computing platforms. We present a parallel Multivariate Spatio-Temporal Clustering (MSTC) technique based on k-means cluster analysis that can target hybrid supercomputers like Titan. We developed a hybrid MPI, CUDA and OpenACC implementation that can utilize both CPU and GPU resources on computational nodes. We describe performance results on Titan that demonstrate the scalability and efficacy of our approach in processing large ecological data sets.

  13. Computing all hybridization networks for multiple binary phylogenetic input trees.

    Science.gov (United States)

    Albrecht, Benjamin

    2015-07-30

    The computation of phylogenetic trees on the same set of species that are based on different orthologous genes can lead to incongruent trees. One possible explanation for this behavior are interspecific hybridization events recombining genes of different species. An important approach to analyze such events is the computation of hybridization networks. This work presents the first algorithm computing the hybridization number as well as a set of representative hybridization networks for multiple binary phylogenetic input trees on the same set of taxa. To improve its practical runtime, we show how this algorithm can be parallelized. Moreover, we demonstrate the efficiency of the software Hybroscale, containing an implementation of our algorithm, by comparing it to PIRNv2.0, which is so far the best available software computing the exact hybridization number for multiple binary phylogenetic trees on the same set of taxa. The algorithm is part of the software Hybroscale, which was developed specifically for the investigation of hybridization networks including their computation and visualization. Hybroscale is freely available(1) and runs on all three major operating systems. Our simulation study indicates that our approach is on average 100 times faster than PIRNv2.0. Moreover, we show how Hybroscale improves the interpretation of the reported hybridization networks by adding certain features to its graphical representation.

  14. Applications of the parallel computing system using network

    International Nuclear Information System (INIS)

    Ido, Shunji; Hasebe, Hiroki

    1994-01-01

    Parallel programming is applied to multiple processors connected in Ethernet. Data exchanges between tasks located in each processing element are realized by two ways. One is socket which is standard library on recent UNIX operating systems. Another is a network connecting software, named as Parallel Virtual Machine (PVM) which is a free software developed by ORNL, to use many workstations connected to network as a parallel computer. This paper discusses the availability of parallel computing using network and UNIX workstations and comparison between specialized parallel systems (Transputer and iPSC/860) in a Monte Carlo simulation which generally shows high parallelization ratio. (author)

  15. Parallel computing and networking; Heiretsu keisanki to network

    Energy Technology Data Exchange (ETDEWEB)

    Asakawa, E; Tsuru, T [Japan National Oil Corp., Tokyo (Japan); Matsuoka, T [Japan Petroleum Exploration Co. Ltd., Tokyo (Japan)

    1996-05-01

    This paper describes the trend of parallel computers used in geophysical exploration. Around 1993 was the early days when the parallel computers began to be used for geophysical exploration. Classification of these computers those days was mainly MIMD (multiple instruction stream, multiple data stream), SIMD (single instruction stream, multiple data stream) and the like. Parallel computers were publicized in the 1994 meeting of the Geophysical Exploration Society as a `high precision imaging technology`. Concerning the library of parallel computers, there was a shift to PVM (parallel virtual machine) in 1993 and to MPI (message passing interface) in 1995. In addition, the compiler of FORTRAN90 was released with support implemented for data parallel and vector computers. In 1993, networks used were Ethernet, FDDI, CDDI and HIPPI. In 1995, the OC-3 products under ATM began to propagate. However, ATM remains to be an interoffice high speed network because the ATM service has not spread yet for the public network. 1 ref.

  16. From parallel to distributed computing for reactive scattering calculations

    International Nuclear Information System (INIS)

    Lagana, A.; Gervasi, O.; Baraglia, R.

    1994-01-01

    Some reactive scattering codes have been ported on different innovative computer architectures ranging from massively parallel machines to clustered workstations. The porting has required a drastic restructuring of the codes to single out computationally decoupled cpu intensive subsections. The suitability of different theoretical approaches for parallel and distributed computing restructuring is discussed and the efficiency of related algorithms evaluated

  17. Model-driven product line engineering for mapping parallel algorithms to parallel computing platforms

    NARCIS (Netherlands)

    Arkin, Ethem; Tekinerdogan, Bedir

    2016-01-01

    Mapping parallel algorithms to parallel computing platforms requires several activities such as the analysis of the parallel algorithm, the definition of the logical configuration of the platform, the mapping of the algorithm to the logical configuration platform and the implementation of the

  18. Hybrid soft computing approaches research and applications

    CERN Document Server

    Dutta, Paramartha; Chakraborty, Susanta

    2016-01-01

    The book provides a platform for dealing with the flaws and failings of the soft computing paradigm through different manifestations. The different chapters highlight the necessity of the hybrid soft computing methodology in general with emphasis on several application perspectives in particular. Typical examples include (a) Study of Economic Load Dispatch by Various Hybrid Optimization Techniques, (b) An Application of Color Magnetic Resonance Brain Image Segmentation by ParaOptiMUSIG activation Function, (c) Hybrid Rough-PSO Approach in Remote Sensing Imagery Analysis,  (d) A Study and Analysis of Hybrid Intelligent Techniques for Breast Cancer Detection using Breast Thermograms, and (e) Hybridization of 2D-3D Images for Human Face Recognition. The elaborate findings of the chapters enhance the exhibition of the hybrid soft computing paradigm in the field of intelligent computing.

  19. Data communications in a parallel active messaging interface of a parallel computer

    Science.gov (United States)

    Archer, Charles J; Blocksome, Michael A; Ratterman, Joseph D; Smith, Brian E

    2013-11-12

    Data communications in a parallel active messaging interface (`PAMI`) of a parallel computer composed of compute nodes that execute a parallel application, each compute node including application processors that execute the parallel application and at least one management processor dedicated to gathering information regarding data communications. The PAMI is composed of data communications endpoints, each endpoint composed of a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task, the compute nodes and the endpoints coupled for data communications through the PAMI and through data communications resources. Embodiments function by gathering call site statistics describing data communications resulting from execution of data communications instructions and identifying in dependence upon the call cite statistics a data communications algorithm for use in executing a data communications instruction at a call site in the parallel application.

  20. Parallel Computing for Terrestrial Ecosystem Carbon Modeling

    International Nuclear Information System (INIS)

    Wang, Dali; Post, Wilfred M.; Ricciuto, Daniel M.; Berry, Michael

    2011-01-01

    Terrestrial ecosystems are a primary component of research on global environmental change. Observational and modeling research on terrestrial ecosystems at the global scale, however, has lagged behind their counterparts for oceanic and atmospheric systems, largely because the unique challenges associated with the tremendous diversity and complexity of terrestrial ecosystems. There are 8 major types of terrestrial ecosystem: tropical rain forest, savannas, deserts, temperate grassland, deciduous forest, coniferous forest, tundra, and chaparral. The carbon cycle is an important mechanism in the coupling of terrestrial ecosystems with climate through biological fluxes of CO 2 . The influence of terrestrial ecosystems on atmospheric CO 2 can be modeled via several means at different timescales. Important processes include plant dynamics, change in land use, as well as ecosystem biogeography. Over the past several decades, many terrestrial ecosystem models (see the 'Model developments' section) have been developed to understand the interactions between terrestrial carbon storage and CO 2 concentration in the atmosphere, as well as the consequences of these interactions. Early TECMs generally adapted simple box-flow exchange models, in which photosynthetic CO 2 uptake and respiratory CO 2 release are simulated in an empirical manner with a small number of vegetation and soil carbon pools. Demands on kinds and amount of information required from global TECMs have grown. Recently, along with the rapid development of parallel computing, spatially explicit TECMs with detailed process based representations of carbon dynamics become attractive, because those models can readily incorporate a variety of additional ecosystem processes (such as dispersal, establishment, growth, mortality etc.) and environmental factors (such as landscape position, pest populations, disturbances, resource manipulations, etc.), and provide information to frame policy options for climate change

  1. Fast parallel computation of polynomials using few processors

    DEFF Research Database (Denmark)

    Valiant, Leslie; Skyum, Sven

    1981-01-01

    It is shown that any multivariate polynomial that can be computed sequentially in C steps and has degree d can be computed in parallel in 0((log d) (log C + log d)) steps using only (Cd)0(1) processors....

  2. Prototyping and Simulating Parallel, Distributed Computations with VISA

    National Research Council Canada - National Science Library

    Demeure, Isabelle M; Nutt, Gary J

    1989-01-01

    ...] to support the design, prototyping, and simulation of parallel, distributed computations. In particular, VISA is meant to guide the choice of partitioning and communication strategies for such computations, based on their performance...

  3. Solving the Stokes problem on a massively parallel computer

    DEFF Research Database (Denmark)

    Axelsson, Owe; Barker, Vincent A.; Neytcheva, Maya

    2001-01-01

    boundary value problem for each velocity component, are solved by the conjugate gradient method with a preconditioning based on the algebraic multi‐level iteration (AMLI) technique. The velocity is found from the computed pressure. The method is optimal in the sense that the computational work...... is proportional to the number of unknowns. Further, it is designed to exploit a massively parallel computer with distributed memory architecture. Numerical experiments on a Cray T3E computer illustrate the parallel performance of the method....

  4. A hybrid parallel architecture for electrostatic interactions in the simulation of dissipative particle dynamics

    Science.gov (United States)

    Yang, Sheng-Chun; Lu, Zhong-Yuan; Qian, Hu-Jun; Wang, Yong-Lei; Han, Jie-Ping

    2017-11-01

    In this work, we upgraded the electrostatic interaction method of CU-ENUF (Yang, et al., 2016) which first applied CUNFFT (nonequispaced Fourier transforms based on CUDA) to the reciprocal-space electrostatic computation and made the computation of electrostatic interaction done thoroughly in GPU. The upgraded edition of CU-ENUF runs concurrently in a hybrid parallel way that enables the computation parallelizing on multiple computer nodes firstly, then further on the installed GPU in each computer. By this parallel strategy, the size of simulation system will be never restricted to the throughput of a single CPU or GPU. The most critical technical problem is how to parallelize a CUNFFT in the parallel strategy, which is conquered effectively by deep-seated research of basic principles and some algorithm skills. Furthermore, the upgraded method is capable of computing electrostatic interactions for both the atomistic molecular dynamics (MD) and the dissipative particle dynamics (DPD). Finally, the benchmarks conducted for validation and performance indicate that the upgraded method is able to not only present a good precision when setting suitable parameters, but also give an efficient way to compute electrostatic interactions for huge simulation systems. Program Files doi:http://dx.doi.org/10.17632/zncf24fhpv.1 Licensing provisions: GNU General Public License 3 (GPL) Programming language: C, C++, and CUDA C Supplementary material: The program is designed for effective electrostatic interactions of large-scale simulation systems, which runs on particular computers equipped with NVIDIA GPUs. It has been tested on (a) single computer node with Intel(R) Core(TM) i7-3770@ 3.40 GHz (CPU) and GTX 980 Ti (GPU), and (b) MPI parallel computer nodes with the same configurations. Nature of problem: For molecular dynamics simulation, the electrostatic interaction is the most time-consuming computation because of its long-range feature and slow convergence in simulation space

  5. Parallel genetic algorithms with migration for the hybrid flow shop scheduling problem

    Directory of Open Access Journals (Sweden)

    K. Belkadi

    2006-01-01

    Full Text Available This paper addresses scheduling problems in hybrid flow shop-like systems with a migration parallel genetic algorithm (PGA_MIG. This parallel genetic algorithm model allows genetic diversity by the application of selection and reproduction mechanisms nearer to nature. The space structure of the population is modified by dividing it into disjoined subpopulations. From time to time, individuals are exchanged between the different subpopulations (migration. Influence of parameters and dedicated strategies are studied. These parameters are the number of independent subpopulations, the interconnection topology between subpopulations, the choice/replacement strategy of the migrant individuals, and the migration frequency. A comparison between the sequential and parallel version of genetic algorithm (GA is provided. This comparison relates to the quality of the solution and the execution time of the two versions. The efficiency of the parallel model highly depends on the parameters and especially on the migration frequency. In the same way this parallel model gives a significant improvement of computational time if it is implemented on a parallel architecture which offers an acceptable number of processors (as many processors as subpopulations.

  6. Numerical discrepancy between serial and MPI parallel computations

    Directory of Open Access Journals (Sweden)

    Sang Bong Lee

    2016-09-01

    Full Text Available Numerical simulations of 1D Burgers equation and 2D sloshing problem were carried out to study numerical discrepancy between serial and parallel computations. The numerical domain was decomposed into 2 and 4 subdomains for parallel computations with message passing interface. The numerical solution of Burgers equation disclosed that fully explicit boundary conditions used on subdomains of parallel computation was responsible for the numerical discrepancy of transient solution between serial and parallel computations. Two dimensional sloshing problems in a rectangular domain were solved using OpenFOAM. After a lapse of initial transient time sloshing patterns of water were significantly different in serial and parallel computations although the same numerical conditions were given. Based on the histograms of pressure measured at two points near the wall the statistical characteristics of numerical solution was not affected by the number of subdomains as much as the transient solution was dependent on the number of subdomains.

  7. Parallel computation for solving the tridiagonal linear system of equations

    International Nuclear Information System (INIS)

    Ishiguro, Misako; Harada, Hiroo; Fujii, Minoru; Fujimura, Toichiro; Nakamura, Yasuhiro; Nanba, Katsumi.

    1981-09-01

    Recently, applications of parallel computation for scientific calculations have increased from the need of the high speed calculation of large scale programs. At the JAERI computing center, an array processor FACOM 230-75 APU has installed to study the applicability of parallel computation for nuclear codes. We made some numerical experiments by using the APU on the methods of solution of tridiagonal linear equation which is an important problem in scientific calculations. Referring to the recent papers with parallel methods, we investigate eight ones. These are Gauss elimination method, Parallel Gauss method, Accelerated parallel Gauss method, Jacobi method, Recursive doubling method, Cyclic reduction method, Chebyshev iteration method, and Conjugate gradient method. The computing time and accuracy were compared among the methods on the basis of the numerical experiments. As the result, it is found that the Cyclic reduction method is best both in computing time and accuracy and the Gauss elimination method is the second one. (author)

  8. Checkpointing for a hybrid computing node

    Science.gov (United States)

    Cher, Chen-Yong

    2016-03-08

    According to an aspect, a method for checkpointing in a hybrid computing node includes executing a task in a processing accelerator of the hybrid computing node. A checkpoint is created in a local memory of the processing accelerator. The checkpoint includes state data to restart execution of the task in the processing accelerator upon a restart operation. Execution of the task is resumed in the processing accelerator after creating the checkpoint. The state data of the checkpoint are transferred from the processing accelerator to a main processor of the hybrid computing node while the processing accelerator is executing the task.

  9. The Research of the Parallel Computing Development from the Angle of Cloud Computing

    Science.gov (United States)

    Peng, Zhensheng; Gong, Qingge; Duan, Yanyu; Wang, Yun

    2017-10-01

    Cloud computing is the development of parallel computing, distributed computing and grid computing. The development of cloud computing makes parallel computing come into people’s lives. Firstly, this paper expounds the concept of cloud computing and introduces two several traditional parallel programming model. Secondly, it analyzes and studies the principles, advantages and disadvantages of OpenMP, MPI and Map Reduce respectively. Finally, it takes MPI, OpenMP models compared to Map Reduce from the angle of cloud computing. The results of this paper are intended to provide a reference for the development of parallel computing.

  10. Stochastic Optimal Control of Parallel Hybrid Electric Vehicles

    Directory of Open Access Journals (Sweden)

    Feiyan Qin

    2017-02-01

    Full Text Available Energy management strategies (EMSs in hybrid electric vehicles (HEVs are highly related to the fuel economy and emission performances. However, EMS constitutes a challenging problem due to the complex structure of a HEV and the unknown or partially known driving cycles. To meet this problem, this paper adopts a stochastic dynamic programming (SDP method for the EMS of a specially designed vehicle, a pre-transmission single-shaft torque-coupling parallel HEV. In this parallel HEV, the auto clutch output is connected to the transmission input through an electric motor, which benefits an efficient motor assist operation. In this EMS, demanded torque of driver is modeled as a one-state Markov process to represent the uncertainty of future driving situations. The obtained EMS has been evaluated with ADVISOR2002 over two standard government drive cycles and a self-defined one, and compared with a dynamic programming (DP one and a rule-based one. Simulation results have shown the real-time performance of the proposed approach, and potential vehicle performance improvement relative to the rule-based one.

  11. Data communications in a parallel active messaging interface of a parallel computer

    Science.gov (United States)

    Archer, Charles J; Blocksome, Michael A; Ratterman, Joseph D; Smith, Brian E

    2013-10-29

    Data communications in a parallel active messaging interface (`PAMI`) of a parallel computer, the parallel computer including a plurality of compute nodes that execute a parallel application, the PAMI composed of data communications endpoints, each endpoint including a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task, the compute nodes and the endpoints coupled for data communications through the PAMI and through data communications resources, including receiving in an origin endpoint of the PAMI a data communications instruction, the instruction characterized by an instruction type, the instruction specifying a transmission of transfer data from the origin endpoint to a target endpoint and transmitting, in accordance with the instruction type, the transfer data from the origin endpoint to the target endpoint.

  12. Designing a parallel evolutionary algorithm for inferring gene networks on the cloud computing environment.

    Science.gov (United States)

    Lee, Wei-Po; Hsiao, Yu-Ting; Hwang, Wei-Che

    2014-01-16

    To improve the tedious task of reconstructing gene networks through testing experimentally the possible interactions between genes, it becomes a trend to adopt the automated reverse engineering procedure instead. Some evolutionary algorithms have been suggested for deriving network parameters. However, to infer large networks by the evolutionary algorithm, it is necessary to address two important issues: premature convergence and high computational cost. To tackle the former problem and to enhance the performance of traditional evolutionary algorithms, it is advisable to use parallel model evolutionary algorithms. To overcome the latter and to speed up the computation, it is advocated to adopt the mechanism of cloud computing as a promising solution: most popular is the method of MapReduce programming model, a fault-tolerant framework to implement parallel algorithms for inferring large gene networks. This work presents a practical framework to infer large gene networks, by developing and parallelizing a hybrid GA-PSO optimization method. Our parallel method is extended to work with the Hadoop MapReduce programming model and is executed in different cloud computing environments. To evaluate the proposed approach, we use a well-known open-source software GeneNetWeaver to create several yeast S. cerevisiae sub-networks and use them to produce gene profiles. Experiments have been conducted and the results have been analyzed. They show that our parallel approach can be successfully used to infer networks with desired behaviors and the computation time can be largely reduced. Parallel population-based algorithms can effectively determine network parameters and they perform better than the widely-used sequential algorithms in gene network inference. These parallel algorithms can be distributed to the cloud computing environment to speed up the computation. By coupling the parallel model population-based optimization method and the parallel computational framework, high

  13. From experiment to design -- Fault characterization and detection in parallel computer systems using computational accelerators

    Science.gov (United States)

    Yim, Keun Soo

    This dissertation summarizes experimental validation and co-design studies conducted to optimize the fault detection capabilities and overheads in hybrid computer systems (e.g., using CPUs and Graphics Processing Units, or GPUs), and consequently to improve the scalability of parallel computer systems using computational accelerators. The experimental validation studies were conducted to help us understand the failure characteristics of CPU-GPU hybrid computer systems under various types of hardware faults. The main characterization targets were faults that are difficult to detect and/or recover from, e.g., faults that cause long latency failures (Ch. 3), faults in dynamically allocated resources (Ch. 4), faults in GPUs (Ch. 5), faults in MPI programs (Ch. 6), and microarchitecture-level faults with specific timing features (Ch. 7). The co-design studies were based on the characterization results. One of the co-designed systems has a set of source-to-source translators that customize and strategically place error detectors in the source code of target GPU programs (Ch. 5). Another co-designed system uses an extension card to learn the normal behavioral and semantic execution patterns of message-passing processes executing on CPUs, and to detect abnormal behaviors of those parallel processes (Ch. 6). The third co-designed system is a co-processor that has a set of new instructions in order to support software-implemented fault detection techniques (Ch. 7). The work described in this dissertation gains more importance because heterogeneous processors have become an essential component of state-of-the-art supercomputers. GPUs were used in three of the five fastest supercomputers that were operating in 2011. Our work included comprehensive fault characterization studies in CPU-GPU hybrid computers. In CPUs, we monitored the target systems for a long period of time after injecting faults (a temporally comprehensive experiment), and injected faults into various types of

  14. The ongoing investigation of high performance parallel computing in HEP

    CERN Document Server

    Peach, Kenneth J; Böck, R K; Dobinson, Robert W; Hansroul, M; Norton, Alan Robert; Willers, Ian Malcolm; Baud, J P; Carminati, F; Gagliardi, F; McIntosh, E; Metcalf, M; Robertson, L; CERN. Geneva. Detector Research and Development Committee

    1993-01-01

    Past and current exploitation of parallel computing in High Energy Physics is summarized and a list of R & D projects in this area is presented. The applicability of new parallel hardware and software to physics problems is investigated, in the light of the requirements for computing power of LHC experiments and the current trends in the computer industry. Four main themes are discussed (possibilities for a finer grain of parallelism; fine-grain communication mechanism; usable parallel programming environment; different programming models and architectures, using standard commercial products). Parallel computing technology is potentially of interest for offline and vital for real time applications in LHC. A substantial investment in applications development and evaluation of state of the art hardware and software products is needed. A solid development environment is required at an early stage, before mainline LHC program development begins.

  15. Hypergraph partitioning implementation for parallelizing matrix-vector multiplication using CUDA GPU-based parallel computing

    Science.gov (United States)

    Murni, Bustamam, A.; Ernastuti, Handhika, T.; Kerami, D.

    2017-07-01

    Calculation of the matrix-vector multiplication in the real-world problems often involves large matrix with arbitrary size. Therefore, parallelization is needed to speed up the calculation process that usually takes a long time. Graph partitioning techniques that have been discussed in the previous studies cannot be used to complete the parallelized calculation of matrix-vector multiplication with arbitrary size. This is due to the assumption of graph partitioning techniques that can only solve the square and symmetric matrix. Hypergraph partitioning techniques will overcome the shortcomings of the graph partitioning technique. This paper addresses the efficient parallelization of matrix-vector multiplication through hypergraph partitioning techniques using CUDA GPU-based parallel computing. CUDA (compute unified device architecture) is a parallel computing platform and programming model that was created by NVIDIA and implemented by the GPU (graphics processing unit).

  16. Heterogeneous Hardware Parallelism Review of the IN2P3 2016 Computing School

    Science.gov (United States)

    Lafage, Vincent

    2017-11-01

    Parallel and hybrid Monte Carlo computation. The Monte Carlo method is the main workhorse for computation of particle physics observables. This paper provides an overview of various HPC technologies that can be used today: multicore (OpenMP, HPX), manycore (OpenCL). The rewrite of a twenty years old Fortran 77 Monte Carlo will illustrate the various programming paradigms in use beyond language implementation. The problem of parallel random number generator will be addressed. We will give a short report of the one week school dedicated to these recent approaches, that took place in École Polytechnique in May 2016.

  17. Vector and parallel processors in computational science

    International Nuclear Information System (INIS)

    Duff, I.S.; Reid, J.K.

    1985-01-01

    This book presents the papers given at a conference which reviewed the new developments in parallel and vector processing. Topics considered at the conference included hardware (array processors, supercomputers), programming languages, software aids, numerical methods (e.g., Monte Carlo algorithms, iterative methods, finite elements, optimization), and applications (e.g., neutron transport theory, meteorology, image processing)

  18. Partitions in languages and parallel computations

    Energy Technology Data Exchange (ETDEWEB)

    Burgin, M S; Burgina, E S

    1982-05-01

    Partitions of entries (linguistic structures) are studied that are intended for parallel data processing. The representations of formal languages with the aid of such structures is examined, and the relationships are considered between partitions of entries and abstract families of languages and automata. 18 references.

  19. Heuristic framework for parallel sorting computations | Nwanze ...

    African Journals Online (AJOL)

    Parallel sorting techniques have become of practical interest with the advent of new multiprocessor architectures. The decreasing cost of these processors will probably in the future, make the solutions that are derived thereof to be more appealing. Efficient algorithms for sorting scheme that are encountered in a number of ...

  20. Algorithms for parallel and vector computations

    Science.gov (United States)

    Ortega, James M.

    1995-01-01

    This is a final report on work performed under NASA grant NAG-1-1112-FOP during the period March, 1990 through February 1995. Four major topics are covered: (1) solution of nonlinear poisson-type equations; (2) parallel reduced system conjugate gradient method; (3) orderings for conjugate gradient preconditioners, and (4) SOR as a preconditioner.

  1. Parallel computing, failure recovery, and extreme values

    DEFF Research Database (Denmark)

    Andersen, Lars Nørvang; Asmussen, Søren

    A task of random size T is split into M subtasks of lengths T1, . . . , TM, each of which is sent to one out of M parallel processors. Each processor may fail at a random time before completing its allocated task, and then has to restart it from the beginning. If X1, . . . ,XM are the total task ...

  2. Parallel and Efficient Sensitivity Analysis of Microscopy Image Segmentation Workflows in Hybrid Systems.

    Science.gov (United States)

    Barreiros, Willian; Teodoro, George; Kurc, Tahsin; Kong, Jun; Melo, Alba C M A; Saltz, Joel

    2017-09-01

    We investigate efficient sensitivity analysis (SA) of algorithms that segment and classify image features in a large dataset of high-resolution images. Algorithm SA is the process of evaluating variations of methods and parameter values to quantify differences in the output. A SA can be very compute demanding because it requires re-processing the input dataset several times with different parameters to assess variations in output. In this work, we introduce strategies to efficiently speed up SA via runtime optimizations targeting distributed hybrid systems and reuse of computations from runs with different parameters. We evaluate our approach using a cancer image analysis workflow on a hybrid cluster with 256 nodes, each with an Intel Phi and a dual socket CPU. The SA attained a parallel efficiency of over 90% on 256 nodes. The cooperative execution using the CPUs and the Phi available in each node with smart task assignment strategies resulted in an additional speedup of about 2×. Finally, multi-level computation reuse lead to an additional speedup of up to 2.46× on the parallel version. The level of performance attained with the proposed optimizations will allow the use of SA in large-scale studies.

  3. Parallel computing by Monte Carlo codes MVP/GMVP

    International Nuclear Information System (INIS)

    Nagaya, Yasunobu; Nakagawa, Masayuki; Mori, Takamasa

    2001-01-01

    General-purpose Monte Carlo codes MVP/GMVP are well-vectorized and thus enable us to perform high-speed Monte Carlo calculations. In order to achieve more speedups, we parallelized the codes on the different types of parallel computing platforms or by using a standard parallelization library MPI. The platforms used for benchmark calculations are a distributed-memory vector-parallel computer Fujitsu VPP500, a distributed-memory massively parallel computer Intel paragon and a distributed-memory scalar-parallel computer Hitachi SR2201, IBM SP2. As mentioned generally, linear speedup could be obtained for large-scale problems but parallelization efficiency decreased as the batch size per a processing element(PE) was smaller. It was also found that the statistical uncertainty for assembly powers was less than 0.1% by the PWR full-core calculation with more than 10 million histories and it took about 1.5 hours by massively parallel computing. (author)

  4. Parallel algorithms and architecture for computation of manipulator forward dynamics

    Science.gov (United States)

    Fijany, Amir; Bejczy, Antal K.

    1989-01-01

    Parallel computation of manipulator forward dynamics is investigated. Considering three classes of algorithms for the solution of the problem, that is, the O(n), the O(n exp 2), and the O(n exp 3) algorithms, parallelism in the problem is analyzed. It is shown that the problem belongs to the class of NC and that the time and processors bounds are of O(log2/2n) and O(n exp 4), respectively. However, the fastest stable parallel algorithms achieve the computation time of O(n) and can be derived by parallelization of the O(n exp 3) serial algorithms. Parallel computation of the O(n exp 3) algorithms requires the development of parallel algorithms for a set of fundamentally different problems, that is, the Newton-Euler formulation, the computation of the inertia matrix, decomposition of the symmetric, positive definite matrix, and the solution of triangular systems. Parallel algorithms for this set of problems are developed which can be efficiently implemented on a unique architecture, a triangular array of n(n+2)/2 processors with a simple nearest-neighbor interconnection. This architecture is particularly suitable for VLSI and WSI implementations. The developed parallel algorithm, compared to the best serial O(n) algorithm, achieves an asymptotic speedup of more than two orders-of-magnitude in the computation the forward dynamics.

  5. Study on MPI/OpenMP hybrid parallelism for Monte Carlo neutron transport code

    International Nuclear Information System (INIS)

    Liang Jingang; Xu Qi; Wang Kan; Liu Shiwen

    2013-01-01

    Parallel programming with mixed mode of messages-passing and shared-memory has several advantages when used in Monte Carlo neutron transport code, such as fitting hardware of distributed-shared clusters, economizing memory demand of Monte Carlo transport, improving parallel performance, and so on. MPI/OpenMP hybrid parallelism was implemented based on a one dimension Monte Carlo neutron transport code. Some critical factors affecting the parallel performance were analyzed and solutions were proposed for several problems such as contention access, lock contention and false sharing. After optimization the code was tested finally. It is shown that the hybrid parallel code can reach good performance just as pure MPI parallel program, while it saves a lot of memory usage at the same time. Therefore hybrid parallel is efficient for achieving large-scale parallel of Monte Carlo neutron transport. (authors)

  6. Highly parallel machines and future of scientific computing

    International Nuclear Information System (INIS)

    Singh, G.S.

    1992-01-01

    Computing requirement of large scale scientific computing has always been ahead of what state of the art hardware could supply in the form of supercomputers of the day. And for any single processor system the limit to increase in the computing power was realized a few years back itself. Now with the advent of parallel computing systems the availability of machines with the required computing power seems a reality. In this paper the author tries to visualize the future large scale scientific computing in the penultimate decade of the present century. The author summarized trends in parallel computers and emphasize the need for a better programming environment and software tools for optimal performance. The author concludes this paper with critique on parallel architectures, software tools and algorithms. (author). 10 refs., 2 tabs

  7. Parallel computing in genomic research: advances and applications

    Directory of Open Access Journals (Sweden)

    Ocaña K

    2015-11-01

    Full Text Available Kary Ocaña,1 Daniel de Oliveira2 1National Laboratory of Scientific Computing, Petrópolis, Rio de Janeiro, 2Institute of Computing, Fluminense Federal University, Niterói, Brazil Abstract: Today's genomic experiments have to process the so-called "biological big data" that is now reaching the size of Terabytes and Petabytes. To process this huge amount of data, scientists may require weeks or months if they use their own workstations. Parallelism techniques and high-performance computing (HPC environments can be applied for reducing the total processing time and to ease the management, treatment, and analyses of this data. However, running bioinformatics experiments in HPC environments such as clouds, grids, clusters, and graphics processing unit requires the expertise from scientists to integrate computational, biological, and mathematical techniques and technologies. Several solutions have already been proposed to allow scientists for processing their genomic experiments using HPC capabilities and parallelism techniques. This article brings a systematic review of literature that surveys the most recently published research involving genomics and parallel computing. Our objective is to gather the main characteristics, benefits, and challenges that can be considered by scientists when running their genomic experiments to benefit from parallelism techniques and HPC capabilities. Keywords: high-performance computing, genomic research, cloud computing, grid computing, cluster computing, parallel computing

  8. Basic design of parallel computational program for probabilistic structural analysis

    International Nuclear Information System (INIS)

    Kaji, Yoshiyuki; Arai, Taketoshi; Gu, Wenwei; Nakamura, Hitoshi

    1999-06-01

    In our laboratory, for 'development of damage evaluation method of structural brittle materials by microscopic fracture mechanics and probabilistic theory' (nuclear computational science cross-over research) we examine computational method related to super parallel computation system which is coupled with material strength theory based on microscopic fracture mechanics for latent cracks and continuum structural model to develop new structural reliability evaluation methods for ceramic structures. This technical report is the review results regarding probabilistic structural mechanics theory, basic terms of formula and program methods of parallel computation which are related to principal terms in basic design of computational mechanics program. (author)

  9. Basic design of parallel computational program for probabilistic structural analysis

    Energy Technology Data Exchange (ETDEWEB)

    Kaji, Yoshiyuki; Arai, Taketoshi [Japan Atomic Energy Research Inst., Tokai, Ibaraki (Japan). Tokai Research Establishment; Gu, Wenwei; Nakamura, Hitoshi

    1999-06-01

    In our laboratory, for `development of damage evaluation method of structural brittle materials by microscopic fracture mechanics and probabilistic theory` (nuclear computational science cross-over research) we examine computational method related to super parallel computation system which is coupled with material strength theory based on microscopic fracture mechanics for latent cracks and continuum structural model to develop new structural reliability evaluation methods for ceramic structures. This technical report is the review results regarding probabilistic structural mechanics theory, basic terms of formula and program methods of parallel computation which are related to principal terms in basic design of computational mechanics program. (author)

  10. Generalised Computability and Applications to Hybrid Systems

    DEFF Research Database (Denmark)

    Korovina, Margarita V.; Kudinov, Oleg V.

    2001-01-01

    We investigate the concept of generalised computability of operators and functionals defined on the set of continuous functions, firstly introduced in [9]. By working in the reals, with equality and without equality, we study properties of generalised computable operators and functionals. Also we...... propose an interesting application to formalisation of hybrid systems. We obtain some class of hybrid systems, which trajectories are computable in the sense of computable analysis. This research was supported in part by the RFBR (grants N 99-01-00485, N 00-01- 00810) and by the Siberian Branch of RAS (a...... grant for young researchers, 2000)...

  11. Parallel computing in plasma physics: Nonlinear instabilities

    International Nuclear Information System (INIS)

    Pohn, E.; Kamelander, G.; Shoucri, M.

    2000-01-01

    A Vlasov-Poisson-system is used for studying the time evolution of the charge-separation at a spatial one- as well as a two-dimensional plasma-edge. Ions are advanced in time using the Vlasov-equation. The whole three-dimensional velocity-space is considered leading to very time-consuming four-resp. five-dimensional fully kinetic simulations. In the 1D simulations electrons are assumed to behave adiabatic, i.e. they are Boltzmann-distributed, leading to a nonlinear Poisson-equation. In the 2D simulations a gyro-kinetic approximation is used for the electrons. The plasma is assumed to be initially neutral. The simulations are performed at an equidistant grid. A constant time-step is used for advancing the density-distribution function in time. The time-evolution of the distribution function is performed using a splitting scheme. Each dimension (x, y, υ x , υ y , υ z ) of the phase-space is advanced in time separately. The value of the distribution function for the next time is calculated from the value of an - in general - interstitial point at the present time (fractional shift). One-dimensional cubic-spline interpolation is used for calculating the interstitial function values. After the fractional shifts are performed for each dimension of the phase-space, a whole time-step for advancing the distribution function is finished. Afterwards the charge density is calculated, the Poisson-equation is solved and the electric field is calculated before the next time-step is performed. The fractional shift method sketched above was parallelized for p processors as follows. Considering first the shifts in y-direction, a proper parallelization strategy is to split the grid into p disjoint υ z -slices, which are sub-grids, each containing a different 1/p-th part of the υ z range but the whole range of all other dimensions. Each processor is responsible for performing the y-shifts on a different slice, which can be done in parallel without any communication between

  12. Research in Parallel Algorithms and Software for Computational Aerosciences

    Science.gov (United States)

    Domel, Neal D.

    1996-01-01

    Phase 1 is complete for the development of a computational fluid dynamics CFD) parallel code with automatic grid generation and adaptation for the Euler analysis of flow over complex geometries. SPLITFLOW, an unstructured Cartesian grid code developed at Lockheed Martin Tactical Aircraft Systems, has been modified for a distributed memory/massively parallel computing environment. The parallel code is operational on an SGI network, Cray J90 and C90 vector machines, SGI Power Challenge, and Cray T3D and IBM SP2 massively parallel machines. Parallel Virtual Machine (PVM) is the message passing protocol for portability to various architectures. A domain decomposition technique was developed which enforces dynamic load balancing to improve solution speed and memory requirements. A host/node algorithm distributes the tasks. The solver parallelizes very well, and scales with the number of processors. Partially parallelized and non-parallelized tasks consume most of the wall clock time in a very fine grain environment. Timing comparisons on a Cray C90 demonstrate that Parallel SPLITFLOW runs 2.4 times faster on 8 processors than its non-parallel counterpart autotasked over 8 processors.

  13. Performance of Air Pollution Models on Massively Parallel Computers

    DEFF Research Database (Denmark)

    Brown, John; Hansen, Per Christian; Wasniewski, Jerzy

    1996-01-01

    To compare the performance and use of three massively parallel SIMD computers, we implemented a large air pollution model on the computers. Using a realistic large-scale model, we gain detailed insight about the performance of the three computers when used to solve large-scale scientific problems...

  14. Universal blind quantum computation for hybrid system

    Science.gov (United States)

    Huang, He-Liang; Bao, Wan-Su; Li, Tan; Li, Feng-Guang; Fu, Xiang-Qun; Zhang, Shuo; Zhang, Hai-Long; Wang, Xiang

    2017-08-01

    As progress on the development of building quantum computer continues to advance, first-generation practical quantum computers will be available for ordinary users in the cloud style similar to IBM's Quantum Experience nowadays. Clients can remotely access the quantum servers using some simple devices. In such a situation, it is of prime importance to keep the security of the client's information. Blind quantum computation protocols enable a client with limited quantum technology to delegate her quantum computation to a quantum server without leaking any privacy. To date, blind quantum computation has been considered only for an individual quantum system. However, practical universal quantum computer is likely to be a hybrid system. Here, we take the first step to construct a framework of blind quantum computation for the hybrid system, which provides a more feasible way for scalable blind quantum computation.

  15. Parallel computing solution of Boltzmann neutron transport equation

    International Nuclear Information System (INIS)

    Ansah-Narh, T.

    2010-01-01

    The focus of the research was on developing parallel computing algorithm for solving Eigen-values of the Boltzmam Neutron Transport Equation (BNTE) in a slab geometry using multi-grid approach. In response to the problem of slow execution of serial computing when solving large problems, such as BNTE, the study was focused on the design of parallel computing systems which was an evolution of serial computing that used multiple processing elements simultaneously to solve complex physical and mathematical problems. Finite element method (FEM) was used for the spatial discretization scheme, while angular discretization was accomplished by expanding the angular dependence in terms of Legendre polynomials. The eigenvalues representing the multiplication factors in the BNTE were determined by the power method. MATLAB Compiler Version 4.1 (R2009a) was used to compile the MATLAB codes of BNTE. The implemented parallel algorithms were enabled with matlabpool, a Parallel Computing Toolbox function. The option UseParallel was set to 'always' and the default value of the option was 'never'. When those conditions held, the solvers computed estimated gradients in parallel. The parallel computing system was used to handle all the bottlenecks in the matrix generated from the finite element scheme and each domain of the power method generated. The parallel algorithm was implemented on a Symmetric Multi Processor (SMP) cluster machine, which had Intel 32 bit quad-core x 86 processors. Convergence rates and timings for the algorithm on the SMP cluster machine were obtained. Numerical experiments indicated the designed parallel algorithm could reach perfect speedup and had good stability and scalability. (au)

  16. Fluid dynamics parallel computer development at NASA Langley Research Center

    Science.gov (United States)

    Townsend, James C.; Zang, Thomas A.; Dwoyer, Douglas L.

    1987-01-01

    To accomplish more detailed simulations of highly complex flows, such as the transition to turbulence, fluid dynamics research requires computers much more powerful than any available today. Only parallel processing on multiple-processor computers offers hope for achieving the required effective speeds. Looking ahead to the use of these machines, the fluid dynamicist faces three issues: algorithm development for near-term parallel computers, architecture development for future computer power increases, and assessment of possible advantages of special purpose designs. Two projects at NASA Langley address these issues. Software development and algorithm exploration is being done on the FLEX/32 Parallel Processing Research Computer. New architecture features are being explored in the special purpose hardware design of the Navier-Stokes Computer. These projects are complementary and are producing promising results.

  17. Climate Ocean Modeling on Parallel Computers

    Science.gov (United States)

    Wang, P.; Cheng, B. N.; Chao, Y.

    1998-01-01

    Ocean modeling plays an important role in both understanding the current climatic conditions and predicting future climate change. However, modeling the ocean circulation at various spatial and temporal scales is a very challenging computational task.

  18. History Matching in Parallel Computational Environments

    Energy Technology Data Exchange (ETDEWEB)

    Steven Bryant; Sanjay Srinivasan; Alvaro Barrera; Sharad Yadav

    2004-08-31

    In the probabilistic approach for history matching, the information from the dynamic data is merged with the prior geologic information in order to generate permeability models consistent with the observed dynamic data as well as the prior geology. The relationship between dynamic response data and reservoir attributes may vary in different regions of the reservoir due to spatial variations in reservoir attributes, fluid properties, well configuration, flow constrains on wells etc. This implies probabilistic approach should then update different regions of the reservoir in different ways. This necessitates delineation of multiple reservoir domains in order to increase the accuracy of the approach. The research focuses on a probabilistic approach to integrate dynamic data that ensures consistency between reservoir models developed from one stage to the next. The algorithm relies on efficient parameterization of the dynamic data integration problem and permits rapid assessment of the updated reservoir model at each stage. The report also outlines various domain decomposition schemes from the perspective of increasing the accuracy of probabilistic approach of history matching. Research progress in three important areas of the project are discussed: {lg_bullet}Validation and testing the probabilistic approach to incorporating production data in reservoir models. {lg_bullet}Development of a robust scheme for identifying reservoir regions that will result in a more robust parameterization of the history matching process. {lg_bullet}Testing commercial simulators for parallel capability and development of a parallel algorithm for history matching.

  19. Parallel computing for event reconstruction in high-energy physics

    International Nuclear Information System (INIS)

    Wolbers, S.

    1993-01-01

    Parallel computing has been recognized as a solution to large computing problems. In High Energy Physics offline event reconstruction of detector data is a very large computing problem that has been solved with parallel computing techniques. A review of the parallel programming package CPS (Cooperative Processes Software) developed and used at Fermilab for offline reconstruction of Terabytes of data requiring the delivery of hundreds of Vax-Years per experiment is given. The Fermilab UNIX farms, consisting of 180 Silicon Graphics workstations and 144 IBM RS6000 workstations, are used to provide the computing power for the experiments. Fermilab has had a long history of providing production parallel computing starting with the ACP (Advanced Computer Project) Farms in 1986. The Fermilab UNIX Farms have been in production for over 2 years with 24 hour/day service to experimental user groups. Additional tools for management, control and monitoring these large systems will be described. Possible future directions for parallel computing in High Energy Physics will be given

  20. Comparative Analysis of Torque and Acceleration of Pre- and Post-Transmission Parallel Hybrid Drivetrains

    Directory of Open Access Journals (Sweden)

    Zulkifli Saiful A.

    2016-01-01

    Full Text Available Parallel hybrid electric vehicles (HEV can be classified according to the location of the electric motor with respect to the transmission unit for the internal combustion engine (ICE: they can be pre-transmission or posttransmission parallel hybrid. A split-axle parallel HEV – in which the ICE and electric motor provide propulsion power to different axles – is a sub-type of the post-transmission hybrid, since addition of torque and power from the two power sources occurs after the vehicle’s transmission. The term ‘through-the-road’ (TTR hybrid is also used for the split-parallel HEV, since power coupling between the ICE and electric motor is not through some mechanical device but through the vehicle itself, its wheels and the road on which it moves. The present work presents torquespeed relationship of the split-parallel hybrid and analyses simulation results of torque profiles and acceleration performance of pre-transmission and post-transmission hybrid configurations, using three different sizes of electric motor. Different operating regions of the pre-trans and post-trans motors are observed, leading to different speed and torque profiles. Although ICE average efficiency in the post-trans hybrid is slightly lower than in the pre-trans hybrid, the post-trans hybrid vehicle has better fuel economy and acceleration performance than the pre-trans hybrid vehicle.

  1. Hybrid parallel strategy for the simulation of fast transient accidental situations at reactor scale

    International Nuclear Information System (INIS)

    Faucher, V.; Galon, P.; Beccantini, A.; Crouzet, F.; Debaud, F.; Gautier, T.

    2015-01-01

    Highlights: • Reference accidental situations for current and future reactors are considered. • They require the modeling of complex fluid–structure systems at full reactor scale. • EPX software computes the non-linear transient solution with explicit time stepping. • Focus on the parallel hybrid solver specific to the proposed coupled equations. - Abstract: This contribution is dedicated to the latest methodological developments implemented in the fast transient dynamics software EUROPLEXUS (EPX) to simulate the mechanical response of fully coupled fluid–structure systems to accidental situations to be considered at reactor scale, among which the Loss of Coolant Accident, the Core Disruptive Accident and the Hydrogen Explosion. Time integration is explicit and the search for reference solutions within the safety framework prevents any simplification and approximations in the coupled algorithm: for instance, all kinematic constraints are dealt with using Lagrange Multipliers, yielding a complex flow chart when non-permanent constraints such as unilateral contact or immersed fluid–structure boundaries are considered. The parallel acceleration of the solution process is then achieved through a hybrid approach, based on a weighted domain decomposition for distributed memory computing and the use of the KAAPI library for self-balanced shared memory processing inside subdomains

  2. Parallelism, fractal geometry and other aspects of computational mathematics

    International Nuclear Information System (INIS)

    Churchhouse, R.F.

    1991-01-01

    In some fields such as meteorology, theoretical physics, quantum chemistry and hydrodynamics there are problems which involve so much computation that computers of the power of a thousand times a Cray 2 could be fully utilised if they were available. Since it is unlikely that uniprocessors of such power will be available, such large scale problems could be solved by using systems of computers running in parallel. This approach, of course, requires to find appropriate algorithms for the solution of such problems which can efficiently make use of a large number of computers working in parallel. 11 refs, 10 figs, 1 tab

  3. Parallel Computing:. Some Activities in High Energy Physics

    Science.gov (United States)

    Willers, Ian

    This paper examines some activities in High Energy Physics that utilise parallel computing. The topic includes all computing from the proposed SIMD front end detectors, the farming applications, high-powered RISC processors and the large machines in the computer centers. We start by looking at the motivation behind using parallelism for general purpose computing. The developments around farming are then described from its simplest form to the more complex system in Fermilab. Finally, there is a list of some developments that are happening close to the experiments.

  4. Parallel ray tracing for one-dimensional discrete ordinate computations

    International Nuclear Information System (INIS)

    Jarvis, R.D.; Nelson, P.

    1996-01-01

    The ray-tracing sweep in discrete-ordinates, spatially discrete numerical approximation methods applied to the linear, steady-state, plane-parallel, mono-energetic, azimuthally symmetric, neutral-particle transport equation can be reduced to a parallel prefix computation. In so doing, the often severe penalty in convergence rate of the source iteration, suffered by most current parallel algorithms using spatial domain decomposition, can be avoided while attaining parallelism in the spatial domain to whatever extent desired. In addition, the reduction implies parallel algorithm complexity limits for the ray-tracing sweep. The reduction applies to all closed, linear, one-cell functional (CLOF) spatial approximation methods, which encompasses most in current popular use. Scalability test results of an implementation of the algorithm on a 64-node nCube-2S hypercube-connected, message-passing, multi-computer are described. (author)

  5. Parallel computer calculation of quantum spin lattices

    International Nuclear Information System (INIS)

    Lamarcq, J.

    1998-01-01

    Numerical simulation allows the theorists to convince themselves about the validity of the models they use. Particularly by simulating the spin lattices one can judge about the validity of a conjecture. Simulating a system defined by a large number of degrees of freedom requires highly sophisticated machines. This study deals with modelling the magnetic interactions between the ions of a crystal. Many exact results have been found for spin 1/2 systems but not for systems of other spins for which many simulation have been carried out. The interest for simulations has been renewed by the Haldane's conjecture stipulating the existence of a energy gap between the ground state and the first excited states of a spin 1 lattice. The existence of this gap has been experimentally demonstrated. This report contains the following four chapters: 1. Spin systems; 2. Calculation of eigenvalues; 3. Programming; 4. Parallel calculation

  6. 3D magnetospheric parallel hybrid multi-grid method applied to planet–plasma interactions

    Energy Technology Data Exchange (ETDEWEB)

    Leclercq, L., E-mail: ludivine.leclercq@latmos.ipsl.fr [LATMOS/IPSL, UVSQ Université Paris-Saclay, UPMC Univ. Paris 06, CNRS, Guyancourt (France); Modolo, R., E-mail: ronan.modolo@latmos.ipsl.fr [LATMOS/IPSL, UVSQ Université Paris-Saclay, UPMC Univ. Paris 06, CNRS, Guyancourt (France); Leblanc, F. [LATMOS/IPSL, UPMC Univ. Paris 06 Sorbonne Universités, UVSQ, CNRS, Paris (France); Hess, S. [ONERA, Toulouse (France); Mancini, M. [LUTH, Observatoire Paris-Meudon (France)

    2016-03-15

    We present a new method to exploit multiple refinement levels within a 3D parallel hybrid model, developed to study planet–plasma interactions. This model is based on the hybrid formalism: ions are kinetically treated whereas electrons are considered as a inertia-less fluid. Generally, ions are represented by numerical particles whose size equals the volume of the cells. Particles that leave a coarse grid subsequently entering a refined region are split into particles whose volume corresponds to the volume of the refined cells. The number of refined particles created from a coarse particle depends on the grid refinement rate. In order to conserve velocity distribution functions and to avoid calculations of average velocities, particles are not coalesced. Moreover, to ensure the constancy of particles' shape function sizes, the hybrid method is adapted to allow refined particles to move within a coarse region. Another innovation of this approach is the method developed to compute grid moments at interfaces between two refinement levels. Indeed, the hybrid method is adapted to accurately account for the special grid structure at the interfaces, avoiding any overlapping grid considerations. Some fundamental test runs were performed to validate our approach (e.g. quiet plasma flow, Alfven wave propagation). Lastly, we also show a planetary application of the model, simulating the interaction between Jupiter's moon Ganymede and the Jovian plasma.

  7. Parallel structures in human and computer memory

    Science.gov (United States)

    Kanerva, Pentti

    1986-08-01

    If we think of our experiences as being recorded continuously on film, then human memory can be compared to a film library that is indexed by the contents of the film strips stored in it. Moreover, approximate retrieval cues suffice to retrieve information stored in this library: We recognize a familiar person in a fuzzy photograph or a familiar tune played on a strange instrument. This paper is about how to construct a computer memory that would allow a computer to recognize patterns and to recall sequences the way humans do. Such a memory is remarkably similar in structure to a conventional computer memory and also to the neural circuits in the cortex of the cerebellum of the human brain. The paper concludes that the frame problem of artificial intelligence could be solved by the use of such a memory if we were able to encode information about the world properly.

  8. Layout design and energetic analysis of a complex diesel parallel hybrid electric vehicle

    International Nuclear Information System (INIS)

    Finesso, Roberto; Spessa, Ezio; Venditti, Mattia

    2014-01-01

    Highlights: • Layout design, energetic and cost analysis of complex parallel hybrid vehicles. • Development of global and real-time optimizers for control strategy identification. • Rule-based control strategies to minimize fuel consumption and NO x . • Energy share across each working mode for battery and thermal engine. - Abstract: The present paper is focused on the design, optimization and analysis of a complex parallel hybrid electric vehicle, equipped with two electric machines on both the front and rear axles, and on the evaluation of its potential to reduce fuel consumption and NO x emissions over several driving missions. The vehicle has been compared with two conventional parallel hybrid vehicles, equipped with a single electric machine on the front axle or on the rear axle, as well as with a conventional vehicle. All the vehicles have been equipped with compression ignition engines. The optimal layout of each vehicle was identified on the basis of the minimization of the overall powertrain costs during the whole vehicle life. These costs include the initial investment due to the production of the components as well as the operating costs related to fuel consumption and to battery depletion. Identification of the optimal powertrain control strategy, in terms of the management of the power flows of the engine and electric machines, and of gear selection, is necessary in order to be able to fully exploit the potential of the hybrid architecture. To this end, two global optimizers, one of a deterministic nature and another of a stochastic type, and two real-time optimizers have been developed, applied and compared. A new mathematical technique has been developed and applied to the vehicle simulation model in order to decrease the computational time of the optimizers. First, the vehicle model equations were written in order to allow a coarse time grid to be used, then, the control variables (i.e., power flow and gear number) were discretized, and the

  9. Misleading Performance Claims in Parallel Computations

    Energy Technology Data Exchange (ETDEWEB)

    Bailey, David H.

    2009-05-29

    In a previous humorous note entitled 'Twelve Ways to Fool the Masses,' I outlined twelve common ways in which performance figures for technical computer systems can be distorted. In this paper and accompanying conference talk, I give a reprise of these twelve 'methods' and give some actual examples that have appeared in peer-reviewed literature in years past. I then propose guidelines for reporting performance, the adoption of which would raise the level of professionalism and reduce the level of confusion, not only in the world of device simulation but also in the larger arena of technical computing.

  10. Global seismic tomography and modern parallel computers

    Directory of Open Access Journals (Sweden)

    A. Piersanti

    2006-06-01

    Full Text Available A fast technological progress is providing seismic tomographers with computers of rapidly increasing speed and RAM, that are not always properly taken advantage of. Large computers with both shared-memory and distributedmemory architectures have made it possible to approach the tomographic inverse problem more accurately. For example, resolution can be quantified from the resolution matrix rather than checkerboard tests; the covariance matrix can be calculated to evaluate the propagation of errors from data to model parameters; the L-curve method can be applied to determine a range of acceptable regularization schemes. We show how these exercises can be implemented efficiently on different hardware architectures.

  11. Integrated computer network high-speed parallel interface

    International Nuclear Information System (INIS)

    Frank, R.B.

    1979-03-01

    As the number and variety of computers within Los Alamos Scientific Laboratory's Central Computer Facility grows, the need for a standard, high-speed intercomputer interface has become more apparent. This report details the development of a High-Speed Parallel Interface from conceptual through implementation stages to meet current and future needs for large-scle network computing within the Integrated Computer Network. 4 figures

  12. Parallel algorithms and archtectures for computational structural mechanics

    Science.gov (United States)

    Patrick, Merrell; Ma, Shing; Mahajan, Umesh

    1989-01-01

    The determination of the fundamental (lowest) natural vibration frequencies and associated mode shapes is a key step used to uncover and correct potential failures or problem areas in most complex structures. However, the computation time taken by finite element codes to evaluate these natural frequencies is significant, often the most computationally intensive part of structural analysis calculations. There is continuing need to reduce this computation time. This study addresses this need by developing methods for parallel computation.

  13. Heterotic computing: exploiting hybrid computational devices.

    Science.gov (United States)

    Kendon, Viv; Sebald, Angelika; Stepney, Susan

    2015-07-28

    Current computational theory deals almost exclusively with single models: classical, neural, analogue, quantum, etc. In practice, researchers use ad hoc combinations, realizing only recently that they can be fundamentally more powerful than the individual parts. A Theo Murphy meeting brought together theorists and practitioners of various types of computing, to engage in combining the individual strengths to produce powerful new heterotic devices. 'Heterotic computing' is defined as a combination of two or more computational systems such that they provide an advantage over either substrate used separately. This post-meeting collection of articles provides a wide-ranging survey of the state of the art in diverse computational paradigms, together with reflections on their future combination into powerful and practical applications. © 2015 The Author(s) Published by the Royal Society. All rights reserved.

  14. Accelerating Climate and Weather Simulations through Hybrid Computing

    Science.gov (United States)

    Zhou, Shujia; Cruz, Carlos; Duffy, Daniel; Tucker, Robert; Purcell, Mark

    2011-01-01

    Unconventional multi- and many-core processors (e.g. IBM (R) Cell B.E.(TM) and NVIDIA (R) GPU) have emerged as effective accelerators in trial climate and weather simulations. Yet these climate and weather models typically run on parallel computers with conventional processors (e.g. Intel, AMD, and IBM) using Message Passing Interface. To address challenges involved in efficiently and easily connecting accelerators to parallel computers, we investigated using IBM's Dynamic Application Virtualization (TM) (IBM DAV) software in a prototype hybrid computing system with representative climate and weather model components. The hybrid system comprises two Intel blades and two IBM QS22 Cell B.E. blades, connected with both InfiniBand(R) (IB) and 1-Gigabit Ethernet. The system significantly accelerates a solar radiation model component by offloading compute-intensive calculations to the Cell blades. Systematic tests show that IBM DAV can seamlessly offload compute-intensive calculations from Intel blades to Cell B.E. blades in a scalable, load-balanced manner. However, noticeable communication overhead was observed, mainly due to IP over the IB protocol. Full utilization of IB Sockets Direct Protocol and the lower latency production version of IBM DAV will reduce this overhead.

  15. Finite element electromagnetic field computation on the Sequent Symmetry 81 parallel computer

    International Nuclear Information System (INIS)

    Ratnajeevan, S.; Hoole, H.

    1990-01-01

    Finite element field analysis algorithms lend themselves to parallelization and this fact is exploited in this paper to implement a finite element analysis program for electromagnetic field computation on the Sequent Symmetry 81 parallel computer with three processors. In terms of waiting time, the maximum gains are to be made in matrix solution and therefore this paper concentrates on the gains in parallelizing the solution part of finite element analysis. An outline of how parallelization could be exploited in most finite element operations is given in this paper although the actual implemention of parallelism on the Sequent Symmetry 81 parallel computer was in sparsity computation, matrix assembly and the matrix solution areas. In all cases, the algorithms were modified suit the parallel programming application rather than allowing the compiler to parallelize on existing algorithms

  16. Hybrid shared/distributed parallelism for 3D characteristics transport solvers

    International Nuclear Information System (INIS)

    Dahmani, M.; Roy, R.

    2005-01-01

    In this paper, we will present a new hybrid parallel model for solving large-scale 3-dimensional neutron transport problems used in nuclear reactor simulations. Large heterogeneous reactor problems, like the ones that occurs when simulating Candu cores, have remained computationally intensive and impractical for routine applications on single-node or even vector computers. Based on the characteristics method, this new model is designed to solve the transport equation after distributing the calculation load on a network of shared memory multi-processors. The tracks are either generated on the fly at each characteristics sweep or stored in sequential files. The load balancing is taken into account by estimating the calculation load of tracks and by distributing batches of uniform load on each node of the network. Moreover, the communication overhead can be predicted after benchmarking the latency and bandwidth using appropriate network test suite. These models are useful for predicting the performance of the parallel applications and to analyze the scalability of the parallel systems. (authors)

  17. A microeconomic scheduler for parallel computers

    Science.gov (United States)

    Stoica, Ion; Abdel-Wahab, Hussein; Pothen, Alex

    1995-01-01

    We describe a scheduler based on the microeconomic paradigm for scheduling on-line a set of parallel jobs in a multiprocessor system. In addition to the classical objectives of increasing the system throughput and reducing the response time, we consider fairness in allocating system resources among the users, and providing the user with control over the relative performances of his jobs. We associate with every user a savings account in which he receives money at a constant rate. When a user wants to run a job, he creates an expense account for that job to which he transfers money from his savings account. The job uses the funds in its expense account to obtain the system resources it needs for execution. The share of the system resources allocated to the user is directly related to the rate at which the user receives money; the rate at which the user transfers money into a job expense account controls the job's performance. We prove that starvation is not possible in our model. Simulation results show that our scheduler improves both system and user performances in comparison with two different variable partitioning policies. It is also shown to be effective in guaranteeing fairness and providing control over the performance of jobs.

  18. Efficient Parallel Kernel Solvers for Computational Fluid Dynamics Applications

    Science.gov (United States)

    Sun, Xian-He

    1997-01-01

    Distributed-memory parallel computers dominate today's parallel computing arena. These machines, such as Intel Paragon, IBM SP2, and Cray Origin2OO, have successfully delivered high performance computing power for solving some of the so-called "grand-challenge" problems. Despite initial success, parallel machines have not been widely accepted in production engineering environments due to the complexity of parallel programming. On a parallel computing system, a task has to be partitioned and distributed appropriately among processors to reduce communication cost and to attain load balance. More importantly, even with careful partitioning and mapping, the performance of an algorithm may still be unsatisfactory, since conventional sequential algorithms may be serial in nature and may not be implemented efficiently on parallel machines. In many cases, new algorithms have to be introduced to increase parallel performance. In order to achieve optimal performance, in addition to partitioning and mapping, a careful performance study should be conducted for a given application to find a good algorithm-machine combination. This process, however, is usually painful and elusive. The goal of this project is to design and develop efficient parallel algorithms for highly accurate Computational Fluid Dynamics (CFD) simulations and other engineering applications. The work plan is 1) developing highly accurate parallel numerical algorithms, 2) conduct preliminary testing to verify the effectiveness and potential of these algorithms, 3) incorporate newly developed algorithms into actual simulation packages. The work plan has well achieved. Two highly accurate, efficient Poisson solvers have been developed and tested based on two different approaches: (1) Adopting a mathematical geometry which has a better capacity to describe the fluid, (2) Using compact scheme to gain high order accuracy in numerical discretization. The previously developed Parallel Diagonal Dominant (PDD) algorithm

  19. History Matching in Parallel Computational Environments

    Energy Technology Data Exchange (ETDEWEB)

    Steven Bryant; Sanjay Srinivasan; Alvaro Barrera; Sharad Yadav

    2005-10-01

    A novel methodology for delineating multiple reservoir domains for the purpose of history matching in a distributed computing environment has been proposed. A fully probabilistic approach to perturb permeability within the delineated zones is implemented. The combination of robust schemes for identifying reservoir zones and distributed computing significantly increase the accuracy and efficiency of the probabilistic approach. The information pertaining to the permeability variations in the reservoir that is contained in dynamic data is calibrated in terms of a deformation parameter rD. This information is merged with the prior geologic information in order to generate permeability models consistent with the observed dynamic data as well as the prior geology. The relationship between dynamic response data and reservoir attributes may vary in different regions of the reservoir due to spatial variations in reservoir attributes, well configuration, flow constrains etc. The probabilistic approach then has to account for multiple r{sub D} values in different regions of the reservoir. In order to delineate reservoir domains that can be characterized with different rD parameters, principal component analysis (PCA) of the Hessian matrix has been done. The Hessian matrix summarizes the sensitivity of the objective function at a given step of the history matching to model parameters. It also measures the interaction of the parameters in affecting the objective function. The basic premise of PC analysis is to isolate the most sensitive and least correlated regions. The eigenvectors obtained during the PCA are suitably scaled and appropriate grid block volume cut-offs are defined such that the resultant domains are neither too large (which increases interactions between domains) nor too small (implying ineffective history matching). The delineation of domains requires calculation of Hessian, which could be computationally costly and as well as restricts the current approach to

  20. A compositional reservoir simulator on distributed memory parallel computers

    International Nuclear Information System (INIS)

    Rame, M.; Delshad, M.

    1995-01-01

    This paper presents the application of distributed memory parallel computes to field scale reservoir simulations using a parallel version of UTCHEM, The University of Texas Chemical Flooding Simulator. The model is a general purpose highly vectorized chemical compositional simulator that can simulate a wide range of displacement processes at both field and laboratory scales. The original simulator was modified to run on both distributed memory parallel machines (Intel iPSC/960 and Delta, Connection Machine 5, Kendall Square 1 and 2, and CRAY T3D) and a cluster of workstations. A domain decomposition approach has been taken towards parallelization of the code. A portion of the discrete reservoir model is assigned to each processor by a set-up routine that attempts a data layout as even as possible from the load-balance standpoint. Each of these subdomains is extended so that data can be shared between adjacent processors for stencil computation. The added routines that make parallel execution possible are written in a modular fashion that makes the porting to new parallel platforms straight forward. Results of the distributed memory computing performance of Parallel simulator are presented for field scale applications such as tracer flood and polymer flood. A comparison of the wall-clock times for same problems on a vector supercomputer is also presented

  1. Models of parallel computation :a survey and classification

    Institute of Scientific and Technical Information of China (English)

    ZHANG Yunquan; CHEN Guoliang; SUN Guangzhong; MIAO Qiankun

    2007-01-01

    In this paper,the state-of-the-art parallel computational model research is reviewed.We will introduce various models that were developed during the past decades.According to their targeting architecture features,especially memory organization,we classify these parallel computational models into three generations.These models and their characteristics are discussed based on three generations classification.We believe that with the ever increasing speed gap between the CPU and memory systems,incorporating non-uniform memory hierarchy into computational models will become unavoidable.With the emergence of multi-core CPUs,the parallelism hierarchy of current computing platforms becomes more and more complicated.Describing this complicated parallelism hierarchy in future computational models becomes more and more important.A semi-automatic toolkit that can extract model parameters and their values on real computers can reduce the model analysis complexity,thus allowing more complicated models with more parameters to be adopted.Hierarchical memory and hierarchical parallelism will be two very important features that should be considered in future model design and research.

  2. Identifying failure in a tree network of a parallel computer

    Science.gov (United States)

    Archer, Charles J.; Pinnow, Kurt W.; Wallenfelt, Brian P.

    2010-08-24

    Methods, parallel computers, and products are provided for identifying failure in a tree network of a parallel computer. The parallel computer includes one or more processing sets including an I/O node and a plurality of compute nodes. For each processing set embodiments include selecting a set of test compute nodes, the test compute nodes being a subset of the compute nodes of the processing set; measuring the performance of the I/O node of the processing set; measuring the performance of the selected set of test compute nodes; calculating a current test value in dependence upon the measured performance of the I/O node of the processing set, the measured performance of the set of test compute nodes, and a predetermined value for I/O node performance; and comparing the current test value with a predetermined tree performance threshold. If the current test value is below the predetermined tree performance threshold, embodiments include selecting another set of test compute nodes. If the current test value is not below the predetermined tree performance threshold, embodiments include selecting from the test compute nodes one or more potential problem nodes and testing individually potential problem nodes and links to potential problem nodes.

  3. Fast Parallel Computation of Polynomials Using Few Processors

    DEFF Research Database (Denmark)

    Valiant, Leslie G.; Skyum, Sven; Berkowitz, S.

    1983-01-01

    It is shown that any multivariate polynomial of degree $d$ that can be computed sequentially in $C$ steps can be computed in parallel in $O((\\log d)(\\log C + \\log d))$ steps using only $(Cd)^{O(1)} $ processors....

  4. Tutorial: Parallel Computing of Simulation Models for Risk Analysis.

    Science.gov (United States)

    Reilly, Allison C; Staid, Andrea; Gao, Michael; Guikema, Seth D

    2016-10-01

    Simulation models are widely used in risk analysis to study the effects of uncertainties on outcomes of interest in complex problems. Often, these models are computationally complex and time consuming to run. This latter point may be at odds with time-sensitive evaluations or may limit the number of parameters that are considered. In this article, we give an introductory tutorial focused on parallelizing simulation code to better leverage modern computing hardware, enabling risk analysts to better utilize simulation-based methods for quantifying uncertainty in practice. This article is aimed primarily at risk analysts who use simulation methods but do not yet utilize parallelization to decrease the computational burden of these models. The discussion is focused on conceptual aspects of embarrassingly parallel computer code and software considerations. Two complementary examples are shown using the languages MATLAB and R. A brief discussion of hardware considerations is located in the Appendix. © 2016 Society for Risk Analysis.

  5. Dynamic grid refinement for partial differential equations on parallel computers

    International Nuclear Information System (INIS)

    Mccormick, S.; Quinlan, D.

    1989-01-01

    The fast adaptive composite grid method (FAC) is an algorithm that uses various levels of uniform grids to provide adaptive resolution and fast solution of PDEs. An asynchronous version of FAC, called AFAC, that completely eliminates the bottleneck to parallelism is presented. This paper describes the advantage that this algorithm has in adaptive refinement for moving singularities on multiprocessor computers. This work is applicable to the parallel solution of two- and three-dimensional shock tracking problems. 6 refs

  6. RAMA: A file system for massively parallel computers

    Science.gov (United States)

    Miller, Ethan L.; Katz, Randy H.

    1993-01-01

    This paper describes a file system design for massively parallel computers which makes very efficient use of a few disks per processor. This overcomes the traditional I/O bottleneck of massively parallel machines by storing the data on disks within the high-speed interconnection network. In addition, the file system, called RAMA, requires little inter-node synchronization, removing another common bottleneck in parallel processor file systems. Support for a large tertiary storage system can easily be integrated in lo the file system; in fact, RAMA runs most efficiently when tertiary storage is used.

  7. Parallel Processing and Applied Mathematics. 10th International Conference, PPAM 2013. Revised Selected Papers

    DEFF Research Database (Denmark)

    The following topics are dealt with: parallel scientific computing; numerical algorithms; parallel nonnumerical algorithms; cloud computing; evolutionary computing; metaheuristics; applied mathematics; GPU computing; multicore systems; hybrid architectures; hierarchical parallelism; HPC systems......; power monitoring; energy monitoring; and distributed computing....

  8. Parallel grid generation algorithm for distributed memory computers

    Science.gov (United States)

    Moitra, Stuti; Moitra, Anutosh

    1994-01-01

    A parallel grid-generation algorithm and its implementation on the Intel iPSC/860 computer are described. The grid-generation scheme is based on an algebraic formulation of homotopic relations. Methods for utilizing the inherent parallelism of the grid-generation scheme are described, and implementation of multiple levELs of parallelism on multiple instruction multiple data machines are indicated. The algorithm is capable of providing near orthogonality and spacing control at solid boundaries while requiring minimal interprocessor communications. Results obtained on the Intel hypercube for a blended wing-body configuration are used to demonstrate the effectiveness of the algorithm. Fortran implementations bAsed on the native programming model of the iPSC/860 computer and the Express system of software tools are reported. Computational gains in execution time speed-up ratios are given.

  9. CUBESIM, Hypercube and Denelcor Hep Parallel Computer Simulation

    International Nuclear Information System (INIS)

    Dunigan, T.H.

    1988-01-01

    1 - Description of program or function: CUBESIM is a set of subroutine libraries and programs for the simulation of message-passing parallel computers and shared-memory parallel computers. Subroutines are supplied to simulate the Intel hypercube and the Denelcor HEP parallel computers. The system permits a user to develop and test parallel programs written in C or FORTRAN on a single processor. The user may alter such hypercube parameters as message startup times, packet size, and the computation-to-communication ratio. The simulation generates a trace file that can be used for debugging, performance analysis, or graphical display. 2 - Method of solution: The CUBESIM simulator is linked with the user's parallel application routines to run as a single UNIX process. The simulator library provides a small operating system to perform process and message management. 3 - Restrictions on the complexity of the problem: Up to 128 processors can be simulated with a virtual memory limit of 6 million bytes. Up to 1000 processes can be simulated

  10. Processing data communications events by awakening threads in parallel active messaging interface of a parallel computer

    Science.gov (United States)

    Archer, Charles J.; Blocksome, Michael A.; Ratterman, Joseph D.; Smith, Brian E.

    2016-03-15

    Processing data communications events in a parallel active messaging interface (`PAMI`) of a parallel computer that includes compute nodes that execute a parallel application, with the PAMI including data communications endpoints, and the endpoints are coupled for data communications through the PAMI and through other data communications resources, including determining by an advance function that there are no actionable data communications events pending for its context, placing by the advance function its thread of execution into a wait state, waiting for a subsequent data communications event for the context; responsive to occurrence of a subsequent data communications event for the context, awakening by the thread from the wait state; and processing by the advance function the subsequent data communications event now pending for the context.

  11. IPython: components for interactive and parallel computing across disciplines. (Invited)

    Science.gov (United States)

    Perez, F.; Bussonnier, M.; Frederic, J. D.; Froehle, B. M.; Granger, B. E.; Ivanov, P.; Kluyver, T.; Patterson, E.; Ragan-Kelley, B.; Sailer, Z.

    2013-12-01

    Scientific computing is an inherently exploratory activity that requires constantly cycling between code, data and results, each time adjusting the computations as new insights and questions arise. To support such a workflow, good interactive environments are critical. The IPython project (http://ipython.org) provides a rich architecture for interactive computing with: 1. Terminal-based and graphical interactive consoles. 2. A web-based Notebook system with support for code, text, mathematical expressions, inline plots and other rich media. 3. Easy to use, high performance tools for parallel computing. Despite its roots in Python, the IPython architecture is designed in a language-agnostic way to facilitate interactive computing in any language. This allows users to mix Python with Julia, R, Octave, Ruby, Perl, Bash and more, as well as to develop native clients in other languages that reuse the IPython clients. In this talk, I will show how IPython supports all stages in the lifecycle of a scientific idea: 1. Individual exploration. 2. Collaborative development. 3. Production runs with parallel resources. 4. Publication. 5. Education. In particular, the IPython Notebook provides an environment for "literate computing" with a tight integration of narrative and computation (including parallel computing). These Notebooks are stored in a JSON-based document format that provides an "executable paper": notebooks can be version controlled, exported to HTML or PDF for publication, and used for teaching.

  12. Small file aggregation in a parallel computing system

    Science.gov (United States)

    Faibish, Sorin; Bent, John M.; Tzelnic, Percy; Grider, Gary; Zhang, Jingwang

    2014-09-02

    Techniques are provided for small file aggregation in a parallel computing system. An exemplary method for storing a plurality of files generated by a plurality of processes in a parallel computing system comprises aggregating the plurality of files into a single aggregated file; and generating metadata for the single aggregated file. The metadata comprises an offset and a length of each of the plurality of files in the single aggregated file. The metadata can be used to unpack one or more of the files from the single aggregated file.

  13. Computational acceleration for MR image reconstruction in partially parallel imaging.

    Science.gov (United States)

    Ye, Xiaojing; Chen, Yunmei; Huang, Feng

    2011-05-01

    In this paper, we present a fast numerical algorithm for solving total variation and l(1) (TVL1) based image reconstruction with application in partially parallel magnetic resonance imaging. Our algorithm uses variable splitting method to reduce computational cost. Moreover, the Barzilai-Borwein step size selection method is adopted in our algorithm for much faster convergence. Experimental results on clinical partially parallel imaging data demonstrate that the proposed algorithm requires much fewer iterations and/or less computational cost than recently developed operator splitting and Bregman operator splitting methods, which can deal with a general sensing matrix in reconstruction framework, to get similar or even better quality of reconstructed images.

  14. Functional efficiency comparison between split- and parallel-hybrid using advanced energy flow analysis methods

    Energy Technology Data Exchange (ETDEWEB)

    Guttenberg, Philipp; Lin, Mengyan [Romax Technology, Nottingham (United Kingdom)

    2009-07-01

    The following paper presents a comparative efficiency analysis of the Toyota Prius versus the Honda Insight using advanced Energy Flow Analysis methods. The sample study shows that even very different hybrid concepts like a split- and a parallel-hybrid can be compared in a high level of detail and demonstrates the benefit showing exemplary results. (orig.)

  15. Algorithms for computational fluid dynamics n parallel processors

    International Nuclear Information System (INIS)

    Van de Velde, E.F.

    1986-01-01

    A study of parallel algorithms for the numerical solution of partial differential equations arising in computational fluid dynamics is presented. The actual implementation on parallel processors of shared and nonshared memory design is discussed. The performance of these algorithms is analyzed in terms of machine efficiency, communication time, bottlenecks and software development costs. For elliptic equations, a parallel preconditioned conjugate gradient method is described, which has been used to solve pressure equations discretized with high order finite elements on irregular grids. A parallel full multigrid method and a parallel fast Poisson solver are also presented. Hyperbolic conservation laws were discretized with parallel versions of finite difference methods like the Lax-Wendroff scheme and with the Random Choice method. Techniques are developed for comparing the behavior of an algorithm on different architectures as a function of problem size and local computational effort. Effective use of these advanced architecture machines requires the use of machine dependent programming. It is shown that the portability problems can be minimized by introducing high level operations on vectors and matrices structured into program libraries

  16. Parallel computing in cluster of GPU applied to a problem of nuclear engineering

    International Nuclear Information System (INIS)

    Moraes, Sergio Ricardo S.; Heimlich, Adino; Resende, Pedro

    2013-01-01

    Cluster computing has been widely used as a low cost alternative for parallel processing in scientific applications. With the use of Message-Passing Interface (MPI) protocol development became even more accessible and widespread in the scientific community. A more recent trend is the use of Graphic Processing Unit (GPU), which is a powerful co-processor able to perform hundreds of instructions in parallel, reaching a capacity of hundreds of times the processing of a CPU. However, a standard PC does not allow, in general, more than two GPUs. Hence, it is proposed in this work development and evaluation of a hybrid low cost parallel approach to the solution to a nuclear engineering typical problem. The idea is to use clusters parallelism technology (MPI) together with GPU programming techniques (CUDA - Compute Unified Device Architecture) to simulate neutron transport through a slab using Monte Carlo method. By using a cluster comprised by four quad-core computers with 2 GPU each, it has been developed programs using MPI and CUDA technologies. Experiments, applying different configurations, from 1 to 8 GPUs has been performed and results were compared with the sequential (non-parallel) version. A speed up of about 2.000 times has been observed when comparing the 8-GPU with the sequential version. Results here presented are discussed and analyzed with the objective of outlining gains and possible limitations of the proposed approach. (author)

  17. Hybrid and Parallel Domain-Decomposition Methods Development to Enable Monte Carlo for Reactor Analyses

    International Nuclear Information System (INIS)

    Wagner, John C.; Mosher, Scott W.; Evans, Thomas M.; Peplow, Douglas E.; Turner, John A.

    2010-01-01

    This paper describes code and methods development at the Oak Ridge National Laboratory focused on enabling high-fidelity, large-scale reactor analyses with Monte Carlo (MC). Current state-of-the-art tools and methods used to perform real commercial reactor analyses have several undesirable features, the most significant of which is the non-rigorous spatial decomposition scheme. Monte Carlo methods, which allow detailed and accurate modeling of the full geometry and are considered the gold standard for radiation transport solutions, are playing an ever-increasing role in correcting and/or verifying the deterministic, multi-level spatial decomposition methodology in current practice. However, the prohibitive computational requirements associated with obtaining fully converged, system-wide solutions restrict the role of MC to benchmarking deterministic results at a limited number of state-points for a limited number of relevant quantities. The goal of this research is to change this paradigm by enabling direct use of MC for full-core reactor analyses. The most significant of the many technical challenges that must be overcome are the slow, non-uniform convergence of system-wide MC estimates and the memory requirements associated with detailed solutions throughout a reactor (problems involving hundreds of millions of different material and tally regions due to fuel irradiation, temperature distributions, and the needs associated with multi-physics code coupling). To address these challenges, our research has focused on the development and implementation of (1) a novel hybrid deterministic/MC method for determining high-precision fluxes throughout the problem space in k-eigenvalue problems and (2) an efficient MC domain-decomposition (DD) algorithm that partitions the problem phase space onto multiple processors for massively parallel systems, with statistical uncertainty estimation. The hybrid method development is based on an extension of the FW-CADIS method, which

  18. Hybrid and parallel domain-decomposition methods development to enable Monte Carlo for reactor analyses

    International Nuclear Information System (INIS)

    Wagner, J.C.; Mosher, S.W.; Evans, T.M.; Peplow, D.E.; Turner, J.A.

    2010-01-01

    This paper describes code and methods development at the Oak Ridge National Laboratory focused on enabling high-fidelity, large-scale reactor analyses with Monte Carlo (MC). Current state-of-the-art tools and methods used to perform 'real' commercial reactor analyses have several undesirable features, the most significant of which is the non-rigorous spatial decomposition scheme. Monte Carlo methods, which allow detailed and accurate modeling of the full geometry and are considered the 'gold standard' for radiation transport solutions, are playing an ever-increasing role in correcting and/or verifying the deterministic, multi-level spatial decomposition methodology in current practice. However, the prohibitive computational requirements associated with obtaining fully converged, system-wide solutions restrict the role of MC to benchmarking deterministic results at a limited number of state-points for a limited number of relevant quantities. The goal of this research is to change this paradigm by enabling direct use of MC for full-core reactor analyses. The most significant of the many technical challenges that must be overcome are the slow, non-uniform convergence of system-wide MC estimates and the memory requirements associated with detailed solutions throughout a reactor (problems involving hundreds of millions of different material and tally regions due to fuel irradiation, temperature distributions, and the needs associated with multi-physics code coupling). To address these challenges, our research has focused on the development and implementation of (1) a novel hybrid deterministic/MC method for determining high-precision fluxes throughout the problem space in k-eigenvalue problems and (2) an efficient MC domain-decomposition (DD) algorithm that partitions the problem phase space onto multiple processors for massively parallel systems, with statistical uncertainty estimation. The hybrid method development is based on an extension of the FW-CADIS method

  19. Parallel computing for homogeneous diffusion and transport equations in neutronics

    International Nuclear Information System (INIS)

    Pinchedez, K.

    1999-06-01

    Parallel computing meets the ever-increasing requirements for neutronic computer code speed and accuracy. In this work, two different approaches have been considered. We first parallelized the sequential algorithm used by the neutronics code CRONOS developed at the French Atomic Energy Commission. The algorithm computes the dominant eigenvalue associated with PN simplified transport equations by a mixed finite element method. Several parallel algorithms have been developed on distributed memory machines. The performances of the parallel algorithms have been studied experimentally by implementation on a T3D Cray and theoretically by complexity models. A comparison of various parallel algorithms has confirmed the chosen implementations. We next applied a domain sub-division technique to the two-group diffusion Eigen problem. In the modal synthesis-based method, the global spectrum is determined from the partial spectra associated with sub-domains. Then the Eigen problem is expanded on a family composed, on the one hand, from eigenfunctions associated with the sub-domains and, on the other hand, from functions corresponding to the contribution from the interface between the sub-domains. For a 2-D homogeneous core, this modal method has been validated and its accuracy has been measured. (author)

  20. Parallelism in computations in quantum and statistical mechanics

    International Nuclear Information System (INIS)

    Clementi, E.; Corongiu, G.; Detrich, J.H.

    1985-01-01

    Often very fundamental biochemical and biophysical problems defy simulations because of limitations in today's computers. We present and discuss a distributed system composed of two IBM 4341 s and/or an IBM 4381 as front-end processors and ten FPS-164 attached array processors. This parallel system - called LCAP - has presently a peak performance of about 110 Mflops; extensions to higher performance are discussed. Presently, the system applications use a modified version of VM/SP as the operating system: description of the modifications is given. Three applications programs have been migrated from sequential to parallel: a molecular quantum mechanical, a Metropolis-Monte Carlo and a molecular dynamics program. Descriptions of the parallel codes are briefly outlined. Use of these parallel codes has already opened up new capabilities for our research. The very positive performance comparisons with today's supercomputers allow us to conclude that parallel computers and programming, of the type we have considered, represent a pragmatic answer to many computationally intensive problems. (orig.)

  1. A Novel Parallel Algorithm for Edit Distance Computation

    Directory of Open Access Journals (Sweden)

    Muhammad Murtaza Yousaf

    2018-01-01

    Full Text Available The edit distance between two sequences is the minimum number of weighted transformation-operations that are required to transform one string into the other. The weighted transformation-operations are insert, remove, and substitute. Dynamic programming solution to find edit distance exists but it becomes computationally intensive when the lengths of strings become very large. This work presents a novel parallel algorithm to solve edit distance problem of string matching. The algorithm is based on resolving dependencies in the dynamic programming solution of the problem and it is able to compute each row of edit distance table in parallel. In this way, it becomes possible to compute the complete table in min(m,n iterations for strings of size m and n whereas state-of-the-art parallel algorithm solves the problem in max(m,n iterations. The proposed algorithm also increases the amount of parallelism in each of its iteration. The algorithm is also capable of exploiting spatial locality while its implementation. Additionally, the algorithm works in a load balanced way that further improves its performance. The algorithm is implemented for multicore systems having shared memory. Implementation of the algorithm in OpenMP shows linear speedup and better execution time as compared to state-of-the-art parallel approach. Efficiency of the algorithm is also proven better in comparison to its competitor.

  2. Analysis of multigrid methods on massively parallel computers: Architectural implications

    Science.gov (United States)

    Matheson, Lesley R.; Tarjan, Robert E.

    1993-01-01

    We study the potential performance of multigrid algorithms running on massively parallel computers with the intent of discovering whether presently envisioned machines will provide an efficient platform for such algorithms. We consider the domain parallel version of the standard V cycle algorithm on model problems, discretized using finite difference techniques in two and three dimensions on block structured grids of size 10(exp 6) and 10(exp 9), respectively. Our models of parallel computation were developed to reflect the computing characteristics of the current generation of massively parallel multicomputers. These models are based on an interconnection network of 256 to 16,384 message passing, 'workstation size' processors executing in an SPMD mode. The first model accomplishes interprocessor communications through a multistage permutation network. The communication cost is a logarithmic function which is similar to the costs in a variety of different topologies. The second model allows single stage communication costs only. Both models were designed with information provided by machine developers and utilize implementation derived parameters. With the medium grain parallelism of the current generation and the high fixed cost of an interprocessor communication, our analysis suggests an efficient implementation requires the machine to support the efficient transmission of long messages, (up to 1000 words) or the high initiation cost of a communication must be significantly reduced through an alternative optimization technique. Furthermore, with variable length message capability, our analysis suggests the low diameter multistage networks provide little or no advantage over a simple single stage communications network.

  3. Position Analysis of a Hybrid Serial-Parallel Manipulator in Immersion Lithography

    Directory of Open Access Journals (Sweden)

    Jie-jie Shao

    2015-01-01

    Full Text Available This paper proposes a novel hybrid serial-parallel mechanism with 6 degrees of freedom. The new mechanism combines two different parallel modules in a serial form. 3-P̲(PH parallel module is architecture of 3 degrees of freedom based on higher joints and specializes in describing two planes’ relative pose. 3-P̲SP parallel module is typical architecture which has been widely investigated in recent researches. In this paper, the direct-inverse position problems of the 3-P̲SP parallel module in the couple mixed-type mode are analyzed in detail, and the solutions are obtained in an analytical form. Furthermore, the solutions for the direct and inverse position problems of the novel hybrid serial-parallel mechanism are also derived and obtained in the analytical form. The proposed hybrid serial-parallel mechanism is applied to regulate the immersion hood’s pose in an immersion lithography system. Through measuring and regulating the pose of the immersion hood with respect to the wafer surface simultaneously, the immersion hood can track the wafer surface’s pose in real-time and the gap status is stabilized. This is another exploration to hybrid serial-parallel mechanism’s application.

  4. A Hybrid Genetic Algorithm to Minimize Total Tardiness for Unrelated Parallel Machine Scheduling with Precedence Constraints

    Directory of Open Access Journals (Sweden)

    Chunfeng Liu

    2013-01-01

    Full Text Available The paper presents a novel hybrid genetic algorithm (HGA for a deterministic scheduling problem where multiple jobs with arbitrary precedence constraints are processed on multiple unrelated parallel machines. The objective is to minimize total tardiness, since delays of the jobs may lead to punishment cost or cancellation of orders by the clients in many situations. A priority rule-based heuristic algorithm, which schedules a prior job on a prior machine according to the priority rule at each iteration, is suggested and embedded to the HGA for initial feasible schedules that can be improved in further stages. Computational experiments are conducted to show that the proposed HGA performs well with respect to accuracy and efficiency of solution for small-sized problems and gets better results than the conventional genetic algorithm within the same runtime for large-sized problems.

  5. Hybrid parallel strategy for the simulation of fast transient accidental situations at reactor scale

    International Nuclear Information System (INIS)

    Faucher, V.; Galon, P.; Beccantini, A.; Crouzet, F.; Debaud, F.; Gautier, T.

    2013-01-01

    This contribution is dedicated to the latest methodological developments implemented in the fast transient dynamics software EUROPLEXUS (EPX) to simulate the mechanical response of fully coupled fluid-structure systems to accidental situations to be considered at reactor scale, among which the Loss of Coolant Accident, the Core Disruptive Accident and the Hydrogen Explosion. Time integration is explicit and the search for reference solutions within the safety framework prevents any simplification and approximations in the coupled algorithm: for instance, all kinematic constraints are dealt with using Lagrange Multipliers, yielding a complex flow chart when non-permanent constraints such as unilateral contact or immersed fluid-structure boundaries are considered. The parallel acceleration of the solution process is then achieved through a hybrid approach, based on a weighted domain decomposition for distributed memory computing and the use of the KAAPI library for self-balanced shared memory processing inside sub-domains. (authors)

  6. General-purpose parallel simulator for quantum computing

    International Nuclear Information System (INIS)

    Niwa, Jumpei; Matsumoto, Keiji; Imai, Hiroshi

    2002-01-01

    With current technologies, it seems to be very difficult to implement quantum computers with many qubits. It is therefore of importance to simulate quantum algorithms and circuits on the existing computers. However, for a large-size problem, the simulation often requires more computational power than is available from sequential processing. Therefore, simulation methods for parallel processors are required. We have developed a general-purpose simulator for quantum algorithms/circuits on the parallel computer (Sun Enterprise4500). It can simulate algorithms/circuits with up to 30 qubits. In order to test efficiency of our proposed methods, we have simulated Shor's factorization algorithm and Grover's database search, and we have analyzed robustness of the corresponding quantum circuits in the presence of both decoherence and operational errors. The corresponding results, statistics, and analyses are presented in this paper

  7. Parallel algorithms for computation of the manipulator inertia matrix

    Science.gov (United States)

    Amin-Javaheri, Masoud; Orin, David E.

    1989-01-01

    The development of an O(log2N) parallel algorithm for the manipulator inertia matrix is presented. It is based on the most efficient serial algorithm which uses the composite rigid body method. Recursive doubling is used to reformulate the linear recurrence equations which are required to compute the diagonal elements of the matrix. It results in O(log2N) levels of computation. Computation of the off-diagonal elements involves N linear recurrences of varying-size and a new method, which avoids redundant computation of position and orientation transforms for the manipulator, is developed. The O(log2N) algorithm is presented in both equation and graphic forms which clearly show the parallelism inherent in the algorithm.

  8. Fencing data transfers in a parallel active messaging interface of a parallel computer

    Science.gov (United States)

    Blocksome, Michael A.; Mamidala, Amith R.

    2015-06-02

    Fencing data transfers in a parallel active messaging interface (`PAMI`) of a parallel computer, the PAMI including data communications endpoints, each endpoint including a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task; the compute nodes coupled for data communications through the PAMI and through data communications resources including at least one segment of shared random access memory; including initiating execution through the PAMI of an ordered sequence of active SEND instructions for SEND data transfers between two endpoints, effecting deterministic SEND data transfers through a segment of shared memory; and executing through the PAMI, with no FENCE accounting for SEND data transfers, an active FENCE instruction, the FENCE instruction completing execution only after completion of all SEND instructions initiated prior to execution of the FENCE instruction for SEND data transfers between the two endpoints.

  9. Hybrid computer modelling in plasma physics

    International Nuclear Information System (INIS)

    Hromadka, J; Ibehej, T; Hrach, R

    2016-01-01

    Our contribution is devoted to development of hybrid modelling techniques. We investigate sheath structures in the vicinity of solids immersed in low temperature argon plasma of different pressures by means of particle and fluid computer models. We discuss the differences in results obtained by these methods and try to propose a way to improve the results of fluid models in the low pressure area. There is a possibility to employ Chapman-Enskog method to find appropriate closure relations of fluid equations in a case when particle distribution function is not Maxwellian. We try to follow this way to enhance fluid model and to use it in hybrid plasma model further. (paper)

  10. Element-topology-independent preconditioners for parallel finite element computations

    Science.gov (United States)

    Park, K. C.; Alexander, Scott

    1992-01-01

    A family of preconditioners for the solution of finite element equations are presented, which are element-topology independent and thus can be applicable to element order-free parallel computations. A key feature of the present preconditioners is the repeated use of element connectivity matrices and their left and right inverses. The properties and performance of the present preconditioners are demonstrated via beam and two-dimensional finite element matrices for implicit time integration computations.

  11. Image processing with massively parallel computer Quadrics Q1

    International Nuclear Information System (INIS)

    Della Rocca, A.B.; La Porta, L.; Ferriani, S.

    1995-05-01

    Aimed to evaluate the image processing capabilities of the massively parallel computer Quadrics Q1, a convolution algorithm that has been implemented is described in this report. At first the discrete convolution mathematical definition is recalled together with the main Q1 h/w and s/w features. Then the different codification forms of the algorythm are described and the Q1 performances are compared with those obtained by different computers. Finally, the conclusions report on main results and suggestions

  12. WEKA-G: Parallel data mining on computational grids

    Directory of Open Access Journals (Sweden)

    PIMENTA, A.

    2009-12-01

    Full Text Available Data mining is a technology that can extract useful information from large amounts of data. However, mining a database often requires a high computational power. To resolve this problem, this paper presents a tool (Weka-G, which runs in parallel algorithms used in the mining process data. As the environment for doing so, we use a computational grid by adding several features within a WAN.

  13. Hardware packet pacing using a DMA in a parallel computer

    Science.gov (United States)

    Chen, Dong; Heidelberger, Phillip; Vranas, Pavlos

    2013-08-13

    Method and system for hardware packet pacing using a direct memory access controller in a parallel computer which, in one aspect, keeps track of a total number of bytes put on the network as a result of a remote get operation, using a hardware token counter.

  14. On the efficient parallel computation of Legendre transforms

    NARCIS (Netherlands)

    Inda, M.A.; Bisseling, R.H.; Maslen, D.K.

    2001-01-01

    In this article, we discuss a parallel implementation of efficient algorithms for computation of Legendre polynomial transforms and other orthogonal polynomial transforms. We develop an approach to the Driscoll-Healy algorithm using polynomial arithmetic and present experimental results on the

  15. On the efficient parallel computation of Legendre transforms

    NARCIS (Netherlands)

    Inda, M.A.; Bisseling, R.H.; Maslen, D.K.

    1999-01-01

    In this article we discuss a parallel implementation of efficient algorithms for computation of Legendre polynomial transforms and other orthogonal polynomial transforms. We develop an approach to the Driscoll-Healy algorithm using polynomial arithmetic and present experimental results on the

  16. Emerging Nanophotonic Applications Explored with Advanced Scientific Parallel Computing

    Science.gov (United States)

    Meng, Xiang

    The domain of nanoscale optical science and technology is a combination of the classical world of electromagnetics and the quantum mechanical regime of atoms and molecules. Recent advancements in fabrication technology allows the optical structures to be scaled down to nanoscale size or even to the atomic level, which are far smaller than the wavelength they are designed for. These nanostructures can have unique, controllable, and tunable optical properties and their interactions with quantum materials can have important near-field and far-field optical response. Undoubtedly, these optical properties can have many important applications, ranging from the efficient and tunable light sources, detectors, filters, modulators, high-speed all-optical switches; to the next-generation classical and quantum computation, and biophotonic medical sensors. This emerging research of nanoscience, known as nanophotonics, is a highly interdisciplinary field requiring expertise in materials science, physics, electrical engineering, and scientific computing, modeling and simulation. It has also become an important research field for investigating the science and engineering of light-matter interactions that take place on wavelength and subwavelength scales where the nature of the nanostructured matter controls the interactions. In addition, the fast advancements in the computing capabilities, such as parallel computing, also become as a critical element for investigating advanced nanophotonic devices. This role has taken on even greater urgency with the scale-down of device dimensions, and the design for these devices require extensive memory and extremely long core hours. Thus distributed computing platforms associated with parallel computing are required for faster designs processes. Scientific parallel computing constructs mathematical models and quantitative analysis techniques, and uses the computing machines to analyze and solve otherwise intractable scientific challenges. In

  17. Configuration Synthesis of Novel Series-Parallel Hybrid Transmission Systems with Eight-Bar Mechanisms

    Directory of Open Access Journals (Sweden)

    Ngoc-Tan Hoang

    2017-07-01

    Full Text Available This paper presents a design approach for the configuration synthesis of series-parallel hybrid transmissions with eight-bar mechanisms. The final design consists of 54 mechanisms with eight members and twelve joints including a simple planetary gear train (PGT and a double planet PGT. Then, by using the techniques of power and clutch arrangements, new series-parallel hybrid transmissions are synthesized. The power arrangement process generates 97 clutchless hybrid systems. The clutch arrangement process generates 100 corresponding series-parallel transmissions. To demonstrate the feasibility of the synthesized configurations, a new hybrid transmission is selected as an example to analyze the working principle with operation modes and power flow paths.

  18. Computationally efficient implementation of combustion chemistry in parallel PDF calculations

    International Nuclear Information System (INIS)

    Lu Liuyan; Lantz, Steven R.; Ren Zhuyin; Pope, Stephen B.

    2009-01-01

    In parallel calculations of combustion processes with realistic chemistry, the serial in situ adaptive tabulation (ISAT) algorithm [S.B. Pope, Computationally efficient implementation of combustion chemistry using in situ adaptive tabulation, Combustion Theory and Modelling, 1 (1997) 41-63; L. Lu, S.B. Pope, An improved algorithm for in situ adaptive tabulation, Journal of Computational Physics 228 (2009) 361-386] substantially speeds up the chemistry calculations on each processor. To improve the parallel efficiency of large ensembles of such calculations in parallel computations, in this work, the ISAT algorithm is extended to the multi-processor environment, with the aim of minimizing the wall clock time required for the whole ensemble. Parallel ISAT strategies are developed by combining the existing serial ISAT algorithm with different distribution strategies, namely purely local processing (PLP), uniformly random distribution (URAN), and preferential distribution (PREF). The distribution strategies enable the queued load redistribution of chemistry calculations among processors using message passing. They are implemented in the software x2f m pi, which is a Fortran 95 library for facilitating many parallel evaluations of a general vector function. The relative performance of the parallel ISAT strategies is investigated in different computational regimes via the PDF calculations of multiple partially stirred reactors burning methane/air mixtures. The results show that the performance of ISAT with a fixed distribution strategy strongly depends on certain computational regimes, based on how much memory is available and how much overlap exists between tabulated information on different processors. No one fixed strategy consistently achieves good performance in all the regimes. Therefore, an adaptive distribution strategy, which blends PLP, URAN and PREF, is devised and implemented. It yields consistently good performance in all regimes. In the adaptive parallel

  19. Domain decomposition parallel computing for transient two-phase flow of nuclear reactors

    Energy Technology Data Exchange (ETDEWEB)

    Lee, Jae Ryong; Yoon, Han Young [KAERI, Daejeon (Korea, Republic of); Choi, Hyoung Gwon [Seoul National University, Seoul (Korea, Republic of)

    2016-05-15

    KAERI (Korea Atomic Energy Research Institute) has been developing a multi-dimensional two-phase flow code named CUPID for multi-physics and multi-scale thermal hydraulics analysis of Light water reactors (LWRs). The CUPID code has been validated against a set of conceptual problems and experimental data. In this work, the CUPID code has been parallelized based on the domain decomposition method with Message passing interface (MPI) library. For domain decomposition, the CUPID code provides both manual and automatic methods with METIS library. For the effective memory management, the Compressed sparse row (CSR) format is adopted, which is one of the methods to represent the sparse asymmetric matrix. CSR format saves only non-zero value and its position (row and column). By performing the verification for the fundamental problem set, the parallelization of the CUPID has been successfully confirmed. Since the scalability of a parallel simulation is generally known to be better for fine mesh system, three different scales of mesh system are considered: 40000 meshes for coarse mesh system, 320000 meshes for mid-size mesh system, and 2560000 meshes for fine mesh system. In the given geometry, both single- and two-phase calculations were conducted. In addition, two types of preconditioners for a matrix solver were compared: Diagonal and incomplete LU preconditioner. In terms of enhancement of the parallel performance, the OpenMP and MPI hybrid parallel computing for a pressure solver was examined. It is revealed that the scalability of hybrid calculation was enhanced for the multi-core parallel computation.

  20. A parallelization study of the general purpose Monte Carlo code MCNP4 on a distributed memory highly parallel computer

    International Nuclear Information System (INIS)

    Yamazaki, Takao; Fujisaki, Masahide; Okuda, Motoi; Takano, Makoto; Masukawa, Fumihiro; Naito, Yoshitaka

    1993-01-01

    The general purpose Monte Carlo code MCNP4 has been implemented on the Fujitsu AP1000 distributed memory highly parallel computer. Parallelization techniques developed and studied are reported. A shielding analysis function of the MCNP4 code is parallelized in this study. A technique to map a history to each processor dynamically and to map control process to a certain processor was applied. The efficiency of parallelized code is up to 80% for a typical practical problem with 512 processors. These results demonstrate the advantages of a highly parallel computer to the conventional computers in the field of shielding analysis by Monte Carlo method. (orig.)

  1. Parallel computing of physical maps--a comparative study in SIMD and MIMD parallelism.

    Science.gov (United States)

    Bhandarkar, S M; Chirravuri, S; Arnold, J

    1996-01-01

    Ordering clones from a genomic library into physical maps of whole chromosomes presents a central computational problem in genetics. Chromosome reconstruction via clone ordering is usually isomorphic to the NP-complete Optimal Linear Arrangement problem. Parallel SIMD and MIMD algorithms for simulated annealing based on Markov chain distribution are proposed and applied to the problem of chromosome reconstruction via clone ordering. Perturbation methods and problem-specific annealing heuristics are proposed and described. The SIMD algorithms are implemented on a 2048 processor MasPar MP-2 system which is an SIMD 2-D toroidal mesh architecture whereas the MIMD algorithms are implemented on an 8 processor Intel iPSC/860 which is an MIMD hypercube architecture. A comparative analysis of the various SIMD and MIMD algorithms is presented in which the convergence, speedup, and scalability characteristics of the various algorithms are analyzed and discussed. On a fine-grained, massively parallel SIMD architecture with a low synchronization overhead such as the MasPar MP-2, a parallel simulated annealing algorithm based on multiple periodically interacting searches performs the best. For a coarse-grained MIMD architecture with high synchronization overhead such as the Intel iPSC/860, a parallel simulated annealing algorithm based on multiple independent searches yields the best results. In either case, distribution of clonal data across multiple processors is shown to exacerbate the tendency of the parallel simulated annealing algorithm to get trapped in a local optimum.

  2. Parallel computing in genomic research: advances and applications.

    Science.gov (United States)

    Ocaña, Kary; de Oliveira, Daniel

    2015-01-01

    Today's genomic experiments have to process the so-called "biological big data" that is now reaching the size of Terabytes and Petabytes. To process this huge amount of data, scientists may require weeks or months if they use their own workstations. Parallelism techniques and high-performance computing (HPC) environments can be applied for reducing the total processing time and to ease the management, treatment, and analyses of this data. However, running bioinformatics experiments in HPC environments such as clouds, grids, clusters, and graphics processing unit requires the expertise from scientists to integrate computational, biological, and mathematical techniques and technologies. Several solutions have already been proposed to allow scientists for processing their genomic experiments using HPC capabilities and parallelism techniques. This article brings a systematic review of literature that surveys the most recently published research involving genomics and parallel computing. Our objective is to gather the main characteristics, benefits, and challenges that can be considered by scientists when running their genomic experiments to benefit from parallelism techniques and HPC capabilities.

  3. Aggregating job exit statuses of a plurality of compute nodes executing a parallel application

    Science.gov (United States)

    Aho, Michael E.; Attinella, John E.; Gooding, Thomas M.; Mundy, Michael B.

    2015-07-21

    Aggregating job exit statuses of a plurality of compute nodes executing a parallel application, including: identifying a subset of compute nodes in the parallel computer to execute the parallel application; selecting one compute node in the subset of compute nodes in the parallel computer as a job leader compute node; initiating execution of the parallel application on the subset of compute nodes; receiving an exit status from each compute node in the subset of compute nodes, where the exit status for each compute node includes information describing execution of some portion of the parallel application by the compute node; aggregating each exit status from each compute node in the subset of compute nodes; and sending an aggregated exit status for the subset of compute nodes in the parallel computer.

  4. Computing NLTE Opacities -- Node Level Parallel Calculation

    Energy Technology Data Exchange (ETDEWEB)

    Holladay, Daniel [Los Alamos National Lab. (LANL), Los Alamos, NM (United States)

    2015-09-11

    Presentation. The goal: to produce a robust library capable of computing reasonably accurate opacities inline with the assumption of LTE relaxed (non-LTE). Near term: demonstrate acceleration of non-LTE opacity computation. Far term (if funded): connect to application codes with in-line capability and compute opacities. Study science problems. Use efficient algorithms that expose many levels of parallelism and utilize good memory access patterns for use on advanced architectures. Portability to multiple types of hardware including multicore processors, manycore processors such as KNL, GPUs, etc. Easily coupled to radiation hydrodynamics and thermal radiative transfer codes.

  5. High spatial resolution CT image reconstruction using parallel computing

    International Nuclear Information System (INIS)

    Yin Yin; Liu Li; Sun Gongxing

    2003-01-01

    Using the PC cluster system with 16 dual CPU nodes, we accelerate the FBP and OR-OSEM reconstruction of high spatial resolution image (2048 x 2048). Based on the number of projections, we rewrite the reconstruction algorithms into parallel format and dispatch the tasks to each CPU. By parallel computing, the speedup factor is roughly equal to the number of CPUs, which can be up to about 25 times when 25 CPUs used. This technique is very suitable for real-time high spatial resolution CT image reconstruction. (authors)

  6. Adaptation and hybridization in computational intelligence

    CERN Document Server

    Jr, Iztok

    2015-01-01

      This carefully edited book takes a walk through recent advances in adaptation and hybridization in the Computational Intelligence (CI) domain. It consists of ten chapters that are divided into three parts. The first part illustrates background information and provides some theoretical foundation tackling the CI domain, the second part deals with the adaptation in CI algorithms, while the third part focuses on the hybridization in CI. This book can serve as an ideal reference for researchers and students of computer science, electrical and civil engineering, economy, and natural sciences that are confronted with solving the optimization, modeling and simulation problems. It covers the recent advances in CI that encompass Nature-inspired algorithms, like Artificial Neural networks, Evolutionary Algorithms and Swarm Intelligence –based algorithms.  

  7. A discrete ordinate response matrix method for massively parallel computers

    International Nuclear Information System (INIS)

    Hanebutte, U.R.; Lewis, E.E.

    1991-01-01

    A discrete ordinate response matrix method is formulated for the solution of neutron transport problems on massively parallel computers. The response matrix formulation eliminates iteration on the scattering source. The nodal matrices which result from the diamond-differenced equations are utilized in a factored form which minimizes memory requirements and significantly reduces the required number of algorithm utilizes massive parallelism by assigning each spatial node to a processor. The algorithm is accelerated effectively by a synthetic method in which the low-order diffusion equations are also solved by massively parallel red/black iterations. The method has been implemented on a 16k Connection Machine-2, and S 8 and S 16 solutions have been obtained for fixed-source benchmark problems in X--Y geometry

  8. Hybrid parallel execution model for logic-based specification languages

    CERN Document Server

    Tsai, Jeffrey J P

    2001-01-01

    Parallel processing is a very important technique for improving the performance of various software development and maintenance activities. The purpose of this book is to introduce important techniques for parallel executation of high-level specifications of software systems. These techniques are very useful for the construction, analysis, and transformation of reliable large-scale and complex software systems. Contents: Current Approaches; Overview of the New Approach; FRORL Requirements Specification Language and Its Decomposition; Rewriting and Data Dependency, Control Flow Analysis of a Lo

  9. Distributed parallel computing in stochastic modeling of groundwater systems.

    Science.gov (United States)

    Dong, Yanhui; Li, Guomin; Xu, Haizhen

    2013-03-01

    Stochastic modeling is a rapidly evolving, popular approach to the study of the uncertainty and heterogeneity of groundwater systems. However, the use of Monte Carlo-type simulations to solve practical groundwater problems often encounters computational bottlenecks that hinder the acquisition of meaningful results. To improve the computational efficiency, a system that combines stochastic model generation with MODFLOW-related programs and distributed parallel processing is investigated. The distributed computing framework, called the Java Parallel Processing Framework, is integrated into the system to allow the batch processing of stochastic models in distributed and parallel systems. As an example, the system is applied to the stochastic delineation of well capture zones in the Pinggu Basin in Beijing. Through the use of 50 processing threads on a cluster with 10 multicore nodes, the execution times of 500 realizations are reduced to 3% compared with those of a serial execution. Through this application, the system demonstrates its potential in solving difficult computational problems in practical stochastic modeling. © 2012, The Author(s). Groundwater © 2012, National Ground Water Association.

  10. Implementation of PHENIX trigger algorithms on massively parallel computers

    International Nuclear Information System (INIS)

    Petridis, A.N.; Wohn, F.K.

    1995-01-01

    The event selection requirements of contemporary high energy and nuclear physics experiments are met by the introduction of on-line trigger algorithms which identify potentially interesting events and reduce the data acquisition rate to levels that are manageable by the electronics. Such algorithms being parallel in nature can be simulated off-line using massively parallel computers. The PHENIX experiment intends to investigate the possible existence of a new phase of matter called the quark gluon plasma which has been theorized to have existed in very early stages of the evolution of the universe by studying collisions of heavy nuclei at ultra-relativistic energies. Such interactions can also reveal important information regarding the structure of the nucleus and mandate a thorough investigation of the simpler proton-nucleus collisions at the same energies. The complexity of PHENIX events and the need to analyze and also simulate them at rates similar to the data collection ones imposes enormous computation demands. This work is a first effort to implement PHENIX trigger algorithms on parallel computers and to study the feasibility of using such machines to run the complex programs necessary for the simulation of the PHENIX detector response. Fine and coarse grain approaches have been studied and evaluated. Depending on the application the performance of a massively parallel computer can be much better or much worse than that of a serial workstation. A comparison between single instruction and multiple instruction computers is also made and possible applications of the single instruction machines to high energy and nuclear physics experiments are outlined. copyright 1995 American Institute of Physics

  11. Enhancing Application Performance Using Mini-Apps: Comparison of Hybrid Parallel Programming Paradigms

    Science.gov (United States)

    Lawson, Gary; Sosonkina, Masha; Baurle, Robert; Hammond, Dana

    2017-01-01

    In many fields, real-world applications for High Performance Computing have already been developed. For these applications to stay up-to-date, new parallel strategies must be explored to yield the best performance; however, restructuring or modifying a real-world application may be daunting depending on the size of the code. In this case, a mini-app may be employed to quickly explore such options without modifying the entire code. In this work, several mini-apps have been created to enhance a real-world application performance, namely the VULCAN code for complex flow analysis developed at the NASA Langley Research Center. These mini-apps explore hybrid parallel programming paradigms with Message Passing Interface (MPI) for distributed memory access and either Shared MPI (SMPI) or OpenMP for shared memory accesses. Performance testing shows that MPI+SMPI yields the best execution performance, while requiring the largest number of code changes. A maximum speedup of 23 was measured for MPI+SMPI, but only 11 was measured for MPI+OpenMP.

  12. The 2nd Symposium on the Frontiers of Massively Parallel Computations

    Science.gov (United States)

    Mills, Ronnie (Editor)

    1988-01-01

    Programming languages, computer graphics, neural networks, massively parallel computers, SIMD architecture, algorithms, digital terrain models, sort computation, simulation of charged particle transport on the massively parallel processor and image processing are among the topics discussed.

  13. Time complexity analysis for distributed memory computers: implementation of parallel conjugate gradient method

    NARCIS (Netherlands)

    Hoekstra, A.G.; Sloot, P.M.A.; Haan, M.J.; Hertzberger, L.O.; van Leeuwen, J.

    1991-01-01

    New developments in Computer Science, both hardware and software, offer researchers, such as physicists, unprecedented possibilities to solve their computational intensive problems.However, full exploitation of e.g. new massively parallel computers, parallel languages or runtime environments

  14. A Parallel Computational Model for Multichannel Phase Unwrapping Problem

    Science.gov (United States)

    Imperatore, Pasquale; Pepe, Antonio; Lanari, Riccardo

    2015-05-01

    In this paper, a parallel model for the solution of the computationally intensive multichannel phase unwrapping (MCh-PhU) problem is proposed. Firstly, the Extended Minimum Cost Flow (EMCF) algorithm for solving MCh-PhU problem is revised within the rigorous mathematical framework of the discrete calculus ; thus permitting to capture its topological structure in terms of meaningful discrete differential operators. Secondly, emphasis is placed on those methodological and practical aspects, which lead to a parallel reformulation of the EMCF algorithm. Thus, a novel dual-level parallel computational model, in which the parallelism is hierarchically implemented at two different (i.e., process and thread) levels, is presented. The validity of our approach has been demonstrated through a series of experiments that have revealed a significant speedup. Therefore, the attained high-performance prototype is suitable for the solution of large-scale phase unwrapping problems in reasonable time frames, with a significant impact on the systematic exploitation of the existing, and rapidly growing, large archives of SAR data.

  15. Executing a gather operation on a parallel computer

    Science.gov (United States)

    Archer, Charles J [Rochester, MN; Ratterman, Joseph D [Rochester, MN

    2012-03-20

    Methods, apparatus, and computer program products are disclosed for executing a gather operation on a parallel computer according to embodiments of the present invention. Embodiments include configuring, by the logical root, a result buffer or the logical root, the result buffer having positions, each position corresponding to a ranked node in the operational group and for storing contribution data gathered from that ranked node. Embodiments also include repeatedly for each position in the result buffer: determining, by each compute node of an operational group, whether the current position in the result buffer corresponds with the rank of the compute node, if the current position in the result buffer corresponds with the rank of the compute node, contributing, by that compute node, the compute node's contribution data, if the current position in the result buffer does not correspond with the rank of the compute node, contributing, by that compute node, a value of zero for the contribution data, and storing, by the logical root in the current position in the result buffer, results of a bitwise OR operation of all the contribution data by all compute nodes of the operational group for the current position, the results received through the global combining network.

  16. Scalable Parallelization of Skyline Computation for Multi-core Processors

    DEFF Research Database (Denmark)

    Chester, Sean; Sidlauskas, Darius; Assent, Ira

    2015-01-01

    The skyline is an important query operator for multi-criteria decision making. It reduces a dataset to only those points that offer optimal trade-offs of dimensions. In general, it is very expensive to compute. Recently, multi-core CPU algorithms have been proposed to accelerate the computation...... of the skyline. However, they do not sufficiently minimize dominance tests and so are not competitive with state-of-the-art sequential algorithms. In this paper, we introduce a novel multi-core skyline algorithm, Hybrid, which processes points in blocks. It maintains a shared, global skyline among all threads...

  17. Fast Evaluation of Segmentation Quality with Parallel Computing

    Directory of Open Access Journals (Sweden)

    Henry Cruz

    2017-01-01

    Full Text Available In digital image processing and computer vision, a fairly frequent task is the performance comparison of different algorithms on enormous image databases. This task is usually time-consuming and tedious, such that any kind of tool to simplify this work is welcome. To achieve an efficient and more practical handling of a normally tedious evaluation, we implemented the automatic detection system, with the help of MATLAB®’s Parallel Computing Toolbox™. The key parts of the system have been parallelized to achieve simultaneous execution and analysis of segmentation algorithms on the one hand and the evaluation of detection accuracy for the nonforested regions, such as a study case, on the other hand. As a positive side effect, CPU usage was reduced and processing time was significantly decreased by 68.54% compared to sequential processing (i.e., executing the system with each algorithm one by one.

  18. Final Report: Center for Programming Models for Scalable Parallel Computing

    Energy Technology Data Exchange (ETDEWEB)

    Mellor-Crummey, John [William Marsh Rice University

    2011-09-13

    As part of the Center for Programming Models for Scalable Parallel Computing, Rice University collaborated with project partners in the design, development and deployment of language, compiler, and runtime support for parallel programming models to support application development for the “leadership-class” computer systems at DOE national laboratories. Work over the course of this project has focused on the design, implementation, and evaluation of a second-generation version of Coarray Fortran. Research and development efforts of the project have focused on the CAF 2.0 language, compiler, runtime system, and supporting infrastructure. This has involved working with the teams that provide infrastructure for CAF that we rely on, implementing new language and runtime features, producing an open source compiler that enabled us to evaluate our ideas, and evaluating our design and implementation through the use of benchmarks. The report details the research, development, findings, and conclusions from this work.

  19. Performing a local reduction operation on a parallel computer

    Science.gov (United States)

    Blocksome, Michael A.; Faraj, Daniel A.

    2012-12-11

    A parallel computer including compute nodes, each including two reduction processing cores, a network write processing core, and a network read processing core, each processing core assigned an input buffer. Copying, in interleaved chunks by the reduction processing cores, contents of the reduction processing cores' input buffers to an interleaved buffer in shared memory; copying, by one of the reduction processing cores, contents of the network write processing core's input buffer to shared memory; copying, by another of the reduction processing cores, contents of the network read processing core's input buffer to shared memory; and locally reducing in parallel by the reduction processing cores: the contents of the reduction processing core's input buffer; every other interleaved chunk of the interleaved buffer; the copied contents of the network write processing core's input buffer; and the copied contents of the network read processing core's input buffer.

  20. Establishing a group of endpoints in a parallel computer

    Science.gov (United States)

    Archer, Charles J.; Blocksome, Michael A.; Ratterman, Joseph D.; Smith, Brian E.; Xue, Hanhong

    2016-02-02

    A parallel computer executes a number of tasks, each task includes a number of endpoints and the endpoints are configured to support collective operations. In such a parallel computer, establishing a group of endpoints receiving a user specification of a set of endpoints included in a global collection of endpoints, where the user specification defines the set in accordance with a predefined virtual representation of the endpoints, the predefined virtual representation is a data structure setting forth an organization of tasks and endpoints included in the global collection of endpoints and the user specification defines the set of endpoints without a user specification of a particular endpoint; and defining a group of endpoints in dependence upon the predefined virtual representation of the endpoints and the user specification.

  1. Mechatronic Model Based Computed Torque Control of a Parallel Manipulator

    Directory of Open Access Journals (Sweden)

    Zhiyong Yang

    2008-11-01

    Full Text Available With high speed and accuracy the parallel manipulators have wide application in the industry, but there still exist many difficulties in the actual control process because of the time-varying and coupling. Unfortunately, the present-day commercial controlles cannot provide satisfying performance for its single axis linear control only. Therefore, aimed at a novel 2-DOF (Degree of Freedom parallel manipulator called Diamond 600, a motor-mechanism coupling dynamic model based control scheme employing the computed torque control algorithm are presented in this paper. First, the integrated dynamic coupling model is deduced, according to equivalent torques between the mechanical structure and the PM (Permanent Magnetism servomotor. Second, computed torque controller is described in detail for the above proposed model. At last, a series of numerical simulations and experiments are carried out to test the effectiveness of the system, and the results verify the favourable tracking ability and robustness.

  2. Mechatronic Model Based Computed Torque Control of a Parallel Manipulator

    Directory of Open Access Journals (Sweden)

    Zhiyong Yang

    2008-03-01

    Full Text Available With high speed and accuracy the parallel manipulators have wide application in the industry, but there still exist many difficulties in the actual control process because of the time-varying and coupling. Unfortunately, the present-day commercial controlles cannot provide satisfying performance for its single axis linear control only. Therefore, aimed at a novel 2-DOF (Degree of Freedom parallel manipulator called Diamond 600, a motor-mechanism coupling dynamic model based control scheme employing the computed torque control algorithm are presented in this paper. First, the integrated dynamic coupling model is deduced, according to equivalent torques between the mechanical structure and the PM (Permanent Magnetism servomotor. Second, computed torque controller is described in detail for the above proposed model. At last, a series of numerical simulations and experiments are carried out to test the effectiveness of the system, and the results verify the favourable tracking ability and robustness.

  3. A scalable PC-based parallel computer for lattice QCD

    International Nuclear Information System (INIS)

    Fodor, Z.; Katz, S.D.; Pappa, G.

    2003-01-01

    A PC-based parallel computer for medium/large scale lattice QCD simulations is suggested. The Eoetvoes Univ., Inst. Theor. Phys. cluster consists of 137 Intel P4-1.7GHz nodes. Gigabit Ethernet cards are used for nearest neighbor communication in a two-dimensional mesh. The sustained performance for dynamical staggered (wilson) quarks on large lattices is around 70(110) GFlops. The exceptional price/performance ratio is below $1/Mflop

  4. A scalable PC-based parallel computer for lattice QCD

    International Nuclear Information System (INIS)

    Fodor, Z.; Papp, G.

    2002-09-01

    A PC-based parallel computer for medium/large scale lattice QCD simulations is suggested. The Eoetvoes Univ., Inst. Theor. Phys. cluster consists of 137 Intel P4-1.7 GHz nodes. Gigabit Ethernet cards are used for nearest neighbor communication in a two-dimensional mesh. The sustained performance for dynamical staggered(wilson) quarks on large lattices is around 70(110) GFlops. The exceptional price/performance ratio is below $1/Mflop. (orig.)

  5. Domain Decomposition: A Bridge between Nature and Parallel Computers

    Science.gov (United States)

    1992-09-01

    B., "Domain Decomposition Algorithms for Indefinite Elliptic Problems," S"IAM Journal of S; cientific and Statistical (’omputing, Vol. 13, 1992, pp...AD-A256 575 NASA Contractor Report 189709 ICASE Report No. 92-44 ICASE DOMAIN DECOMPOSITION: A BRIDGE BETWEEN NATURE AND PARALLEL COMPUTERS DTIC dE...effectively implemented on dis- tributed memory multiprocessors. In 1990 (as reported in Ref. 38 using the tile algo- rithm), a 103,201-unknown 2D elliptic

  6. Noise simulation in cone beam CT imaging with parallel computing

    International Nuclear Information System (INIS)

    Tu, S.-J.; Shaw, Chris C; Chen, Lingyun

    2006-01-01

    We developed a computer noise simulation model for cone beam computed tomography imaging using a general purpose PC cluster. This model uses a mono-energetic x-ray approximation and allows us to investigate three primary performance components, specifically quantum noise, detector blurring and additive system noise. A parallel random number generator based on the Weyl sequence was implemented in the noise simulation and a visualization technique was accordingly developed to validate the quality of the parallel random number generator. In our computer simulation model, three-dimensional (3D) phantoms were mathematically modelled and used to create 450 analytical projections, which were then sampled into digital image data. Quantum noise was simulated and added to the analytical projection image data, which were then filtered to incorporate flat panel detector blurring. Additive system noise was generated and added to form the final projection images. The Feldkamp algorithm was implemented and used to reconstruct the 3D images of the phantoms. A 24 dual-Xeon PC cluster was used to compute the projections and reconstructed images in parallel with each CPU processing 10 projection views for a total of 450 views. Based on this computer simulation system, simulated cone beam CT images were generated for various phantoms and technique settings. Noise power spectra for the flat panel x-ray detector and reconstructed images were then computed to characterize the noise properties. As an example among the potential applications of our noise simulation model, we showed that images of low contrast objects can be produced and used for image quality evaluation

  7. Implementing a Commercial-Strength Parallel Hybrid Movie Recommendation Engine

    DEFF Research Database (Denmark)

    Amolochitis, Emmanouil; Christou, Ioannis T.; Tan, Zheng-Hua

    2014-01-01

    AMORE is a hybrid recommendation system that provides movie recommendations for a major triple-play services provider in Greece. Combined with our own implementations of several user-, item-, and content-based recommendation algorithms, AMORE significantly outperforms other state......-of-the-art implementations both in solution quality and response time. AMORE currently serves daily recommendation requests for all active subscribers of the provider's video-on-demand services and has contributed to an increase of rental profits and customer retention....

  8. Neural network control of a parallel hybrid-electric propulsion system for a small unmanned aerial vehicle

    Science.gov (United States)

    Harmon, Frederick G.

    2005-11-01

    Parallel hybrid-electric propulsion systems would be beneficial for small unmanned aerial vehicles (UAVs) used for military, homeland security, and disaster-monitoring missions. The benefits, due to the hybrid and electric-only modes, include increased time-on-station and greater range as compared to electric-powered UAVs and stealth modes not available with gasoline-powered UAVs. This dissertation contributes to the research fields of small unmanned aerial vehicles, hybrid-electric propulsion system control, and intelligent control. A conceptual design of a small UAV with a parallel hybrid-electric propulsion system is provided. The UAV is intended for intelligence, surveillance, and reconnaissance (ISR) missions. A conceptual design reveals the trade-offs that must be considered to take advantage of the hybrid-electric propulsion system. The resulting hybrid-electric propulsion system is a two-point design that includes an engine primarily sized for cruise speed and an electric motor and battery pack that are primarily sized for a slower endurance speed. The electric motor provides additional power for take-off, climbing, and acceleration and also serves as a generator during charge-sustaining operation or regeneration. The intelligent control of the hybrid-electric propulsion system is based on an instantaneous optimization algorithm that generates a hyper-plane from the nonlinear efficiency maps for the internal combustion engine, electric motor, and lithium-ion battery pack. The hyper-plane incorporates charge-depletion and charge-sustaining strategies. The optimization algorithm is flexible and allows the operator/user to assign relative importance between the use of gasoline, electricity, and recharging depending on the intended mission. A MATLAB/Simulink model was developed to test the control algorithms. The Cerebellar Model Arithmetic Computer (CMAC) associative memory neural network is applied to the control of the UAVs parallel hybrid

  9. Local rollback for fault-tolerance in parallel computing systems

    Science.gov (United States)

    Blumrich, Matthias A [Yorktown Heights, NY; Chen, Dong [Yorktown Heights, NY; Gara, Alan [Yorktown Heights, NY; Giampapa, Mark E [Yorktown Heights, NY; Heidelberger, Philip [Yorktown Heights, NY; Ohmacht, Martin [Yorktown Heights, NY; Steinmacher-Burow, Burkhard [Boeblingen, DE; Sugavanam, Krishnan [Yorktown Heights, NY

    2012-01-24

    A control logic device performs a local rollback in a parallel super computing system. The super computing system includes at least one cache memory device. The control logic device determines a local rollback interval. The control logic device runs at least one instruction in the local rollback interval. The control logic device evaluates whether an unrecoverable condition occurs while running the at least one instruction during the local rollback interval. The control logic device checks whether an error occurs during the local rollback. The control logic device restarts the local rollback interval if the error occurs and the unrecoverable condition does not occur during the local rollback interval.

  10. An Alternative Algorithm for Computing Watersheds on Shared Memory Parallel Computers

    NARCIS (Netherlands)

    Meijster, A.; Roerdink, J.B.T.M.

    1995-01-01

    In this paper a parallel implementation of a watershed algorithm is proposed. The algorithm can easily be implemented on shared memory parallel computers. The watershed transform is generally considered to be inherently sequential since the discrete watershed of an image is defined using recursion.

  11. Continuous path control of a 5-DOF parallel-serial hybrid robot

    International Nuclear Information System (INIS)

    Uchiyama, Takuma; Terada, Hidetsugu; Mitsuya, Hironori

    2010-01-01

    Using the 5-degree of freedom parallel-serial hybrid robot, to realize the de-burring, new forward and inverse kinematic calculation methods based on the 'off-line teaching' method are proposed. This hybrid robot consists of a parallel stage section and a serial stage section. Considering this point, each section is calculated individually. And the continuous path control algorithm of this hybrid robot is proposed. To verify the usefulness, a prototype robot is tested which is controlled based on the proposed methods. This verification includes a positioning test and a pose test. The positioning test evaluates the continuous path of the tool center point. The pose test evaluates the pose on the tool center point. As the result, it is confirmed that this hybrid robot moves correctly using the proposed methods

  12. Optimal control applied to the control strategy of a parallel hybrid vehicle; Commande optimale appliquee a la strategie de commande d'un vehicule hybride parallele

    Energy Technology Data Exchange (ETDEWEB)

    Delprat, S.; Guerra, T.M. [Universite de Valenciennes et du Hainaut-Cambresis, LAMIH UMR CNRS 8530, 59 - Valenciennes (France); Rimaux, J. [PSA Peugeot Citroen, DRIA/SARA/EEES, 78 - Velizy Villacoublay (France); Paganelli, G. [Center for Automotive Research, Ohio (United States)

    2002-07-01

    Control strategies are algorithms that calculate the power repartition between the engine and the motor of an hybrid vehicle in order to minimize the fuel consumption and/or emissions. Some algorithms are devoted to real time application whereas others are designed for global optimization in stimulation. The last ones provide solutions which can be used to evaluate the performances of a given hybrid vehicle or a given real time control strategy. The control strategy problem is firstly written into the form of an optimization under constraints problem. A solution based on optimal control is proposed. Results are given for the European Normalized Cycle and a parallel single shaft hybrid vehicle built at the LAMIH (France). (authors)

  13. Event parallelism: Distributed memory parallel computing for high energy physics experiments

    International Nuclear Information System (INIS)

    Nash, T.

    1989-05-01

    This paper describes the present and expected future development of distributed memory parallel computers for high energy physics experiments. It covers the use of event parallel microprocessor farms, particularly at Fermilab, including both ACP multiprocessors and farms of MicroVAXES. These systems have proven very cost effective in the past. A case is made for moving to the more open environment of UNIX and RISC processors. The 2nd Generation ACP Multiprocessor System, which is based on powerful RISC systems, is described. Given the promise of still more extraordinary increases in processor performance, a new emphasis on point to point, rather than bussed, communication will be required. Developments in this direction are described. 6 figs

  14. Fencing direct memory access data transfers in a parallel active messaging interface of a parallel computer

    Science.gov (United States)

    Blocksome, Michael A.; Mamidala, Amith R.

    2013-09-03

    Fencing direct memory access (`DMA`) data transfers in a parallel active messaging interface (`PAMI`) of a parallel computer, the PAMI including data communications endpoints, each endpoint including specifications of a client, a context, and a task, the endpoints coupled for data communications through the PAMI and through DMA controllers operatively coupled to segments of shared random access memory through which the DMA controllers deliver data communications deterministically, including initiating execution through the PAMI of an ordered sequence of active DMA instructions for DMA data transfers between two endpoints, effecting deterministic DMA data transfers through a DMA controller and a segment of shared memory; and executing through the PAMI, with no FENCE accounting for DMA data transfers, an active FENCE instruction, the FENCE instruction completing execution only after completion of all DMA instructions initiated prior to execution of the FENCE instruction for DMA data transfers between the two endpoints.

  15. Data communications for a collective operation in a parallel active messaging interface of a parallel computer

    Science.gov (United States)

    Faraj, Daniel A

    2013-07-16

    Algorithm selection for data communications in a parallel active messaging interface (`PAMI`) of a parallel computer, the PAMI composed of data communications endpoints, each endpoint including specifications of a client, a context, and a task, endpoints coupled for data communications through the PAMI, including associating in the PAMI data communications algorithms and bit masks; receiving in an origin endpoint of the PAMI a collective instruction, the instruction specifying transmission of a data communications message from the origin endpoint to a target endpoint; constructing a bit mask for the received collective instruction; selecting, from among the associated algorithms and bit masks, a data communications algorithm in dependence upon the constructed bit mask; and executing the collective instruction, transmitting, according to the selected data communications algorithm from the origin endpoint to the target endpoint, the data communications message.

  16. Data communications in a parallel active messaging interface of a parallel computer

    Science.gov (United States)

    Davis, Kristan D; Faraj, Daniel A

    2013-07-09

    Algorithm selection for data communications in a parallel active messaging interface (`PAMI`) of a parallel computer, the PAMI composed of data communications endpoints, each endpoint including specifications of a client, a context, and a task, endpoints coupled for data communications through the PAMI, including associating in the PAMI data communications algorithms and ranges of message sizes so that each algorithm is associated with a separate range of message sizes; receiving in an origin endpoint of the PAMI a data communications instruction, the instruction specifying transmission of a data communications message from the origin endpoint to a target endpoint, the data communications message characterized by a message size; selecting, from among the associated algorithms and ranges, a data communications algorithm in dependence upon the message size; and transmitting, according to the selected data communications algorithm from the origin endpoint to the target endpoint, the data communications message.

  17. Event parallelism: Distributed memory parallel computing for high energy physics experiments

    International Nuclear Information System (INIS)

    Nash, T.

    1989-01-01

    This paper describes the present and expected future development of distributed memory parallel computers for high energy physics experiments. It covers the use of event parallel microprocessor farms, particularly at Fermilab, including both ACP multiprocessors and farms of MicroVAXES. These systems have proven very cost effective in the past. A case is made for moving to the more open environment of UNIX and RISC processors. The 2nd Generation ACP Multiprocessor System, which is based on powerful RISC systems, is described. Given the promise of still more extraordinary increases in processor performance, a new emphasis on point to point, rather than bussed, communication will be required. Developments in this direction are described. (orig.)

  18. Event parallelism: Distributed memory parallel computing for high energy physics experiments

    Science.gov (United States)

    Nash, Thomas

    1989-12-01

    This paper describes the present and expected future development of distributed memory parallel computers for high energy physics experiments. It covers the use of event parallel microprocessor farms, particularly at Fermilab, including both ACP multiprocessors and farms of MicroVAXES. These systems have proven very cost effective in the past. A case is made for moving to the more open environment of UNIX and RISC processors. The 2nd Generation ACP Multiprocessor System, which is based on powerful RISC system, is described. Given the promise of still more extraordinary increases in processor performance, a new emphasis on point to point, rather than bussed, communication will be required. Developments in this direction are described.

  19. Parallel Hybrid Gas-Electric Geared Turbofan Engine Conceptual Design and Benefits Analysis

    Science.gov (United States)

    Lents, Charles; Hardin, Larry; Rheaume, Jonathan; Kohlman, Lee

    2016-01-01

    The conceptual design of a parallel gas-electric hybrid propulsion system for a conventional single aisle twin engine tube and wing vehicle has been developed. The study baseline vehicle and engine technology are discussed, followed by results of the hybrid propulsion system sizing and performance analysis. The weights analysis for the electric energy storage & conversion system and thermal management system is described. Finally, the potential system benefits are assessed.

  20. Performing an allreduce operation on a plurality of compute nodes of a parallel computer

    Science.gov (United States)

    Faraj, Ahmad [Rochester, MN

    2012-04-17

    Methods, apparatus, and products are disclosed for performing an allreduce operation on a plurality of compute nodes of a parallel computer. Each compute node includes at least two processing cores. Each processing core has contribution data for the allreduce operation. Performing an allreduce operation on a plurality of compute nodes of a parallel computer includes: establishing one or more logical rings among the compute nodes, each logical ring including at least one processing core from each compute node; performing, for each logical ring, a global allreduce operation using the contribution data for the processing cores included in that logical ring, yielding a global allreduce result for each processing core included in that logical ring; and performing, for each compute node, a local allreduce operation using the global allreduce results for each processing core on that compute node.

  1. A hybrid, massively parallel implementation of a genetic algorithm for optimization of the impact performance of a metal/polymer composite plate

    KAUST Repository

    Narayanan, Kiran; Mora Cordova, Angel; Allsopp, Nicholas; El Sayed, Tamer S.

    2012-01-01

    A hybrid parallelization method composed of a coarse-grained genetic algorithm (GA) and fine-grained objective function evaluations is implemented on a heterogeneous computational resource consisting of 16 IBM Blue Gene/P racks, a single x86 cluster

  2. Centaure: an heterogeneous parallel architecture for computer vision

    International Nuclear Information System (INIS)

    Peythieux, Marc

    1997-01-01

    This dissertation deals with the architecture of parallel computers dedicated to computer vision. In the first chapter, the problem to be solved is presented, as well as the architecture of the Sympati and Symphonie computers, on which this work is based. The second chapter is about the state of the art of computers and integrated processors that can execute computer vision and image processing codes. The third chapter contains a description of the architecture of Centaure. It has an heterogeneous structure: it is composed of a multiprocessor system based on Analog Devices ADSP21060 Sharc digital signal processor, and of a set of Symphonie computers working in a multi-SIMD fashion. Centaure also has a modular structure. Its basic node is composed of one Symphonie computer, tightly coupled to a Sharc thanks to a dual ported memory. The nodes of Centaure are linked together by the Sharc communication links. The last chapter deals with a performance validation of Centaure. The execution times on Symphonie and on Centaure of a benchmark which is typical of industrial vision, are presented and compared. In the first place, these results show that the basic node of Centaure allows a faster execution than Symphonie, and that increasing the size of the tested computer leads to a better speed-up with Centaure than with Symphonie. In the second place, these results validate the choice of running the low level structure of Centaure in a multi- SIMD fashion. (author) [fr

  3. A new parallel-type hybrid electric-vehicle

    International Nuclear Information System (INIS)

    David Huang, K.; Tzeng, S.-C.

    2004-01-01

    This new system promises an internal-combustion engine that always maintains optimal operating conditions. The system comprises two parts: (1) an internal-combustion power-distribution device and (2) an integrated design involving the engine and electronic motor. The internal-combustion power-distribution device provides an engine capable of constantly operating in an optimal fashion, minimizing emissions and maximizing thermal-efficiency. The electric motor can generate extra power. Notably, the integrated torque design comprises three helical gears. This design can release the power of the engine or electric motor separately, or can integrate these two different powers into a hybridized power system

  4. Development of real-time visualization system for Computational Fluid Dynamics on parallel computers

    International Nuclear Information System (INIS)

    Muramatsu, Kazuhiro; Otani, Takayuki; Matsumoto, Hideki; Takei, Toshifumi; Doi, Shun

    1998-03-01

    A real-time visualization system for computational fluid dynamics in a network connecting between a parallel computing server and the client terminal was developed. Using the system, a user can visualize the results of a CFD (Computational Fluid Dynamics) simulation on the parallel computer as a client terminal during the actual computation on a server. Using GUI (Graphical User Interface) on the client terminal, to user is also able to change parameters of the analysis and visualization during the real-time of the calculation. The system carries out both of CFD simulation and generation of a pixel image data on the parallel computer, and compresses the data. Therefore, the amount of data from the parallel computer to the client is so small in comparison with no compression that the user can enjoy the swift image appearance comfortably. Parallelization of image data generation is based on Owner Computation Rule. GUI on the client is built on Java applet. A real-time visualization is thus possible on the client PC only if Web browser is implemented on it. (author)

  5. Semi-coarsening multigrid methods for parallel computing

    Energy Technology Data Exchange (ETDEWEB)

    Jones, J.E.

    1996-12-31

    Standard multigrid methods are not well suited for problems with anisotropic coefficients which can occur, for example, on grids that are stretched to resolve a boundary layer. There are several different modifications of the standard multigrid algorithm that yield efficient methods for anisotropic problems. In the paper, we investigate the parallel performance of these multigrid algorithms. Multigrid algorithms which work well for anisotropic problems are based on line relaxation and/or semi-coarsening. In semi-coarsening multigrid algorithms a grid is coarsened in only one of the coordinate directions unlike standard or full-coarsening multigrid algorithms where a grid is coarsened in each of the coordinate directions. When both semi-coarsening and line relaxation are used, the resulting multigrid algorithm is robust and automatic in that it requires no knowledge of the nature of the anisotropy. This is the basic multigrid algorithm whose parallel performance we investigate in the paper. The algorithm is currently being implemented on an IBM SP2 and its performance is being analyzed. In addition to looking at the parallel performance of the basic semi-coarsening algorithm, we present algorithmic modifications with potentially better parallel efficiency. One modification reduces the amount of computational work done in relaxation at the expense of using multiple coarse grids. This modification is also being implemented with the aim of comparing its performance to that of the basic semi-coarsening algorithm.

  6. Dynamic stability calculations for power grids employing a parallel computer

    Energy Technology Data Exchange (ETDEWEB)

    Schmidt, K

    1982-06-01

    The aim of dynamic contingency calculations in power systems is to estimate the effects of assumed disturbances, such as loss of generation. Due to the large dimensions of the problem these simulations require considerable computing time and costs, to the effect that they are at present only used in a planning state but not for routine checks in power control stations. In view of the homogeneity of the problem, where a multitude of equal generator models, having different parameters, are to be integrated simultaneously, the use of a parallel computer looks very attractive. The results of this study employing a prototype parallel computer (SMS 201) are presented. It consists of up to 128 equal microcomputers bus-connected to a control computer. Each of the modules is programmed to simulate a node of the power grid. Generators with their associated control are represented by models of 13 states each. Passive nodes are complemented by 'phantom'-generators, so that the whole power grid is homogenous, thus removing the need for load-flow-iterations. Programming of microcomputers is essentially performed in FORTRAN.

  7. Fast electrostatic force calculation on parallel computer clusters

    International Nuclear Information System (INIS)

    Kia, Amirali; Kim, Daejoong; Darve, Eric

    2008-01-01

    The fast multipole method (FMM) and smooth particle mesh Ewald (SPME) are well known fast algorithms to evaluate long range electrostatic interactions in molecular dynamics and other fields. FMM is a multi-scale method which reduces the computation cost by approximating the potential due to a group of particles at a large distance using few multipole functions. This algorithm scales like O(N) for N particles. SPME algorithm is an O(NlnN) method which is based on an interpolation of the Fourier space part of the Ewald sum and evaluating the resulting convolutions using fast Fourier transform (FFT). Those algorithms suffer from relatively poor efficiency on large parallel machines especially for mid-size problems around hundreds of thousands of atoms. A variation of the FMM, called PWA, based on plane wave expansions is presented in this paper. A new parallelization strategy for PWA, which takes advantage of the specific form of this expansion, is described. Its parallel efficiency is compared with SPME through detail time measurements on two different computer clusters

  8. Performance evaluation for compressible flow calculations on five parallel computers of different architectures

    International Nuclear Information System (INIS)

    Kimura, Toshiya.

    1997-03-01

    A two-dimensional explicit Euler solver has been implemented for five MIMD parallel computers of different machine architectures in Center for Promotion of Computational Science and Engineering of Japan Atomic Energy Research Institute. These parallel computers are Fujitsu VPP300, NEC SX-4, CRAY T94, IBM SP2, and Hitachi SR2201. The code was parallelized by several parallelization methods, and a typical compressible flow problem has been calculated for different grid sizes changing the number of processors. Their effective performances for parallel calculations, such as calculation speed, speed-up ratio and parallel efficiency, have been investigated and evaluated. The communication time among processors has been also measured and evaluated. As a result, the differences on the performance and the characteristics between vector-parallel and scalar-parallel computers can be pointed, and it will present the basic data for efficient use of parallel computers and for large scale CFD simulations on parallel computers. (author)

  9. An Introduction to Parallel Cluster Computing Using PVM for Computer Modeling and Simulation of Engineering Problems

    International Nuclear Information System (INIS)

    Spencer, VN

    2001-01-01

    An investigation has been conducted regarding the ability of clustered personal computers to improve the performance of executing software simulations for solving engineering problems. The power and utility of personal computers continues to grow exponentially through advances in computing capabilities such as newer microprocessors, advances in microchip technologies, electronic packaging, and cost effective gigabyte-size hard drive capacity. Many engineering problems require significant computing power. Therefore, the computation has to be done by high-performance computer systems that cost millions of dollars and need gigabytes of memory to complete the task. Alternately, it is feasible to provide adequate computing in the form of clustered personal computers. This method cuts the cost and size by linking (clustering) personal computers together across a network. Clusters also have the advantage that they can be used as stand-alone computers when they are not operating as a parallel computer. Parallel computing software to exploit clusters is available for computer operating systems like Unix, Windows NT, or Linux. This project concentrates on the use of Windows NT, and the Parallel Virtual Machine (PVM) system to solve an engineering dynamics problem in Fortran

  10. Overview of Parallel Platforms for Common High Performance Computing

    Directory of Open Access Journals (Sweden)

    T. Fryza

    2012-04-01

    Full Text Available The paper deals with various parallel platforms used for high performance computing in the signal processing domain. More precisely, the methods exploiting the multicores central processing units such as message passing interface and OpenMP are taken into account. The properties of the programming methods are experimentally proved in the application of a fast Fourier transform and a discrete cosine transform and they are compared with the possibilities of MATLAB's built-in functions and Texas Instruments digital signal processors with very long instruction word architectures. New FFT and DCT implementations were proposed and tested. The implementation phase was compared with CPU based computing methods and with possibilities of the Texas Instruments digital signal processing library on C6747 floating-point DSPs. The optimal combination of computing methods in the signal processing domain and new, fast routines' implementation is proposed as well.

  11. Parallel computation of automatic differentiation applied to magnetic field calculations

    International Nuclear Information System (INIS)

    Hinkins, R.L.; Lawrence Berkeley Lab., CA

    1994-09-01

    The author presents a parallelization of an accelerator physics application to simulate magnetic field in three dimensions. The problem involves the evaluation of high order derivatives with respect to two variables of a multivariate function. Automatic differentiation software had been used with some success, but the computation time was prohibitive. The implementation runs on several platforms, including a network of workstations using PVM, a MasPar using MPFortran, and a CM-5 using CMFortran. A careful examination of the code led to several optimizations that improved its serial performance by a factor of 8.7. The parallelization produced further improvements, especially on the MasPar with a speedup factor of 620. As a result a problem that took six days on a SPARC 10/41 now runs in minutes on the MasPar, making it feasible for physicists at Lawrence Berkeley Laboratory to simulate larger magnets

  12. A finite element solution method for quadrics parallel computer

    International Nuclear Information System (INIS)

    Zucchini, A.

    1996-08-01

    A distributed preconditioned conjugate gradient method for finite element analysis has been developed and implemented on a parallel SIMD Quadrics computer. The main characteristic of the method is that it does not require any actual assembling of all element equations in a global system. The physical domain of the problem is partitioned in cells of n p finite elements and each cell element is assigned to a different node of an n p -processors machine. Element stiffness matrices are stored in the data memory of the assigned processing node and the solution process is completely executed in parallel at element level. Inter-element and therefore inter-processor communications are required once per iteration to perform local sums of vector quantities between neighbouring elements. A prototype implementation has been tested on an 8-nodes Quadrics machine in a simple 2D benchmark problem

  13. Parallel computation of transverse wakes in linear colliders

    International Nuclear Information System (INIS)

    Zhan, Xiaowei; Ko, Kwok.

    1996-11-01

    SLAC has proposed the detuned structure (DS) as one possible design to control the emittance growth of long bunch trains due to transverse wakefields in the Next Linear Collider (NLC). The DS consists of 206 cells with tapering from cell to cell of the order of few microns to provide Gaussian detuning of the dipole modes. The decoherence of these modes leads to two orders of magnitude reduction in wakefield experienced by the trailing bunch. To model such a large heterogeneous structure realistically is impractical with finite-difference codes using structured grids. The authors have calculated the wakefield in the DS on a parallel computer with a finite-element code using an unstructured grid. The parallel implementation issues are presented along with simulation results that include contributions from higher dipole bands and wall dissipation

  14. A scalable implementation of RI-SCF on parallel computers

    International Nuclear Information System (INIS)

    Fruechtl, H.A.; Kendall, R.A.; Harrison, R.J.

    1996-01-01

    In order to avoid the integral bottleneck of conventional SCF calculations, the Resolution of the Identity (RI) method is used to obtain an approximate solution to the Hartree-Fock equations. In this approximation only three-center integrals are needed to build the Fock matrix. It has been implemented as part of the NWChem package of portable and scalable ab initio programs for parallel computers. Utilizing the V-approximation, both the Coulomb and exchange contribution to the Fock matrix can be calculated from a transformed set of three-center integrals which have to be precalculated and stored. A distributed in-core method as well as a disk based implementation have been programmed. Details of the implementation as well as the parallel programming tools used are described. We also give results and timings from benchmark calculations

  15. Eighth SIAM conference on parallel processing for scientific computing: Final program and abstracts

    Energy Technology Data Exchange (ETDEWEB)

    NONE

    1997-12-31

    This SIAM conference is the premier forum for developments in parallel numerical algorithms, a field that has seen very lively and fruitful developments over the past decade, and whose health is still robust. Themes for this conference were: combinatorial optimization; data-parallel languages; large-scale parallel applications; message-passing; molecular modeling; parallel I/O; parallel libraries; parallel software tools; parallel compilers; particle simulations; problem-solving environments; and sparse matrix computations.

  16. High performance parallel computers for science: New developments at the Fermilab advanced computer program

    International Nuclear Information System (INIS)

    Nash, T.; Areti, H.; Atac, R.

    1988-08-01

    Fermilab's Advanced Computer Program (ACP) has been developing highly cost effective, yet practical, parallel computers for high energy physics since 1984. The ACP's latest developments are proceeding in two directions. A Second Generation ACP Multiprocessor System for experiments will include $3500 RISC processors each with performance over 15 VAX MIPS. To support such high performance, the new system allows parallel I/O, parallel interprocess communication, and parallel host processes. The ACP Multi-Array Processor, has been developed for theoretical physics. Each $4000 node is a FORTRAN or C programmable pipelined 20 MFlops (peak), 10 MByte single board computer. These are plugged into a 16 port crossbar switch crate which handles both inter and intra crate communication. The crates are connected in a hypercube. Site oriented applications like lattice gauge theory are supported by system software called CANOPY, which makes the hardware virtually transparent to users. A 256 node, 5 GFlop, system is under construction. 10 refs., 7 figs

  17. Simulation of partially coherent light propagation using parallel computing devices

    Science.gov (United States)

    Magalhães, Tiago C.; Rebordão, José M.

    2017-08-01

    Light acquires or loses coherence and coherence is one of the few optical observables. Spectra can be derived from coherence functions and understanding any interferometric experiment is also relying upon coherence functions. Beyond the two limiting cases (full coherence or incoherence) the coherence of light is always partial and it changes with propagation. We have implemented a code to compute the propagation of partially coherent light from the source plane to the observation plane using parallel computing devices (PCDs). In this paper, we restrict the propagation in free space only. To this end, we used the Open Computing Language (OpenCL) and the open-source toolkit PyOpenCL, which gives access to OpenCL parallel computation through Python. To test our code, we chose two coherence source models: an incoherent source and a Gaussian Schell-model source. In the former case, we divided into two different source shapes: circular and rectangular. The results were compared to the theoretical values. Our implemented code allows one to choose between the PyOpenCL implementation and a standard one, i.e using the CPU only. To test the computation time for each implementation (PyOpenCL and standard), we used several computer systems with different CPUs and GPUs. We used powers of two for the dimensions of the cross-spectral density matrix (e.g. 324, 644) and a significant speed increase is observed in the PyOpenCL implementation when compared to the standard one. This can be an important tool for studying new source models.

  18. Cluster implementation for parallel computation within MATLAB software environment

    International Nuclear Information System (INIS)

    Santana, Antonio O. de; Dantas, Carlos C.; Charamba, Luiz G. da R.; Souza Neto, Wilson F. de; Melo, Silvio B. Melo; Lima, Emerson A. de O.

    2013-01-01

    A cluster for parallel computation with MATLAB software the COCGT - Cluster for Optimizing Computing in Gamma ray Transmission methods, is implemented. The implementation correspond to creation of a local net of computers, facilities and configurations of software, as well as the accomplishment of cluster tests for determine and optimizing of performance in the data processing. The COCGT implementation was required by data computation from gamma transmission measurements applied to fluid dynamic and tomography reconstruction in a FCC-Fluid Catalytic Cracking cold pilot unity, and simulation data as well. As an initial test the determination of SVD - Singular Values Decomposition - of random matrix with dimension (n , n), n=1000, using the Girco's law modified, revealed that COCGT was faster in comparison to the literature [1] cluster, which is similar and operates at the same conditions. Solution of a system of linear equations provided a new test for the COCGT performance by processing a square matrix with n=10000, computing time was 27 s and for square matrix with n=12000, computation time was 45 s. For determination of the cluster behavior in relation to 'parfor' (parallel for-loop) and 'spmd' (single program multiple data), two codes were used containing those two commands and the same problem: determination of SVD of a square matrix with n= 1000. The execution of codes by means of COCGT proved: 1) for the code with 'parfor', the performance improved with the labs number from 1 to 8 labs; 2) for the code 'spmd', just 1 lab (core) was enough to process and give results in less than 1 s. In similar situation, with the difference that now the SVD will be determined from square matrix with n1500, for code with 'parfor', and n=7000, for code with 'spmd'. That results take to conclusions: 1) for the code with 'parfor', the behavior was the same already described above; 2) for code with 'spmd', the same besides having produced a larger performance, it supports a

  19. Parallel computation of multigroup reactivity coefficient using iterative method

    Science.gov (United States)

    Susmikanti, Mike; Dewayatna, Winter

    2013-09-01

    One of the research activities to support the commercial radioisotope production program is a safety research target irradiation FPM (Fission Product Molybdenum). FPM targets form a tube made of stainless steel in which the nuclear degrees of superimposed high-enriched uranium. FPM irradiation tube is intended to obtain fission. The fission material widely used in the form of kits in the world of nuclear medicine. Irradiation FPM tube reactor core would interfere with performance. One of the disorders comes from changes in flux or reactivity. It is necessary to study a method for calculating safety terrace ongoing configuration changes during the life of the reactor, making the code faster became an absolute necessity. Neutron safety margin for the research reactor can be reused without modification to the calculation of the reactivity of the reactor, so that is an advantage of using perturbation method. The criticality and flux in multigroup diffusion model was calculate at various irradiation positions in some uranium content. This model has a complex computation. Several parallel algorithms with iterative method have been developed for the sparse and big matrix solution. The Black-Red Gauss Seidel Iteration and the power iteration parallel method can be used to solve multigroup diffusion equation system and calculated the criticality and reactivity coeficient. This research was developed code for reactivity calculation which used one of safety analysis with parallel processing. It can be done more quickly and efficiently by utilizing the parallel processing in the multicore computer. This code was applied for the safety limits calculation of irradiated targets FPM with increment Uranium.

  20. An efficient heuristic versus a robust hybrid meta-heuristic for general framework of serial-parallel redundancy problem

    International Nuclear Information System (INIS)

    Sadjadi, Seyed Jafar; Soltani, R.

    2009-01-01

    We present a heuristic approach to solve a general framework of serial-parallel redundancy problem where the reliability of the system is maximized subject to some general linear constraints. The complexity of the redundancy problem is generally considered to be NP-Hard and the optimal solution is not normally available. Therefore, to evaluate the performance of the proposed method, a hybrid genetic algorithm is also implemented whose parameters are calibrated via Taguchi's robust design method. Then, various test problems are solved and the computational results indicate that the proposed heuristic approach could provide us some promising reliabilities, which are fairly close to optimal solutions in a reasonable amount of time.

  1. The control of a parallel hybrid-electric propulsion system for a small unmanned aerial vehicle using a CMAC neural network.

    Science.gov (United States)

    Harmon, Frederick G; Frank, Andrew A; Joshi, Sanjay S

    2005-01-01

    A Simulink model, a propulsion energy optimization algorithm, and a CMAC controller were developed for a small parallel hybrid-electric unmanned aerial vehicle (UAV). The hybrid-electric UAV is intended for military, homeland security, and disaster-monitoring missions involving intelligence, surveillance, and reconnaissance (ISR). The Simulink model is a forward-facing simulation program used to test different control strategies. The flexible energy optimization algorithm for the propulsion system allows relative importance to be assigned between the use of gasoline, electricity, and recharging. A cerebellar model arithmetic computer (CMAC) neural network approximates the energy optimization results and is used to control the parallel hybrid-electric propulsion system. The hybrid-electric UAV with the CMAC controller uses 67.3% less energy than a two-stroke gasoline-powered UAV during a 1-h ISR mission and 37.8% less energy during a longer 3-h ISR mission.

  2. SPINET: A Parallel Computing Approach to Spine Simulations

    Directory of Open Access Journals (Sweden)

    Peter G. Kropf

    1996-01-01

    Full Text Available Research in scientitic programming enables us to realize more and more complex applications, and on the other hand, application-driven demands on computing methods and power are continuously growing. Therefore, interdisciplinary approaches become more widely used. The interdisciplinary SPINET project presented in this article applies modern scientific computing tools to biomechanical simulations: parallel computing and symbolic and modern functional programming. The target application is the human spine. Simulations of the spine help us to investigate and better understand the mechanisms of back pain and spinal injury. Two approaches have been used: the first uses the finite element method for high-performance simulations of static biomechanical models, and the second generates a simulation developmenttool for experimenting with different dynamic models. A finite element program for static analysis has been parallelized for the MUSIC machine. To solve the sparse system of linear equations, a conjugate gradient solver (iterative method and a frontal solver (direct method have been implemented. The preprocessor required for the frontal solver is written in the modern functional programming language SML, the solver itself in C, thus exploiting the characteristic advantages of both functional and imperative programming. The speedup analysis of both solvers show very satisfactory results for this irregular problem. A mixed symbolic-numeric environment for rigid body system simulations is presented. It automatically generates C code from a problem specification expressed by the Lagrange formalism using Maple.

  3. Large-scale parallel genome assembler over cloud computing environment.

    Science.gov (United States)

    Das, Arghya Kusum; Koppa, Praveen Kumar; Goswami, Sayan; Platania, Richard; Park, Seung-Jong

    2017-06-01

    The size of high throughput DNA sequencing data has already reached the terabyte scale. To manage this huge volume of data, many downstream sequencing applications started using locality-based computing over different cloud infrastructures to take advantage of elastic (pay as you go) resources at a lower cost. However, the locality-based programming model (e.g. MapReduce) is relatively new. Consequently, developing scalable data-intensive bioinformatics applications using this model and understanding the hardware environment that these applications require for good performance, both require further research. In this paper, we present a de Bruijn graph oriented Parallel Giraph-based Genome Assembler (GiGA), as well as the hardware platform required for its optimal performance. GiGA uses the power of Hadoop (MapReduce) and Giraph (large-scale graph analysis) to achieve high scalability over hundreds of compute nodes by collocating the computation and data. GiGA achieves significantly higher scalability with competitive assembly quality compared to contemporary parallel assemblers (e.g. ABySS and Contrail) over traditional HPC cluster. Moreover, we show that the performance of GiGA is significantly improved by using an SSD-based private cloud infrastructure over traditional HPC cluster. We observe that the performance of GiGA on 256 cores of this SSD-based cloud infrastructure closely matches that of 512 cores of traditional HPC cluster.

  4. CX: A Scalable, Robust Network for Parallel Computing

    Directory of Open Access Journals (Sweden)

    Peter Cappello

    2002-01-01

    Full Text Available CX, a network-based computational exchange, is presented. The system's design integrates variations of ideas from other researchers, such as work stealing, non-blocking tasks, eager scheduling, and space-based coordination. The object-oriented API is simple, compact, and cleanly separates application logic from the logic that supports interprocess communication and fault tolerance. Computations, of course, run to completion in the presence of computational hosts that join and leave the ongoing computation. Such hosts, or producers, use task caching and prefetching to overlap computation with interprocessor communication. To break a potential task server bottleneck, a network of task servers is presented. Even though task servers are envisioned as reliable, the self-organizing, scalable network of n- servers, described as a sibling-connected height-balanced fat tree, tolerates a sequence of n-1 server failures. Tasks are distributed throughout the server network via a simple "diffusion" process. CX is intended as a test bed for research on automated silent auctions, reputation services, authentication services, and bonding services. CX also provides a test bed for algorithm research into network-based parallel computation.

  5. A Hybrid FPGA/Coarse Parallel Processing Architecture for Multi-modal Visual Feature Descriptors

    DEFF Research Database (Denmark)

    Jensen, Lars Baunegaard With; Kjær-Nielsen, Anders; Alonso, Javier Díaz

    2008-01-01

    This paper describes the hybrid architecture developed for speeding up the processing of so-called multi-modal visual primitives which are sparse image descriptors extracted along contours. In the system, the first stages of visual processing are implemented on FPGAs due to their highly parallel...

  6. Micrometer and nanometer-scale parallel patterning of ceramic and organic-inorganic hybrid materials

    NARCIS (Netherlands)

    ten Elshof, Johan E.; Khan, Sajid; Göbel, Ole

    2010-01-01

    This review gives an overview of the progress made in recent years in the development of low-cost parallel patterning techniques for ceramic materials, silica, and organic–inorganic silsesquioxane-based hybrids from wet-chemical solutions and suspensions on the micrometer and nanometer-scale. The

  7. Hybrid cloud and cluster computing paradigms for life science applications.

    Science.gov (United States)

    Qiu, Judy; Ekanayake, Jaliya; Gunarathne, Thilina; Choi, Jong Youl; Bae, Seung-Hee; Li, Hui; Zhang, Bingjing; Wu, Tak-Lon; Ruan, Yang; Ekanayake, Saliya; Hughes, Adam; Fox, Geoffrey

    2010-12-21

    Clouds and MapReduce have shown themselves to be a broadly useful approach to scientific computing especially for parallel data intensive applications. However they have limited applicability to some areas such as data mining because MapReduce has poor performance on problems with an iterative structure present in the linear algebra that underlies much data analysis. Such problems can be run efficiently on clusters using MPI leading to a hybrid cloud and cluster environment. This motivates the design and implementation of an open source Iterative MapReduce system Twister. Comparisons of Amazon, Azure, and traditional Linux and Windows environments on common applications have shown encouraging performance and usability comparisons in several important non iterative cases. These are linked to MPI applications for final stages of the data analysis. Further we have released the open source Twister Iterative MapReduce and benchmarked it against basic MapReduce (Hadoop) and MPI in information retrieval and life sciences applications. The hybrid cloud (MapReduce) and cluster (MPI) approach offers an attractive production environment while Twister promises a uniform programming environment for many Life Sciences applications. We used commercial clouds Amazon and Azure and the NSF resource FutureGrid to perform detailed comparisons and evaluations of different approaches to data intensive computing. Several applications were developed in MPI, MapReduce and Twister in these different environments.

  8. Routing performance analysis and optimization within a massively parallel computer

    Science.gov (United States)

    Archer, Charles Jens; Peters, Amanda; Pinnow, Kurt Walter; Swartz, Brent Allen

    2013-04-16

    An apparatus, program product and method optimize the operation of a massively parallel computer system by, in part, receiving actual performance data concerning an application executed by the plurality of interconnected nodes, and analyzing the actual performance data to identify an actual performance pattern. A desired performance pattern may be determined for the application, and an algorithm may be selected from among a plurality of algorithms stored within a memory, the algorithm being configured to achieve the desired performance pattern based on the actual performance data.

  9. Particle orbit tracking on a parallel computer: Hypertrack

    International Nuclear Information System (INIS)

    Cole, B.; Bourianoff, G.; Pilat, F.; Talman, R.

    1991-05-01

    A program has been written which performs particle orbit tracking on the Intel iPSC/860 distributed memory parallel computer. The tracking is performed using a thin element approach. A brief description of the structure and performance of the code is presented, along with applications of the code to the analysis of accelerator lattices for the SSC. The concept of ''ensemble tracking'', i.e. the tracking of ensemble averages of noninteracting particles, such as the emittance, is presented. Preliminary results of such studies will be presented. 2 refs., 6 figs

  10. Work-Efficient Parallel Skyline Computation for the GPU

    DEFF Research Database (Denmark)

    Bøgh, Kenneth Sejdenfaden; Chester, Sean; Assent, Ira

    2015-01-01

    offers the potential for parallelizing skyline computation across thousands of cores. However, attempts to port skyline algorithms to the GPU have prioritized throughput and failed to outperform sequential algorithms. In this paper, we introduce a new skyline algorithm, designed for the GPU, that uses...... a global, static partitioning scheme. With the partitioning, we can permit controlled branching to exploit transitive relationships and avoid most point-to-point comparisons. The result is a non-traditional GPU algorithm, SkyAlign, that prioritizes work-effciency and respectable throughput, rather than...

  11. Parameters that affect parallel processing for computational electromagnetic simulation codes on high performance computing clusters

    Science.gov (United States)

    Moon, Hongsik

    What is the impact of multicore and associated advanced technologies on computational software for science? Most researchers and students have multicore laptops or desktops for their research and they need computing power to run computational software packages. Computing power was initially derived from Central Processing Unit (CPU) clock speed. That changed when increases in clock speed became constrained by power requirements. Chip manufacturers turned to multicore CPU architectures and associated technological advancements to create the CPUs for the future. Most software applications benefited by the increased computing power the same way that increases in clock speed helped applications run faster. However, for Computational ElectroMagnetics (CEM) software developers, this change was not an obvious benefit - it appeared to be a detriment. Developers were challenged to find a way to correctly utilize the advancements in hardware so that their codes could benefit. The solution was parallelization and this dissertation details the investigation to address these challenges. Prior to multicore CPUs, advanced computer technologies were compared with the performance using benchmark software and the metric was FLoting-point Operations Per Seconds (FLOPS) which indicates system performance for scientific applications that make heavy use of floating-point calculations. Is FLOPS an effective metric for parallelized CEM simulation tools on new multicore system? Parallel CEM software needs to be benchmarked not only by FLOPS but also by the performance of other parameters related to type and utilization of the hardware, such as CPU, Random Access Memory (RAM), hard disk, network, etc. The codes need to be optimized for more than just FLOPs and new parameters must be included in benchmarking. In this dissertation, the parallel CEM software named High Order Basis Based Integral Equation Solver (HOBBIES) is introduced. This code was developed to address the needs of the

  12. Representing and computing regular languages on massively parallel networks

    Energy Technology Data Exchange (ETDEWEB)

    Miller, M.I.; O' Sullivan, J.A. (Electronic Systems and Research Lab., of Electrical Engineering, Washington Univ., St. Louis, MO (US)); Boysam, B. (Dept. of Electrical, Computer and Systems Engineering, Rensselaer Polytechnic Inst., Troy, NY (US)); Smith, K.R. (Dept. of Electrical Engineering, Southern Illinois Univ., Edwardsville, IL (US))

    1991-01-01

    This paper proposes a general method for incorporating rule-based constraints corresponding to regular languages into stochastic inference problems, thereby allowing for a unified representation of stochastic and syntactic pattern constraints. The authors' approach first established the formal connection of rules to Chomsky grammars, and generalizes the original work of Shannon on the encoding of rule-based channel sequences to Markov chains of maximum entropy. This maximum entropy probabilistic view leads to Gibb's representations with potentials which have their number of minima growing at precisely the exponential rate that the language of deterministically constrained sequences grow. These representations are coupled to stochastic diffusion algorithms, which sample the language-constrained sequences by visiting the energy minima according to the underlying Gibbs' probability law. The coupling to stochastic search methods yields the all-important practical result that fully parallel stochastic cellular automata may be derived to generate samples from the rule-based constraint sets. The production rules and neighborhood state structure of the language of sequences directly determines the necessary connection structures of the required parallel computing surface. Representations of this type have been mapped to the DAP-510 massively-parallel processor consisting of 1024 mesh-connected bit-serial processing elements for performing automated segmentation of electron-micrograph images.

  13. Parallel Computer System for 3D Visualization Stereo on GPU

    Science.gov (United States)

    Al-Oraiqat, Anas M.; Zori, Sergii A.

    2018-03-01

    This paper proposes the organization of a parallel computer system based on Graphic Processors Unit (GPU) for 3D stereo image synthesis. The development is based on the modified ray tracing method developed by the authors for fast search of tracing rays intersections with scene objects. The system allows significant increase in the productivity for the 3D stereo synthesis of photorealistic quality. The generalized procedure of 3D stereo image synthesis on the Graphics Processing Unit/Graphics Processing Clusters (GPU/GPC) is proposed. The efficiency of the proposed solutions by GPU implementation is compared with single-threaded and multithreaded implementations on the CPU. The achieved average acceleration in multi-thread implementation on the test GPU and CPU is about 7.5 and 1.6 times, respectively. Studying the influence of choosing the size and configuration of the computational Compute Unified Device Archi-tecture (CUDA) network on the computational speed shows the importance of their correct selection. The obtained experimental estimations can be significantly improved by new GPUs with a large number of processing cores and multiprocessors, as well as optimized configuration of the computing CUDA network.

  14. Parallel computing and molecular dynamics of biological membranes

    International Nuclear Information System (INIS)

    La Penna, G.; Letardi, S.; Minicozzi, V.; Morante, S.; Rossi, G.C.; Salina, G.

    1998-01-01

    In this talk I discuss the general question of the portability of molecular dynamics codes for diffusive systems on parallel computers of the APE family. The intrinsic single precision of the today available platforms does not seem to affect the numerical accuracy of the simulations, while the absence of integer addressing from CPU to individual nodes puts strong constraints on possible programming strategies. Liquids can be satisfactorily simulated using the ''systolic'' method. For more complex systems, like the biological ones at which we are ultimately interested in, the ''domain decomposition'' approach is best suited to beat the quadratic growth of the inter-molecular computational time with the number of atoms of the system. The promising perspectives of using this strategy for extensive simulations of lipid bilayers are briefly reviewed. (orig.)

  15. Parallel computation for distributed parameter system-from vector processors to Adena computer

    Energy Technology Data Exchange (ETDEWEB)

    Nogi, T

    1983-04-01

    Research on advanced parallel hardware and software architectures for very high-speed computation deserves and needs more support and attention to fulfil its promise. Novel architectures for parallel processing are being made ready. Architectures for parallel processing can be roughly divided into two groups. One is a vector processor in which a single central processing unit involves multiple vector-arithmetic registers. The other is a processor array in which slave processors are connected to a host processor to perform parallel computation. In this review, the concept and data structure of the Adena (alternating-direction edition nexus array) architecture, which is conformable to distributed-parameter simulation algorithms, are described. 5 references.

  16. Concurrent computation of attribute filters on shared memory parallel machines

    NARCIS (Netherlands)

    Wilkinson, Michael H.F.; Gao, Hui; Hesselink, Wim H.; Jonker, Jan-Eppo; Meijster, Arnold

    2008-01-01

    Morphological attribute filters have not previously been parallelized mainly because they are both global and nonseparable. We propose a parallel algorithm that achieves efficient parallelism for a large class of attribute filters, including attribute openings, closings, thinnings, and thickenings,

  17. Parallel Computation of Unsteady Flows on a Network of Workstations

    Science.gov (United States)

    1997-01-01

    Parallel computation of unsteady flows requires significant computational resources. The utilization of a network of workstations seems an efficient solution to the problem where large problems can be treated at a reasonable cost. This approach requires the solution of several problems: 1) the partitioning and distribution of the problem over a network of workstation, 2) efficient communication tools, 3) managing the system efficiently for a given problem. Of course, there is the question of the efficiency of any given numerical algorithm to such a computing system. NPARC code was chosen as a sample for the application. For the explicit version of the NPARC code both two- and three-dimensional problems were studied. Again both steady and unsteady problems were investigated. The issues studied as a part of the research program were: 1) how to distribute the data between the workstations, 2) how to compute and how to communicate at each node efficiently, 3) how to balance the load distribution. In the following, a summary of these activities is presented. Details of the work have been presented and published as referenced.

  18. Simple, parallel, high-performance virtual machines for extreme computations

    International Nuclear Information System (INIS)

    Chokoufe Nejad, Bijan; Ohl, Thorsten; Reuter, Jurgen

    2014-11-01

    We introduce a high-performance virtual machine (VM) written in a numerically fast language like Fortran or C to evaluate very large expressions. We discuss the general concept of how to perform computations in terms of a VM and present specifically a VM that is able to compute tree-level cross sections for any number of external legs, given the corresponding byte code from the optimal matrix element generator, O'Mega. Furthermore, this approach allows to formulate the parallel computation of a single phase space point in a simple and obvious way. We analyze hereby the scaling behaviour with multiple threads as well as the benefits and drawbacks that are introduced with this method. Our implementation of a VM can run faster than the corresponding native, compiled code for certain processes and compilers, especially for very high multiplicities, and has in general runtimes in the same order of magnitude. By avoiding the tedious compile and link steps, which may fail for source code files of gigabyte sizes, new processes or complex higher order corrections that are currently out of reach could be evaluated with a VM given enough computing power.

  19. Genetic algorithm with small population size for search feasible control parameters for parallel hybrid electric vehicles

    Directory of Open Access Journals (Sweden)

    Yu-Huei Cheng

    2017-11-01

    Full Text Available The control strategy is a major unit in hybrid electric vehicles (HEVs. In order to provide suitable control parameters for reducing fuel consumptions and engine emissions while maintaining vehicle performance requirements, the genetic algorithm (GA with small population size is applied to search for feasible control parameters in parallel HEVs. The electric assist control strategy (EACS is used as the fundamental control strategy of parallel HEVs. The dynamic performance requirements stipulated in the Partnership for a New Generation of Vehicles (PNGV is considered to maintain the vehicle performance. The known ADvanced VehIcle SimulatOR (ADVISOR is used to simulate a specific parallel HEV with urban dynamometer driving schedule (UDDS. Five population sets with size 5, 10, 15, 20, and 25 are used in the GA. The experimental results show that the GA with population size of 25 is the best for selecting feasible control parameters in parallel HEVs.

  20. Cloud identification using genetic algorithms and massively parallel computation

    Science.gov (United States)

    Buckles, Bill P.; Petry, Frederick E.

    1996-01-01

    As a Guest Computational Investigator under the NASA administered component of the High Performance Computing and Communication Program, we implemented a massively parallel genetic algorithm on the MasPar SIMD computer. Experiments were conducted using Earth Science data in the domains of meteorology and oceanography. Results obtained in these domains are competitive with, and in most cases better than, similar problems solved using other methods. In the meteorological domain, we chose to identify clouds using AVHRR spectral data. Four cloud speciations were used although most researchers settle for three. Results were remarkedly consistent across all tests (91% accuracy). Refinements of this method may lead to more timely and complete information for Global Circulation Models (GCMS) that are prevalent in weather forecasting and global environment studies. In the oceanographic domain, we chose to identify ocean currents from a spectrometer having similar characteristics to AVHRR. Here the results were mixed (60% to 80% accuracy). Given that one is willing to run the experiment several times (say 10), then it is acceptable to claim the higher accuracy rating. This problem has never been successfully automated. Therefore, these results are encouraging even though less impressive than the cloud experiment. Successful conclusion of an automated ocean current detection system would impact coastal fishing, naval tactics, and the study of micro-climates. Finally we contributed to the basic knowledge of GA (genetic algorithm) behavior in parallel environments. We developed better knowledge of the use of subpopulations in the context of shared breeding pools and the migration of individuals. Rigorous experiments were conducted based on quantifiable performance criteria. While much of the work confirmed current wisdom, for the first time we were able to submit conclusive evidence. The software developed under this grant was placed in the public domain. An extensive user

  1. Parallel processing using an optical delay-based reservoir computer

    Science.gov (United States)

    Van der Sande, Guy; Nguimdo, Romain Modeste; Verschaffelt, Guy

    2016-04-01

    Delay systems subject to delayed optical feedback have recently shown great potential in solving computationally hard tasks. By implementing a neuro-inspired computational scheme relying on the transient response to optical data injection, high processing speeds have been demonstrated. However, reservoir computing systems based on delay dynamics discussed in the literature are designed by coupling many different stand-alone components which lead to bulky, lack of long-term stability, non-monolithic systems. Here we numerically investigate the possibility of implementing reservoir computing schemes based on semiconductor ring lasers. Semiconductor ring lasers are semiconductor lasers where the laser cavity consists of a ring-shaped waveguide. SRLs are highly integrable and scalable, making them ideal candidates for key components in photonic integrated circuits. SRLs can generate light in two counterpropagating directions between which bistability has been demonstrated. We demonstrate that two independent machine learning tasks , even with different nature of inputs with different input data signals can be simultaneously computed using a single photonic nonlinear node relying on the parallelism offered by photonics. We illustrate the performance on simultaneous chaotic time series prediction and a classification of the Nonlinear Channel Equalization. We take advantage of different directional modes to process individual tasks. Each directional mode processes one individual task to mitigate possible crosstalk between the tasks. Our results indicate that prediction/classification with errors comparable to the state-of-the-art performance can be obtained even with noise despite the two tasks being computed simultaneously. We also find that a good performance is obtained for both tasks for a broad range of the parameters. The results are discussed in detail in [Nguimdo et al., IEEE Trans. Neural Netw. Learn. Syst. 26, pp. 3301-3307, 2015

  2. Computational performance of a smoothed particle hydrodynamics simulation for shared-memory parallel computing

    Science.gov (United States)

    Nishiura, Daisuke; Furuichi, Mikito; Sakaguchi, Hide

    2015-09-01

    The computational performance of a smoothed particle hydrodynamics (SPH) simulation is investigated for three types of current shared-memory parallel computer devices: many integrated core (MIC) processors, graphics processing units (GPUs), and multi-core CPUs. We are especially interested in efficient shared-memory allocation methods for each chipset, because the efficient data access patterns differ between compute unified device architecture (CUDA) programming for GPUs and OpenMP programming for MIC processors and multi-core CPUs. We first introduce several parallel implementation techniques for the SPH code, and then examine these on our target computer architectures to determine the most effective algorithms for each processor unit. In addition, we evaluate the effective computing performance and power efficiency of the SPH simulation on each architecture, as these are critical metrics for overall performance in a multi-device environment. In our benchmark test, the GPU is found to produce the best arithmetic performance as a standalone device unit, and gives the most efficient power consumption. The multi-core CPU obtains the most effective computing performance. The computational speed of the MIC processor on Xeon Phi approached that of two Xeon CPUs. This indicates that using MICs is an attractive choice for existing SPH codes on multi-core CPUs parallelized by OpenMP, as it gains computational acceleration without the need for significant changes to the source code.

  3. Flatness based feedforward control of a parallel hybrid drivetrain; Flachheitsbasierter Vorsteuerungsentwurf fuer den Antriebsstrang eines Parallelhybriden

    Energy Technology Data Exchange (ETDEWEB)

    Gasper, Rainer; Hesseler, Frank; Abel, Dirk [RWTH Aachen Univ. (Germany). Inst. fuer Regelungstechnik

    2010-10-15

    The advantages of Hybrid Electrical Vehicles (HEV) are fuel consumption reduction and minimization of exhaust emissions. Moreover, the drivability of a HEV is very important for the consumer acceptance. The gear shifts and the start of the internal combustion engine are very important for the drivability of a HEV. Because this two tasks are automated, oscillations in the vehicle would be uncomfortable for the driver. In the paper at hand, feedforward controllers for the drivetrain control of a parallel hybrid with an automated manual transmission and a dry clutch are presented. (orig.)

  4. Hierarchical Control of Parallel AC-DC Converter Interfaces for Hybrid Microgrids

    DEFF Research Database (Denmark)

    Lu, Xiaonan; Guerrero, Josep M.; Sun, Kai

    2014-01-01

    In this paper, a hierarchical control system for parallel power electronics interfaces between ac bus and dc bus in a hybrid microgrid is presented. Both standalone and grid-connected operation modes in the dc side of the microgrid are analyzed. Concretely, a three-level hierarchical control system...... equal or proportional dc load current sharing. The common secondary control level is designed to eliminate the dc bus voltage deviation produced by the droop control, with dc bus voltage in the hybrid microgrid boosted to an acceptable range. After guaranteeing the performance of the dc side standalone...

  5. Pacing a data transfer operation between compute nodes on a parallel computer

    Science.gov (United States)

    Blocksome, Michael A [Rochester, MN

    2011-09-13

    Methods, systems, and products are disclosed for pacing a data transfer between compute nodes on a parallel computer that include: transferring, by an origin compute node, a chunk of an application message to a target compute node; sending, by the origin compute node, a pacing request to a target direct memory access (`DMA`) engine on the target compute node using a remote get DMA operation; determining, by the origin compute node, whether a pacing response to the pacing request has been received from the target DMA engine; and transferring, by the origin compute node, a next chunk of the application message if the pacing response to the pacing request has been received from the target DMA engine.

  6. Nordic Summer School on Parallel Computing in Optimization

    CERN Document Server

    Pardalos, Panos; Storøy, Sverre

    1997-01-01

    During the last three decades, breakthroughs in computer technology have made a tremendous impact on optimization. In particular, parallel computing has made it possible to solve larger and computationally more difficult prob­ lems. This volume contains mainly lecture notes from a Nordic Summer School held at the Linkoping Institute of Technology, Sweden in August 1995. In order to make the book more complete, a few authors were invited to contribute chapters that were not part of the course on this first occasion. The purpose of this Nordic course in advanced studies was three-fold. One goal was to introduce the students to the new achievements in a new and very active field, bring them close to world leading researchers, and strengthen their competence in an area with internationally explosive rate of growth. A second goal was to strengthen the bonds between students from different Nordic countries, and to encourage collaboration and joint research ventures over the borders. In this respect, the course bui...

  7. Development of three-dimensional neoclassical transport simulation code with high performance Fortran on a vector-parallel computer

    International Nuclear Information System (INIS)

    Satake, Shinsuke; Okamoto, Masao; Nakajima, Noriyoshi; Takamaru, Hisanori

    2005-11-01

    A neoclassical transport simulation code (FORTEC-3D) applicable to three-dimensional configurations has been developed using High Performance Fortran (HPF). Adoption of computing techniques for parallelization and a hybrid simulation model to the δf Monte-Carlo method transport simulation, including non-local transport effects in three-dimensional configurations, makes it possible to simulate the dynamism of global, non-local transport phenomena with a self-consistent radial electric field within a reasonable computation time. In this paper, development of the transport code using HPF is reported. Optimization techniques in order to achieve both high vectorization and parallelization efficiency, adoption of a parallel random number generator, and also benchmark results, are shown. (author)

  8. Efficient relaxed-Jacobi smoothers for multigrid on parallel computers

    Science.gov (United States)

    Yang, Xiang; Mittal, Rajat

    2017-03-01

    In this Technical Note, we present a family of Jacobi-based multigrid smoothers suitable for the solution of discretized elliptic equations. These smoothers are based on the idea of scheduled-relaxation Jacobi proposed recently by Yang & Mittal (2014) [18] and employ two or three successive relaxed Jacobi iterations with relaxation factors derived so as to maximize the smoothing property of these iterations. The performance of these new smoothers measured in terms of convergence acceleration and computational workload, is assessed for multi-domain implementations typical of parallelized solvers, and compared to the lexicographic point Gauss-Seidel smoother. The tests include the geometric multigrid method on structured grids as well as the algebraic grid method on unstructured grids. The tests demonstrate that unlike Gauss-Seidel, the convergence of these Jacobi-based smoothers is unaffected by domain decomposition, and furthermore, they outperform the lexicographic Gauss-Seidel by factors that increase with domain partition count.

  9. Optimized collectives using a DMA on a parallel computer

    Science.gov (United States)

    Chen, Dong [Croton On Hudson, NY; Gabor, Dozsa [Ardsley, NY; Giampapa, Mark E [Irvington, NY; Heidelberger,; Phillip, [Cortlandt Manor, NY

    2011-02-08

    Optimizing collective operations using direct memory access controller on a parallel computer, in one aspect, may comprise establishing a byte counter associated with a direct memory access controller for each submessage in a message. The byte counter includes at least a base address of memory and a byte count associated with a submessage. A byte counter associated with a submessage is monitored to determine whether at least a block of data of the submessage has been received. The block of data has a predetermined size, for example, a number of bytes. The block is processed when the block has been fully received, for example, when the byte count indicates all bytes of the block have been received. The monitoring and processing may continue for all blocks in all submessages in the message.

  10. Literature Review on the Hybrid Flow Shop Scheduling Problem with Unrelated Parallel Machines

    Directory of Open Access Journals (Sweden)

    Eliana Marcela Peña Tibaduiza

    2017-01-01

    Full Text Available Context: The flow shop hybrid problem with unrelated parallel machines has been less studied in the academia compared to the flow shop hybrid with identical processors. For this reason, there are few reports about the kind of application of this problem in industries. Method: A literature review of the state of the art on flow-shop scheduling problem was conducted by collecting and analyzing academic papers on several scientific databases. For this aim, a search query was constructed using keywords defining the problem and checking the inclusion of unrelated parallel machines in such definition; as a result, 50 papers were finally selected for this study. Results: A classification of the problem according to the characteristics of the production system was performed, also solution methods, constraints and objective functions commonly used are presented. Conclusions: An increasing trend is observed in studies of flow shop with multiple stages, but few are based on industry case-studies.

  11. Accelerating Astronomy & Astrophysics in the New Era of Parallel Computing: GPUs, Phi and Cloud Computing

    Science.gov (United States)

    Ford, Eric B.; Dindar, Saleh; Peters, Jorg

    2015-08-01

    The realism of astrophysical simulations and statistical analyses of astronomical data are set by the available computational resources. Thus, astronomers and astrophysicists are constantly pushing the limits of computational capabilities. For decades, astronomers benefited from massive improvements in computational power that were driven primarily by increasing clock speeds and required relatively little attention to details of the computational hardware. For nearly a decade, increases in computational capabilities have come primarily from increasing the degree of parallelism, rather than increasing clock speeds. Further increases in computational capabilities will likely be led by many-core architectures such as Graphical Processing Units (GPUs) and Intel Xeon Phi. Successfully harnessing these new architectures, requires significantly more understanding of the hardware architecture, cache hierarchy, compiler capabilities and network network characteristics.I will provide an astronomer's overview of the opportunities and challenges provided by modern many-core architectures and elastic cloud computing. The primary goal is to help an astronomical audience understand what types of problems are likely to yield more than order of magnitude speed-ups and which problems are unlikely to parallelize sufficiently efficiently to be worth the development time and/or costs.I will draw on my experience leading a team in developing the Swarm-NG library for parallel integration of large ensembles of small n-body systems on GPUs, as well as several smaller software projects. I will share lessons learned from collaborating with computer scientists, including both technical and soft skills. Finally, I will discuss the challenges of training the next generation of astronomers to be proficient in this new era of high-performance computing, drawing on experience teaching a graduate class on High-Performance Scientific Computing for Astrophysics and organizing a 2014 advanced summer

  12. Solution of finite element problems using hybrid parallelization with MPI and OpenMP Solution of finite element problems using hybrid parallelization with MPI and OpenMP

    Directory of Open Access Journals (Sweden)

    José Miguel Vargas-Félix

    2012-11-01

    Full Text Available The Finite Element Method (FEM is used to solve problems like solid deformation and heat diffusion in domains with complex geometries. This kind of geometries requires discretization with millions of elements; this is equivalent to solve systems of equations with sparse matrices and tens or hundreds of millions of variables. The aim is to use computer clusters to solve these systems. The solution method used is Schur substructuration. Using it is possible to divide a large system of equations into many small ones to solve them more efficiently. This method allows parallelization. MPI (Message Passing Interface is used to distribute the systems of equations to solve each one in a computer of a cluster. Each system of equations is solved using a solver implemented to use OpenMP as a local parallelization method.The Finite Element Method (FEM is used to solve problems like solid deformation and heat diffusion in domains with complex geometries. This kind of geometries requires discretization with millions of elements; this is equivalent to solve systems of equations with sparse matrices and tens or hundreds of millions of variables. The aim is to use computer clusters to solve these systems. The solution method used is Schur substructuration. Using it is possible to divide a large system of equations into many small ones to solve them more efficiently. This method allows parallelization. MPI (Message Passing Interface is used to distribute the systems of equations to solve each one in a computer of a cluster. Each system of equations is solved using a solver implemented to use OpenMP as a local parallelization method.

  13. Experiences Using Hybrid MPI/OpenMP in the Real World: Parallelization of a 3D CFD Solver for Multi-Core Node Clusters

    Directory of Open Access Journals (Sweden)

    Gabriele Jost

    2010-01-01

    Full Text Available Today most systems in high-performance computing (HPC feature a hierarchical hardware design: shared-memory nodes with several multi-core CPUs are connected via a network infrastructure. When parallelizing an application for these architectures it seems natural to employ a hierarchical programming model such as combining MPI and OpenMP. Nevertheless, there is the general lore that pure MPI outperforms the hybrid MPI/OpenMP approach. In this paper, we describe the hybrid MPI/OpenMP parallelization of IR3D (Incompressible Realistic 3-D code, a full-scale real-world application, which simulates the environmental effects on the evolution of vortices trailing behind control surfaces of underwater vehicles. We discuss performance, scalability and limitations of the pure MPI version of the code on a variety of hardware platforms and show how the hybrid approach can help to overcome certain limitations.

  14. Torque Split Strategy for Parallel Hybrid Electric Vehicles with an Integrated Starter Generator

    OpenAIRE

    Fu, Zhumu; Gao, Aiyun; Wang, Xiaohong; Song, Xiaona

    2014-01-01

    This paper presents a torque split strategy for parallel hybrid electric vehicles with an integrated starter generator (ISG-PHEV) by using fuzzy logic control. By combining the efficiency map and the optimum torque curve of the internal combustion engine (ICE) with the state of charge (SOC) of the batteries, the torque split strategy is designed, which manages the ICE within its peak efficiency region. Taking the quantified ICE torque, the quantified SOC of the batteries, and the quantified I...

  15. HTTR plant dynamic simulation using a hybrid computer

    International Nuclear Information System (INIS)

    Shimazaki, Junya; Suzuki, Katsuo; Nabeshima, Kunihiko; Watanabe, Koichi; Shinohara, Yoshikuni; Nakagawa, Shigeaki.

    1990-01-01

    A plant dynamic simulation of High-Temperature Engineering Test Reactor has been made using a new-type hybrid computer. This report describes a dynamic simulation model of HTTR, a hybrid simulation method for SIMSTAR and some results obtained from dynamics analysis of HTTR simulation. It concludes that the hybrid plant simulation is useful for on-line simulation on account of its capability of computation at high speed, compared with that of all digital computer simulation. With sufficient accuracy, 40 times faster computation than real time was reached only by changing an analog time scale for HTTR simulation. (author)

  16. An Accurate liver segmentation method using parallel computing algorithm

    International Nuclear Information System (INIS)

    Elbasher, Eiman Mohammed Khalied

    2014-12-01

    Computed Tomography (CT or CAT scan) is a noninvasive diagnostic imaging procedure that uses a combination of X-rays and computer technology to produce horizontal, or axial, images (often called slices) of the body. A CT scan shows detailed images of any part of the body, including the bones muscles, fat and organs CT scans are more detailed than standard x-rays. CT scans may be done with or without "contrast Contrast refers to a substance taken by mouth and/ or injected into an intravenous (IV) line that causes the particular organ or tissue under study to be seen more clearly. CT scan of the liver and biliary tract are used in the diagnosis of many diseases in the abdomen structures, particularly when another type of examination, such as X-rays, physical examination, and ultra sound is not conclusive. Unfortunately, the presence of noise and artifact in the edges and fine details in the CT images limit the contrast resolution and make diagnostic procedure more difficult. This experimental study was conducted at the College of Medical Radiological Science, Sudan University of Science and Technology and Fidel Specialist Hospital. The sample of study was included 50 patients. The main objective of this research was to study an accurate liver segmentation method using a parallel computing algorithm, and to segment liver and adjacent organs using image processing technique. The main technique of segmentation used in this study was watershed transform. The scope of image processing and analysis applied to medical application is to improve the quality of the acquired image and extract quantitative information from medical image data in an efficient and accurate way. The results of this technique agreed wit the results of Jarritt et al, (2010), Kratchwil et al, (2010), Jover et al, (2011), Yomamoto et al, (1996), Cai et al (1999), Saudha and Jayashree (2010) who used different segmentation filtering based on the methods of enhancing the computed tomography images. Anther

  17. Modelling of data uncertainties on hybrid computers

    Energy Technology Data Exchange (ETDEWEB)

    Schneider, Anke (ed.)

    2016-06-15

    The codes d{sup 3}f and r{sup 3}t are well established for modelling density-driven flow and nuclide transport in the far field of repositories for hazardous material in deep geological formations. They are applicable in porous media as well as in fractured rock or mudstone, for modelling salt- and heat transport as well as a free groundwater surface. Development of the basic framework of d{sup 3}f and r{sup 3}t had begun more than 20 years ago. Since that time significant advancements took place in the requirements for safety assessment as well as for computer hardware development. The period of safety assessment for a repository of high-level radioactive waste was extended to 1 million years, and the complexity of the models is steadily growing. Concurrently, the demands on accuracy increase. Additionally, model and parameter uncertainties become more and more important for an increased understanding of prediction reliability. All this leads to a growing demand for computational power that requires a considerable software speed-up. An effective way to achieve this is the use of modern, hybrid computer architectures which requires basically the set-up of new data structures and a corresponding code revision but offers a potential speed-up by several orders of magnitude. The original codes d{sup 3}f and r{sup 3}t were applications of the software platform UG /BAS 94/ whose development had begun in the early nineteennineties. However, UG had recently been advanced to the C++ based, substantially revised version UG4 /VOG 13/. To benefit also in the future from state-of-the-art numerical algorithms and to use hybrid computer architectures, the codes d{sup 3}f and r{sup 3}t were transferred to this new code platform. Making use of the fact that coupling between different sets of equations is natively supported in UG4, d{sup 3}f and r{sup 3}t were combined to one conjoint code d{sup 3}f++. A direct estimation of uncertainties for complex groundwater flow models with the

  18. Parallel Solver for H(div) Problems Using Hybridization and AMG

    Energy Technology Data Exchange (ETDEWEB)

    Lee, Chak S. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Vassilevski, Panayot S. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)

    2016-01-15

    In this paper, a scalable parallel solver is proposed for H(div) problems discretized by arbitrary order finite elements on general unstructured meshes. The solver is based on hybridization and algebraic multigrid (AMG). Unlike some previously studied H(div) solvers, the hybridization solver does not require discrete curl and gradient operators as additional input from the user. Instead, only some element information is needed in the construction of the solver. The hybridization results in a H1-equivalent symmetric positive definite system, which is then rescaled and solved by AMG solvers designed for H1 problems. Weak and strong scaling of the method are examined through several numerical tests. Our numerical results show that the proposed solver provides a promising alternative to ADS, a state-of-the-art solver [12], for H(div) problems. In fact, it outperforms ADS for higher order elements.

  19. Mechatronic Design of a New Humanoid Robot with Hybrid Parallel Actuation

    Directory of Open Access Journals (Sweden)

    Vítor Santos

    2012-10-01

    Full Text Available Humanoid robotics is unquestionably a challenging and long-term field of research. Of the numerous and most urgent challenges to tackle, autonomous and efficient locomotion may possibly be the most underdeveloped at present in the research community. Therefore, to pursue studies in relation to autonomy with efficient locomotion, the authors have been developing a new teen-sized humanoid platform with hybrid characteristics. The hybrid nature is clear in the mixed actuation based on common electrical motors and passive actuators attached in parallel to the motors. This paper presents the mechatronic design of the humanoid platform, focusing mainly on the mechanical structure, the design and simulation of the hybrid joints, and the different subsystems implemented. Trying to keep the appropriate human proportions and main degrees of freedom, the developed platform utilizes a distributed control architecture and a rich set of sensing capabilities, both ripe for future development and research.

  20. Parallelized computation for computer simulation of electrocardiograms using personal computers with multi-core CPU and general-purpose GPU.

    Science.gov (United States)

    Shen, Wenfeng; Wei, Daming; Xu, Weimin; Zhu, Xin; Yuan, Shizhong

    2010-10-01

    Biological computations like electrocardiological modelling and simulation usually require high-performance computing environments. This paper introduces an implementation of parallel computation for computer simulation of electrocardiograms (ECGs) in a personal computer environment with an Intel CPU of Core (TM) 2 Quad Q6600 and a GPU of Geforce 8800GT, with software support by OpenMP and CUDA. It was tested in three parallelization device setups: (a) a four-core CPU without a general-purpose GPU, (b) a general-purpose GPU plus 1 core of CPU, and (c) a four-core CPU plus a general-purpose GPU. To effectively take advantage of a multi-core CPU and a general-purpose GPU, an algorithm based on load-prediction dynamic scheduling was developed and applied to setting (c). In the simulation with 1600 time steps, the speedup of the parallel computation as compared to the serial computation was 3.9 in setting (a), 16.8 in setting (b), and 20.0 in setting (c). This study demonstrates that a current PC with a multi-core CPU and a general-purpose GPU provides a good environment for parallel computations in biological modelling and simulation studies. Copyright 2010 Elsevier Ireland Ltd. All rights reserved.

  1. Optimal energy management for a series-parallel hybrid electric bus

    International Nuclear Information System (INIS)

    Xiong Weiwei; Zhang Yong; Yin Chengliang

    2009-01-01

    This paper aims to present a new type of series-parallel hybrid electric bus and its energy management strategy. This hybrid bus is a post-transmission coupled system employing a novel transmission as the series-parallel configuration switcher. In this paper, the vehicle architecture, transmission scheme and numerical models are presented. The energy management system governs the mode switching between the series mode and the parallel mode as well as the instantaneous power distribution. In this work, two separated controllers using fuzzy logic called Mode Decision and Parallel-driving Energy Management are employed to fulfill these two tasks. The energy management strategy and the applications of fuzzy logic are described. The strategy is validated by a forward-facing simulation program based on the software Matlab/Simulink. The results show that the energy management strategy is effective to control the engine operating in a high-efficiency region as well as to sustain the battery charge state while satisfy the drive ability. The energy consumption is theoretically reduced by 30.3% to that of the conventional bus under transit bus driving cycle. In addition, works need future study are also presented.

  2. Iterative algorithms for large sparse linear systems on parallel computers

    Science.gov (United States)

    Adams, L. M.

    1982-01-01

    Algorithms for assembling in parallel the sparse system of linear equations that result from finite difference or finite element discretizations of elliptic partial differential equations, such as those that arise in structural engineering are developed. Parallel linear stationary iterative algorithms and parallel preconditioned conjugate gradient algorithms are developed for solving these systems. In addition, a model for comparing parallel algorithms on array architectures is developed and results of this model for the algorithms are given.

  3. III - Template Metaprogramming for massively parallel scientific computing - Templates for Iteration; Thread-level Parallelism

    CERN Multimedia

    CERN. Geneva

    2016-01-01

    Large scale scientific computing raises questions on different levels ranging from the fomulation of the problems to the choice of the best algorithms and their implementation for a specific platform. There are similarities in these different topics that can be exploited by modern-style C++ template metaprogramming techniques to produce readable, maintainable and generic code. Traditional low-level code tend to be fast but platform-dependent, and it obfuscates the meaning of the algorithm. On the other hand, object-oriented approach is nice to read, but may come with an inherent performance penalty. These lectures aim to present he basics of the Expression Template (ET) idiom which allows us to keep the object-oriented approach without sacrificing performance. We will in particular show to to enhance ET to include SIMD vectorization. We will then introduce techniques for abstracting iteration, and introduce thread-level parallelism for use in heavy data-centric loads. We will show to to apply these methods i...

  4. Implementing Molecular Dynamics for Hybrid High Performance Computers - 1. Short Range Forces

    International Nuclear Information System (INIS)

    Brown, W. Michael; Wang, Peng; Plimpton, Steven J.; Tharrington, Arnold N.

    2011-01-01

    The use of accelerators such as general-purpose graphics processing units (GPGPUs) have become popular in scientific computing applications due to their low cost, impressive floating-point capabilities, high memory bandwidth, and low electrical power requirements. Hybrid high performance computers, machines with more than one type of floating-point processor, are now becoming more prevalent due to these advantages. In this work, we discuss several important issues in porting a large molecular dynamics code for use on parallel hybrid machines - (1) choosing a hybrid parallel decomposition that works on central processing units (CPUs) with distributed memory and accelerator cores with shared memory, (2) minimizing the amount of code that must be ported for efficient acceleration, (3) utilizing the available processing power from both many-core CPUs and accelerators, and (4) choosing a programming model for acceleration. We present our solution to each of these issues for short-range force calculation in the molecular dynamics package LAMMPS. We describe algorithms for efficient short range force calculation on hybrid high performance machines. We describe a new approach for dynamic load balancing of work between CPU and accelerator cores. We describe the Geryon library that allows a single code to compile with both CUDA and OpenCL for use on a variety of accelerators. Finally, we present results on a parallel test cluster containing 32 Fermi GPGPUs and 180 CPU cores.

  5. A hybrid computer simulation of reactor spatial dynamics

    International Nuclear Information System (INIS)

    Hinds, H.W.

    1977-08-01

    The partial differential equations describing the one-speed spatial dynamics of thermal neutron reactors were converted to a set of ordinary differential equations, using finite-difference approximations for the spatial derivatives. The variables were then normalized to a steady-state reference condition in a novel manner, to yield an equation set particularly suitable for implementation on a hybrid computer. One Applied Dynamics AD/FIVE analog-computer console is capable of solving, all in parallel, up to 30 simultaneous differential equations. This corresponds roughly to eight reactor nodes, each with two active delayed-neutron groups. To improve accuracy, an increase in the number of nodes is usually required. Using the Hsu-Howe multiplexing technique, an 8-node, one-dimensional module was switched back and forth between the left and right halves of the reactor, to simulate a 16-node model, also in one dimension. These two versions (8 or 16 nodes) of the model were tested on benchmark problems of the loss-of-coolant type, which were also solved using the digital code FORSIM, with two energy groups and 26 nodes. Good agreement was obtained between the two solution techniques. (author)

  6. Voith hybrid systems - parallel hybrid for rail vehicles; Voith Hybridsysteme - Parallelhybrid fuer Schienenfahrzeuge

    Energy Technology Data Exchange (ETDEWEB)

    Groezinger, Thomas; Berger, Juergen; Discher, Andreas; Bartosch, Stephan [Voith Turbo GmbH und Co. KG (Germany)

    2010-03-15

    The article presents a variety of ways help to save fuel, reduce noise and minimize harmful emissions for rail vehicles. These ECO components can be used separately or in combination with drive systems for various types of hybrid concepts. For example, via a hydrostatic or electric hybrid system can recuperate and store braking energy and utilize it for powering the vehicle or driving auxiliary systems. Another system converts lost heat from the drive motor into mechanical or electrical energy. With EcoConsult, Voith Turbo also offers a ''toolbox'' comprising software, hardware and consultancy which allows identifying the exact operating conditions and a reliable calculation of the life cycle cost (LCC) for a variety of vehicle categories and operating profiles. (orig.)

  7. Parallel Computational Fluid Dynamics 2007 : Implementations and Experiences on Large Scale and Grid Computing

    CERN Document Server

    2009-01-01

    At the 19th Annual Conference on Parallel Computational Fluid Dynamics held in Antalya, Turkey, in May 2007, the most recent developments and implementations of large-scale and grid computing were presented. This book, comprised of the invited and selected papers of this conference, details those advances, which are of particular interest to CFD and CFD-related communities. It also offers the results related to applications of various scientific and engineering problems involving flows and flow-related topics. Intended for CFD researchers and graduate students, this book is a state-of-the-art presentation of the relevant methodology and implementation techniques of large-scale computing.

  8. Contribution to the algorithmic and efficient programming of new parallel architectures including accelerators for neutron physics and shielding computations

    International Nuclear Information System (INIS)

    Dubois, J.

    2011-01-01

    In science, simulation is a key process for research or validation. Modern computer technology allows faster numerical experiments, which are cheaper than real models. In the field of neutron simulation, the calculation of eigenvalues is one of the key challenges. The complexity of these problems is such that a lot of computing power may be necessary. The work of this thesis is first the evaluation of new computing hardware such as graphics card or massively multi-core chips, and their application to eigenvalue problems for neutron simulation. Then, in order to address the massive parallelism of supercomputers national, we also study the use of asynchronous hybrid methods for solving eigenvalue problems with this very high level of parallelism. Then we experiment the work of this research on several national supercomputers such as the Titane hybrid machine of the Computing Center, Research and Technology (CCRT), the Curie machine of the Very Large Computing Centre (TGCC), currently being installed, and the Hopper machine at the Lawrence Berkeley National Laboratory (LBNL). We also do our experiments on local workstations to illustrate the interest of this research in an everyday use with local computing resources. (author) [fr

  9. A class of parallel algorithms for computation of the manipulator inertia matrix

    Science.gov (United States)

    Fijany, Amir; Bejczy, Antal K.

    1989-01-01

    Parallel and parallel/pipeline algorithms for computation of the manipulator inertia matrix are presented. An algorithm based on composite rigid-body spatial inertia method, which provides better features for parallelization, is used for the computation of the inertia matrix. Two parallel algorithms are developed which achieve the time lower bound in computation. Also described is the mapping of these algorithms with topological variation on a two-dimensional processor array, with nearest-neighbor connection, and with cardinality variation on a linear processor array. An efficient parallel/pipeline algorithm for the linear array was also developed, but at significantly higher efficiency.

  10. Parallel In Situ Indexing for Data-intensive Computing

    Energy Technology Data Exchange (ETDEWEB)

    Kim, Jinoh; Abbasi, Hasan; Chacon, Luis; Docan, Ciprian; Klasky, Scott; Liu, Qing; Podhorszki, Norbert; Shoshani, Arie; Wu, Kesheng

    2011-09-09

    As computing power increases exponentially, vast amount of data is created by many scientific re- search activities. However, the bandwidth for storing the data to disks and reading the data from disks has been improving at a much slower pace. These two trends produce an ever-widening data access gap. Our work brings together two distinct technologies to address this data access issue: indexing and in situ processing. From decades of database research literature, we know that indexing is an effective way to address the data access issue, particularly for accessing relatively small fraction of data records. As data sets increase in sizes, more and more analysts need to use selective data access, which makes indexing an even more important for improving data access. The challenge is that most implementations of in- dexing technology are embedded in large database management systems (DBMS), but most scientific datasets are not managed by any DBMS. In this work, we choose to include indexes with the scientific data instead of requiring the data to be loaded into a DBMS. We use compressed bitmap indexes from the FastBit software which are known to be highly effective for query-intensive workloads common to scientific data analysis. To use the indexes, we need to build them first. The index building procedure needs to access the whole data set and may also require a significant amount of compute time. In this work, we adapt the in situ processing technology to generate the indexes, thus removing the need of read- ing data from disks and to build indexes in parallel. The in situ data processing system used is ADIOS, a middleware for high-performance I/O. Our experimental results show that the indexes can improve the data access time up to 200 times depending on the fraction of data selected, and using in situ data processing system can effectively reduce the time needed to create the indexes, up to 10 times with our in situ technique when using identical parallel settings.

  11. Power-balancing instantaneous optimization energy management for a novel series-parallel hybrid electric bus

    Science.gov (United States)

    Sun, Dongye; Lin, Xinyou; Qin, Datong; Deng, Tao

    2012-11-01

    Energy management(EM) is a core technique of hybrid electric bus(HEB) in order to advance fuel economy performance optimization and is unique for the corresponding configuration. There are existing algorithms of control strategy seldom take battery power management into account with international combustion engine power management. In this paper, a type of power-balancing instantaneous optimization(PBIO) energy management control strategy is proposed for a novel series-parallel hybrid electric bus. According to the characteristic of the novel series-parallel architecture, the switching boundary condition between series and parallel mode as well as the control rules of the power-balancing strategy are developed. The equivalent fuel model of battery is implemented and combined with the fuel of engine to constitute the objective function which is to minimize the fuel consumption at each sampled time and to coordinate the power distribution in real-time between the engine and battery. To validate the proposed strategy effective and reasonable, a forward model is built based on Matlab/Simulink for the simulation and the dSPACE autobox is applied to act as a controller for hardware in-the-loop integrated with bench test. Both the results of simulation and hardware-in-the-loop demonstrate that the proposed strategy not only enable to sustain the battery SOC within its operational range and keep the engine operation point locating the peak efficiency region, but also the fuel economy of series-parallel hybrid electric bus(SPHEB) dramatically advanced up to 30.73% via comparing with the prototype bus and a similar improvement for PBIO strategy relative to rule-based strategy, the reduction of fuel consumption is up to 12.38%. The proposed research ensures the algorithm of PBIO is real-time applicability, improves the efficiency of SPHEB system, as well as suite to complicated configuration perfectly.

  12. Performance Characteristics of Hybrid MPI/OpenMP Implementations of NAS Parallel Benchmarks SP and BT on Large-Scale Multicore Clusters

    KAUST Repository

    Wu, X.; Taylor, V.

    2011-01-01

    The NAS Parallel Benchmarks (NPB) are well-known applications with fixed algorithms for evaluating parallel systems and tools. Multicore clusters provide a natural programming paradigm for hybrid programs, whereby OpenMP can be used with the data sharing with the multicores that comprise a node, and MPI can be used with the communication between nodes. In this paper, we use Scalar Pentadiagonal (SP) and Block Tridiagonal (BT) benchmarks of MPI NPB 3.3 as a basis for a comparative approach to implement hybrid MPI/OpenMP versions of SP and BT. In particular, we can compare the performance of the hybrid SP and BT with the MPI counterparts on large-scale multicore clusters, Intrepid (BlueGene/P) at Argonne National Laboratory and Jaguar (Cray XT4/5) at Oak Ridge National Laboratory. Our performance results indicate that the hybrid SP outperforms the MPI SP by up to 20.76 %, and the hybrid BT outperforms the MPI BT by up to 8.58 % on up to 10 000 cores on Intrepid and Jaguar. We also use performance tools and MPI trace libraries available on these clusters to further investigate the performance characteristics of the hybrid SP and BT. © 2011 The Author. Published by Oxford University Press on behalf of The British Computer Society. All rights reserved.

  13. Performance Characteristics of Hybrid MPI/OpenMP Implementations of NAS Parallel Benchmarks SP and BT on Large-Scale Multicore Clusters

    KAUST Repository

    Wu, X.

    2011-07-18

    The NAS Parallel Benchmarks (NPB) are well-known applications with fixed algorithms for evaluating parallel systems and tools. Multicore clusters provide a natural programming paradigm for hybrid programs, whereby OpenMP can be used with the data sharing with the multicores that comprise a node, and MPI can be used with the communication between nodes. In this paper, we use Scalar Pentadiagonal (SP) and Block Tridiagonal (BT) benchmarks of MPI NPB 3.3 as a basis for a comparative approach to implement hybrid MPI/OpenMP versions of SP and BT. In particular, we can compare the performance of the hybrid SP and BT with the MPI counterparts on large-scale multicore clusters, Intrepid (BlueGene/P) at Argonne National Laboratory and Jaguar (Cray XT4/5) at Oak Ridge National Laboratory. Our performance results indicate that the hybrid SP outperforms the MPI SP by up to 20.76 %, and the hybrid BT outperforms the MPI BT by up to 8.58 % on up to 10 000 cores on Intrepid and Jaguar. We also use performance tools and MPI trace libraries available on these clusters to further investigate the performance characteristics of the hybrid SP and BT. © 2011 The Author. Published by Oxford University Press on behalf of The British Computer Society. All rights reserved.

  14. Identifying logical planes formed of compute nodes of a subcommunicator in a parallel computer

    Science.gov (United States)

    Davis, Kristan D.; Faraj, Daniel A.

    2016-03-01

    In a parallel computer, a plurality of logical planes formed of compute nodes of a subcommunicator may be identified by: for each compute node of the subcommunicator and for a number of dimensions beginning with a first dimension: establishing, by a plane building node, in a positive direction of the first dimension, all logical planes that include the plane building node and compute nodes of the subcommunicator in a positive direction of a second dimension, where the second dimension is orthogonal to the first dimension; and establishing, by the plane building node, in a negative direction of the first dimension, all logical planes that include the plane building node and compute nodes of the subcommunicator in the positive direction of the second dimension.

  15. I - Template Metaprogramming for Massively Parallel Scientific Computing - Expression Templates

    CERN Multimedia

    CERN. Geneva

    2016-01-01

    Large scale scientific computing raises questions on different levels ranging from the fomulation of the problems to the choice of the best algorithms and their implementation for a specific platform. There are similarities in these different topics that can be exploited by modern-style C++ template metaprogramming techniques to produce readable, maintainable and generic code. Traditional low-level code tend to be fast but platform-dependent, and it obfuscates the meaning of the algorithm. On the other hand, object-oriented approach is nice to read, but may come with an inherent performance penalty. These lectures aim to present he basics of the Expression Template (ET) idiom which allows us to keep the object-oriented approach without sacrificing performance. We will in particular show to to enhance ET to include SIMD vectorization. We will then introduce techniques for abstracting iteration, and introduce thread-level parallelism for use in heavy data-centric loads. We will show to to apply these methods i...

  16. Parallel multiphysics algorithms and software for computational nuclear engineering

    International Nuclear Information System (INIS)

    Gaston, D; Hansen, G; Kadioglu, S; Knoll, D A; Newman, C; Park, H; Permann, C; Taitano, W

    2009-01-01

    There is a growing trend in nuclear reactor simulation to consider multiphysics problems. This can be seen in reactor analysis where analysts are interested in coupled flow, heat transfer and neutronics, and in fuel performance simulation where analysts are interested in thermomechanics with contact coupled to species transport and chemistry. These more ambitious simulations usually motivate some level of parallel computing. Many of the coupling efforts to date utilize simple code coupling or first-order operator splitting, often referred to as loose coupling. While these approaches can produce answers, they usually leave questions of accuracy and stability unanswered. Additionally, the different physics often reside on separate grids which are coupled via simple interpolation, again leaving open questions of stability and accuracy. Utilizing state of the art mathematics and software development techniques we are deploying next generation tools for nuclear engineering applications. The Jacobian-free Newton-Krylov (JFNK) method combined with physics-based preconditioning provide the underlying mathematical structure for our tools. JFNK is understood to be a modern multiphysics algorithm, but we are also utilizing its unique properties as a scale bridging algorithm. To facilitate rapid development of multiphysics applications we have developed the Multiphysics Object-Oriented Simulation Environment (MOOSE). Examples from two MOOSE-based applications: PRONGHORN, our multiphysics gas cooled reactor simulation tool and BISON, our multiphysics, multiscale fuel performance simulation tool will be presented.

  17. Adaptive Dynamic Process Scheduling on Distributed Memory Parallel Computers

    Directory of Open Access Journals (Sweden)

    Wei Shu

    1994-01-01

    Full Text Available One of the challenges in programming distributed memory parallel machines is deciding how to allocate work to processors. This problem is particularly important for computations with unpredictable dynamic behaviors or irregular structures. We present a scheme for dynamic scheduling of medium-grained processes that is useful in this context. The adaptive contracting within neighborhood (ACWN is a dynamic, distributed, load-dependent, and scalable scheme. It deals with dynamic and unpredictable creation of processes and adapts to different systems. The scheme is described and contrasted with two other schemes that have been proposed in this context, namely the randomized allocation and the gradient model. The performance of the three schemes on an Intel iPSC/2 hypercube is presented and analyzed. The experimental results show that even though the ACWN algorithm incurs somewhat larger overhead than the randomized allocation, it achieves better performance in most cases due to its adaptiveness. Its feature of quickly spreading the work helps it outperform the gradient model in performance and scalability.

  18. Hybrid parallelization of the XTOR-2F code for the simulation of two-fluid MHD instabilities in tokamaks

    Science.gov (United States)

    Marx, Alain; Lütjens, Hinrich

    2017-03-01

    A hybrid MPI/OpenMP parallel version of the XTOR-2F code [Lütjens and Luciani, J. Comput. Phys. 229 (2010) 8130] solving the two-fluid MHD equations in full tokamak geometry by means of an iterative Newton-Krylov matrix-free method has been developed. The present work shows that the code has been parallelized significantly despite the numerical profile of the problem solved by XTOR-2F, i.e. a discretization with pseudo-spectral representations in all angular directions, the stiffness of the two-fluid stability problem in tokamaks, and the use of a direct LU decomposition to invert the physical pre-conditioner at every Krylov iteration of the solver. The execution time of the parallelized version is an order of magnitude smaller than the sequential one for low resolution cases, with an increasing speedup when the discretization mesh is refined. Moreover, it allows to perform simulations with higher resolutions, previously forbidden because of memory limitations.

  19. Parallel computing of a climate model on the dawn 1000 by domain decomposition method

    Science.gov (United States)

    Bi, Xunqiang

    1997-12-01

    In this paper the parallel computing of a grid-point nine-level atmospheric general circulation model on the Dawn 1000 is introduced. The model was developed by the Institute of Atmospheric Physics (IAP), Chinese Academy of Sciences (CAS). The Dawn 1000 is a MIMD massive parallel computer made by National Research Center for Intelligent Computer (NCIC), CAS. A two-dimensional domain decomposition method is adopted to perform the parallel computing. The potential ways to increase the speed-up ratio and exploit more resources of future massively parallel supercomputation are also discussed.

  20. Parallel computing for data science with examples in R, C++ and CUDA

    CERN Document Server

    Matloff, Norman

    2015-01-01

    Parallel Computing for Data Science: With Examples in R, C++ and CUDA is one of the first parallel computing books to concentrate exclusively on parallel data structures, algorithms, software tools, and applications in data science. It includes examples not only from the classic ""n observations, p variables"" matrix format but also from time series, network graph models, and numerous other structures common in data science. The examples illustrate the range of issues encountered in parallel programming.With the main focus on computation, the book shows how to compute on three types of platfor

  1. A Parallel Energy-Sharing Control Strategy for Fuel Cell Hybrid Vehicle

    Directory of Open Access Journals (Sweden)

    Nik Rumzi Nik Idris

    2011-08-01

    Full Text Available This paper presents a parallel energy-sharing control strategy for the application of fuel cell hybrid vehicles (FCHVs. The hybrid source discussed consists of a fuel cells (FCs generator and energy storage units (ESUs which composed by the battery and ultracapacitor (UC modules. A direct current (DC bus is used to interface between the energy sources and the electric vehicles (EV propulsion system (loads. Energy sources are connected to the DC bus using of power electronics converters. A total of six control loops are designed in the supervisory system in order to regulate the DC bus voltage, control of current flow and to monitor the state of charge (SOC of each energy storage device at the same time. Proportional plus integral (PI controllers are employed to regulate the output from each control loop referring to their reference signals. The proposed energy control system is simulated in MATLAB/Simulink environment. Results indicated that the proposed parallel energy-sharing control system is capable to provide a practical hybrid vehicle in respond to the vehicle traction response and avoids the FC and battery from overstressed at the same time.

  2. A Hybrid Parallel Execution Model for Logic Based Requirement Specifications (Invited Paper

    Directory of Open Access Journals (Sweden)

    Jeffrey J. P. Tsai

    1999-05-01

    Full Text Available It is well known that undiscovered errors in a requirements specification is extremely expensive to be fixed when discovered in the software maintenance phase. Errors in the requirement phase can be reduced through the validation and verification of the requirements specification. Many logic-based requirements specification languages have been developed to achieve these goals. However, the execution and reasoning of a logic-based requirements specification can be very slow. An effective way to improve their performance is to execute and reason the logic-based requirements specification in parallel. In this paper, we present a hybrid model to facilitate the parallel execution of a logic-based requirements specification language. A logic-based specification is first applied by a data dependency analysis technique which can find all the mode combinations that exist within a specification clause. This mode information is used to support a novel hybrid parallel execution model, which combines both top-down and bottom-up evaluation strategies. This new execution model can find the failure in the deepest node of the search tree at the early stage of the evaluation, thus this new execution model can reduce the total number of nodes searched in the tree, the total processes needed to be generated, and the total communication channels needed in the search process. A simulator has been implemented to analyze the execution behavior of the new model. Experiments show significant improvement based on several criteria.

  3. Design, Dynamics, and Workspace of a Hybrid-Driven-Based Cable Parallel Manipulator

    Directory of Open Access Journals (Sweden)

    Bin Zi

    2013-01-01

    Full Text Available The design, dynamics, and workspace of a hybrid-driven-based cable parallel manipulator (HDCPM are presented. The HDCPM is able to perform high efficiency, heavy load, and high-performance motion due to the advantages of both the cable parallel manipulator and the hybrid-driven planar five-bar mechanism. The design is performed according to theories of mechanism structure synthesis for cable parallel manipulators. The dynamic formulation of the HDCPM is established on the basis of Newton-Euler method. The workspace of the manipulator is analyzed additionally. As an example, a completely restrained HDCPM with 3 degrees of freedom is studied in simulation in order to verify the validity of the proposed design, workspace, and dynamic analysis. The simulation results, compared with the theoretical analysis, and the case study previously performed show that the manipulator design is reasonable and the mathematical models are correct, which provides the theoretical basis for future physical prototype and control system design.

  4. Endpoint-based parallel data processing with non-blocking collective instructions in a parallel active messaging interface of a parallel computer

    Science.gov (United States)

    Archer, Charles J; Blocksome, Michael A; Cernohous, Bob R; Ratterman, Joseph D; Smith, Brian E

    2014-11-11

    Endpoint-based parallel data processing with non-blocking collective instructions in a PAMI of a parallel computer is disclosed. The PAMI is composed of data communications endpoints, each including a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task. The compute nodes are coupled for data communications through the PAMI. The parallel application establishes a data communications geometry specifying a set of endpoints that are used in collective operations of the PAMI by associating with the geometry a list of collective algorithms valid for use with the endpoints of the geometry; registering in each endpoint in the geometry a dispatch callback function for a collective operation; and executing without blocking, through a single one of the endpoints in the geometry, an instruction for the collective operation.

  5. Introduction to massively-parallel computing in high-energy physics

    CERN Document Server

    AUTHOR|(CDS)2083520

    1993-01-01

    Ever since computers were first used for scientific and numerical work, there has existed an "arms race" between the technical development of faster computing hardware, and the desires of scientists to solve larger problems in shorter time-scales. However, the vast leaps in processor performance achieved through advances in semi-conductor science have reached a hiatus as the technology comes up against the physical limits of the speed of light and quantum effects. This has lead all high performance computer manufacturers to turn towards a parallel architecture for their new machines. In these lectures we will introduce the history and concepts behind parallel computing, and review the various parallel architectures and software environments currently available. We will then introduce programming methodologies that allow efficient exploitation of parallel machines, and present case studies of the parallelization of typical High Energy Physics codes for the two main classes of parallel computing architecture (S...

  6. Parallel Computation of the Jacobian Matrix for Nonlinear Equation Solvers Using MATLAB

    Science.gov (United States)

    Rose, Geoffrey K.; Nguyen, Duc T.; Newman, Brett A.

    2017-01-01

    Demonstrating speedup for parallel code on a multicore shared memory PC can be challenging in MATLAB due to underlying parallel operations that are often opaque to the user. This can limit potential for improvement of serial code even for the so-called embarrassingly parallel applications. One such application is the computation of the Jacobian matrix inherent to most nonlinear equation solvers. Computation of this matrix represents the primary bottleneck in nonlinear solver speed such that commercial finite element (FE) and multi-body-dynamic (MBD) codes attempt to minimize computations. A timing study using MATLAB's Parallel Computing Toolbox was performed for numerical computation of the Jacobian. Several approaches for implementing parallel code were investigated while only the single program multiple data (spmd) method using composite objects provided positive results. Parallel code speedup is demonstrated but the goal of linear speedup through the addition of processors was not achieved due to PC architecture.

  7. Comparative Study of Dynamic Programming and Pontryagin’s Minimum Principle on Energy Management for a Parallel Hybrid Electric Vehicle

    Directory of Open Access Journals (Sweden)

    Huei Peng

    2013-04-01

    Full Text Available This paper compares two optimal energy management methods for parallel hybrid electric vehicles using an Automatic Manual Transmission (AMT. A control-oriented model of the powertrain and vehicle dynamics is built first. The energy management is formulated as a typical optimal control problem to trade off the fuel consumption and gear shifting frequency under admissible constraints. The Dynamic Programming (DP and Pontryagin’s Minimum Principle (PMP are applied to obtain the optimal solutions. Tuning with the appropriate co-states, the PMP solution is found to be very close to that from DP. The solution for the gear shifting in PMP has an algebraic expression associated with the vehicular velocity and can be implemented more efficiently in the control algorithm. The computation time of PMP is significantly less than DP.

  8. Benefits of a parallel hybrid electric architecture on medium commercial vehicles

    Energy Technology Data Exchange (ETDEWEB)

    Boot, Marco Aimo; Consano, Ludovico [Iveco S.p.A, Turin (Italy)

    2009-07-01

    Hybrid electric technology is becoming an increasingly interesting solution for medium and heavy trucks involved in urban and suburban missions. The increasing demand for gas and oil, consequent price rises and environmental concerns are driving a market that is in need of alternative solutions. For these reasons, the growth in the global hybrid market significantly exceeded all the hybrid sales forecasts. The parallel hybrid electric vehicle (PHEV) employs an additional power source (electric motogenerator) in combination with the conventional diesel engine. This architecture exploits the benefits of both power sources in order to reduce the fuel consumption, increase the overall power, and above all, decrease CO2 emissions. Moreover, the emissions reduction target is lead by EU Regulations and local initiatives for traffic limitations, but the real drivers for the growth in the market are demonstrable fuel economy improvements and productivity costs optimization (global efficiency). This paper presents the results achieved by Iveco in the development and testing of parallel hybrid systems applied to medium range commercial vehicles, with the intent to evaluate the functionality, driveability performance and leading the best reduction in terms of fuel consumption and emissions in different real-world missions. The system architecture foresees one electric motor/generator and a single clutch unit. An external electrical power source for the battery recharging it is not necessary. The chosen configuration allows to implement the following functional modes: Stop and Start with Electric Launch, Hybrid Mode, Regenerative Braking Mode, Inertial Start and Creeping Mode. The software contained in the supervisor control unit has been tuned to the customer specific missions, taking in account on road data acquisition in order to demonstrate the reliability, driveability and the overall efficiency of the hybrid system. The field tests carried out in collaboration with

  9. Parallel paving: An algorithm for generating distributed, adaptive, all-quadrilateral meshes on parallel computers

    Energy Technology Data Exchange (ETDEWEB)

    Lober, R.R.; Tautges, T.J.; Vaughan, C.T.

    1997-03-01

    Paving is an automated mesh generation algorithm which produces all-quadrilateral elements. It can additionally generate these elements in varying sizes such that the resulting mesh adapts to a function distribution, such as an error function. While powerful, conventional paving is a very serial algorithm in its operation. Parallel paving is the extension of serial paving into parallel environments to perform the same meshing functions as conventional paving only on distributed, discretized models. This extension allows large, adaptive, parallel finite element simulations to take advantage of paving`s meshing capabilities for h-remap remeshing. A significantly modified version of the CUBIT mesh generation code has been developed to host the parallel paving algorithm and demonstrate its capabilities on both two dimensional and three dimensional surface geometries and compare the resulting parallel produced meshes to conventionally paved meshes for mesh quality and algorithm performance. Sandia`s {open_quotes}tiling{close_quotes} dynamic load balancing code has also been extended to work with the paving algorithm to retain parallel efficiency as subdomains undergo iterative mesh refinement.

  10. An object-oriented programming paradigm for parallelization of computational fluid dynamics

    International Nuclear Information System (INIS)

    Ohta, Takashi.

    1997-03-01

    We propose an object-oriented programming paradigm for parallelization of scientific computing programs, and show that the approach can be a very useful strategy. Generally, parallelization of scientific programs tends to be complicated and unportable due to the specific requirements of each parallel computer or compiler. In this paper, we show that the object-oriented programming design, which separates the parallel processing parts from the solver of the applications, can achieve the large improvement in the maintenance of the codes, as well as the high portability. We design the program for the two-dimensional Euler equations according to the paradigm, and evaluate the parallel performance on IBM SP2. (author)

  11. Parallelization of MCNP 4, a Monte Carlo neutron and photon transport code system, in highly parallel distributed memory type computer

    International Nuclear Information System (INIS)

    Masukawa, Fumihiro; Takano, Makoto; Naito, Yoshitaka; Yamazaki, Takao; Fujisaki, Masahide; Suzuki, Koichiro; Okuda, Motoi.

    1993-11-01

    In order to improve the accuracy and calculating speed of shielding analyses, MCNP 4, a Monte Carlo neutron and photon transport code system, has been parallelized and measured of its efficiency in the highly parallel distributed memory type computer, AP1000. The code has been analyzed statically and dynamically, then the suitable algorithm for parallelization has been determined for the shielding analysis functions of MCNP 4. This includes a strategy where a new history is assigned to the idling processor element dynamically during the execution. Furthermore, to avoid the congestion of communicative processing, the batch concept, processing multi-histories by a unit, has been introduced. By analyzing a sample cask problem with 2,000,000 histories by the AP1000 with 512 processor elements, the 82 % of parallelization efficiency is achieved, and the calculational speed has been estimated to be around 50 times as fast as that of FACOM M-780. (author)

  12. Parameters Design for a Parallel Hybrid Electric Bus Using Regenerative Brake Model

    Directory of Open Access Journals (Sweden)

    Zilin Ma

    2014-01-01

    Full Text Available A design methodology which uses the regenerative brake model is introduced to determine the major system parameters of a parallel electric hybrid bus drive train. Hybrid system parameters mainly include the power rating of internal combustion engine (ICE, gear ratios of transmission, power rating, and maximal torque of motor, power, and capacity of battery. The regenerative model is built in the vehicle model to estimate the regenerative energy in the real road conditions. The design target is to ensure that the vehicle meets the specified vehicle performance, such as speed and acceleration, and at the same time, operates the ICE within an expected speed range. Several pairs of parameters are selected from the result analysis, and the fuel saving result in the road test shows that a 25% reduction is achieved in fuel consumption.

  13. Optimal Golomb Ruler Sequences Generation for Optical WDM Systems: A Novel Parallel Hybrid Multi-objective Bat Algorithm

    Science.gov (United States)

    Bansal, Shonak; Singh, Arun Kumar; Gupta, Neena

    2017-02-01

    In real-life, multi-objective engineering design problems are very tough and time consuming optimization problems due to their high degree of nonlinearities, complexities and inhomogeneity. Nature-inspired based multi-objective optimization algorithms are now becoming popular for solving multi-objective engineering design problems. This paper proposes original multi-objective Bat algorithm (MOBA) and its extended form, namely, novel parallel hybrid multi-objective Bat algorithm (PHMOBA) to generate shortest length Golomb ruler called optimal Golomb ruler (OGR) sequences at a reasonable computation time. The OGRs found their application in optical wavelength division multiplexing (WDM) systems as channel-allocation algorithm to reduce the four-wave mixing (FWM) crosstalk. The performances of both the proposed algorithms to generate OGRs as optical WDM channel-allocation is compared with other existing classical computing and nature-inspired algorithms, including extended quadratic congruence (EQC), search algorithm (SA), genetic algorithms (GAs), biogeography based optimization (BBO) and big bang-big crunch (BB-BC) optimization algorithms. Simulations conclude that the proposed parallel hybrid multi-objective Bat algorithm works efficiently as compared to original multi-objective Bat algorithm and other existing algorithms to generate OGRs for optical WDM systems. The algorithm PHMOBA to generate OGRs, has higher convergence and success rate than original MOBA. The efficiency improvement of proposed PHMOBA to generate OGRs up to 20-marks, in terms of ruler length and total optical channel bandwidth (TBW) is 100 %, whereas for original MOBA is 85 %. Finally the implications for further research are also discussed.

  14. Evaluation of DEC`s GIGAswitch for distributed parallel computing

    Energy Technology Data Exchange (ETDEWEB)

    Chen, H.; Hutchins, J.; Brandt, J.

    1993-10-01

    One of Sandia`s research efforts is to reduce the end-to-end communication delay in a parallel-distributed computing environment. GIGAswitch is DEC`s implementation of a gigabit local area network based on switched FDDI technology. Using the GIGAswitch, the authors intend to minimize the medium access latency suffered by shared-medium FDDI technology. Experimental results show that the GIGAswitch adds 16.5 microseconds of switching and bridging delay to an end-to-end communication. Although the added latency causes a 1.8% throughput degradation and a 5% line efficiency degradation, the availability of dedicated bandwidth is much more than what is available to a workstation on a shared medium. For example, ten directly connected workstations each would have a dedicated bandwidth of 95 Mbps, but if they were sharing the FDDI bandwidth, each would have 10% of the total bandwidth, i.e., less than 10 Mbps. In addition, they have found that when there is no output port contention, the switch`s aggregate bandwidth will scale up to multiples of its port bandwidth. However, with output port contention, the throughput and latency performance suffered significantly. Their mathematical and simulation models indicate that the GIGAswitch line efficiency could be as low as 63% when there are nine input ports contending for the same output port. The data indicate that the delay introduced by contention at the server workstation is 50 times that introduced by the GIGAswitch. The authors conclude that the GIGAswitch meets the performance requirements of today`s high-end workstations and that the switched FDDI technology provides an alternative that utilizes existing workstation interfaces while increasing the aggregate bandwidth. However, because the speed of workstations is increasing by a factor of 2 every 1.5 years, the switched FDDI technology is only good as an interim solution.

  15. Hybrid islanding detection method by using grid impedance estimation in parallel-inverters-based microgrid

    DEFF Research Database (Denmark)

    Ghzaiel, Walid; Jebali-Ben Ghorbal, Manel; Slama-Belkhodja, Ilhem

    2014-01-01

    This paper presents a hybrid islanding detection algorithm integrated on the distributed generation unit more close to the point of common coupling of a Microgrid based on parallel inverters where one of them is responsible to control the system. The method is based on resonance excitation under...... parameters, both resistive and inductive parts, from the injected resonance frequency determination. Finally, the inverter will disconnect the microgrid from the faulty grid and reconnect the parallel inverter system to the controllable distributed system in order to ensure high power quality. This paper...... shows that grid impedance variation detection estimation can be an efficient method for islanding detection in microgrid systems. Theoretical analysis and simulation results are presented to validate the proposed method....

  16. 3-D Hybrid Simulation of Quasi-Parallel Bow Shock and Its Effects on the Magnetosphere

    International Nuclear Information System (INIS)

    Lin, Y.; Wang, X.Y.

    2005-01-01

    A three-dimensional (3-D) global-scale hybrid simulation is carried out for the structure of the quasi-parallel bow shock, in particular the foreshock waves and pressure pulses. The wave evolution and interaction with the dayside magnetosphere are discussed. It is shown that diamagnetic cavities are generated in the turbulent foreshock due to the ion beam plasma interaction, and these compressional pulses lead to strong surface perturbations at the magnetopause and Alfven waves/field line resonance in the magnetosphere

  17. Issues in developing parallel iterative algorithms for solving partial differential equations on a (transputer-based) distributed parallel computing system

    International Nuclear Information System (INIS)

    Rajagopalan, S.; Jethra, A.; Khare, A.N.; Ghodgaonkar, M.D.; Srivenkateshan, R.; Menon, S.V.G.

    1990-01-01

    Issues relating to implementing iterative procedures, for numerical solution of elliptic partial differential equations, on a distributed parallel computing system are discussed. Preliminary investigations show that a speed-up of about 3.85 is achievable on a four transputer pipeline network. (author). 2 figs., 3 a ppendixes., 7 refs

  18. COED Transactions, Vol. X, No. 10, October 1978. Simulation of a Sampled-Data System on a Hybrid Computer.

    Science.gov (United States)

    Mitchell, Eugene E., Ed.

    The simulation of a sampled-data system is described that uses a full parallel hybrid computer. The sampled data system simulated illustrates the proportional-integral-derivative (PID) discrete control of a continuous second-order process representing a stirred-tank. The stirred-tank is simulated using continuous analog components, while PID…

  19. How to Build an AppleSeed: A Parallel Macintosh Cluster for Numerically Intensive Computing

    Science.gov (United States)

    Decyk, V. K.; Dauger, D. E.

    We have constructed a parallel cluster consisting of a mixture of Apple Macintosh G3 and G4 computers running the Mac OS, and have achieved very good performance on numerically intensive, parallel plasma particle-incell simulations. A subset of the MPI message-passing library was implemented in Fortran77 and C. This library enabled us to port code, without modification, from other parallel processors to the Macintosh cluster. Unlike Unix-based clusters, no special expertise in operating systems is required to build and run the cluster. This enables us to move parallel computing from the realm of experts to the main stream of computing.

  20. A Hybrid Shared-Memory Parallel Max-Tree Algorithm for Extreme Dynamic-Range Images.

    Science.gov (United States)

    Moschini, Ugo; Meijster, Arnold; Wilkinson, Michael H F

    2018-03-01

    Max-trees, or component trees, are graph structures that represent the connected components of an image in a hierarchical way. Nowadays, many application fields rely on images with high-dynamic range or floating point values. Efficient sequential algorithms exist to build trees and compute attributes for images of any bit depth. However, we show that the current parallel algorithms perform poorly already with integers at bit depths higher than 16 bits per pixel. We propose a parallel method combining the two worlds of flooding and merging max-tree algorithms. First, a pilot max-tree of a quantized version of the image is built in parallel using a flooding method. Later, this structure is used in a parallel leaf-to-root approach to compute efficiently the final max-tree and to drive the merging of the sub-trees computed by the threads. We present an analysis of the performance both on simulated and actual 2D images and 3D volumes. Execution times are about better than the fastest sequential algorithm and speed-up goes up to on 64 threads.

  1. DIMACS Workshop on Interconnection Networks and Mapping, and Scheduling Parallel Computations

    CERN Document Server

    Rosenberg, Arnold L; Sotteau, Dominique; NSF Science and Technology Center in Discrete Mathematics and Theoretical Computer Science; Interconnection networks and mapping and scheduling parallel computations

    1995-01-01

    The interconnection network is one of the most basic components of a massively parallel computer system. Such systems consist of hundreds or thousands of processors interconnected to work cooperatively on computations. One of the central problems in parallel computing is the task of mapping a collection of processes onto the processors and routing network of a parallel machine. Once this mapping is done, it is critical to schedule computations within and communication among processor from universities and laboratories, as well as practitioners involved in the design, implementation, and application of massively parallel systems. Focusing on interconnection networks of parallel architectures of today and of the near future , the book includes topics such as network topologies,network properties, message routing, network embeddings, network emulation, mappings, and efficient scheduling. inputs for a process are available where and when the process is scheduled to be computed. This book contains the refereed pro...

  2. Stampi: a message passing library for distributed parallel computing. User's guide

    International Nuclear Information System (INIS)

    Imamura, Toshiyuki; Koide, Hiroshi; Takemiya, Hiroshi

    1998-11-01

    A new message passing library, Stampi, has been developed to realize a computation with different kind of parallel computers arbitrarily and making MPI (Message Passing Interface) as an unique interface for communication. Stampi is based on MPI2 specification. It realizes dynamic process creation to different machines and communication between spawned one within the scope of MPI semantics. Vender implemented MPI as a closed system in one parallel machine and did not support both functions; process creation and communication to external machines. Stampi supports both functions and enables us distributed parallel computing. Currently Stampi has been implemented on COMPACS (COMplex PArallel Computer System) introduced in CCSE, five parallel computers and one graphic workstation, and any communication on them can be processed on. (author)

  3. A method of paralleling computer calculation for two-dimensional kinetic plasma model

    International Nuclear Information System (INIS)

    Brazhnik, V.A.; Demchenko, V.V.; Dem'yanov, V.G.; D'yakov, V.E.; Ol'shanskij, V.V.; Panchenko, V.I.

    1987-01-01

    A method for parallel computer calculation and OSIRIS program complex realizing it and designed for numerical plasma simulation by the macroparticle method are described. The calculation can be carried out either with one or simultaneously with two computers BESM-6, that is provided by some package of interacting programs functioning in every computer. Program interaction in every computer is based on event techniques realized in OS DISPAK. Parallel computer calculation with two BESM-6 computers allows to accelerate the computation 1.5 times

  4. A hybrid, massively parallel implementation of a genetic algorithm for optimization of the impact performance of a metal/polymer composite plate

    KAUST Repository

    Narayanan, Kiran

    2012-07-17

    A hybrid parallelization method composed of a coarse-grained genetic algorithm (GA) and fine-grained objective function evaluations is implemented on a heterogeneous computational resource consisting of 16 IBM Blue Gene/P racks, a single x86 cluster node and a high-performance file system. The GA iterator is coupled with a finite-element (FE) analysis code developed in house to facilitate computational steering in order to calculate the optimal impact velocities of a projectile colliding with a polyurea/structural steel composite plate. The FE code is capable of capturing adiabatic shear bands and strain localization, which are typically observed in high-velocity impact applications, and it includes several constitutive models of plasticity, viscoelasticity and viscoplasticity for metals and soft materials, which allow simulation of ductile fracture by void growth. A strong scaling study of the FE code was conducted to determine the optimum number of processes run in parallel. The relative efficiency of the hybrid, multi-level parallelization method is studied in order to determine the parameters for the parallelization. Optimal impact velocities of the projectile calculated using the proposed approach, are reported. © The Author(s) 2012.

  5. Comparative Simulation Study of Production Scheduling in the Hybrid and the Parallel Flow

    Directory of Open Access Journals (Sweden)

    Varela Maria L.R.

    2017-06-01

    Full Text Available Scheduling is one of the most important decisions in production control. An approach is proposed for supporting users to solve scheduling problems, by choosing the combination of physical manufacturing system configuration and the material handling system settings. The approach considers two alternative manufacturing scheduling configurations in a two stage product oriented manufacturing system, exploring the hybrid flow shop (HFS and the parallel flow shop (PFS environments. For illustrating the application of the proposed approach an industrial case from the automotive components industry is studied. The main aim of this research to compare results of study of production scheduling in the hybrid and the parallel flow, taking into account the makespan minimization criterion. Thus the HFS and the PFS performance is compared and analyzed, mainly in terms of the makespan, as the transportation times vary. The study shows that the performance HFS is clearly better when the work stations’ processing times are unbalanced, either in nature or as a consequence of the addition of transport times just to one of the work station processing time but loses advantage, becoming worse than the performance of the PFS configuration when the work stations’ processing times are balanced, either in nature or as a consequence of the addition of transport times added on the work stations’ processing times. This means that physical layout configurations along with the way transport time are including the work stations’ processing times should be carefully taken into consideration due to its influence on the performance reached by both HFS and PFS configurations.

  6. Cooperative storage of shared files in a parallel computing system with dynamic block size

    Science.gov (United States)

    Bent, John M.; Faibish, Sorin; Grider, Gary

    2015-11-10

    Improved techniques are provided for parallel writing of data to a shared object in a parallel computing system. A method is provided for storing data generated by a plurality of parallel processes to a shared object in a parallel computing system. The method is performed by at least one of the processes and comprises: dynamically determining a block size for storing the data; exchanging a determined amount of the data with at least one additional process to achieve a block of the data having the dynamically determined block size; and writing the block of the data having the dynamically determined block size to a file system. The determined block size comprises, e.g., a total amount of the data to be stored divided by the number of parallel processes. The file system comprises, for example, a log structured virtual parallel file system, such as a Parallel Log-Structured File System (PLFS).

  7. The specification of Stampi, a message passing library for distributed parallel computing

    International Nuclear Information System (INIS)

    Imamura, Toshiyuki; Takemiya, Hiroshi; Koide, Hiroshi

    2000-03-01

    At CCSE, Center for Promotion of Computational Science and Engineering, a new message passing library for heterogeneous and distributed parallel computing has been developed, and it is called as Stampi. Stampi enables us to communicate between any combination of parallel computers as well as workstations. Currently, a Stampi system is constructed from Stampi library and Stampi/Java. It provides functions to connect a Stampi application with not only those on COMPACS, COMplex Parallel Computer System, but also applets which work on WWW browsers. This report summarizes the specifications of Stampi and details the development of its system. (author)

  8. Optimal Battery Utilization Over Lifetime for Parallel Hybrid Electric Vehicle to Maximize Fuel Economy

    Energy Technology Data Exchange (ETDEWEB)

    Patil, Chinmaya; Naghshtabrizi, Payam; Verma, Rajeev; Tang, Zhijun; Smith, Kandler; Shi, Ying

    2016-08-01

    This paper presents a control strategy to maximize fuel economy of a parallel hybrid electric vehicle over a target life of the battery. Many approaches to maximizing fuel economy of parallel hybrid electric vehicle do not consider the effect of control strategy on the life of the battery. This leads to an oversized and underutilized battery. There is a trade-off between how aggressively to use and 'consume' the battery versus to use the engine and consume fuel. The proposed approach addresses this trade-off by exploiting the differences in the fast dynamics of vehicle power management and slow dynamics of battery aging. The control strategy is separated into two parts, (1) Predictive Battery Management (PBM), and (2) Predictive Power Management (PPM). PBM is the higher level control with slow update rate, e.g. once per month, responsible for generating optimal set points for PPM. The considered set points in this paper are the battery power limits and State Of Charge (SOC). The problem of finding the optimal set points over the target battery life that minimize engine fuel consumption is solved using dynamic programming. PPM is the lower level control with high update rate, e.g. a second, responsible for generating the optimal HEV energy management controls and is implemented using model predictive control approach. The PPM objective is to find the engine and battery power commands to achieve the best fuel economy given the battery power and SOC constraints imposed by PBM. Simulation results with a medium duty commercial hybrid electric vehicle and the proposed two-level hierarchical control strategy show that the HEV fuel economy is maximized while meeting a specified target battery life. On the other hand, the optimal unconstrained control strategy achieves marginally higher fuel economy, but fails to meet the target battery life.

  9. MIP Models and Hybrid Algorithms for Simultaneous Job Splitting and Scheduling on Unrelated Parallel Machines

    Science.gov (United States)

    Ozmutlu, H. Cenk

    2014-01-01

    We developed mixed integer programming (MIP) models and hybrid genetic-local search algorithms for the scheduling problem of unrelated parallel machines with job sequence and machine-dependent setup times and with job splitting property. The first contribution of this paper is to introduce novel algorithms which make splitting and scheduling simultaneously with variable number of subjobs. We proposed simple chromosome structure which is constituted by random key numbers in hybrid genetic-local search algorithm (GAspLA). Random key numbers are used frequently in genetic algorithms, but it creates additional difficulty when hybrid factors in local search are implemented. We developed algorithms that satisfy the adaptation of results of local search into the genetic algorithms with minimum relocation operation of genes' random key numbers. This is the second contribution of the paper. The third contribution of this paper is three developed new MIP models which are making splitting and scheduling simultaneously. The fourth contribution of this paper is implementation of the GAspLAMIP. This implementation let us verify the optimality of GAspLA for the studied combinations. The proposed methods are tested on a set of problems taken from the literature and the results validate the effectiveness of the proposed algorithms. PMID:24977204

  10. MIP models and hybrid algorithms for simultaneous job splitting and scheduling on unrelated parallel machines.

    Science.gov (United States)

    Eroglu, Duygu Yilmaz; Ozmutlu, H Cenk

    2014-01-01

    We developed mixed integer programming (MIP) models and hybrid genetic-local search algorithms for the scheduling problem of unrelated parallel machines with job sequence and machine-dependent setup times and with job splitting property. The first contribution of this paper is to introduce novel algorithms which make splitting and scheduling simultaneously with variable number of subjobs. We proposed simple chromosome structure which is constituted by random key numbers in hybrid genetic-local search algorithm (GAspLA). Random key numbers are used frequently in genetic algorithms, but it creates additional difficulty when hybrid factors in local search are implemented. We developed algorithms that satisfy the adaptation of results of local search into the genetic algorithms with minimum relocation operation of genes' random key numbers. This is the second contribution of the paper. The third contribution of this paper is three developed new MIP models which are making splitting and scheduling simultaneously. The fourth contribution of this paper is implementation of the GAspLAMIP. This implementation let us verify the optimality of GAspLA for the studied combinations. The proposed methods are tested on a set of problems taken from the literature and the results validate the effectiveness of the proposed algorithms.

  11. Portable programming on parallel/networked computers using the Application Portable Parallel Library (APPL)

    Science.gov (United States)

    Quealy, Angela; Cole, Gary L.; Blech, Richard A.

    1993-01-01

    The Application Portable Parallel Library (APPL) is a subroutine-based library of communication primitives that is callable from applications written in FORTRAN or C. APPL provides a consistent programmer interface to a variety of distributed and shared-memory multiprocessor MIMD machines. The objective of APPL is to minimize the effort required to move parallel applications from one machine to another, or to a network of homogeneous machines. APPL encompasses many of the message-passing primitives that are currently available on commercial multiprocessor systems. This paper describes APPL (version 2.3.1) and its usage, reports the status of the APPL project, and indicates possible directions for the future. Several applications using APPL are discussed, as well as performance and overhead results.

  12. Event Based Simulator for Parallel Computing over the Wide Area Network for Real Time Visualization

    Science.gov (United States)

    Sundararajan, Elankovan; Harwood, Aaron; Kotagiri, Ramamohanarao; Satria Prabuwono, Anton

    As the computational requirement of applications in computational science continues to grow tremendously, the use of computational resources distributed across the Wide Area Network (WAN) becomes advantageous. However, not all applications can be executed over the WAN due to communication overhead that can drastically slowdown the computation. In this paper, we introduce an event based simulator to investigate the performance of parallel algorithms executed over the WAN. The event based simulator known as SIMPAR (SIMulator for PARallel computation), simulates the actual computations and communications involved in parallel computation over the WAN using time stamps. Visualization of real time applications require steady stream of processed data flow for visualization purposes. Hence, SIMPAR may prove to be a valuable tool to investigate types of applications and computing resource requirements to provide uninterrupted flow of processed data for real time visualization purposes. The results obtained from the simulation show concurrence with the expected performance using the L-BSP model.

  13. A Parallel and Distributed Surrogate Model Implementation for Computational Steering

    KAUST Repository

    Butnaru, Daniel; Buse, Gerrit; Pfluger, Dirk

    2012-01-01

    of the input parameters. Such an exploration process is however not possible if the simulation is computationally too expensive. For these cases we present in this paper a scalable computational steering approach utilizing a fast surrogate model as substitute

  14. Using parallel computing in modeling and optimization of mineral ...

    African Journals Online (AJOL)

    Then to solve ultimate pit limit problem it is required to find such a sub graph in a graph whose sum of weights will be maximal. One of the possible solutions of this problem is using genetic algorithms. We use a ... Details of implementation parallel genetic algorithm for searching open pit limits are provided. Comparison with ...

  15. Computation of watersheds based on parallel graph algorithms

    NARCIS (Netherlands)

    Meijster, A.; Roerdink, J.B.T.M.; Maragos, P; Schafer, RW; Butt, MA

    1996-01-01

    In this paper the implementation of a parallel watershed algorithm is described. The algorithm has been implemented on a Cray J932, which is a shared memory architecture with 32 processors. The watershed transform has generally been considered to be inherently sequential, but recently a few research

  16. Locating hardware faults in a data communications network of a parallel computer

    Science.gov (United States)

    Archer, Charles J.; Megerian, Mark G.; Ratterman, Joseph D.; Smith, Brian E.

    2010-01-12

    Hardware faults location in a data communications network of a parallel computer. Such a parallel computer includes a plurality of compute nodes and a data communications network that couples the compute nodes for data communications and organizes the compute node as a tree. Locating hardware faults includes identifying a next compute node as a parent node and a root of a parent test tree, identifying for each child compute node of the parent node a child test tree having the child compute node as root, running a same test suite on the parent test tree and each child test tree, and identifying the parent compute node as having a defective link connected from the parent compute node to a child compute node if the test suite fails on the parent test tree and succeeds on all the child test trees.

  17. Wing-Body Aeroelasticity Using Finite-Difference Fluid/Finite-Element Structural Equations on Parallel Computers

    Science.gov (United States)

    Byun, Chansup; Guruswamy, Guru P.; Kutler, Paul (Technical Monitor)

    1994-01-01

    In recent years significant advances have been made for parallel computers in both hardware and software. Now parallel computers have become viable tools in computational mechanics. Many application codes developed on conventional computers have been modified to benefit from parallel computers. Significant speedups in some areas have been achieved by parallel computations. For single-discipline use of both fluid dynamics and structural dynamics, computations have been made on wing-body configurations using parallel computers. However, only a limited amount of work has been completed in combining these two disciplines for multidisciplinary applications. The prime reason is the increased level of complication associated with a multidisciplinary approach. In this work, procedures to compute aeroelasticity on parallel computers using direct coupling of fluid and structural equations will be investigated for wing-body configurations. The parallel computer selected for computations is an Intel iPSC/860 computer which is a distributed-memory, multiple-instruction, multiple data (MIMD) computer with 128 processors. In this study, the computational efficiency issues of parallel integration of both fluid and structural equations will be investigated in detail. The fluid and structural domains will be modeled using finite-difference and finite-element approaches, respectively. Results from the parallel computer will be compared with those from the conventional computers using a single processor. This study will provide an efficient computational tool for the aeroelastic analysis of wing-body structures on MIMD type parallel computers.

  18. A parallel finite-difference method for computational aerodynamics

    International Nuclear Information System (INIS)

    Swisshelm, J.M.

    1989-01-01

    A finite-difference scheme for solving complex three-dimensional aerodynamic flow on parallel-processing supercomputers is presented. The method consists of a basic flow solver with multigrid convergence acceleration, embedded grid refinements, and a zonal equation scheme. Multitasking and vectorization have been incorporated into the algorithm. Results obtained include multiprocessed flow simulations from the Cray X-MP and Cray-2. Speedups as high as 3.3 for the two-dimensional case and 3.5 for segments of the three-dimensional case have been achieved on the Cray-2. The entire solver attained a factor of 2.7 improvement over its unitasked version on the Cray-2. The performance of the parallel algorithm on each machine is analyzed. 14 refs

  19. The ELPA library: scalable parallel eigenvalue solutions for electronic structure theory and computational science.

    Science.gov (United States)

    Marek, A; Blum, V; Johanni, R; Havu, V; Lang, B; Auckenthaler, T; Heinecke, A; Bungartz, H-J; Lederer, H

    2014-05-28

    Obtaining the eigenvalues and eigenvectors of large matrices is a key problem in electronic structure theory and many other areas of computational science. The computational effort formally scales as O(N(3)) with the size of the investigated problem, N (e.g. the electron count in electronic structure theory), and thus often defines the system size limit that practical calculations cannot overcome. In many cases, more than just a small fraction of the possible eigenvalue/eigenvector pairs is needed, so that iterative solution strategies that focus only on a few eigenvalues become ineffective. Likewise, it is not always desirable or practical to circumvent the eigenvalue solution entirely. We here review some current developments regarding dense eigenvalue solvers and then focus on the Eigenvalue soLvers for Petascale Applications (ELPA) library, which facilitates the efficient algebraic solution of symmetric and Hermitian eigenvalue problems for dense matrices that have real-valued and complex-valued matrix entries, respectively, on parallel computer platforms. ELPA addresses standard as well as generalized eigenvalue problems, relying on the well documented matrix layout of the Scalable Linear Algebra PACKage (ScaLAPACK) library but replacing all actual parallel solution steps with subroutines of its own. For these steps, ELPA significantly outperforms the corresponding ScaLAPACK routines and proprietary libraries that implement the ScaLAPACK interface (e.g. Intel's MKL). The most time-critical step is the reduction of the matrix to tridiagonal form and the corresponding backtransformation of the eigenvectors. ELPA offers both a one-step tridiagonalization (successive Householder transformations) and a two-step transformation that is more efficient especially towards larger matrices and larger numbers of CPU cores. ELPA is based on the MPI standard, with an early hybrid MPI-OpenMPI implementation available as well. Scalability beyond 10,000 CPU cores for problem

  20. Parallel Computational Intelligence-Based Multi-Camera Surveillance System

    OpenAIRE

    Orts-Escolano, Sergio; Garcia-Rodriguez, Jose; Morell, Vicente; Cazorla, Miguel; Azorin-Lopez, Jorge; García-Chamizo, Juan Manuel

    2014-01-01

    In this work, we present a multi-camera surveillance system based on the use of self-organizing neural networks to represent events on video. The system processes several tasks in parallel using GPUs (graphic processor units). It addresses multiple vision tasks at various levels, such as segmentation, representation or characterization, analysis and monitoring of the movement. These features allow the construction of a robust representation of the environment and interpret the behavior of mob...

  1. Depth-Averaged Non-Hydrostatic Hydrodynamic Model Using a New Multithreading Parallel Computing Method

    Directory of Open Access Journals (Sweden)

    Ling Kang

    2017-03-01

    Full Text Available Compared to the hydrostatic hydrodynamic model, the non-hydrostatic hydrodynamic model can accurately simulate flows that feature vertical accelerations. The model’s low computational efficiency severely restricts its wider application. This paper proposes a non-hydrostatic hydrodynamic model based on a multithreading parallel computing method. The horizontal momentum equation is obtained by integrating the Navier–Stokes equations from the bottom to the free surface. The vertical momentum equation is approximated by the Keller-box scheme. A two-step method is used to solve the model equations. A parallel strategy based on block decomposition computation is utilized. The original computational domain is subdivided into two subdomains that are physically connected via a virtual boundary technique. Two sub-threads are created and tasked with the computation of the two subdomains. The producer–consumer model and the thread lock technique are used to achieve synchronous communication between sub-threads. The validity of the model was verified by solitary wave propagation experiments over a flat bottom and slope, followed by two sinusoidal wave propagation experiments over submerged breakwater. The parallel computing method proposed here was found to effectively enhance computational efficiency and save 20%–40% computation time compared to serial computing. The parallel acceleration rate and acceleration efficiency are approximately 1.45% and 72%, respectively. The parallel computing method makes a contribution to the popularization of non-hydrostatic models.

  2. MEDUSA - An overset grid flow solver for network-based parallel computer systems

    Science.gov (United States)

    Smith, Merritt H.; Pallis, Jani M.

    1993-01-01

    Continuing improvement in processing speed has made it feasible to solve the Reynolds-Averaged Navier-Stokes equations for simple three-dimensional flows on advanced workstations. Combining multiple workstations into a network-based heterogeneous parallel computer allows the application of programming principles learned on MIMD (Multiple Instruction Multiple Data) distributed memory parallel computers to the solution of larger problems. An overset-grid flow solution code has been developed which uses a cluster of workstations as a network-based parallel computer. Inter-process communication is provided by the Parallel Virtual Machine (PVM) software. Solution speed equivalent to one-third of a Cray-YMP processor has been achieved from a cluster of nine commonly used engineering workstation processors. Load imbalance and communication overhead are the principal impediments to parallel efficiency in this application.

  3. Parallel computation of Euler and Navier-Stokes flows

    International Nuclear Information System (INIS)

    Swisshelm, J.M.; Johnson, G.M.; Kumar, S.P.

    1986-01-01

    A multigrid technique useful for accelerating the convergence of Euler and Navier-Stokes flow computations has been restructured to improve its performance on both SIMD and MIMD computers. The new algorithm allows both the construction of longer coarse-grid vectors and the multitasking of entire grids. Computational results are presented for the CDC Cyber 205, Cray X-MP, and Denelcor HEP I. 15 references

  4. COMPSs-Mobile: parallel programming for mobile-cloud computing

    OpenAIRE

    Lordan Gomis, Francesc-Josep; Badia Sala, Rosa Maria

    2016-01-01

    The advent of Cloud and the popularization of mobile devices have led us to a shift in computing access. Computing users will have an interaction display while the real computation will be performed remotely, in the Cloud. COMPSs-Mobile is a framework that aims to ease the development of energy-efficient and high-performing applications for this environment. The framework provides an infrastructure-unaware programming model that allows developers to code regular Android applications that, ...

  5. Engine-start Control Strategy of P2 Parallel Hybrid Electric Vehicle

    Science.gov (United States)

    Xiangyang, Xu; Siqi, Zhao; Peng, Dong

    2017-12-01

    A smooth and fast engine-start process is important to parallel hybrid electric vehicles with an electric motor mounted in front of the transmission. However, there are some challenges during the engine-start control. Firstly, the electric motor must simultaneously provide a stable driving torque to ensure the drivability and a compensative torque to drag the engine before ignition. Secondly, engine-start time is a trade-off control objective because both fast start and smooth start have to be considered. To solve these problems, this paper first analyzed the resistance of the engine start process, and established a physic model in MATLAB/Simulink. Then a model-based coordinated control strategy among engine, motor and clutch was developed. Two basic control strategy during fast start and smooth start process were studied. Simulation results showed that the control objectives were realized by applying given control strategies, which can meet different requirement from the driver.

  6. Hybrid Monte Carlo methods in computational finance

    NARCIS (Netherlands)

    Leitao Rodriguez, A.

    2017-01-01

    Monte Carlo methods are highly appreciated and intensively employed in computational finance in the context of financial derivatives valuation or risk management. The method offers valuable advantages like flexibility, easy interpretation and straightforward implementation. Furthermore, the

  7. Parallel computer calculation of quantum spin lattices; Calcul de chaines de spins quantiques sur ordinateur parallele

    Energy Technology Data Exchange (ETDEWEB)

    Lamarcq, J. [Service de Physique Theorique, CEA Centre d`Etudes de Saclay, 91 - Gif-sur-Yvette (France)

    1998-07-10

    Numerical simulation allows the theorists to convince themselves about the validity of the models they use. Particularly by simulating the spin lattices one can judge about the validity of a conjecture. Simulating a system defined by a large number of degrees of freedom requires highly sophisticated machines. This study deals with modelling the magnetic interactions between the ions of a crystal. Many exact results have been found for spin 1/2 systems but not for systems of other spins for which many simulation have been carried out. The interest for simulations has been renewed by the Haldane`s conjecture stipulating the existence of a energy gap between the ground state and the first excited states of a spin 1 lattice. The existence of this gap has been experimentally demonstrated. This report contains the following four chapters: 1. Spin systems; 2. Calculation of eigenvalues; 3. Programming; 4. Parallel calculation 14 refs., 6 figs.

  8. A kind of video image digitizing circuit based on computer parallel port

    International Nuclear Information System (INIS)

    Wang Yi; Tang Le; Cheng Jianping; Li Yuanjing; Zhang Binquan

    2003-01-01

    A kind of video images digitizing circuit based on parallel port was developed to digitize the flash x ray images in our Multi-Channel Digital Flash X ray Imaging System. The circuit can digitize the video images and store in static memory. The digital images can be transferred to computer through parallel port and can be displayed, processed and stored. (authors)

  9. Mathematical Methods and Algorithms of Mobile Parallel Computing on the Base of Multi-core Processors

    Directory of Open Access Journals (Sweden)

    Alexander B. Bakulev

    2012-11-01

    Full Text Available This article deals with mathematical models and algorithms, providing mobility of sequential programs parallel representation on the high-level language, presents formal model of operation environment processes management, based on the proposed model of programs parallel representation, presenting computation process on the base of multi-core processors.

  10. Hybrid MPI-OpenMP Parallelism in the ONETEP Linear-Scaling Electronic Structure Code: Application to the Delamination of Cellulose Nanofibrils.

    Science.gov (United States)

    Wilkinson, Karl A; Hine, Nicholas D M; Skylaris, Chris-Kriton

    2014-11-11

    We present a hybrid MPI-OpenMP implementation of Linear-Scaling Density Functional Theory within the ONETEP code. We illustrate its performance on a range of high performance computing (HPC) platforms comprising shared-memory nodes with fast interconnect. Our work has focused on applying OpenMP parallelism to the routines which dominate the computational load, attempting where possible to parallelize different loops from those already parallelized within MPI. This includes 3D FFT box operations, sparse matrix algebra operations, calculation of integrals, and Ewald summation. While the underlying numerical methods are unchanged, these developments represent significant changes to the algorithms used within ONETEP to distribute the workload across CPU cores. The new hybrid code exhibits much-improved strong scaling relative to the MPI-only code and permits calculations with a much higher ratio of cores to atoms. These developments result in a significantly shorter time to solution than was possible using MPI alone and facilitate the application of the ONETEP code to systems larger than previously feasible. We illustrate this with benchmark calculations from an amyloid fibril trimer containing 41,907 atoms. We use the code to study the mechanism of delamination of cellulose nanofibrils when undergoing sonification, a process which is controlled by a large number of interactions that collectively determine the structural properties of the fibrils. Many energy evaluations were needed for these simulations, and as these systems comprise up to 21,276 atoms this would not have been feasible without the developments described here.

  11. Performance of a Sequential and Parallel Computational Fluid Dynamic (CFD) Solver on a Missile Body Configuration

    National Research Council Canada - National Science Library

    Hisley, Dixie

    1999-01-01

    .... The goals of this report are: (1) to investigate the performance of message passing and loop level parallelization techniques, as they were implemented in the computational fluid dynamics (CFD...

  12. Massively Parallel Computing at Sandia and Its Application to National Defense

    National Research Council Canada - National Science Library

    Dosanjh, Sudip

    1991-01-01

    Two years ago, researchers at Sandia National Laboratories showed that a massively parallel computer with 1024 processors could solve scientific problems more than 1000 times faster than a single processor...

  13. Computational cost of isogeometric multi-frontal solvers on parallel distributed memory machines

    KAUST Repository

    Woźniak, Maciej; Paszyński, Maciej R.; Pardo, D.; Dalcin, Lisandro; Calo, Victor M.

    2015-01-01

    This paper derives theoretical estimates of the computational cost for isogeometric multi-frontal direct solver executed on parallel distributed memory machines. We show theoretically that for the Cp-1 global continuity of the isogeometric solution

  14. Computer model of a reverberant and parallel circuit coupling

    Science.gov (United States)

    Kalil, Camila de Andrade; de Castro, Maria Clícia Stelling; Cortez, Célia Martins

    2017-11-01

    The objective of the present study was to deepen the knowledge about the functioning of the neural circuits by implementing a signal transmission model using the Graph Theory in a small network of neurons composed of an interconnected reverberant and parallel circuit, in order to investigate the processing of the signals in each of them and the effects on the output of the network. For this, a program was developed in C language and simulations were done using neurophysiological data obtained in the literature.

  15. Paging memory from random access memory to backing storage in a parallel computer

    Science.gov (United States)

    Archer, Charles J; Blocksome, Michael A; Inglett, Todd A; Ratterman, Joseph D; Smith, Brian E

    2013-05-21

    Paging memory from random access memory (`RAM`) to backing storage in a parallel computer that includes a plurality of compute nodes, including: executing a data processing application on a virtual machine operating system in a virtual machine on a first compute node; providing, by a second compute node, backing storage for the contents of RAM on the first compute node; and swapping, by the virtual machine operating system in the virtual machine on the first compute node, a page of memory from RAM on the first compute node to the backing storage on the second compute node.

  16. Applications of parallel computer architectures to the real-time simulation of nuclear power systems

    International Nuclear Information System (INIS)

    Doster, J.M.; Sills, E.D.

    1988-01-01

    In this paper the authors report on efforts to utilize parallel computer architectures for the thermal-hydraulic simulation of nuclear power systems and current research efforts toward the development of advanced reactor operator aids and control systems based on this new technology. Many aspects of reactor thermal-hydraulic calculations are inherently parallel, and the computationally intensive portions of these calculations can be effectively implemented on modern computers. Timing studies indicate faster-than-real-time, high-fidelity physics models can be developed when the computational algorithms are designed to take advantage of the computer's architecture. These capabilities allow for the development of novel control systems and advanced reactor operator aids. Coupled with an integral real-time data acquisition system, evolving parallel computer architectures can provide operators and control room designers improved control and protection capabilities. Current research efforts are currently under way in this area

  17. TME (Task Mapping Editor): tool for executing distributed parallel computing. TME user's manual

    International Nuclear Information System (INIS)

    Takemiya, Hiroshi; Yamagishi, Nobuhiro; Imamura, Toshiyuki

    2000-03-01

    At the Center for Promotion of Computational Science and Engineering, a software environment PPExe has been developed to support scientific computing on a parallel computer cluster (distributed parallel scientific computing). TME (Task Mapping Editor) is one of components of the PPExe and provides a visual programming environment for distributed parallel scientific computing. Users can specify data dependence among tasks (programs) visually as a data flow diagram and map these tasks onto computers interactively through GUI of TME. The specified tasks are processed by other components of PPExe such as Meta-scheduler, RIM (Resource Information Monitor), and EMS (Execution Management System) according to the execution order of these tasks determined by TME. In this report, we describe the usage of TME. (author)

  18. Reducing power consumption while synchronizing a plurality of compute nodes during execution of a parallel application

    Science.gov (United States)

    Archer, Charles J [Rochester, MN; Blocksome, Michael A [Rochester, MN; Peters, Amanda E [Cambridge, MA; Ratterman, Joseph D [Rochester, MN; Smith, Brian E [Rochester, MN

    2012-04-17

    Methods, apparatus, and products are disclosed for reducing power consumption while synchronizing a plurality of compute nodes during execution of a parallel application that include: beginning, by each compute node, performance of a blocking operation specified by the parallel application, each compute node beginning the blocking operation asynchronously with respect to the other compute nodes; reducing, for each compute node, power to one or more hardware components of that compute node in response to that compute node beginning the performance of the blocking operation; and restoring, for each compute node, the power to the hardware components having power reduced in response to all of the compute nodes beginning the performance of the blocking operation.

  19. Reducing power consumption while synchronizing a plurality of compute nodes during execution of a parallel application

    Science.gov (United States)

    Archer, Charles J [Rochester, MN; Blocksome, Michael A [Rochester, MN; Peters, Amanda A [Rochester, MN; Ratterman, Joseph D [Rochester, MN; Smith, Brian E [Rochester, MN

    2012-01-10

    Methods, apparatus, and products are disclosed for reducing power consumption while synchronizing a plurality of compute nodes during execution of a parallel application that include: beginning, by each compute node, performance of a blocking operation specified by the parallel application, each compute node beginning the blocking operation asynchronously with respect to the other compute nodes; reducing, for each compute node, power to one or more hardware components of that compute node in response to that compute node beginning the performance of the blocking operation; and restoring, for each compute node, the power to the hardware components having power reduced in response to all of the compute nodes beginning the performance of the blocking operation.

  20. Line-plane broadcasting in a data communications network of a parallel computer

    Science.gov (United States)

    Archer, Charles J.; Berg, Jeremy E.; Blocksome, Michael A.; Smith, Brian E.

    2010-06-08

    Methods, apparatus, and products are disclosed for line-plane broadcasting in a data communications network of a parallel computer, the parallel computer comprising a plurality of compute nodes connected together through the network, the network optimized for point to point data communications and characterized by at least a first dimension, a second dimension, and a third dimension, that include: initiating, by a broadcasting compute node, a broadcast operation, including sending a message to all of the compute nodes along an axis of the first dimension for the network; sending, by each compute node along the axis of the first dimension, the message to all of the compute nodes along an axis of the second dimension for the network; and sending, by each compute node along the axis of the second dimension, the message to all of the compute nodes along an axis of the third dimension for the network.

  1. Induced superconductivity in Nb/InAs-hybrid structures in parallel and perpendicular magnetic fields

    International Nuclear Information System (INIS)

    Rohlfing, Franziska

    2007-07-01

    The thesis in hand investigates experimentally Josephson contacts based on Nb/InAs-hybrid structures. The experiments discussed here were done on samples of different width of the Josephson contacts (between 500 nm and 2000 nm). They were realized by means of different methods of the semiconductor technology. The length of the Josephson contacts was about 600 nm and, as superconducting material, niobium was used. Both critical current and characteristics in the resistive regime (excess-current and multiple Andreev reflection) are studied as a function of temperature and external magnetic fields. Measurements in perpendicular and parallel magnetic fields with respect to the plain of the two-dimensional electron gas, are presented. The Andreev reflection amplitude determining the supercurrent is calculated by means of the Greens functions of the two-dimensional electron gas beneath the superconductors which is modified by the proximity effect. From the fit to the data with this model, the transparency of the boundary between the superconductor and the two-dimensional electron gas can be estimated to be about 0.1. The transparency of the point contacts in the two-dimensional electrons gas can be determined independently from the Josephson junction width dependence of the normal resistance (T=10 K). This transparency amounts to about 0.8 in the examined samples. The measurements of the critical current in a magnetic field perpendicular to the two-dimensional electron gas show a Fraunhofer pattern. In order to study the transition from perpendicular orientation into parallel orientation, measurements of the critical current as a function of the magnetic field were done for different angles. In the resistive regime, the excess current measurements in the magnetic field show a very interesting behaviour: In parallel magnetic fields, the excess current becomes zero at about 2.5 T. In perpendicular magnetic field however, the excess current is strongly suppressed below 30 m

  2. Input/output routines for a hybrid computer

    International Nuclear Information System (INIS)

    Izume, Akitada; Yodo, Terutaka; Sakama, Iwao; Sakamoto, Akira; Miyake, Osamu

    1976-05-01

    This report is concerned with data processing programs for a hybrid computer system. Especially pre-data processing of magnetic tapes which are recorded during the dynamic experiment by FACOM 270/25 data logging system in the 50 MW steam generator test facility is described in detail. The magnetic tape is a most effective recording medium for data logging, but recording formats of the magnetic tape are different between data logging systems. In our section, the final data analyses are performed by data in the disk of EAI-690 hybrid computer system, and to transfer all required information in magnetic tapes to the disk, the magnetic tape editing and data transit are necessary by sub-computer NEAC-3200 system. This report is written for users as a manual and reference hand book of pre-data processing between different type computers. (auth.)

  3. Ocean Modeling and Visualization on Massively Parallel Computer

    Science.gov (United States)

    Chao, Yi; Li, P. Peggy; Wang, Ping; Katz, Daniel S.; Cheng, Benny N.

    1997-01-01

    Climate modeling is one of the grand challenges of computational science, and ocean modeling plays an important role in both understanding the current climatic conditions and predicting future climate change.

  4. Parallel Computational Intelligence-Based Multi-Camera Surveillance System

    Directory of Open Access Journals (Sweden)

    Sergio Orts-Escolano

    2014-04-01

    Full Text Available In this work, we present a multi-camera surveillance system based on the use of self-organizing neural networks to represent events on video. The system processes several tasks in parallel using GPUs (graphic processor units. It addresses multiple vision tasks at various levels, such as segmentation, representation or characterization, analysis and monitoring of the movement. These features allow the construction of a robust representation of the environment and interpret the behavior of mobile agents in the scene. It is also necessary to integrate the vision module into a global system that operates in a complex environment by receiving images from multiple acquisition devices at video frequency. Offering relevant information to higher level systems, monitoring and making decisions in real time, it must accomplish a set of requirements, such as: time constraints, high availability, robustness, high processing speed and re-configurability. We have built a system able to represent and analyze the motion in video acquired by a multi-camera network and to process multi-source data in parallel on a multi-GPU architecture.

  5. Computer code MLCOSP for multiple-correlation and spectrum analysis with a hybrid computer

    International Nuclear Information System (INIS)

    Oguma, Ritsuo; Fujii, Yoshio; Usui, Hozumi; Watanabe, Koichi

    1975-10-01

    Usage of the computer code MLCOSP(Multiple Correlation and Spectrum) developed is described for a hybrid computer installed in JAERI Functions of the hybrid computer and its terminal devices are utilized ingeniously in the code to reduce complexity of the data handling which occurrs in analysis of the multivariable experimental data and to perform the analysis in perspective. Features of the code are as follows; Experimental data can be fed to the digital computer through the analog part of the hybrid computer by connecting with a data recorder. The computed results are displayed in figures, and hardcopies are taken when necessary. Series-messages to the code are shown on the terminal, so man-machine communication is possible. And further the data can be put in through a keyboard, so case study according to the results of analysis is possible. (auth.)

  6. Computational hybrid anthropometric paediatric phantom library for internal radiation dosimetry

    DEFF Research Database (Denmark)

    Xie, Tianwu; Kuster, Niels; Zaidi, Habib

    2017-01-01

    for children demonstrated that they follow the same trend when correlated with age. The constructed hybrid computational phantom library opens up the prospect of comprehensive radiation dosimetry calculations and risk assessment for the paediatric population of different age groups and diverse anthropometric...

  7. Hybrid computer optimization of systems with random parameters

    Science.gov (United States)

    White, R. C., Jr.

    1972-01-01

    A hybrid computer Monte Carlo technique for the simulation and optimization of systems with random parameters is presented. The method is applied to the simultaneous optimization of the means and variances of two parameters in the radar-homing missile problem treated by McGhee and Levine.

  8. On efficiency of fire simulation realization: parallelization with greater number of computational meshes

    Science.gov (United States)

    Valasek, Lukas; Glasa, Jan

    2017-12-01

    Current fire simulation systems are capable to utilize advantages of high-performance computer (HPC) platforms available and to model fires efficiently in parallel. In this paper, efficiency of a corridor fire simulation on a HPC computer cluster is discussed. The parallel MPI version of Fire Dynamics Simulator is used for testing efficiency of selected strategies of allocation of computational resources of the cluster using a greater number of computational cores. Simulation results indicate that if the number of cores used is not equal to a multiple of the total number of cluster node cores there are allocation strategies which provide more efficient calculations.

  9. Parallel diffusion calculation for the PHAETON on-line multiprocessor computer

    International Nuclear Information System (INIS)

    Collart, J.M.; Fedon-Magnaud, C.; Lautard, J.J.

    1987-04-01

    The aim of the PHAETON project is the design of an on-line computer in order to increase the immediate knowledge of the main operating and safety parameters in power plants. A significant stage is the computation of the three dimensional flux distribution. For cost and safety reason a computer based on a parallel microprocessor architecture has been studied. This paper presents a first approach to parallelized three dimensional diffusion calculation. A computing software has been written and built in a four processors demonstrator. We present the realization in progress, concerning the final equipment. 8 refs

  10. Fencing network direct memory access data transfers in a parallel active messaging interface of a parallel computer

    Science.gov (United States)

    Blocksome, Michael A.; Mamidala, Amith R.

    2015-07-07

    Fencing direct memory access (`DMA`) data transfers in a parallel active messaging interface (`PAMI`) of a parallel computer, the PAMI including data communications endpoints, each endpoint including specifications of a client, a context, and a task, the endpoints coupled for data communications through the PAMI and through DMA controllers operatively coupled to a deterministic data communications network through which the DMA controllers deliver data communications deterministically, including initiating execution through the PAMI of an ordered sequence of active DMA instructions for DMA data transfers between two endpoints, effecting deterministic DMA data transfers through a DMA controller and the deterministic data communications network; and executing through the PAMI, with no FENCE accounting for DMA data transfers, an active FENCE instruction, the FENCE instruction completing execution only after completion of all DMA instructions initiated prior to execution of the FENCE instruction for DMA data transfers between the two endpoints.

  11. HTMT-class Latency Tolerant Parallel Architecture for Petaflops Scale Computation

    Science.gov (United States)

    Sterling, Thomas; Bergman, Larry

    2000-01-01

    Computational Aero Sciences and other numeric intensive computation disciplines demand computing throughputs substantially greater than the Teraflops scale systems only now becoming available. The related fields of fluids, structures, thermal, combustion, and dynamic controls are among the interdisciplinary areas that in combination with sufficient resolution and advanced adaptive techniques may force performance requirements towards Petaflops. This will be especially true for compute intensive models such as Navier-Stokes are or when such system models are only part of a larger design optimization computation involving many design points. Yet recent experience with conventional MPP configurations comprising commodity processing and memory components has shown that larger scale frequently results in higher programming difficulty and lower system efficiency. While important advances in system software and algorithms techniques have had some impact on efficiency and programmability for certain classes of problems, in general it is unlikely that software alone will resolve the challenges to higher scalability. As in the past, future generations of high-end computers may require a combination of hardware architecture and system software advances to enable efficient operation at a Petaflops level. The NASA led HTMT project has engaged the talents of a broad interdisciplinary team to develop a new strategy in high-end system architecture to deliver petaflops scale computing in the 2004/5 timeframe. The Hybrid-Technology, MultiThreaded parallel computer architecture incorporates several advanced technologies in combination with an innovative dynamic adaptive scheduling mechanism to provide unprecedented performance and efficiency within practical constraints of cost, complexity, and power consumption. The emerging superconductor Rapid Single Flux Quantum electronics can operate at 100 GHz (the record is 770 GHz) and one percent of the power required by convention

  12. Digital tomosynthesis parallel imaging computational analysis with shift and add and back projection reconstruction algorithms.

    Science.gov (United States)

    Chen, Ying; Balla, Apuroop; Rayford II, Cleveland E; Zhou, Weihua; Fang, Jian; Cong, Linlin

    2010-01-01

    Digital tomosynthesis is a novel technology that has been developed for various clinical applications. Parallel imaging configuration is utilised in a few tomosynthesis imaging areas such as digital chest tomosynthesis. Recently, parallel imaging configuration for breast tomosynthesis began to appear too. In this paper, we present the investigation on computational analysis of impulse response characterisation as the start point of our important research efforts to optimise the parallel imaging configurations. Results suggest that impulse response computational analysis is an effective method to compare and optimise imaging configurations.

  13. A homotopy method for solving Riccati equations on a shared memory parallel computer

    International Nuclear Information System (INIS)

    Zigic, D.; Watson, L.T.; Collins, E.G. Jr.; Davis, L.D.

    1993-01-01

    Although there are numerous algorithms for solving Riccati equations, there still remains a need for algorithms which can operate efficiently on large problems and on parallel machines. This paper gives a new homotopy-based algorithm for solving Riccati equations on a shared memory parallel computer. The central part of the algorithm is the computation of the kernel of the Jacobian matrix, which is essential for the corrector iterations along the homotopy zero curve. Using a Schur decomposition the tensor product structure of various matrices can be efficiently exploited. The algorithm allows for efficient parallelization on shared memory machines

  14. Parallel computation for blood cell classification in medical hyperspectral imagery

    International Nuclear Information System (INIS)

    Li, Wei; Wu, Lucheng; Qiu, Xianbo; Ran, Qiong; Xie, Xiaoming

    2016-01-01

    With the advantage of fine spectral resolution, hyperspectral imagery provides great potential for cell classification. This paper provides a promising classification system including the following three stages: (1) band selection for a subset of spectral bands with distinctive and informative features, (2) spectral-spatial feature extraction, such as local binary patterns (LBP), and (3) followed by an effective classifier. Moreover, these three steps are further implemented on graphics processing units (GPU) respectively, which makes the system real-time and more practical. The GPU parallel implementation is compared with the serial implementation on central processing units (CPU). Experimental results based on real medical hyperspectral data demonstrate that the proposed system is able to offer high accuracy and fast speed, which are appealing for cell classification in medical hyperspectral imagery. (paper)

  15. Vector and parallel processors in computational science. Proceedings

    Energy Technology Data Exchange (ETDEWEB)

    Duff, I S; Reid, J K

    1985-01-01

    This volume contains papers from most of the invited talks and from several of the contributed talks and poster sessions presented at VAPP II. The contents present an extensive coverage of all important aspects of vector and parallel processors, including hardware, languages, numerical algorithms and applications. The topics covered include descriptions of new machines (both research and commercial machines), languages and software aids, and general discussions of whole classes of machines and their uses. Numerical methods papers include Monte Carlo algorithms, iterative and direct methods for solving large systems, finite elements, optimization, random number generation and mathematical software. The specific applications covered include neutron diffusion calculations, molecular dynamics, weather forecasting, lattice gauge calculations, fluid dynamics, flight simulation, cartography, image processing and cryptography. Most machines and architecture types are being used for these applications. many refs.

  16. A Parallel and Distributed Surrogate Model Implementation for Computational Steering

    KAUST Repository

    Butnaru, Daniel

    2012-06-01

    Understanding the influence of multiple parameters in a complex simulation setting is a difficult task. In the ideal case, the scientist can freely steer such a simulation and is immediately presented with the results for a certain configuration of the input parameters. Such an exploration process is however not possible if the simulation is computationally too expensive. For these cases we present in this paper a scalable computational steering approach utilizing a fast surrogate model as substitute for the time-consuming simulation. The surrogate model we propose is based on the sparse grid technique, and we identify the main computational tasks associated with its evaluation and its extension. We further show how distributed data management combined with the specific use of accelerators allows us to approximate and deliver simulation results to a high-resolution visualization system in real-time. This significantly enhances the steering workflow and facilitates the interactive exploration of large datasets. © 2012 IEEE.

  17. Managing internode data communications for an uninitialized process in a parallel computer

    Science.gov (United States)

    Archer, Charles J; Blocksome, Michael A; Miller, Douglas R; Parker, Jeffrey J; Ratterman, Joseph D; Smith, Brian E

    2014-05-20

    A parallel computer includes nodes, each having main memory and a messaging unit (MU). Each MU includes computer memory, which in turn includes, MU message buffers. Each MU message buffer is associated with an uninitialized process on the compute node. In the parallel computer, managing internode data communications for an uninitialized process includes: receiving, by an MU of a compute node, one or more data communications messages in an MU message buffer associated with an uninitialized process on the compute node; determining, by an application agent, that the MU message buffer associated with the uninitialized process is full prior to initialization of the uninitialized process; establishing, by the application agent, a temporary message buffer for the uninitialized process in main computer memory; and moving, by the application agent, data communications messages from the MU message buffer associated with the uninitialized process to the temporary message buffer in main computer memory.

  18. Multiscale Methods, Parallel Computation, and Neural Networks for Real-Time Computer Vision.

    Science.gov (United States)

    Battiti, Roberto

    1990-01-01

    This thesis presents new algorithms for low and intermediate level computer vision. The guiding ideas in the presented approach are those of hierarchical and adaptive processing, concurrent computation, and supervised learning. Processing of the visual data at different resolutions is used not only to reduce the amount of computation necessary to reach the fixed point, but also to produce a more accurate estimation of the desired parameters. The presented adaptive multiple scale technique is applied to the problem of motion field estimation. Different parts of the image are analyzed at a resolution that is chosen in order to minimize the error in the coefficients of the differential equations to be solved. Tests with video-acquired images show that velocity estimation is more accurate over a wide range of motion with respect to the homogeneous scheme. In some cases introduction of explicit discontinuities coupled to the continuous variables can be used to avoid propagation of visual information from areas corresponding to objects with different physical and/or kinematic properties. The human visual system uses concurrent computation in order to process the vast amount of visual data in "real -time." Although with different technological constraints, parallel computation can be used efficiently for computer vision. All the presented algorithms have been implemented on medium grain distributed memory multicomputers with a speed-up approximately proportional to the number of processors used. A simple two-dimensional domain decomposition assigns regions of the multiresolution pyramid to the different processors. The inter-processor communication needed during the solution process is proportional to the linear dimension of the assigned domain, so that efficiency is close to 100% if a large region is assigned to each processor. Finally, learning algorithms are shown to be a viable technique to engineer computer vision systems for different applications starting from

  19. Parallel distributed computing in modeling of the nanomaterials production technologies

    NARCIS (Netherlands)

    Krzhizhanovskaya, V.V.; Korkhov, V.V.; Zatevakhin, M.A.; Gorbachev, Y.E.

    2008-01-01

    Simulation of physical and chemical processes occurring in the nanomaterial production technologies is a computationally challenging problem, due to the great number of coupled processes, time and length scales to be taken into account. To solve such complex problems with a good level of detail in a

  20. Monte Carlo calculations on a parallel computer using MORSE-C.G

    International Nuclear Information System (INIS)

    Wood, J.

    1995-01-01

    The general purpose particle transport Monte Carlo code, MORSE-C.G., is implemented on a parallel computing transputer-based system having MIMD architecture. Example problems are solved which are representative of the 3-principal types of problem that can be solved by the original serial code, namely, fixed source, eigenvalue (k-eff) and time-dependent. The results from the parallelized version of the code are compared in tables with the serial code run on a mainframe serial computer, and with an independent, deterministic transport code. The performance of the parallel computer as the number of processors is varied is shown graphically. For the parallel strategy used, the loss of efficiency as the number of processors is increased, is investigated. (author)

  1. Preliminary Study on the Enhancement of Reconstruction Speed for Emission Computed Tomography Using Parallel Processing

    International Nuclear Information System (INIS)

    Park, Min Jae; Lee, Jae Sung; Kim, Soo Mee; Kang, Ji Yeon; Lee, Dong Soo; Park, Kwang Suk

    2009-01-01

    Conventional image reconstruction uses simplified physical models of projection. However, real physics, for example 3D reconstruction, takes too long time to process all the data in clinic and is unable in a common reconstruction machine because of the large memory for complex physical models. We suggest the realistic distributed memory model of fast-reconstruction using parallel processing on personal computers to enable large-scale technologies. The preliminary tests for the possibility on virtual machines and various performance test on commercial super computer, Tachyon were performed. Expectation maximization algorithm with common 2D projection and realistic 3D line of response were tested. Since the process time was getting slower (max 6 times) after a certain iteration, optimization for compiler was performed to maximize the efficiency of parallelization. Parallel processing of a program on multiple computers was available on Linux with MPICH and NFS. We verified that differences between parallel processed image and single processed image at the same iterations were under the significant digits of floating point number, about 6 bit. Double processors showed good efficiency (1.96 times) of parallel computing. Delay phenomenon was solved by vectorization method using SSE. Through the study, realistic parallel computing system in clinic was established to be able to reconstruct by plenty of memory using the realistic physical models which was impossible to simplify

  2. Stampi: a message passing library for distributed parallel computing. User's guide, second edition

    International Nuclear Information System (INIS)

    Imamura, Toshiyuki; Koide, Hiroshi; Takemiya, Hiroshi

    2000-02-01

    A new message passing library, Stampi, has been developed to realize a computation with different kind of parallel computers arbitrarily and making MPI (Message Passing Interface) as an unique interface for communication. Stampi is based on the MPI2 specification, and it realizes dynamic process creation to different machines and communication between spawned one within the scope of MPI semantics. Main features of Stampi are summarized as follows: (i) an automatic switch function between external- and internal communications, (ii) a message routing/relaying with a routing module, (iii) a dynamic process creation, (iv) a support of two types of connection, Master/Slave and Client/Server, (v) a support of a communication with Java applets. Indeed vendors implemented MPI libraries as a closed system in one parallel machine or their systems, and did not support both functions; process creation and communication to external machines. Stampi supports both functions and enables us distributed parallel computing. Currently Stampi has been implemented on COMPACS (COMplex PArallel Computer System) introduced in CCSE, five parallel computers and one graphic workstation, moreover on eight kinds of parallel machines, totally fourteen systems. Stampi provides us MPI communication functionality on them. This report describes mainly the usage of Stampi. (author)

  3. Parallel processing algorithms for hydrocodes on a computer with MIMD architecture (DENELCOR's HEP)

    International Nuclear Information System (INIS)

    Hicks, D.L.

    1983-11-01

    In real time simulation/prediction of complex systems such as water-cooled nuclear reactors, if reactor operators had fast simulator/predictors to check the consequences of their operations before implementing them, events such as the incident at Three Mile Island might be avoided. However, existing simulator/predictors such as RELAP run slower than real time on serial computers. It appears that the only way to overcome the barrier to higher computing rates is to use computers with architectures that allow concurrent computations or parallel processing. The computer architecture with the greatest degree of parallelism is labeled Multiple Instruction Stream, Multiple Data Stream (MIMD). An example of a machine of this type is the HEP computer by DENELCOR. It appears that hydrocodes are very well suited for parallelization on the HEP. It is a straightforward exercise to parallelize explicit, one-dimensional Lagrangean hydrocodes in a zone-by-zone parallelization. Similarly, implicit schemes can be parallelized in a zone-by-zone fashion via an a priori, symbolic inversion of the tridiagonal matrix that arises in an implicit scheme. These techniques are extended to Eulerian hydrocodes by using Harlow's rezone technique. The extension from single-phase Eulerian to two-phase Eulerian is straightforward. This step-by-step extension leads to hydrocodes with zone-by-zone parallelization that are capable of two-phase flow simulation. Extensions to two and three spatial dimensions can be achieved by operator splitting. It appears that a zone-by-zone parallelization is the best way to utilize the capabilities of an MIMD machine. 40 references

  4. Parallel performances of three 3D reconstruction methods on MIMD computers: Feldkamp, block ART and SIRT algorithms

    International Nuclear Information System (INIS)

    Laurent, C.; Chassery, J.M.; Peyrin, F.; Girerd, C.

    1996-01-01

    This paper deals with the parallel implementations of reconstruction methods in 3D tomography. 3D tomography requires voluminous data and long computation times. Parallel computing, on MIMD computers, seems to be a good approach to manage this problem. In this study, we present the different steps of the parallelization on an abstract parallel computer. Depending on the method, we use two main approaches to parallelize the algorithms: the local approach and the global approach. Experimental results on MIMD computers are presented. Two 3D images reconstructed from realistic data are showed

  5. Design and Validation of Real-Time Optimal Control with ECMS to Minimize Energy Consumption for Parallel Hybrid Electric Vehicles

    Directory of Open Access Journals (Sweden)

    Aiyun Gao

    2017-01-01

    Full Text Available A real-time optimal control of parallel hybrid electric vehicles (PHEVs with the equivalent consumption minimization strategy (ECMS is presented in this paper, whose purpose is to achieve the total equivalent fuel consumption minimization and to maintain the battery state of charge (SOC within its operation range at all times simultaneously. Vehicle and assembly models of PHEVs are established, which provide the foundation for the following calculations. The ECMS is described in detail, in which an instantaneous cost function including the fuel energy and the electrical energy is proposed, whose emphasis is the computation of the equivalent factor. The real-time optimal control strategy is designed through regarding the minimum of the total equivalent fuel consumption as the control objective and the torque split factor as the control variable. The validation of the control strategy proposed is demonstrated both in the MATLAB/Simulink/Advisor environment and under actual transportation conditions by comparing the fuel economy, the charge sustainability, and parts performance with other three control strategies under different driving cycles including standard, actual, and real-time road conditions. Through numerical simulations and real vehicle tests, the accuracy of the approach used for the evaluation of the equivalent factor is confirmed, and the potential of the proposed control strategy in terms of fuel economy and keeping the deviations of SOC at a low level is illustrated.

  6. Implementation of QR up- and downdating on a massively parallel |computer

    DEFF Research Database (Denmark)

    Bendtsen, Claus; Hansen, Per Christian; Madsen, Kaj

    1995-01-01

    We describe an implementation of QR up- and downdating on a massively parallel computer (the Connection Machine CM-200) and show that the algorithm maps well onto the computer. In particular, we show how the use of corrected semi-normal equations for downdating can be efficiently implemented. We...... also illustrate the use of our algorithms in a new LP algorithm....

  7. Software Alchemy: Turning Complex Statistical Computations into Embarrassingly-Parallel Ones

    Directory of Open Access Journals (Sweden)

    Norman Matloff

    2016-07-01

    Full Text Available The growth in the use of computationally intensive statistical procedures, especially with big data, has necessitated the usage of parallel computation on diverse platforms such as multicore, GPUs, clusters and clouds. However, slowdown due to interprocess communication costs typically limits such methods to "embarrassingly parallel" (EP algorithms, especially on non-shared memory platforms. This paper develops a broadlyapplicable method for converting many non-EP algorithms into statistically equivalent EP ones. The method is shown to yield excellent levels of speedup for a variety of statistical computations. It also overcomes certain problems of memory limitations.

  8. Accelerating Neuroimage Registration through Parallel Computation of Similarity Metric.

    Directory of Open Access Journals (Sweden)

    Yun-Gang Luo

    Full Text Available Neuroimage registration is crucial for brain morphometric analysis and treatment efficacy evaluation. However, existing advanced registration algorithms such as FLIRT and ANTs are not efficient enough for clinical use. In this paper, a GPU implementation of FLIRT with the correlation ratio (CR as the similarity metric and a GPU accelerated correlation coefficient (CC calculation for the symmetric diffeomorphic registration of ANTs have been developed. The comparison with their corresponding original tools shows that our accelerated algorithms can greatly outperform the original algorithm in terms of computational efficiency. This paper demonstrates the great potential of applying these registration tools in clinical applications.

  9. Programming a massively parallel, computation universal system: static behavior

    Energy Technology Data Exchange (ETDEWEB)

    Lapedes, A.; Farber, R.

    1986-01-01

    In previous work by the authors, the ''optimum finding'' properties of Hopfield neural nets were applied to the nets themselves to create a ''neural compiler.'' This was done in such a way that the problem of programming the attractors of one neural net (called the Slave net) was expressed as an optimization problem that was in turn solved by a second neural net (the Master net). In this series of papers that approach is extended to programming nets that contain interneurons (sometimes called ''hidden neurons''), and thus deals with nets capable of universal computation. 22 refs.

  10. Weighted Local Active Pixel Pattern (WLAPP for Face Recognition in Parallel Computation Environment

    Directory of Open Access Journals (Sweden)

    Gundavarapu Mallikarjuna Rao

    2013-10-01

    Full Text Available Abstract  - The availability of multi-core technology resulted totally new computational era. Researchers are keen to explore available potential in state of art-machines for breaking the bearer imposed by serial computation. Face Recognition is one of the challenging applications on so ever computational environment. The main difficulty of traditional Face Recognition algorithms is lack of the scalability. In this paper Weighted Local Active Pixel Pattern (WLAPP, a new scalable Face Recognition Algorithm suitable for parallel environment is proposed.  Local Active Pixel Pattern (LAPP is found to be simple and computational inexpensive compare to Local Binary Patterns (LBP. WLAPP is developed based on concept of LAPP. The experimentation is performed on FG-Net Aging Database with deliberately introduced 20% distortion and the results are encouraging. Keywords — Active pixels, Face Recognition, Local Binary Pattern (LBP, Local Active Pixel Pattern (LAPP, Pattern computing, parallel workers, template, weight computation.  

  11. 10th International Workshop on Parallel Tools for High Performance Computing

    CERN Document Server

    Gracia, José; Hilbrich, Tobias; Knüpfer, Andreas; Resch, Michael; Nagel, Wolfgang

    2017-01-01

    This book presents the proceedings of the 10th International Parallel Tools Workshop, held October 4-5, 2016 in Stuttgart, Germany – a forum to discuss the latest advances in parallel tools. High-performance computing plays an increasingly important role for numerical simulation and modelling in academic and industrial research. At the same time, using large-scale parallel systems efficiently is becoming more difficult. A number of tools addressing parallel program development and analysis have emerged from the high-performance computing community over the last decade, and what may have started as collection of small helper script has now matured to production-grade frameworks. Powerful user interfaces and an extensive body of documentation allow easy usage by non-specialists.

  12. Parallel computation with molecular-motor-propelled agents in nanofabricated networks.

    Science.gov (United States)

    Nicolau, Dan V; Lard, Mercy; Korten, Till; van Delft, Falco C M J M; Persson, Malin; Bengtsson, Elina; Månsson, Alf; Diez, Stefan; Linke, Heiner; Nicolau, Dan V

    2016-03-08

    The combinatorial nature of many important mathematical problems, including nondeterministic-polynomial-time (NP)-complete problems, places a severe limitation on the problem size that can be solved with conventional, sequentially operating electronic computers. There have been significant efforts in conceiving parallel-computation approaches in the past, for example: DNA computation, quantum computation, and microfluidics-based computation. However, these approaches have not proven, so far, to be scalable and practical from a fabrication and operational perspective. Here, we report the foundations of an alternative parallel-computation system in which a given combinatorial problem is encoded into a graphical, modular network that is embedded in a nanofabricated planar device. Exploring the network in a parallel fashion using a large number of independent, molecular-motor-propelled agents then solves the mathematical problem. This approach uses orders of magnitude less energy than conventional computers, thus addressing issues related to power consumption and heat dissipation. We provide a proof-of-concept demonstration of such a device by solving, in a parallel fashion, the small instance {2, 5, 9} of the subset sum problem, which is a benchmark NP-complete problem. Finally, we discuss the technical advances necessary to make our system scalable with presently available technology.

  13. Teaching Scientific Computing: A Model-Centered Approach to Pipeline and Parallel Programming with C

    Directory of Open Access Journals (Sweden)

    Vladimiras Dolgopolovas

    2015-01-01

    Full Text Available The aim of this study is to present an approach to the introduction into pipeline and parallel computing, using a model of the multiphase queueing system. Pipeline computing, including software pipelines, is among the key concepts in modern computing and electronics engineering. The modern computer science and engineering education requires a comprehensive curriculum, so the introduction to pipeline and parallel computing is the essential topic to be included in the curriculum. At the same time, the topic is among the most motivating tasks due to the comprehensive multidisciplinary and technical requirements. To enhance the educational process, the paper proposes a novel model-centered framework and develops the relevant learning objects. It allows implementing an educational platform of constructivist learning process, thus enabling learners’ experimentation with the provided programming models, obtaining learners’ competences of the modern scientific research and computational thinking, and capturing the relevant technical knowledge. It also provides an integral platform that allows a simultaneous and comparative introduction to pipelining and parallel computing. The programming language C for developing programming models and message passing interface (MPI and OpenMP parallelization tools have been chosen for implementation.

  14. Torque Split Strategy for Parallel Hybrid Electric Vehicles with an Integrated Starter Generator

    Directory of Open Access Journals (Sweden)

    Zhumu Fu

    2014-01-01

    Full Text Available This paper presents a torque split strategy for parallel hybrid electric vehicles with an integrated starter generator (ISG-PHEV by using fuzzy logic control. By combining the efficiency map and the optimum torque curve of the internal combustion engine (ICE with the state of charge (SOC of the batteries, the torque split strategy is designed, which manages the ICE within its peak efficiency region. Taking the quantified ICE torque, the quantified SOC of the batteries, and the quantified ICE speed as inputs, and regarding the output torque demanded on the ICE as an output, a fuzzy logic controller (FLC with relevant fuzzy rules has been developed to determine the optimal torque distribution among the ICE, the ISG, and the electric motor/generator (EMG effectively. The simulation results reveal that, compared with the conventional torque control strategy which uses rule-based controller (RBC in different driving cycles, the proposed FLC improves the fuel economy of the ISG-PHEV, increases the efficiency of the ICE, and maintains batteries SOC within its operation range more availably.

  15. Vector and parallel computing on the IBM ES/3090, a powerful approach to solving problems in the utility industry

    International Nuclear Information System (INIS)

    Bellucci, V.J.

    1990-01-01

    This paper describes IBM's approach to parallel computing using the IBM ES/3090 computer. Parallel processing concepts were discussed including its advantages, potential performance improvements and limitations. Particular applications and capabilities for the IBM ES/3090 were presented along with preliminary results from some utilities in the application of parallel processing to simulation of system reliability, air pollution models, and power network dynamics

  16. System design and energetic characterization of a four-wheel-driven series–parallel hybrid electric powertrain for heavy-duty applications

    International Nuclear Information System (INIS)

    Wang, Enhua; Guo, Di; Yang, Fuyuan

    2015-01-01

    Highlights: • A novel four-wheel-driven series–parallel hybrid powertrain is proposed. • A system model and a rule-based control strategy are designed. • Energetic performance is compared to a rear-wheel-driven hybrid powertrain. • Less torsional oscillation and more robust regenerative braking are achieved. - Abstract: Powertrain topology design is vital for system performance of a hybrid electric vehicle. In this paper, a novel four-wheel-driven series–parallel hybrid electric powertrain is proposed. A motor is connected to the differential of the rear axle. An auxiliary power unit is linked to the differential of the front axle via a clutch. First, a mathematical model was established to evaluate the fuel-saving potential. A rule-based energy management algorithm was subsequently designed, and its working parameters were optimized. The hybrid powertrain system was applied to a transit bus, and the system characteristics were analyzed. Compared to an existing coaxial power-split hybrid powertrain, the fuel economy of the four-wheel-driven series–parallel hybrid powertrain can be at the same level under normal road conditions. However, the proposed four-wheel-driven series–parallel hybrid powertrain can recover braking energy more efficiently under road conditions with a low adhesive coefficient and can alleviate the torsional oscillation occurring at the existing coaxial power-split hybrid powertrain. Therefore, the four-wheel-driven series–parallel hybrid powertrain is a good solution for transit buses toward more robust performance.

  17. CUDA/GPU Technology : Parallel Programming For High Performance Scientific Computing

    OpenAIRE

    YUHENDRA; KUZE, Hiroaki; JOSAPHAT, Tetuko Sri Sumantyo

    2009-01-01

    [ABSTRACT]Graphics processing units (GP Us) originally designed for computer video cards have emerged as the most powerful chip in a high-performance workstation. In the high performance computation capabilities, graphic processing units (GPU) lead to much more powerful performance than conventional CPUs by means of parallel processing. In 2007, the birth of Compute Unified Device Architecture (CUDA) and CUDA-enabled GPUs by NVIDIA Corporation brought a revolution in the general purpose GPU a...

  18. Distributed and cloud computing from parallel processing to the Internet of Things

    CERN Document Server

    Hwang, Kai; Fox, Geoffrey C

    2012-01-01

    Distributed and Cloud Computing, named a 2012 Outstanding Academic Title by the American Library Association's Choice publication, explains how to create high-performance, scalable, reliable systems, exposing the design principles, architecture, and innovative applications of parallel, distributed, and cloud computing systems. Starting with an overview of modern distributed models, the book provides comprehensive coverage of distributed and cloud computing, including: Facilitating management, debugging, migration, and disaster recovery through virtualization Clustered systems for resear

  19. Massively parallel computation of PARASOL code on the Origin 3800 system

    International Nuclear Information System (INIS)

    Hosokawa, Masanari; Takizuka, Tomonori

    2001-10-01

    The divertor particle simulation code named PARASOL simulates open-field plasmas between divertor walls self-consistently by using an electrostatic PIC method and a binary collision Monte Carlo model. The PARASOL parallelized with MPI-1.1 for scalar parallel computer worked on Intel Paragon XP/S system. A system SGI Origin 3800 was newly installed (May, 2001). The parallel programming was improved at this switchover. As a result of the high-performance new hardware and this improvement, the PARASOL is speeded up by about 60 times with the same number of processors. (author)

  20. Efficiency Analysis of the Parallel Implementation of the SIMPLE Algorithm on Multiprocessor Computers

    Science.gov (United States)

    Lashkin, S. V.; Kozelkov, A. S.; Yalozo, A. V.; Gerasimov, V. Yu.; Zelensky, D. K.

    2017-12-01

    This paper describes the details of the parallel implementation of the SIMPLE algorithm for numerical solution of the Navier-Stokes system of equations on arbitrary unstructured grids. The iteration schemes for the serial and parallel versions of the SIMPLE algorithm are implemented. In the description of the parallel implementation, special attention is paid to computational data exchange among processors under the condition of the grid model decomposition using fictitious cells. We discuss the specific features for the storage of distributed matrices and implementation of vector-matrix operations in parallel mode. It is shown that the proposed way of matrix storage reduces the number of interprocessor exchanges. A series of numerical experiments illustrates the effect of the multigrid SLAE solver tuning on the general efficiency of the algorithm; the tuning involves the types of the cycles used (V, W, and F), the number of iterations of a smoothing operator, and the number of cells for coarsening. Two ways (direct and indirect) of efficiency evaluation for parallelization of the numerical algorithm are demonstrated. The paper presents the results of solving some internal and external flow problems with the evaluation of parallelization efficiency by two algorithms. It is shown that the proposed parallel implementation enables efficient computations for the problems on a thousand processors. Based on the results obtained, some general recommendations are made for the optimal tuning of the multigrid solver, as well as for selecting the optimal number of cells per processor.

  1. Modeling the Fracture of Ice Sheets on Parallel Computers

    Energy Technology Data Exchange (ETDEWEB)

    Waisman, Haim [Columbia Univ., New York, NY (United States); Tuminaro, Ray [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)

    2013-10-10

    The objective of this project was to investigate the complex fracture of ice and understand its role within larger ice sheet simulations and global climate change. This objective was achieved by developing novel physics based models for ice, novel numerical tools to enable the modeling of the physics and by collaboration with the ice community experts. At the present time, ice fracture is not explicitly considered within ice sheet models due in part to large computational costs associated with the accurate modeling of this complex phenomena. However, fracture not only plays an extremely important role in regional behavior but also influences ice dynamics over much larger zones in ways that are currently not well understood. To this end, our research findings through this project offers significant advancement to the field and closes a large gap of knowledge in understanding and modeling the fracture of ice sheets in the polar regions. Thus, we believe that our objective has been achieved and our research accomplishments are significant. This is corroborated through a set of published papers, posters and presentations at technical conferences in the field. In particular significant progress has been made in the mechanics of ice, fracture of ice sheets and ice shelves in polar regions and sophisticated numerical methods that enable the solution of the physics in an efficient way.

  2. Parallel Object-Oriented Computation Applied to a Finite Element Problem

    Directory of Open Access Journals (Sweden)

    Jon B. Weissman

    1993-01-01

    Full Text Available The conventional wisdom in the scientific computing community is that the best way to solve large-scale numerically intensive scientific problems on today's parallel MIMD computers is to use Fortran or C programmed in a data-parallel style using low-level message-passing primitives. This approach inevitably leads to nonportable codes and extensive development time, and restricts parallel programming to the domain of the expert programmer. We believe that these problems are not inherent to parallel computing but are the result of the programming tools used. We will show that comparable performance can be achieved with little effort if better tools that present higher level abstractions are used. The vehicle for our demonstration is a 2D electromagnetic finite element scattering code we have implemented in Mentat, an object-oriented parallel processing system. We briefly describe the application. Mentat, the implementation, and present performance results for both a Mentat and a hand-coded parallel Fortran version.

  3. A language for data-parallel and task parallel programming dedicated to multi-SIMD computers. Contributions to hydrodynamic simulation with lattice gases

    International Nuclear Information System (INIS)

    Pic, Marc Michel

    1995-01-01

    Parallel programming covers task-parallelism and data-parallelism. Many problems need both parallelisms. Multi-SIMD computers allow hierarchical approach of these parallelisms. The T++ language, based on C++, is dedicated to exploit Multi-SIMD computers using a programming paradigm which is an extension of array-programming to tasks managing. Our language introduced array of independent tasks to achieve separately (MIMD), on subsets of processors of identical behaviour (SIMD), in order to translate the hierarchical inclusion of data-parallelism in task-parallelism. To manipulate in a symmetrical way tasks and data we propose meta-operations which have the same behaviour on tasks arrays and on data arrays. We explain how to implement this language on our parallel computer SYMPHONIE in order to profit by the locally-shared memory, by the hardware virtualization, and by the multiplicity of communications networks. We analyse simultaneously a typical application of such architecture. Finite elements scheme for Fluid mechanic needs powerful parallel computers and requires large floating points abilities. Lattice gases is an alternative to such simulations. Boolean lattice bases are simple, stable, modular, need to floating point computation, but include numerical noise. Boltzmann lattice gases present large precision of computation, but needs floating points and are only locally stable. We propose a new scheme, called multi-bit, who keeps the advantages of each boolean model to which it is applied, with large numerical precision and reduced noise. Experiments on viscosity, physical behaviour, noise reduction and spurious invariants are shown and implementation techniques for parallel Multi-SIMD computers detailed. (author) [fr

  4. Storing files in a parallel computing system based on user-specified parser function

    Science.gov (United States)

    Faibish, Sorin; Bent, John M; Tzelnic, Percy; Grider, Gary; Manzanares, Adam; Torres, Aaron

    2014-10-21

    Techniques are provided for storing files in a parallel computing system based on a user-specified parser function. A plurality of files generated by a distributed application in a parallel computing system are stored by obtaining a parser from the distributed application for processing the plurality of files prior to storage; and storing one or more of the plurality of files in one or more storage nodes of the parallel computing system based on the processing by the parser. The plurality of files comprise one or more of a plurality of complete files and a plurality of sub-files. The parser can optionally store only those files that satisfy one or more semantic requirements of the parser. The parser can also extract metadata from one or more of the files and the extracted metadata can be stored with one or more of the plurality of files and used for searching for files.

  5. Parallel computing simulation of fluid flow in the unsaturated zone of Yucca Mountain, Nevada

    International Nuclear Information System (INIS)

    Zhang, Keni; Wu, Yu-Shu; Bodvarsson, G.S.

    2001-01-01

    This paper presents the application of parallel computing techniques to large-scale modeling of fluid flow in the unsaturated zone (UZ) at Yucca Mountain, Nevada. In this study, parallel computing techniques, as implemented into the TOUGH2 code, are applied in large-scale numerical simulations on a distributed-memory parallel computer. The modeling study has been conducted using an over-one-million-cell three-dimensional numerical model, which incorporates a wide variety of field data for the highly heterogeneous fractured formation at Yucca Mountain. The objective of this study is to analyze the impact of various surface infiltration scenarios (under current and possible future climates) on flow through the UZ system, using various hydrogeological conceptual models with refined grids. The results indicate that the one-million-cell models produce better resolution results and reveal some flow patterns that cannot be obtained using coarse-grid modeling models

  6. Breast Cancer Image Segmentation Using K-Means Clustering Based on GPU Cuda Parallel Computing

    Directory of Open Access Journals (Sweden)

    Andika Elok Amalia

    2018-02-01

    Full Text Available Image processing technology is now widely used in the health area, one example is to help the radiologist to analyze the result of MRI (Magnetic Resonance Imaging, CT Scan and Mammography. Image segmentation is a process which is intended to obtain the objects contained in the image by dividing the image into several areas that have similarity attributes on an object with the aim of facilitating the analysis process. The increasing amount  of patient data and larger image size are new challenges in segmentation process to use time efficiently while still keeping the process quality. Research on the segmentation of medical images have been done but still few that combine with parallel computing. In this research, K-Means clustering on the image of mammography result is implemented using two-way computation which are serial and parallel. The result shows that parallel computing  gives faster average performance execution up to twofold.

  7. 8th International Workshop on Parallel Tools for High Performance Computing

    CERN Document Server

    Gracia, José; Knüpfer, Andreas; Resch, Michael; Nagel, Wolfgang

    2015-01-01

    Numerical simulation and modelling using High Performance Computing has evolved into an established technique in academic and industrial research. At the same time, the High Performance Computing infrastructure is becoming ever more complex. For instance, most of the current top systems around the world use thousands of nodes in which classical CPUs are combined with accelerator cards in order to enhance their compute power and energy efficiency. This complexity can only be mastered with adequate development and optimization tools. Key topics addressed by these tools include parallelization on heterogeneous systems, performance optimization for CPUs and accelerators, debugging of increasingly complex scientific applications, and optimization of energy usage in the spirit of green IT. This book represents the proceedings of the 8th International Parallel Tools Workshop, held October 1-2, 2014 in Stuttgart, Germany – which is a forum to discuss the latest advancements in the parallel tools.

  8. Fluid/Structure Interaction Studies of Aircraft Using High Fidelity Equations on Parallel Computers

    Science.gov (United States)

    Guruswamy, Guru; VanDalsem, William (Technical Monitor)

    1994-01-01

    Abstract Aeroelasticity which involves strong coupling of fluids, structures and controls is an important element in designing an aircraft. Computational aeroelasticity using low fidelity methods such as the linear aerodynamic flow equations coupled with the modal structural equations are well advanced. Though these low fidelity approaches are computationally less intensive, they are not adequate for the analysis of modern aircraft such as High Speed Civil Transport (HSCT) and Advanced Subsonic Transport (AST) which can experience complex flow/structure interactions. HSCT can experience vortex induced aeroelastic oscillations whereas AST can experience transonic buffet associated structural oscillations. Both aircraft may experience a dip in the flutter speed at the transonic regime. For accurate aeroelastic computations at these complex fluid/structure interaction situations, high fidelity equations such as the Navier-Stokes for fluids and the finite-elements for structures are needed. Computations using these high fidelity equations require large computational resources both in memory and speed. Current conventional super computers have reached their limitations both in memory and speed. As a result, parallel computers have evolved to overcome the limitations of conventional computers. This paper will address the transition that is taking place in computational aeroelasticity from conventional computers to parallel computers. The paper will address special techniques needed to take advantage of the architecture of new parallel computers. Results will be illustrated from computations made on iPSC/860 and IBM SP2 computer by using ENSAERO code that directly couples the Euler/Navier-Stokes flow equations with high resolution finite-element structural equations.

  9. Parallel, distributed and GPU computing technologies in single-particle electron microscopy.

    Science.gov (United States)

    Schmeisser, Martin; Heisen, Burkhard C; Luettich, Mario; Busche, Boris; Hauer, Florian; Koske, Tobias; Knauber, Karl-Heinz; Stark, Holger

    2009-07-01

    Most known methods for the determination of the structure of macromolecular complexes are limited or at least restricted at some point by their computational demands. Recent developments in information technology such as multicore, parallel and GPU processing can be used to overcome these limitations. In particular, graphics processing units (GPUs), which were originally developed for rendering real-time effects in computer games, are now ubiquitous and provide unprecedented computational power for scientific applications. Each parallel-processing paradigm alone can improve overall performance; the increased computational performance obtained by combining all paradigms, unleashing the full power of today's technology, makes certain applications feasible that were previously virtually impossible. In this article, state-of-the-art paradigms are introduced, the tools and infrastructure needed to apply these paradigms are presented and a state-of-the-art infrastructure and solution strategy for moving scientific applications to the next generation of computer hardware is outlined.

  10. Scalable and massively parallel Monte Carlo photon transport simulations for heterogeneous computing platforms.

    Science.gov (United States)

    Yu, Leiming; Nina-Paravecino, Fanny; Kaeli, David; Fang, Qianqian

    2018-01-01

    We present a highly scalable Monte Carlo (MC) three-dimensional photon transport simulation platform designed for heterogeneous computing systems. Through the development of a massively parallel MC algorithm using the Open Computing Language framework, this research extends our existing graphics processing unit (GPU)-accelerated MC technique to a highly scalable vendor-independent heterogeneous computing environment, achieving significantly improved performance and software portability. A number of parallel computing techniques are investigated to achieve portable performance over a wide range of computing hardware. Furthermore, multiple thread-level and device-level load-balancing strategies are developed to obtain efficient simulations using multiple central processing units and GPUs. (2018) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE).

  11. Hybrid soft computing systems for electromyographic signals analysis: a review

    Science.gov (United States)

    2014-01-01

    Electromyographic (EMG) is a bio-signal collected on human skeletal muscle. Analysis of EMG signals has been widely used to detect human movement intent, control various human-machine interfaces, diagnose neuromuscular diseases, and model neuromusculoskeletal system. With the advances of artificial intelligence and soft computing, many sophisticated techniques have been proposed for such purpose. Hybrid soft computing system (HSCS), the integration of these different techniques, aims to further improve the effectiveness, efficiency, and accuracy of EMG analysis. This paper reviews and compares key combinations of neural network, support vector machine, fuzzy logic, evolutionary computing, and swarm intelligence for EMG analysis. Our suggestions on the possible future development of HSCS in EMG analysis are also given in terms of basic soft computing techniques, further combination of these techniques, and their other applications in EMG analysis. PMID:24490979

  12. Hybrid soft computing systems for electromyographic signals analysis: a review.

    Science.gov (United States)

    Xie, Hong-Bo; Guo, Tianruo; Bai, Siwei; Dokos, Socrates

    2014-02-03

    Electromyographic (EMG) is a bio-signal collected on human skeletal muscle. Analysis of EMG signals has been widely used to detect human movement intent, control various human-machine interfaces, diagnose neuromuscular diseases, and model neuromusculoskeletal system. With the advances of artificial intelligence and soft computing, many sophisticated techniques have been proposed for such purpose. Hybrid soft computing system (HSCS), the integration of these different techniques, aims to further improve the effectiveness, efficiency, and accuracy of EMG analysis. This paper reviews and compares key combinations of neural network, support vector machine, fuzzy logic, evolutionary computing, and swarm intelligence for EMG analysis. Our suggestions on the possible future development of HSCS in EMG analysis are also given in terms of basic soft computing techniques, further combination of these techniques, and their other applications in EMG analysis.

  13. DVS-SOFTWARE: An Effective Tool for Applying Highly Parallelized Hardware To Computational Geophysics

    Science.gov (United States)

    Herrera, I.; Herrera, G. S.

    2015-12-01

    Most geophysical systems are macroscopic physical systems. The behavior prediction of such systems is carried out by means of computational models whose basic models are partial differential equations (PDEs) [1]. Due to the enormous size of the discretized version of such PDEs it is necessary to apply highly parallelized super-computers. For them, at present, the most efficient software is based on non-overlapping domain decomposition methods (DDM). However, a limiting feature of the present state-of-the-art techniques is due to the kind of discretizations used in them. Recently, I. Herrera and co-workers using 'non-overlapping discretizations' have produced the DVS-Software which overcomes this limitation [2]. The DVS-software can be applied to a great variety of geophysical problems and achieves very high parallel efficiencies (90%, or so [3]). It is therefore very suitable for effectively applying the most advanced parallel supercomputers available at present. In a parallel talk, in this AGU Fall Meeting, Graciela Herrera Z. will present how this software is being applied to advance MOD-FLOW. Key Words: Parallel Software for Geophysics, High Performance Computing, HPC, Parallel Computing, Domain Decomposition Methods (DDM)REFERENCES [1]. Herrera Ismael and George F. Pinder, Mathematical Modelling in Science and Engineering: An axiomatic approach", John Wiley, 243p., 2012. [2]. Herrera, I., de la Cruz L.M. and Rosas-Medina A. "Non Overlapping Discretization Methods for Partial, Differential Equations". NUMER METH PART D E, 30: 1427-1454, 2014, DOI 10.1002/num 21852. (Open source) [3]. Herrera, I., & Contreras Iván "An Innovative Tool for Effectively Applying Highly Parallelized Software To Problems of Elasticity". Geofísica Internacional, 2015 (In press)

  14. Evaluation of a Compact Hybrid Brain-Computer Interface System

    Directory of Open Access Journals (Sweden)

    Jaeyoung Shin

    2017-01-01

    Full Text Available We realized a compact hybrid brain-computer interface (BCI system by integrating a portable near-infrared spectroscopy (NIRS device with an economical electroencephalography (EEG system. The NIRS array was located on the subjects’ forehead, covering the prefrontal area. The EEG electrodes were distributed over the frontal, motor/temporal, and parietal areas. The experimental paradigm involved a Stroop word-picture matching test in combination with mental arithmetic (MA and baseline (BL tasks, in which the subjects were asked to perform either MA or BL in response to congruent or incongruent conditions, respectively. We compared the classification accuracies of each of the modalities (NIRS or EEG with that of the hybrid system. We showed that the hybrid system outperforms the unimodal EEG and NIRS systems by 6.2% and 2.5%, respectively. Since the proposed hybrid system is based on portable platforms, it is not confined to a laboratory environment and has the potential to be used in real-life situations, such as in neurorehabilitation.

  15. Application of parallel computing to seismic damage process simulation of an arch dam

    International Nuclear Information System (INIS)

    Zhong Hong; Lin Gao; Li Jianbo

    2010-01-01

    The simulation of damage process of high arch dam subjected to strong earthquake shocks is significant to the evaluation of its performance and seismic safety, considering the catastrophic effect of dam failure. However, such numerical simulation requires rigorous computational capacity. Conventional serial computing falls short of that and parallel computing is a fairly promising solution to this problem. The parallel finite element code PDPAD was developed for the damage prediction of arch dams utilizing the damage model with inheterogeneity of concrete considered. Developed with programming language Fortran, the code uses a master/slave mode for programming, domain decomposition method for allocation of tasks, MPI (Message Passing Interface) for communication and solvers from AZTEC library for solution of large-scale equations. Speedup test showed that the performance of PDPAD was quite satisfactory. The code was employed to study the damage process of a being-built arch dam on a 4-node PC Cluster, with more than one million degrees of freedom considered. The obtained damage mode was quite similar to that of shaking table test, indicating that the proposed procedure and parallel code PDPAD has a good potential in simulating seismic damage mode of arch dams. With the rapidly growing need for massive computation emerged from engineering problems, parallel computing will find more and more applications in pertinent areas.

  16. PAPIRUS, a parallel computing framework for sensitivity analysis, uncertainty propagation, and estimation of parameter distribution

    International Nuclear Information System (INIS)

    Heo, Jaeseok; Kim, Kyung Doo

    2015-01-01

    Highlights: • We developed an interface between an engineering simulation code and statistical analysis software. • Multiple packages of the sensitivity analysis, uncertainty quantification, and parameter estimation algorithms are implemented in the framework. • Parallel computing algorithms are also implemented in the framework to solve multiple computational problems simultaneously. - Abstract: This paper introduces a statistical data analysis toolkit, PAPIRUS, designed to perform the model calibration, uncertainty propagation, Chi-square linearity test, and sensitivity analysis for both linear and nonlinear problems. The PAPIRUS was developed by implementing multiple packages of methodologies, and building an interface between an engineering simulation code and the statistical analysis algorithms. A parallel computing framework is implemented in the PAPIRUS with multiple computing resources and proper communications between the server and the clients of each processor. It was shown that even though a large amount of data is considered for the engineering calculation, the distributions of the model parameters and the calculation results can be quantified accurately with significant reductions in computational effort. A general description about the PAPIRUS with a graphical user interface is presented in Section 2. Sections 2.1–2.5 present the methodologies of data assimilation, uncertainty propagation, Chi-square linearity test, and sensitivity analysis implemented in the toolkit with some results obtained by each module of the software. Parallel computing algorithms adopted in the framework to solve multiple computational problems simultaneously are also summarized in the paper

  17. PAPIRUS, a parallel computing framework for sensitivity analysis, uncertainty propagation, and estimation of parameter distribution

    Energy Technology Data Exchange (ETDEWEB)

    Heo, Jaeseok, E-mail: jheo@kaeri.re.kr; Kim, Kyung Doo, E-mail: kdkim@kaeri.re.kr

    2015-10-15

    Highlights: • We developed an interface between an engineering simulation code and statistical analysis software. • Multiple packages of the sensitivity analysis, uncertainty quantification, and parameter estimation algorithms are implemented in the framework. • Parallel computing algorithms are also implemented in the framework to solve multiple computational problems simultaneously. - Abstract: This paper introduces a statistical data analysis toolkit, PAPIRUS, designed to perform the model calibration, uncertainty propagation, Chi-square linearity test, and sensitivity analysis for both linear and nonlinear problems. The PAPIRUS was developed by implementing multiple packages of methodologies, and building an interface between an engineering simulation code and the statistical analysis algorithms. A parallel computing framework is implemented in the PAPIRUS with multiple computing resources and proper communications between the server and the clients of each processor. It was shown that even though a large amount of data is considered for the engineering calculation, the distributions of the model parameters and the calculation results can be quantified accurately with significant reductions in computational effort. A general description about the PAPIRUS with a graphical user interface is presented in Section 2. Sections 2.1–2.5 present the methodologies of data assimilation, uncertainty propagation, Chi-square linearity test, and sensitivity analysis implemented in the toolkit with some results obtained by each module of the software. Parallel computing algorithms adopted in the framework to solve multiple computational problems simultaneously are also summarized in the paper.

  18. CSP: A Multifaceted Hybrid Architecture for Space Computing

    Science.gov (United States)

    Rudolph, Dylan; Wilson, Christopher; Stewart, Jacob; Gauvin, Patrick; George, Alan; Lam, Herman; Crum, Gary Alex; Wirthlin, Mike; Wilson, Alex; Stoddard, Aaron

    2014-01-01

    Research on the CHREC Space Processor (CSP) takes a multifaceted hybrid approach to embedded space computing. Working closely with the NASA Goddard SpaceCube team, researchers at the National Science Foundation (NSF) Center for High-Performance Reconfigurable Computing (CHREC) at the University of Florida and Brigham Young University are developing hybrid space computers that feature an innovative combination of three technologies: commercial-off-the-shelf (COTS) devices, radiation-hardened (RadHard) devices, and fault-tolerant computing. Modern COTS processors provide the utmost in performance and energy-efficiency but are susceptible to ionizing radiation in space, whereas RadHard processors are virtually immune to this radiation but are more expensive, larger, less energy-efficient, and generations behind in speed and functionality. By featuring COTS devices to perform the critical data processing, supported by simpler RadHard devices that monitor and manage the COTS devices, and augmented with novel uses of fault-tolerant hardware, software, information, and networking within and between COTS devices, the resulting system can maximize performance and reliability while minimizing energy consumption and cost. NASA Goddard has adopted the CSP concept and technology with plans underway to feature flight-ready CSP boards on two upcoming space missions.

  19. Advanced quadrature sets and acceleration and preconditioning techniques for the discrete ordinates method in parallel computing environments

    Science.gov (United States)

    Longoni, Gianluca

    In the nuclear science and engineering field, radiation transport calculations play a key-role in the design and optimization of nuclear devices. The linear Boltzmann equation describes the angular, energy and spatial variations of the particle or radiation distribution. The discrete ordinates method (S N) is the most widely used technique for solving the linear Boltzmann equation. However, for realistic problems, the memory and computing time require the use of supercomputers. This research is devoted to the development of new formulations for the SN method, especially for highly angular dependent problems, in parallel environments. The present research work addresses two main issues affecting the accuracy and performance of SN transport theory methods: quadrature sets and acceleration techniques. New advanced quadrature techniques which allow for large numbers of angles with a capability for local angular refinement have been developed. These techniques have been integrated into the 3-D SN PENTRAN (Parallel Environment Neutral-particle TRANsport) code and applied to highly angular dependent problems, such as CT-Scan devices, that are widely used to obtain detailed 3-D images for industrial/medical applications. In addition, the accurate simulation of core physics and shielding problems with strong heterogeneities and transport effects requires the numerical solution of the transport equation. In general, the convergence rate of the solution methods for the transport equation is reduced for large problems with optically thick regions and scattering ratios approaching unity. To remedy this situation, new acceleration algorithms based on the Even-Parity Simplified SN (EP-SSN) method have been developed. A new stand-alone code system, PENSSn (Parallel Environment Neutral-particle Simplified SN), has been developed based on the EP-SSN method. The code is designed for parallel computing environments with spatial, angular and hybrid (spatial/angular) domain

  20. Speed up of MCACE, a Monte Carlo code for evaluation of shielding safety, by parallel computer, (3)

    International Nuclear Information System (INIS)

    Takano, Makoto; Masukawa, Fumihiro; Naito, Yoshitaka; Onodera, Emi; Imawaka, Tsuneyuki; Yoda, Yoshihisa.

    1993-07-01

    The parallel computing of the MCACE code has been studied on two platforms; 1) Shared Memory Type Vector-Parallel Computer Monte-4 and 2) Networked Several Workstations. On the Monte-4, a disk-file has been allocated to collect all results computed by 4 CPUs in parallel, executing the copy of the MCACE code on each CPU. On the workstations under network environment, two parallel models have been evaluated; 1) a host-node model and 2) the model used on the Monte-4 where no software for parallelization has been employed but only standard FORTRAN language. The measurement of computing times has showed that speed up of about 3 times has been achieved by using 4 CPUs of the Monte-4. Further, connecting 4 workstations by network, the computing speed by parallelization has achieved faster than our scalar main frame computer, FACOM M-780. (author)

  1. Energy efficient hybrid computing systems using spin devices

    Science.gov (United States)

    Sharad, Mrigank

    Emerging spin-devices like magnetic tunnel junctions (MTJ's), spin-valves and domain wall magnets (DWM) have opened new avenues for spin-based logic design. This work explored potential computing applications which can exploit such devices for higher energy-efficiency and performance. The proposed applications involve hybrid design schemes, where charge-based devices supplement the spin-devices, to gain large benefits at the system level. As an example, lateral spin valves (LSV) involve switching of nanomagnets using spin-polarized current injection through a metallic channel such as Cu. Such spin-torque based devices possess several interesting properties that can be exploited for ultra-low power computation. Analog characteristic of spin current facilitate non-Boolean computation like majority evaluation that can be used to model a neuron. The magneto-metallic neurons can operate at ultra-low terminal voltage of ˜20mV, thereby resulting in small computation power. Moreover, since nano-magnets inherently act as memory elements, these devices can facilitate integration of logic and memory in interesting ways. The spin based neurons can be integrated with CMOS and other emerging devices leading to different classes of neuromorphic/non-Von-Neumann architectures. The spin-based designs involve `mixed-mode' processing and hence can provide very compact and ultra-low energy solutions for complex computation blocks, both digital as well as analog. Such low-power, hybrid designs can be suitable for various data processing applications like cognitive computing, associative memory, and currentmode on-chip global interconnects. Simulation results for these applications based on device-circuit co-simulation framework predict more than ˜100x improvement in computation energy as compared to state of the art CMOS design, for optimal spin-device parameters.

  2. Efficient multi-objective calibration of a computationally intensive hydrologic model with parallel computing software in Python

    Science.gov (United States)

    With enhanced data availability, distributed watershed models for large areas with high spatial and temporal resolution are increasingly used to understand water budgets and examine effects of human activities and climate change/variability on water resources. Developing parallel computing software...

  3. On the Performance of the Python Programming Language for Serial and Parallel Scientific Computations

    Directory of Open Access Journals (Sweden)

    Xing Cai

    2005-01-01

    Full Text Available This article addresses the performance of scientific applications that use the Python programming language. First, we investigate several techniques for improving the computational efficiency of serial Python codes. Then, we discuss the basic programming techniques in Python for parallelizing serial scientific applications. It is shown that an efficient implementation of the array-related operations is essential for achieving good parallel performance, as for the serial case. Once the array-related operations are efficiently implemented, probably using a mixed-language implementation, good serial and parallel performance become achievable. This is confirmed by a set of numerical experiments. Python is also shown to be well suited for writing high-level parallel programs.

  4. Fast parallel algorithms that compute transitive closure of a fuzzy relation

    Science.gov (United States)

    Kreinovich, Vladik YA.

    1993-01-01

    The notion of a transitive closure of a fuzzy relation is very useful for clustering in pattern recognition, for fuzzy databases, etc. The original algorithm proposed by L. Zadeh (1971) requires the computation time O(n(sup 4)), where n is the number of elements in the relation. In 1974, J. C. Dunn proposed a O(n(sup 2)) algorithm. Since we must compute n(n-1)/2 different values s(a, b) (a not equal to b) that represent the fuzzy relation, and we need at least one computational step to compute each of these values, we cannot compute all of them in less than O(n(sup 2)) steps. So, Dunn's algorithm is in this sense optimal. For small n, it is ok. However, for big n (e.g., for big databases), it is still a lot, so it would be desirable to decrease the computation time (this problem was formulated by J. Bezdek). Since this decrease cannot be done on a sequential computer, the only way to do it is to use a computer with several processors working in parallel. We show that on a parallel computer, transitive closure can be computed in time O((log(sub 2)(n))2).

  5. Effecting a broadcast with an allreduce operation on a parallel computer

    Science.gov (United States)

    Almasi, Gheorghe; Archer, Charles J.; Ratterman, Joseph D.; Smith, Brian E.

    2010-11-02

    A parallel computer comprises a plurality of compute nodes organized into at least one operational group for collective parallel operations. Each compute node is assigned a unique rank and is coupled for data communications through a global combining network. One compute node is assigned to be a logical root. A send buffer and a receive buffer is configured. Each element of a contribution of the logical root in the send buffer is contributed. One or more zeros corresponding to a size of the element are injected. An allreduce operation with a bitwise OR using the element and the injected zeros is performed. And the result for the allreduce operation is determined and stored in each receive buffer.

  6. Study of emissions and fuel economy for parallel hybrid versus conventional vehicles on real world and standard driving cycles

    Directory of Open Access Journals (Sweden)

    Ahmed Al-Samari

    2017-12-01

    Full Text Available Parallel hybrid electric vehicles (PHEVs increasing rapidly in the automobile markets. However, the benefits out of using this kind of vehicles are still concerned a lot of costumers. This work investigated the expected benefits (such as decreasing emissions and increasing fuel economy from using the parallel HEV in comparison to the conventional vehicle model of the real-world and standard driving cycles. The software Autonomie used in this study to simulate the parallel HEV and conventional models on these driving cycles.The results show that the fuel economy (FE can be improved significantly up to 68% on real-world driving cycle, which is represented mostly city activities. However, the FE improvement was limited (10% on the highway driving cycle, and this is expected since the using of brake system was infrequent. Moreover, the emissions from parallel HEV decreased about 40% on the real-world driving cycle, and decreased 11% on the highway driving cycle. Finally, the engine efficiency, improved about 12% on the real-world driving cycle, and about 7% on highway driving cycle. Keywords: Emissions, Hybrid electric vehicles, Fuel economy, Real-world driving cycle

  7. Hybrid epidemics--a case study on computer worm conficker.

    Directory of Open Access Journals (Sweden)

    Changwang Zhang

    Full Text Available Conficker is a computer worm that erupted on the Internet in 2008. It is unique in combining three different spreading strategies: local probing, neighbourhood probing, and global probing. We propose a mathematical model that combines three modes of spreading: local, neighbourhood, and global, to capture the worm's spreading behaviour. The parameters of the model are inferred directly from network data obtained during the first day of the Conficker epidemic. The model is then used to explore the tradeoff between spreading modes in determining the worm's effectiveness. Our results show that the Conficker epidemic is an example of a critically hybrid epidemic, in which the different modes of spreading in isolation do not lead to successful epidemics. Such hybrid spreading strategies may be used beneficially to provide the most effective strategies for promulgating information across a large population. When used maliciously, however, they can present a dangerous challenge to current internet security protocols.

  8. Hybrid epidemics--a case study on computer worm conficker.

    Science.gov (United States)

    Zhang, Changwang; Zhou, Shi; Chain, Benjamin M

    2015-01-01

    Conficker is a computer worm that erupted on the Internet in 2008. It is unique in combining three different spreading strategies: local probing, neighbourhood probing, and global probing. We propose a mathematical model that combines three modes of spreading: local, neighbourhood, and global, to capture the worm's spreading behaviour. The parameters of the model are inferred directly from network data obtained during the first day of the Conficker epidemic. The model is then used to explore the tradeoff between spreading modes in determining the worm's effectiveness. Our results show that the Conficker epidemic is an example of a critically hybrid epidemic, in which the different modes of spreading in isolation do not lead to successful epidemics. Such hybrid spreading strategies may be used beneficially to provide the most effective strategies for promulgating information across a large population. When used maliciously, however, they can present a dangerous challenge to current internet security protocols.

  9. Hybrid Epidemics—A Case Study on Computer Worm Conficker

    Science.gov (United States)

    Zhang, Changwang; Zhou, Shi; Chain, Benjamin M.

    2015-01-01

    Conficker is a computer worm that erupted on the Internet in 2008. It is unique in combining three different spreading strategies: local probing, neighbourhood probing, and global probing. We propose a mathematical model that combines three modes of spreading: local, neighbourhood, and global, to capture the worm’s spreading behaviour. The parameters of the model are inferred directly from network data obtained during the first day of the Conficker epidemic. The model is then used to explore the tradeoff between spreading modes in determining the worm’s effectiveness. Our results show that the Conficker epidemic is an example of a critically hybrid epidemic, in which the different modes of spreading in isolation do not lead to successful epidemics. Such hybrid spreading strategies may be used beneficially to provide the most effective strategies for promulgating information across a large population. When used maliciously, however, they can present a dangerous challenge to current internet security protocols. PMID:25978309

  10. Parallelization of the preconditioned IDR solver for modern multicore computer systems

    Science.gov (United States)

    Bessonov, O. A.; Fedoseyev, A. I.

    2012-10-01

    This paper present the analysis, parallelization and optimization approach for the large sparse matrix solver CNSPACK for modern multicore microprocessors. CNSPACK is an advanced solver successfully used for coupled solution of stiff problems arising in multiphysics applications such as CFD, semiconductor transport, kinetic and quantum problems. It employs iterative IDR algorithm with ILU preconditioning (user chosen ILU preconditioning order). CNSPACK has been successfully used during last decade for solving problems in several application areas, including fluid dynamics and semiconductor device simulation. However, there was a dramatic change in processor architectures and computer system organization in recent years. Due to this, performance criteria and methods have been revisited, together with involving the parallelization of the solver and preconditioner using Open MP environment. Results of the successful implementation for efficient parallelization are presented for the most advances computer system (Intel Core i7-9xx or two-processor Xeon 55xx/56xx).

  11. Fast parallel molecular algorithms for DNA-based computation: factoring integers.

    Science.gov (United States)

    Chang, Weng-Long; Guo, Minyi; Ho, Michael Shan-Hui

    2005-06-01

    The RSA public-key cryptosystem is an algorithm that converts input data to an unrecognizable encryption and converts the unrecognizable data back into its original decryption form. The security of the RSA public-key cryptosystem is based on the difficulty of factoring the product of two large prime numbers. This paper demonstrates to factor the product of two large prime numbers, and is a breakthrough in basic biological operations using a molecular computer. In order to achieve this, we propose three DNA-based algorithms for parallel subtractor, parallel comparator, and parallel modular arithmetic that formally verify our designed molecular solutions for factoring the product of two large prime numbers. Furthermore, this work indicates that the cryptosystems using public-key are perhaps insecure and also presents clear evidence of the ability of molecular computing to perform complicated mathematical operations.

  12. A review of parallel computing for large-scale remote sensing image mosaicking

    OpenAIRE

    Chen, Lajiao; Ma, Yan; Liu, Peng; Wei, Jingbo; Jie, Wei; He, Jijun

    2015-01-01

    Interest in image mosaicking has been spurred by a wide variety of research and management needs. However, for large-scale applications, remote sensing image mosaicking usually requires significant computational capabilities. Several studies have attempted to apply parallel computing to improve image mosaicking algorithms and to speed up calculation process. The state of the art of this field has not yet been summarized, which is, however, essential for a better understanding and for further ...

  13. Parallel Implementation of Triangular Cellular Automata for Computing Two-Dimensional Elastodynamic Response on Arbitrary Domains

    Science.gov (United States)

    Leamy, Michael J.; Springer, Adam C.

    In this research we report parallel implementation of a Cellular Automata-based simulation tool for computing elastodynamic response on complex, two-dimensional domains. Elastodynamic simulation using Cellular Automata (CA) has recently been presented as an alternative, inherently object-oriented technique for accurately and efficiently computing linear and nonlinear wave propagation in arbitrarily-shaped geometries. The local, autonomous nature of the method should lead to straight-forward and efficient parallelization. We address this notion on symmetric multiprocessor (SMP) hardware using a Java-based object-oriented CA code implementing triangular state machines (i.e., automata) and the MPI bindings written in Java (MPJ Express). We use MPJ Express to reconfigure our existing CA code to distribute a domain's automata to cores present on a dual quad-core shared-memory system (eight total processors). We note that this message passing parallelization strategy is directly applicable to computer clustered computing, which will be the focus of follow-on research. Results on the shared memory platform indicate nearly-ideal, linear speed-up. We conclude that the CA-based elastodynamic simulator is easily configured to run in parallel, and yields excellent speed-up on SMP hardware.

  14. The boat hull model : adapting the roofline model to enable performance prediction for parallel computing

    NARCIS (Netherlands)

    Nugteren, C.; Corporaal, H.

    2012-01-01

    Multi-core and many-core were already major trends for the past six years, and are expected to continue for the next decades. With these trends of parallel computing, it becomes increasingly difficult to decide on which architecture to run a given application. In this work, we use an algorithm

  15. The boat hull model : enabling performance prediction for parallel computing prior to code development

    NARCIS (Netherlands)

    Nugteren, C.; Corporaal, H.

    2012-01-01

    Multi-core and many-core were already major trends for the past six years and are expected to continue for the next decade. With these trends of parallel computing, it becomes increasingly difficult to decide on which processor to run a given application, mainly because the programming of these

  16. Parallel Structures of Computer-Assisted Signature Pedagogy: The Case of Integrated Spreadsheets

    Science.gov (United States)

    Abramovich, Sergei; Easton, Jonathan; Hayes, Victoria O.

    2012-01-01

    This article was motivated by the authors' work on a project with a group of 2nd-grade students in a computer lab of a rural school in upstate New York. From this project, one goal of which was to provide a capstone experience for a teacher candidate in teaching application-oriented mathematics with technology, the ideas about parallel structures…

  17. High Performance Parallel Processing Project: Industrial computing initiative. Progress reports for fiscal year 1995

    Energy Technology Data Exchange (ETDEWEB)

    Koniges, A.

    1996-02-09

    This project is a package of 11 individual CRADA`s plus hardware. This innovative project established a three-year multi-party collaboration that is significantly accelerating the availability of commercial massively parallel processing computing software technology to U.S. government, academic, and industrial end-users. This report contains individual presentations from nine principal investigators along with overall program information.

  18. Alleviating Search Uncertainty through Concept Associations: Automatic Indexing, Co-Occurrence Analysis, and Parallel Computing.

    Science.gov (United States)

    Chen, Hsinchun; Martinez, Joanne; Kirchhoff, Amy; Ng, Tobun D.; Schatz, Bruce R.

    1998-01-01

    Grounded on object filtering, automatic indexing, and co-occurrence analysis, an experiment was performed using a parallel supercomputer to analyze over 400,000 abstracts in an INSPEC computer engineering collection. A user evaluation revealed that system-generated thesauri were better than the human-generated INSPEC subject thesaurus in concept…

  19. Use of massively parallel computing to improve modelling accuracy within the nuclear sector

    Directory of Open Access Journals (Sweden)

    L M Evans

    2016-06-01

    This work presents recent advancements in three techniques: Uncertainty quantification (UQ; Cellular automata finite element (CAFE; Image based finite element methods (IBFEM. Case studies are presented demonstrating their suitability for use in nuclear engineering made possible by advancements in parallel computing hardware that is projected to be available for industry within the next decade costing of the order of $100k.

  20. Fast parallel algorithm for three-dimensional distance-driven model in iterative computed tomography reconstruction

    International Nuclear Information System (INIS)

    Chen Jian-Lin; Li Lei; Wang Lin-Yuan; Cai Ai-Long; Xi Xiao-Qi; Zhang Han-Ming; Li Jian-Xin; Yan Bin

    2015-01-01

    The projection matrix model is used to describe the physical relationship between reconstructed object and projection. Such a model has a strong influence on projection and backprojection, two vital operations in iterative computed tomographic reconstruction. The distance-driven model (DDM) is a state-of-the-art technology that simulates forward and back projections. This model has a low computational complexity and a relatively high spatial resolution; however, it includes only a few methods in a parallel operation with a matched model scheme. This study introduces a fast and parallelizable algorithm to improve the traditional DDM for computing the parallel projection and backprojection operations. Our proposed model has been implemented on a GPU (graphic processing unit) platform and has achieved satisfactory computational efficiency with no approximation. The runtime for the projection and backprojection operations with our model is approximately 4.5 s and 10.5 s per loop, respectively, with an image size of 256×256×256 and 360 projections with a size of 512×512. We compare several general algorithms that have been proposed for maximizing GPU efficiency by using the unmatched projection/backprojection models in a parallel computation. The imaging resolution is not sacrificed and remains accurate during computed tomographic reconstruction. (paper)

  1. Fast Simulation of Large-Scale Floods Based on GPU Parallel Computing

    Directory of Open Access Journals (Sweden)

    Qiang Liu

    2018-05-01

    Full Text Available Computing speed is a significant issue of large-scale flood simulations for real-time response to disaster prevention and mitigation. Even today, most of the large-scale flood simulations are generally run on supercomputers due to the massive amounts of data and computations necessary. In this work, a two-dimensional shallow water model based on an unstructured Godunov-type finite volume scheme was proposed for flood simulation. To realize a fast simulation of large-scale floods on a personal computer, a Graphics Processing Unit (GPU-based, high-performance computing method using the OpenACC application was adopted to parallelize the shallow water model. An unstructured data management method was presented to control the data transportation between the GPU and CPU (Central Processing Unit with minimum overhead, and then both computation and data were offloaded from the CPU to the GPU, which exploited the computational capability of the GPU as much as possible. The parallel model was validated using various benchmarks and real-world case studies. The results demonstrate that speed-ups of up to one order of magnitude can be achieved in comparison with the serial model. The proposed parallel model provides a fast and reliable tool with which to quickly assess flood hazards in large-scale areas and, thus, has a bright application prospect for dynamic inundation risk identification and disaster assessment.

  2. A Hybrid Neural Network Approach for Kinematic Modeling of a Novel 6-UPS Parallel Human-Like Mastication Robot

    Directory of Open Access Journals (Sweden)

    Hadi Kalani

    2016-04-01

    Full Text Available Introduction we aimed to introduce a 6-universal-prismatic-spherical (UPS parallel mechanism for the human jaw motion and theoretically evaluate its kinematic problem. We proposed a strategy to provide a fast and accurate solution to the kinematic problem. The proposed strategy could accelerate the process of solution-finding for the direct kinematic problem by reducing the number of required iterations in order to reach the desired accuracy level. Materials and Methods To overcome the direct kinematic problem, an artificial neural network and third-order Newton-Raphson algorithm were combined to provide an improved hybrid method. In this method, approximate solution was presented for the direct kinematic problem by the neural network. This solution could be considered as the initial guess for the third-order Newton-Raphson algorithm to provide an answer with the desired level of accuracy. Results The results showed that the proposed combination could help find a approximate solution and reduce the execution time for the direct kinematic problem, The results showed that muscular actuations showed periodic behaviors, and the maximum length variation of temporalis muscle was larger than that of masseter and pterygoid muscles. By reducing the processing time for solving the direct kinematic problem, more time could be devoted to control calculations.. In this method, for relatively high levels of accuracy, the number of iterations and computational time decreased by 90% and 34%, respectively, compared to the conventional Newton method. Conclusion The present analysis could allow researchers to characterize and study the mastication process by specifying different chewing patterns (e.g., muscle displacements.

  3. In-cylinder diesel spray combustion simulations using parallel computation: A performance benchmarking study

    International Nuclear Information System (INIS)

    Pang, Kar Mun; Ng, Hoon Kiat; Gan, Suyin

    2012-01-01

    Highlights: ► A performance benchmarking exercise is conducted for diesel combustion simulations. ► The reduced chemical mechanism shows its advantages over base and skeletal models. ► High efficiency and great reduction of CPU runtime are achieved through 4-node solver. ► Increasing ISAT memory from 0.1 to 2 GB reduces the CPU runtime by almost 35%. ► Combustion and soot processes are predicted well with minimal computational cost. - Abstract: In the present study, in-cylinder diesel combustion simulation was performed with parallel processing on an Intel Xeon Quad-Core platform to allow both fluid dynamics and chemical kinetics of the surrogate diesel fuel model to be solved simultaneously on multiple processors. Here, Cartesian Z-Coordinate was selected as the most appropriate partitioning algorithm since it computationally bisects the domain such that the dynamic load associated with fuel particle tracking was evenly distributed during parallel computations. Other variables examined included number of compute nodes, chemistry sizes and in situ adaptive tabulation (ISAT) parameters. Based on the performance benchmarking test conducted, parallel configuration of 4-compute node was found to reduce the computational runtime most efficiently whereby a parallel efficiency of up to 75.4% was achieved. The simulation results also indicated that accuracy level was insensitive to the number of partitions or the partitioning algorithms. The effect of reducing the number of species on computational runtime was observed to be more significant than reducing the number of reactions. Besides, the study showed that an increase in the ISAT maximum storage of up to 2 GB reduced the computational runtime by 50%. Also, the ISAT error tolerance of 10 −3 was chosen to strike a balance between results accuracy and computational runtime. The optimised parameters in parallel processing and ISAT, as well as the use of the in-house reduced chemistry model allowed accurate

  4. Some computational challenges of developing efficient parallel algorithms for data-dependent computations in thermal-hydraulics supercomputer applications

    International Nuclear Information System (INIS)

    Woodruff, S.B.

    1992-01-01

    The Transient Reactor Analysis Code (TRAC), which features a two- fluid treatment of thermal-hydraulics, is designed to model transients in water reactors and related facilities. One of the major computational costs associated with TRAC and similar codes is calculating constitutive coefficients. Although the formulations for these coefficients are local the costs are flow-regime- or data-dependent; i.e., the computations needed for a given spatial node often vary widely as a function of time. Consequently, poor load balancing will degrade efficiency on either vector or data parallel architectures when the data are organized according to spatial location. Unfortunately, a general automatic solution to the load-balancing problem associated with data-dependent computations is not yet available for massively parallel architectures. This document discusses why developers algorithms, such as a neural net representation, that do not exhibit algorithms, such as a neural net representation, that do not exhibit load-balancing problems

  5. Autonomic Management of Application Workflows on Hybrid Computing Infrastructure

    Directory of Open Access Journals (Sweden)

    Hyunjoo Kim

    2011-01-01

    Full Text Available In this paper, we present a programming and runtime framework that enables the autonomic management of complex application workflows on hybrid computing infrastructures. The framework is designed to address system and application heterogeneity and dynamics to ensure that application objectives and constraints are satisfied. The need for such autonomic system and application management is becoming critical as computing infrastructures become increasingly heterogeneous, integrating different classes of resources from high-end HPC systems to commodity clusters and clouds. For example, the framework presented in this paper can be used to provision the appropriate mix of resources based on application requirements and constraints. The framework also monitors the system/application state and adapts the application and/or resources to respond to changing requirements or environment. To demonstrate the operation of the framework and to evaluate its ability, we employ a workflow used to characterize an oil reservoir executing on a hybrid infrastructure composed of TeraGrid nodes and Amazon EC2 instances of various types. Specifically, we show how different applications objectives such as acceleration, conservation and resilience can be effectively achieved while satisfying deadline and budget constraints, using an appropriate mix of dynamically provisioned resources. Our evaluations also demonstrate that public clouds can be used to complement and reinforce the scheduling and usage of traditional high performance computing infrastructure.

  6. Computational hybrid anthropometric paediatric phantom library for internal radiation dosimetry

    Science.gov (United States)

    Xie, Tianwu; Kuster, Niels; Zaidi, Habib

    2017-04-01

    Hybrid computational phantoms combine voxel-based and simplified equation-based modelling approaches to provide unique advantages and more realism for the construction of anthropomorphic models. In this work, a methodology and C++ code are developed to generate hybrid computational phantoms covering statistical distributions of body morphometry in the paediatric population. The paediatric phantoms of the Virtual Population Series (IT’IS Foundation, Switzerland) were modified to match target anthropometric parameters, including body mass, body length, standing height and sitting height/stature ratio, determined from reference databases of the National Centre for Health Statistics and the National Health and Nutrition Examination Survey. The phantoms were selected as representative anchor phantoms for the newborn, 1, 2, 5, 10 and 15 years-old children, and were subsequently remodelled to create 1100 female and male phantoms with 10th, 25th, 50th, 75th and 90th body morphometries. Evaluation was performed qualitatively using 3D visualization and quantitatively by analysing internal organ masses. Overall, the newly generated phantoms appear very reasonable and representative of the main characteristics of the paediatric population at various ages and for different genders, body sizes and sitting stature ratios. The mass of internal organs increases with height and body mass. The comparison of organ masses of the heart, kidney, liver, lung and spleen with published autopsy and ICRP reference data for children demonstrated that they follow the same trend when correlated with age. The constructed hybrid computational phantom library opens up the prospect of comprehensive radiation dosimetry calculations and risk assessment for the paediatric population of different age groups and diverse anthropometric parameters.

  7. Just-in-Time Compilation-Inspired Methodology for Parallelization of Compute Intensive Java Code

    Directory of Open Access Journals (Sweden)

    GHULAM MUSTAFA

    2017-01-01

    Full Text Available Compute intensive programs generally consume significant fraction of execution time in a small amount of repetitive code. Such repetitive code is commonly known as hotspot code. We observed that compute intensive hotspots often possess exploitable loop level parallelism. A JIT (Just-in-Time compiler profiles a running program to identify its hotspots. Hotspots are then translated into native code, for efficient execution. Using similar approach, we propose a methodology to identify hotspots and exploit their parallelization potential on multicore systems. Proposed methodology selects and parallelizes each DOALL loop that is either contained in a hotspot method or calls a hotspot method. The methodology could be integrated in front-end of a JIT compiler to parallelize sequential code, just before native translation. However, compilation to native code is out of scope of this work. As a case study, we analyze eighteen JGF (Java Grande Forum benchmarks to determine parallelization potential of hotspots. Eight benchmarks demonstrate a speedup of up to 7.6x on an 8-core system

  8. Just-in-time compilation-inspired methodology for parallelization of compute intensive java code

    International Nuclear Information System (INIS)

    Mustafa, G.; Ghani, M.U.

    2017-01-01

    Compute intensive programs generally consume significant fraction of execution time in a small amount of repetitive code. Such repetitive code is commonly known as hotspot code. We observed that compute intensive hotspots often possess exploitable loop level parallelism. A JIT (Just-in-Time) compiler profiles a running program to identify its hotspots. Hotspots are then translated into native code, for efficient execution. Using similar approach, we propose a methodology to identify hotspots and exploit their parallelization potential on multicore systems. Proposed methodology selects and parallelizes each DOALL loop that is either contained in a hotspot method or calls a hotspot method. The methodology could be integrated in front-end of a JIT compiler to parallelize sequential code, just before native translation. However, compilation to native code is out of scope of this work. As a case study, we analyze eighteen JGF (Java Grande Forum) benchmarks to determine parallelization potential of hotspots. Eight benchmarks demonstrate a speedup of up to 7.6x on an 8-core system. (author)

  9. A parallel simulated annealing algorithm for standard cell placement on a hypercube computer

    Science.gov (United States)

    Jones, Mark Howard

    1987-01-01

    A parallel version of a simulated annealing algorithm is presented which is targeted to run on a hypercube computer. A strategy for mapping the cells in a two dimensional area of a chip onto processors in an n-dimensional hypercube is proposed such that both small and large distance moves can be applied. Two types of moves are allowed: cell exchanges and cell displacements. The computation of the cost function in parallel among all the processors in the hypercube is described along with a distributed data structure that needs to be stored in the hypercube to support parallel cost evaluation. A novel tree broadcasting strategy is used extensively in the algorithm for updating cell locations in the parallel environment. Studies on the performance of the algorithm on example industrial circuits show that it is faster and gives better final placement results than the uniprocessor simulated annealing algorithms. An improved uniprocessor algorithm is proposed which is based on the improved results obtained from parallelization of the simulated annealing algorithm.

  10. OPTHYLIC: An Optimised Tool for Hybrid Limits Computation

    Science.gov (United States)

    Busato, Emmanuel; Calvet, David; Theveneaux-Pelzer, Timothée

    2018-05-01

    A software tool, computing observed and expected upper limits on Poissonian process rates using a hybrid frequentist-Bayesian CLs method, is presented. This tool can be used for simple counting experiments where only signal, background and observed yields are provided or for multi-bin experiments where binned distributions of discriminating variables are provided. It allows the combination of several channels and takes into account statistical and systematic uncertainties, as well as correlations of systematic uncertainties between channels. It has been validated against other software tools and analytical calculations, for several realistic cases.

  11. Architecture and VHDL behavioural validation of a parallel processor dedicated to computer vision

    International Nuclear Information System (INIS)

    Collette, Thierry

    1992-01-01

    Speeding up image processing is mainly obtained using parallel computers; SIMD processors (single instruction stream, multiple data stream) have been developed, and have proven highly efficient regarding low-level image processing operations. Nevertheless, their performances drop for most intermediate of high level operations, mainly when random data reorganisations in processor memories are involved. The aim of this thesis was to extend the SIMD computer capabilities to allow it to perform more efficiently at the image processing intermediate level. The study of some representative algorithms of this class, points out the limits of this computer. Nevertheless, these limits can be erased by architectural modifications. This leads us to propose SYMPATIX, a new SIMD parallel computer. To valid its new concept, a behavioural model written in VHDL - Hardware Description Language - has been elaborated. With this model, the new computer performances have been estimated running image processing algorithm simulations. VHDL modeling approach allows to perform the system top down electronic design giving an easy coupling between system architectural modifications and their electronic cost. The obtained results show SYMPATIX to be an efficient computer for low and intermediate level image processing. It can be connected to a high level computer, opening up the development of new computer vision applications. This thesis also presents, a top down design method, based on the VHDL, intended for electronic system architects. (author) [fr

  12. A learnable parallel processing architecture towards unity of memory and computing.

    Science.gov (United States)

    Li, H; Gao, B; Chen, Z; Zhao, Y; Huang, P; Ye, H; Liu, L; Liu, X; Kang, J

    2015-08-14

    Developing energy-efficient parallel information processing systems beyond von Neumann architecture is a long-standing goal of modern information technologies. The widely used von Neumann computer architecture separates memory and computing units, which leads to energy-hungry data movement when computers work. In order to meet the need of efficient information processing for the data-driven applications such as big data and Internet of Things, an energy-efficient processing architecture beyond von Neumann is critical for the information society. Here we show a non-von Neumann architecture built of resistive switching (RS) devices named "iMemComp", where memory and logic are unified with single-type devices. Leveraging nonvolatile nature and structural parallelism of crossbar RS arrays, we have equipped "iMemComp" with capabilities of computing in parallel and learning user-defined logic functions for large-scale information processing tasks. Such architecture eliminates the energy-hungry data movement in von Neumann computers. Compared with contemporary silicon technology, adder circuits based on "iMemComp" can improve the speed by 76.8% and the power dissipation by 60.3%, together with a 700 times aggressive reduction in the circuit area.

  13. A conceptual design of multidisciplinary-integrated C.F.D. simulation on parallel computers

    International Nuclear Information System (INIS)

    Onishi, Ryoichi; Ohta, Takashi; Kimura, Toshiya.

    1996-11-01

    A design of a parallel aeroelastic code for aircraft integrated simulations is conducted. The method for integrating aerodynamics and structural dynamics software on parallel computers is devised by using the Euler/Navier-Stokes equations coupled with wing-box finite element structures. A synthesis of modern aircraft requires the optimizations of aerodynamics, structures, controls, operabilities, or other design disciplines, and the R and D efforts to implement Multidisciplinary Design Optimization environments using high performance computers are made especially among the U.S. aerospace industries. This report describes a Multiple Program Multiple Data (MPMD) parallelization of aerodynamics and structural dynamics codes with a dynamic deformation grid. A three-dimensional computation of a flowfield with dynamic deformation caused by a structural deformation is performed, and a pressure data calculated is used for a computation of the structural deformation which is input again to a fluid dynamics code. This process is repeated exchanging the computed data of pressures and deformations between flowfield grids and structural elements. It enables to simulate the structure movements which take into account of the interaction of fluid and structure. The conceptual design for achieving the aforementioned various functions is reported. Also the future extensions to incorporate control systems, which enable to simulate a realistic aircraft configuration to be a major tool for Aircraft Integrated Simulation, are investigated. (author)

  14. A learnable parallel processing architecture towards unity of memory and computing

    Science.gov (United States)

    Li, H.; Gao, B.; Chen, Z.; Zhao, Y.; Huang, P.; Ye, H.; Liu, L.; Liu, X.; Kang, J.

    2015-08-01

    Developing energy-efficient parallel information processing systems beyond von Neumann architecture is a long-standing goal of modern information technologies. The widely used von Neumann computer architecture separates memory and computing units, which leads to energy-hungry data movement when computers work. In order to meet the need of efficient information processing for the data-driven applications such as big data and Internet of Things, an energy-efficient processing architecture beyond von Neumann is critical for the information society. Here we show a non-von Neumann architecture built of resistive switching (RS) devices named “iMemComp”, where memory and logic are unified with single-type devices. Leveraging nonvolatile nature and structural parallelism of crossbar RS arrays, we have equipped “iMemComp” with capabilities of computing in parallel and learning user-defined logic functions for large-scale information processing tasks. Such architecture eliminates the energy-hungry data movement in von Neumann computers. Compared with contemporary silicon technology, adder circuits based on “iMemComp” can improve the speed by 76.8% and the power dissipation by 60.3%, together with a 700 times aggressive reduction in the circuit area.

  15. Parallel computation of aerodynamic influence coefficients for aeroelastic analysis on a transputer network

    Science.gov (United States)

    Janetzke, D. C.; Murthy, D. V.

    1991-01-01

    Aeroelastic analysis is mult-disciplinary and computationally expensive. Hence, it can greatly benefit from parallel processing. As part of an effort to develop an aeroelastic analysis capability on a distributed-memory transputer network, a parallel algorithm for the computation of aerodynamic influence coefficients is implemented on a network of 32 transputers. The aerodynamic influence coefficients are calculated using a three-dimensional unsteady aerodynamic model and a panel discretization. Efficiencies up to 85 percent are demonstrated using 32 processors. The effects of subtask ordering, problem size and network topology are presented. A comparison to results on a shared-memory computer indicates that higher speedup is achieved on the distributed-memory system.

  16. Studies of electron collisions with polyatomic molecules using distributed-memory parallel computers

    International Nuclear Information System (INIS)

    Winstead, C.; Hipes, P.G.; Lima, M.A.P.; McKoy, V.

    1991-01-01

    Elastic electron scattering cross sections from 5--30 eV are reported for the molecules C 2 H 4 , C 2 H 6 , C 3 H 8 , Si 2 H 6 , and GeH 4 , obtained using an implementation of the Schwinger multichannel method for distributed-memory parallel computer architectures. These results, obtained within the static-exchange approximation, are in generally good agreement with the available experimental data. These calculations demonstrate the potential of highly parallel computation in the study of collisions between low-energy electrons and polyatomic gases. The computational methodology discussed is also directly applicable to the calculation of elastic cross sections at higher levels of approximation (target polarization) and of electronic excitation cross sections

  17. Animated computer graphics models of space and earth sciences data generated via the massively parallel processor

    Science.gov (United States)

    Treinish, Lloyd A.; Gough, Michael L.; Wildenhain, W. David

    1987-01-01

    The capability was developed of rapidly producing visual representations of large, complex, multi-dimensional space and earth sciences data sets via the implementation of computer graphics modeling techniques on the Massively Parallel Processor (MPP) by employing techniques recently developed for typically non-scientific applications. Such capabilities can provide a new and valuable tool for the understanding of complex scientific data, and a new application of parallel computing via the MPP. A prototype system with such capabilities was developed and integrated into the National Space Science Data Center's (NSSDC) Pilot Climate Data System (PCDS) data-independent environment for computer graphics data display to provide easy access to users. While developing these capabilities, several problems had to be solved independently of the actual use of the MPP, all of which are outlined.

  18. A Review of Hybrid Brain-Computer Interface Systems

    Directory of Open Access Journals (Sweden)

    Setare Amiri

    2013-01-01

    Full Text Available Increasing number of research activities and different types of studies in brain-computer interface (BCI systems show potential in this young research area. Research teams have studied features of different data acquisition techniques, brain activity patterns, feature extraction techniques, methods of classifications, and many other aspects of a BCI system. However, conventional BCIs have not become totally applicable, due to the lack of high accuracy, reliability, low information transfer rate, and user acceptability. A new approach to create a more reliable BCI that takes advantage of each system is to combine two or more BCI systems with different brain activity patterns or different input signal sources. This type of BCI, called hybrid BCI, may reduce disadvantages of each conventional BCI system. In addition, hybrid BCIs may create more applications and possibly increase the accuracy and the information transfer rate. However, the type of BCIs and their combinations should be considered carefully. In this paper, after introducing several types of BCIs and their combinations, we review and discuss hybrid BCIs, different possibilities to combine them, and their advantages and disadvantages.

  19. 3-D electromagnetic plasma particle simulations on the Intel Delta parallel computer

    International Nuclear Information System (INIS)

    Wang, J.; Liewer, P.C.

    1994-01-01

    A three-dimensional electromagnetic PIC code has been developed on the 512 node Intel Touchstone Delta MIMD parallel computer. This code is based on the General Concurrent PIC algorithm which uses a domain decomposition to divide the computation among the processors. The 3D simulation domain can be partitioned into 1-, 2-, or 3-dimensional sub-domains. Particles must be exchanged between processors as they move among the subdomains. The Intel Delta allows one to use this code for very-large-scale simulations (i.e. over 10 8 particles and 10 6 grid cells). The parallel efficiency of this code is measured, and the overall code performance on the Delta is compared with that on Cray supercomputers. It is shown that their code runs with a high parallel efficiency of ≥ 95% for large size problems. The particle push time achieved is 115 nsecs/particle/time step for 162 million particles on 512 nodes. Comparing with the performance on a single processor Cray C90, this represents a factor of 58 speedup. The code uses a finite-difference leap frog method for field solve which is significantly more efficient than fast fourier transforms on parallel computers. The performance of this code on the 128 node Cray T3D will also be discussed

  20. GENESIS: a hybrid-parallel and multi-scale molecular dynamics simulator with enhanced sampling algorithms for biomolecular and cellular simulations.

    Science.gov (United States)

    Jung, Jaewoon; Mori, Takaharu; Kobayashi, Chigusa; Matsunaga, Yasuhiro; Yoda, Takao; Feig, Michael; Sugita, Yuji

    2015-07-01

    GENESIS (Generalized-Ensemble Simulation System) is a new software package for molecular dynamics (MD) simulations of macromolecules. It has two MD simulators, called ATDYN and SPDYN. ATDYN is parallelized based on an atomic decomposition algorithm for the simulations of all-atom force-field models as well as coarse-grained Go-like models. SPDYN is highly parallelized based on a domain decomposition scheme, allowing large-scale MD simulations on supercomputers. Hybrid schemes combining OpenMP and MPI are used in both simulators to target modern multicore computer architectures. Key advantages of GENESIS are (1) the highly parallel performance of SPDYN for very large biological systems consisting of more than one million atoms and (2) the availability of various REMD algorithms (T-REMD, REUS, multi-dimensional REMD for both all-atom and Go-like models under the NVT, NPT, NPAT, and NPγT ensembles). The former is achieved by a combination of the midpoint cell method and the efficient three-dimensional Fast Fourier Transform algorithm, where the domain decomposition space is shared in real-space and reciprocal-space calculations. Other features in SPDYN, such as avoiding concurrent memory access, reducing communication times, and usage of parallel input/output files, also contribute to the performance. We show the REMD simulation results of a mixed (POPC/DMPC) lipid bilayer as a real application using GENESIS. GENESIS is released as free software under the GPLv2 licence and can be easily modified for the development of new algorithms and molecular models. WIREs Comput Mol Sci 2015, 5:310-323. doi: 10.1002/wcms.1220.