Simulating Growth Kinetics in a Data-Parallel 3D Lattice Photobioreactor
Directory of Open Access Journals (Sweden)
A. V. Husselmann
2013-01-01
Full Text Available Though there have been many attempts to address growth kinetics in algal photobioreactors, surprisingly little have attempted an agent-based modelling (ABM approach. ABM has been heralded as a method of practical scientific inquiry into systems of a complex nature and has been applied liberally in a range of disciplines including ecology, physics, social science, and microbiology with special emphasis on pathogenic bacterial growth. We bring together agent-based simulation with the Photosynthetic Factory (PSF model, as well as certain key bioreactor characteristics in a visual 3D, parallel computing fashion. Despite being at small scale, the simulation gives excellent visual cues on the dynamics of such a reactor, and we further investigate the model in a variety of ways. Our parallel implementation on graphical processing units of the simulation provides key advantages, which we also briefly discuss. We also provide some performance data, along with particular effort in visualisation, using volumetric and isosurface rendering.
International Nuclear Information System (INIS)
Schleier, W.; Besold, G.; Heinz, K.
1992-01-01
The authors study the applicability of parallelized/vectorized Monte Carlo (MC) algorithms to the simulation of domain growth in two-dimensional lattice gas models undergoing an ordering process after a rapid quench below an order-disorder transition temperature. As examples they consider models with 2 x 1 and c(2 x 2) equilibrium superstructures on the square and rectangular lattices, respectively. They also study the case of phase separation ('1 x 1' islands) on the square lattice. A generalized parallel checkerboard algorithm for Kawasaki dynamics is shown to give rise to artificial spatial correlations in all three models. However, only if superstructure domains evolve do these correlations modify the kinetics by influencing the nucleation process and result in a reduced growth exponent compared to the value from the conventional heat bath algorithm with random single-site updates. In order to overcome these artificial modifications, two MC algorithms with a reduced degree of parallelism ('hybrid' and 'mask' algorithms, respectively) are presented and applied. As the results indicate, these algorithms are suitable for the simulation of superstructure domain growth on parallel/vector computers. 60 refs., 10 figs., 1 tab
Parallel discrete event simulation
Overeinder, B.J.; Hertzberger, L.O.; Sloot, P.M.A.; Withagen, W.J.
1991-01-01
In simulating applications for execution on specific computing systems, the simulation performance figures must be known in a short period of time. One basic approach to the problem of reducing the required simulation time is the exploitation of parallelism. However, in parallelizing the simulation
Parallel Atomistic Simulations
Energy Technology Data Exchange (ETDEWEB)
HEFFELFINGER,GRANT S.
2000-01-18
Algorithms developed to enable the use of atomistic molecular simulation methods with parallel computers are reviewed. Methods appropriate for bonded as well as non-bonded (and charged) interactions are included. While strategies for obtaining parallel molecular simulations have been developed for the full variety of atomistic simulation methods, molecular dynamics and Monte Carlo have received the most attention. Three main types of parallel molecular dynamics simulations have been developed, the replicated data decomposition, the spatial decomposition, and the force decomposition. For Monte Carlo simulations, parallel algorithms have been developed which can be divided into two categories, those which require a modified Markov chain and those which do not. Parallel algorithms developed for other simulation methods such as Gibbs ensemble Monte Carlo, grand canonical molecular dynamics, and Monte Carlo methods for protein structure determination are also reviewed and issues such as how to measure parallel efficiency, especially in the case of parallel Monte Carlo algorithms with modified Markov chains are discussed.
Parallel reservoir simulator computations
International Nuclear Information System (INIS)
Hemanth-Kumar, K.; Young, L.C.
1995-01-01
The adaptation of a reservoir simulator for parallel computations is described. The simulator was originally designed for vector processors. It performs approximately 99% of its calculations in vector/parallel mode and relative to scalar calculations it achieves speedups of 65 and 81 for black oil and EOS simulations, respectively on the CRAY C-90
Massively parallel multicanonical simulations
Gross, Jonathan; Zierenberg, Johannes; Weigel, Martin; Janke, Wolfhard
2018-03-01
Generalized-ensemble Monte Carlo simulations such as the multicanonical method and similar techniques are among the most efficient approaches for simulations of systems undergoing discontinuous phase transitions or with rugged free-energy landscapes. As Markov chain methods, they are inherently serial computationally. It was demonstrated recently, however, that a combination of independent simulations that communicate weight updates at variable intervals allows for the efficient utilization of parallel computational resources for multicanonical simulations. Implementing this approach for the many-thread architecture provided by current generations of graphics processing units (GPUs), we show how it can be efficiently employed with of the order of 104 parallel walkers and beyond, thus constituting a versatile tool for Monte Carlo simulations in the era of massively parallel computing. We provide the fully documented source code for the approach applied to the paradigmatic example of the two-dimensional Ising model as starting point and reference for practitioners in the field.
Xyce parallel electronic simulator.
Energy Technology Data Exchange (ETDEWEB)
Keiter, Eric R; Mei, Ting; Russo, Thomas V.; Rankin, Eric Lamont; Schiek, Richard Louis; Thornquist, Heidi K.; Fixel, Deborah A.; Coffey, Todd S; Pawlowski, Roger P; Santarelli, Keith R.
2010-05-01
This document is a reference guide to the Xyce Parallel Electronic Simulator, and is a companion document to the Xyce Users Guide. The focus of this document is (to the extent possible) exhaustively list device parameters, solver options, parser options, and other usage details of Xyce. This document is not intended to be a tutorial. Users who are new to circuit simulation are better served by the Xyce Users Guide.
Massively parallel quantum computer simulator
De Raedt, K.; Michielsen, K.; De Raedt, H.; Trieu, B.; Arnold, G.; Richter, M.; Lippert, Th.; Watanabe, H.; Ito, N.
2007-01-01
We describe portable software to simulate universal quantum computers on massive parallel Computers. We illustrate the use of the simulation software by running various quantum algorithms on different computer architectures, such as a IBM BlueGene/L, a IBM Regatta p690+, a Hitachi SR11000/J1, a Cray
Parallel Monte Carlo simulation of aerosol dynamics
Zhou, K.
2014-01-01
A highly efficient Monte Carlo (MC) algorithm is developed for the numerical simulation of aerosol dynamics, that is, nucleation, surface growth, and coagulation. Nucleation and surface growth are handled with deterministic means, while coagulation is simulated with a stochastic method (Marcus-Lushnikov stochastic process). Operator splitting techniques are used to synthesize the deterministic and stochastic parts in the algorithm. The algorithm is parallelized using the Message Passing Interface (MPI). The parallel computing efficiency is investigated through numerical examples. Near 60% parallel efficiency is achieved for the maximum testing case with 3.7 million MC particles running on 93 parallel computing nodes. The algorithm is verified through simulating various testing cases and comparing the simulation results with available analytical and/or other numerical solutions. Generally, it is found that only small number (hundreds or thousands) of MC particles is necessary to accurately predict the aerosol particle number density, volume fraction, and so forth, that is, low order moments of the Particle Size Distribution (PSD) function. Accurately predicting the high order moments of the PSD needs to dramatically increase the number of MC particles. 2014 Kun Zhou et al.
Xyce parallel electronic simulator design.
Energy Technology Data Exchange (ETDEWEB)
Thornquist, Heidi K.; Rankin, Eric Lamont; Mei, Ting; Schiek, Richard Louis; Keiter, Eric Richard; Russo, Thomas V.
2010-09-01
This document is the Xyce Circuit Simulator developer guide. Xyce has been designed from the 'ground up' to be a SPICE-compatible, distributed memory parallel circuit simulator. While it is in many respects a research code, Xyce is intended to be a production simulator. As such, having software quality engineering (SQE) procedures in place to insure a high level of code quality and robustness are essential. Version control, issue tracking customer support, C++ style guildlines and the Xyce release process are all described. The Xyce Parallel Electronic Simulator has been under development at Sandia since 1999. Historically, Xyce has mostly been funded by ASC, the original focus of Xyce development has primarily been related to circuits for nuclear weapons. However, this has not been the only focus and it is expected that the project will diversify. Like many ASC projects, Xyce is a group development effort, which involves a number of researchers, engineers, scientists, mathmaticians and computer scientists. In addition to diversity of background, it is to be expected on long term projects for there to be a certain amount of staff turnover, as people move on to different projects. As a result, it is very important that the project maintain high software quality standards. The point of this document is to formally document a number of the software quality practices followed by the Xyce team in one place. Also, it is hoped that this document will be a good source of information for new developers.
Synchronization Techniques in Parallel Discrete Event Simulation
Lindén, Jonatan
2018-01-01
Discrete event simulation is an important tool for evaluating system models in many fields of science and engineering. To improve the performance of large-scale discrete event simulations, several techniques to parallelize discrete event simulation have been developed. In parallel discrete event simulation, the work of a single discrete event simulation is distributed over multiple processing elements. A key challenge in parallel discrete event simulation is to ensure that causally dependent ...
Synchronization Of Parallel Discrete Event Simulations
Steinman, Jeffrey S.
1992-01-01
Adaptive, parallel, discrete-event-simulation-synchronization algorithm, Breathing Time Buckets, developed in Synchronous Parallel Environment for Emulation and Discrete Event Simulation (SPEEDES) operating system. Algorithm allows parallel simulations to process events optimistically in fluctuating time cycles that naturally adapt while simulation in progress. Combines best of optimistic and conservative synchronization strategies while avoiding major disadvantages. Algorithm processes events optimistically in time cycles adapting while simulation in progress. Well suited for modeling communication networks, for large-scale war games, for simulated flights of aircraft, for simulations of computer equipment, for mathematical modeling, for interactive engineering simulations, and for depictions of flows of information.
Parallel Computing for Brain Simulation.
Pastur-Romay, L A; Porto-Pazos, A B; Cedron, F; Pazos, A
2017-01-01
The human brain is the most complex system in the known universe, it is therefore one of the greatest mysteries. It provides human beings with extraordinary abilities. However, until now it has not been understood yet how and why most of these abilities are produced. For decades, researchers have been trying to make computers reproduce these abilities, focusing on both understanding the nervous system and, on processing data in a more efficient way than before. Their aim is to make computers process information similarly to the brain. Important technological developments and vast multidisciplinary projects have allowed creating the first simulation with a number of neurons similar to that of a human brain. This paper presents an up-to-date review about the main research projects that are trying to simulate and/or emulate the human brain. They employ different types of computational models using parallel computing: digital models, analog models and hybrid models. This review includes the current applications of these works, as well as future trends. It is focused on various works that look for advanced progress in Neuroscience and still others which seek new discoveries in Computer Science (neuromorphic hardware, machine learning techniques). Their most outstanding characteristics are summarized and the latest advances and future plans are presented. In addition, this review points out the importance of considering not only neurons: Computational models of the brain should also include glial cells, given the proven importance of astrocytes in information processing. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.
Parallelization of quantum molecular dynamics simulation code
International Nuclear Information System (INIS)
Kato, Kaori; Kunugi, Tomoaki; Shibahara, Masahiko; Kotake, Susumu
1998-02-01
A quantum molecular dynamics simulation code has been developed for the analysis of the thermalization of photon energies in the molecule or materials in Kansai Research Establishment. The simulation code is parallelized for both Scalar massively parallel computer (Intel Paragon XP/S75) and Vector parallel computer (Fujitsu VPP300/12). Scalable speed-up has been obtained with a distribution to processor units by division of particle group in both parallel computers. As a result of distribution to processor units not only by particle group but also by the particles calculation that is constructed with fine calculations, highly parallelization performance is achieved in Intel Paragon XP/S75. (author)
Building a parallel file system simulator
International Nuclear Information System (INIS)
Molina-Estolano, E; Maltzahn, C; Brandt, S A; Bent, J
2009-01-01
Parallel file systems are gaining in popularity in high-end computing centers as well as commercial data centers. High-end computing systems are expected to scale exponentially and to pose new challenges to their storage scalability in terms of cost and power. To address these challenges scientists and file system designers will need a thorough understanding of the design space of parallel file systems. Yet there exist few systematic studies of parallel file system behavior at petabyte- and exabyte scale. An important reason is the significant cost of getting access to large-scale hardware to test parallel file systems. To contribute to this understanding we are building a parallel file system simulator that can simulate parallel file systems at very large scale. Our goal is to simulate petabyte-scale parallel file systems on a small cluster or even a single machine in reasonable time and fidelity. With this simulator, file system experts will be able to tune existing file systems for specific workloads, scientists and file system deployment engineers will be able to better communicate workload requirements, file system designers and researchers will be able to try out design alternatives and innovations at scale, and instructors will be able to study very large-scale parallel file system behavior in the class room. In this paper we describe our approach and provide preliminary results that are encouraging both in terms of fidelity and simulation scalability.
Structured building model reduction toward parallel simulation
Energy Technology Data Exchange (ETDEWEB)
Dobbs, Justin R. [Cornell University; Hencey, Brondon M. [Cornell University
2013-08-26
Building energy model reduction exchanges accuracy for improved simulation speed by reducing the number of dynamical equations. Parallel computing aims to improve simulation times without loss of accuracy but is poorly utilized by contemporary simulators and is inherently limited by inter-processor communication. This paper bridges these disparate techniques to implement efficient parallel building thermal simulation. We begin with a survey of three structured reduction approaches that compares their performance to a leading unstructured method. We then use structured model reduction to find thermal clusters in the building energy model and allocate processing resources. Experimental results demonstrate faster simulation and low error without any interprocessor communication.
Program For Parallel Discrete-Event Simulation
Beckman, Brian C.; Blume, Leo R.; Geiselman, John S.; Presley, Matthew T.; Wedel, John J., Jr.; Bellenot, Steven F.; Diloreto, Michael; Hontalas, Philip J.; Reiher, Peter L.; Weiland, Frederick P.
1991-01-01
User does not have to add any special logic to aid in synchronization. Time Warp Operating System (TWOS) computer program is special-purpose operating system designed to support parallel discrete-event simulation. Complete implementation of Time Warp mechanism. Supports only simulations and other computations designed for virtual time. Time Warp Simulator (TWSIM) subdirectory contains sequential simulation engine interface-compatible with TWOS. TWOS and TWSIM written in, and support simulations in, C programming language.
Data parallel sorting for particle simulation
Dagum, Leonardo
1992-01-01
Sorting on a parallel architecture is a communications intensive event which can incur a high penalty in applications where it is required. In the case of particle simulation, only integer sorting is necessary, and sequential implementations easily attain the minimum performance bound of O (N) for N particles. Parallel implementations, however, have to cope with the parallel sorting problem which, in addition to incurring a heavy communications cost, can make the minimun performance bound difficult to attain. This paper demonstrates how the sorting problem in a particle simulation can be reduced to a merging problem, and describes an efficient data parallel algorithm to solve this merging problem in a particle simulation. The new algorithm is shown to be optimal under conditions usual for particle simulation, and its fieldwise implementation on the Connection Machine is analyzed in detail. The new algorithm is about four times faster than a fieldwise implementation of radix sort on the Connection Machine.
Parallel Monte Carlo simulation of aerosol dynamics
Zhou, K.; He, Z.; Xiao, M.; Zhang, Z.
2014-01-01
is simulated with a stochastic method (Marcus-Lushnikov stochastic process). Operator splitting techniques are used to synthesize the deterministic and stochastic parts in the algorithm. The algorithm is parallelized using the Message Passing Interface (MPI
Acoustic simulation in architecture with parallel algorithm
Li, Xiaohong; Zhang, Xinrong; Li, Dan
2004-03-01
In allusion to complexity of architecture environment and Real-time simulation of architecture acoustics, a parallel radiosity algorithm was developed. The distribution of sound energy in scene is solved with this method. And then the impulse response between sources and receivers at frequency segment, which are calculated with multi-process, are combined into whole frequency response. The numerical experiment shows that parallel arithmetic can improve the acoustic simulating efficiency of complex scene.
Simulation Exploration through Immersive Parallel Planes: Preprint
Energy Technology Data Exchange (ETDEWEB)
Brunhart-Lupo, Nicholas; Bush, Brian W.; Gruchalla, Kenny; Smith, Steve
2016-03-01
We present a visualization-driven simulation system that tightly couples systems dynamics simulations with an immersive virtual environment to allow analysts to rapidly develop and test hypotheses in a high-dimensional parameter space. To accomplish this, we generalize the two-dimensional parallel-coordinates statistical graphic as an immersive 'parallel-planes' visualization for multivariate time series emitted by simulations running in parallel with the visualization. In contrast to traditional parallel coordinate's mapping the multivariate dimensions onto coordinate axes represented by a series of parallel lines, we map pairs of the multivariate dimensions onto a series of parallel rectangles. As in the case of parallel coordinates, each individual observation in the dataset is mapped to a polyline whose vertices coincide with its coordinate values. Regions of the rectangles can be 'brushed' to highlight and select observations of interest: a 'slider' control allows the user to filter the observations by their time coordinate. In an immersive virtual environment, users interact with the parallel planes using a joystick that can select regions on the planes, manipulate selection, and filter time. The brushing and selection actions are used to both explore existing data as well as to launch additional simulations corresponding to the visually selected portions of the input parameter space. As soon as the new simulations complete, their resulting observations are displayed in the virtual environment. This tight feedback loop between simulation and immersive analytics accelerates users' realization of insights about the simulation and its output.
Simulation Exploration through Immersive Parallel Planes
Energy Technology Data Exchange (ETDEWEB)
Brunhart-Lupo, Nicholas J [National Renewable Energy Laboratory (NREL), Golden, CO (United States); Bush, Brian W [National Renewable Energy Laboratory (NREL), Golden, CO (United States); Gruchalla, Kenny M [National Renewable Energy Laboratory (NREL), Golden, CO (United States); Smith, Steve [Los Alamos Visualization Associates
2017-05-25
We present a visualization-driven simulation system that tightly couples systems dynamics simulations with an immersive virtual environment to allow analysts to rapidly develop and test hypotheses in a high-dimensional parameter space. To accomplish this, we generalize the two-dimensional parallel-coordinates statistical graphic as an immersive 'parallel-planes' visualization for multivariate time series emitted by simulations running in parallel with the visualization. In contrast to traditional parallel coordinate's mapping the multivariate dimensions onto coordinate axes represented by a series of parallel lines, we map pairs of the multivariate dimensions onto a series of parallel rectangles. As in the case of parallel coordinates, each individual observation in the dataset is mapped to a polyline whose vertices coincide with its coordinate values. Regions of the rectangles can be 'brushed' to highlight and select observations of interest: a 'slider' control allows the user to filter the observations by their time coordinate. In an immersive virtual environment, users interact with the parallel planes using a joystick that can select regions on the planes, manipulate selection, and filter time. The brushing and selection actions are used to both explore existing data as well as to launch additional simulations corresponding to the visually selected portions of the input parameter space. As soon as the new simulations complete, their resulting observations are displayed in the virtual environment. This tight feedback loop between simulation and immersive analytics accelerates users' realization of insights about the simulation and its output.
Parallel programming with Easy Java Simulations
Esquembre, F.; Christian, W.; Belloni, M.
2018-01-01
Nearly all of today's processors are multicore, and ideally programming and algorithm development utilizing the entire processor should be introduced early in the computational physics curriculum. Parallel programming is often not introduced because it requires a new programming environment and uses constructs that are unfamiliar to many teachers. We describe how we decrease the barrier to parallel programming by using a java-based programming environment to treat problems in the usual undergraduate curriculum. We use the easy java simulations programming and authoring tool to create the program's graphical user interface together with objects based on those developed by Kaminsky [Building Parallel Programs (Course Technology, Boston, 2010)] to handle common parallel programming tasks. Shared-memory parallel implementations of physics problems, such as time evolution of the Schrödinger equation, are available as source code and as ready-to-run programs from the AAPT-ComPADRE digital library.
Parallel and Distributed System Simulation
Dongarra, Jack
1998-01-01
This exploratory study initiated our research into the software infrastructure necessary to support the modeling and simulation techniques that are most appropriate for the Information Power Grid. Such computational power grids will use high-performance networking to connect hardware, software, instruments, databases, and people into a seamless web that supports a new generation of computation-rich problem solving environments for scientists and engineers. In this context we looked at evaluating the NetSolve software environment for network computing that leverages the potential of such systems while addressing their complexities. NetSolve's main purpose is to enable the creation of complex applications that harness the immense power of the grid, yet are simple to use and easy to deploy. NetSolve uses a modular, client-agent-server architecture to create a system that is very easy to use. Moreover, it is designed to be highly composable in that it readily permits new resources to be added by anyone willing to do so. In these respects NetSolve is to the Grid what the World Wide Web is to the Internet. But like the Web, the design that makes these wonderful features possible can also impose significant limitations on the performance and robustness of a NetSolve system. This project explored the design innovations that push the performance and robustness of the NetSolve paradigm as far as possible without sacrificing the Web-like ease of use and composability that make it so powerful.
Xyce parallel electronic simulator release notes.
Energy Technology Data Exchange (ETDEWEB)
Keiter, Eric R; Hoekstra, Robert John; Mei, Ting; Russo, Thomas V.; Schiek, Richard Louis; Thornquist, Heidi K.; Rankin, Eric Lamont; Coffey, Todd S; Pawlowski, Roger P; Santarelli, Keith R.
2010-05-01
The Xyce Parallel Electronic Simulator has been written to support, in a rigorous manner, the simulation needs of the Sandia National Laboratories electrical designers. Specific requirements include, among others, the ability to solve extremely large circuit problems by supporting large-scale parallel computing platforms, improved numerical performance and object-oriented code design and implementation. The Xyce release notes describe: Hardware and software requirements New features and enhancements Any defects fixed since the last release Current known defects and defect workarounds For up-to-date information not available at the time these notes were produced, please visit the Xyce web page at http://www.cs.sandia.gov/xyce.
Parallel discrete event simulation using shared memory
Reed, Daniel A.; Malony, Allen D.; Mccredie, Bradley D.
1988-01-01
With traditional event-list techniques, evaluating a detailed discrete-event simulation-model can often require hours or even days of computation time. By eliminating the event list and maintaining only sufficient synchronization to ensure causality, parallel simulation can potentially provide speedups that are linear in the numbers of processors. A set of shared-memory experiments, using the Chandy-Misra distributed-simulation algorithm, to simulate networks of queues is presented. Parameters of the study include queueing network topology and routing probabilities, number of processors, and assignment of network nodes to processors. These experiments show that Chandy-Misra distributed simulation is a questionable alternative to sequential-simulation of most queueing network models.
Parallel adaptive simulations on unstructured meshes
International Nuclear Information System (INIS)
Shephard, M S; Jansen, K E; Sahni, O; Diachin, L A
2007-01-01
This paper discusses methods being developed by the ITAPS center to support the execution of parallel adaptive simulations on unstructured meshes. The paper first outlines the ITAPS approach to the development of interoperable mesh, geometry and field services to support the needs of SciDAC application in these areas. The paper then demonstrates the ability of unstructured adaptive meshing methods built on such interoperable services to effectively solve important physics problems. Attention is then focused on ITAPs' developing ability to solve adaptive unstructured mesh problems on massively parallel computers
Xyce parallel electronic simulator : reference guide.
Energy Technology Data Exchange (ETDEWEB)
Mei, Ting; Rankin, Eric Lamont; Thornquist, Heidi K.; Santarelli, Keith R.; Fixel, Deborah A.; Coffey, Todd Stirling; Russo, Thomas V.; Schiek, Richard Louis; Warrender, Christina E.; Keiter, Eric Richard; Pawlowski, Roger Patrick
2011-05-01
This document is a reference guide to the Xyce Parallel Electronic Simulator, and is a companion document to the Xyce Users Guide. The focus of this document is (to the extent possible) exhaustively list device parameters, solver options, parser options, and other usage details of Xyce. This document is not intended to be a tutorial. Users who are new to circuit simulation are better served by the Xyce Users Guide. The Xyce Parallel Electronic Simulator has been written to support, in a rigorous manner, the simulation needs of the Sandia National Laboratories electrical designers. It is targeted specifically to run on large-scale parallel computing platforms but also runs well on a variety of architectures including single processor workstations. It also aims to support a variety of devices and models specific to Sandia needs. This document is intended to complement the Xyce Users Guide. It contains comprehensive, detailed information about a number of topics pertinent to the usage of Xyce. Included in this document is a netlist reference for the input-file commands and elements supported within Xyce; a command line reference, which describes the available command line arguments for Xyce; and quick-references for users of other circuit codes, such as Orcad's PSpice and Sandia's ChileSPICE.
A hybrid parallel framework for the cellular Potts model simulations
Energy Technology Data Exchange (ETDEWEB)
Jiang, Yi [Los Alamos National Laboratory; He, Kejing [SOUTH CHINA UNIV; Dong, Shoubin [SOUTH CHINA UNIV
2009-01-01
The Cellular Potts Model (CPM) has been widely used for biological simulations. However, most current implementations are either sequential or approximated, which can't be used for large scale complex 3D simulation. In this paper we present a hybrid parallel framework for CPM simulations. The time-consuming POE solving, cell division, and cell reaction operation are distributed to clusters using the Message Passing Interface (MPI). The Monte Carlo lattice update is parallelized on shared-memory SMP system using OpenMP. Because the Monte Carlo lattice update is much faster than the POE solving and SMP systems are more and more common, this hybrid approach achieves good performance and high accuracy at the same time. Based on the parallel Cellular Potts Model, we studied the avascular tumor growth using a multiscale model. The application and performance analysis show that the hybrid parallel framework is quite efficient. The hybrid parallel CPM can be used for the large scale simulation ({approx}10{sup 8} sites) of complex collective behavior of numerous cells ({approx}10{sup 6}).
Practical integrated simulation systems for coupled numerical simulations in parallel
Energy Technology Data Exchange (ETDEWEB)
Osamu, Hazama; Zhihong, Guo [Japan Atomic Energy Research Inst., Centre for Promotion of Computational Science and Engineering, Tokyo (Japan)
2003-07-01
In order for the numerical simulations to reflect 'real-world' phenomena and occurrences, incorporation of multidisciplinary and multi-physics simulations considering various physical models and factors are becoming essential. However, there still exist many obstacles which inhibit such numerical simulations. For example, it is still difficult in many instances to develop satisfactory software packages which allow for such coupled simulations and such simulations will require more computational resources. A precise multi-physics simulation today will require parallel processing which again makes it a complicated process. Under the international cooperative efforts between CCSE/JAERI and Fraunhofer SCAI, a German institute, a library called the MpCCI, or Mesh-based Parallel Code Coupling Interface, has been implemented together with a library called STAMPI to couple two existing codes to develop an 'integrated numerical simulation system' intended for meta-computing environments. (authors)
Parallel beam dynamics simulation of linear accelerators
International Nuclear Information System (INIS)
Qiang, Ji; Ryne, Robert D.
2002-01-01
In this paper we describe parallel particle-in-cell methods for the large scale simulation of beam dynamics in linear accelerators. These techniques have been implemented in the IMPACT (Integrated Map and Particle Accelerator Tracking) code. IMPACT is being used to study the behavior of intense charged particle beams and as a tool for the design of next-generation linear accelerators. As examples, we present applications of the code to the study of emittance exchange in high intensity beams and to the study of beam transport in a proposed accelerator for the development of accelerator-driven waste transmutation technologies
Multibus-based parallel processor for simulation
Ogrady, E. P.; Wang, C.-H.
1983-01-01
A Multibus-based parallel processor simulation system is described. The system is intended to serve as a vehicle for gaining hands-on experience, testing system and application software, and evaluating parallel processor performance during development of a larger system based on the horizontal/vertical-bus interprocessor communication mechanism. The prototype system consists of up to seven Intel iSBC 86/12A single-board computers which serve as processing elements, a multiple transmission controller (MTC) designed to support system operation, and an Intel Model 225 Microcomputer Development System which serves as the user interface and input/output processor. All components are interconnected by a Multibus/IEEE 796 bus. An important characteristic of the system is that it provides a mechanism for a processing element to broadcast data to other selected processing elements. This parallel transfer capability is provided through the design of the MTC and a minor modification to the iSBC 86/12A board. The operation of the MTC, the basic hardware-level operation of the system, and pertinent details about the iSBC 86/12A and the Multibus are described.
Empirical study of parallel LRU simulation algorithms
Carr, Eric; Nicol, David M.
1994-01-01
This paper reports on the performance of five parallel algorithms for simulating a fully associative cache operating under the LRU (Least-Recently-Used) replacement policy. Three of the algorithms are SIMD, and are implemented on the MasPar MP-2 architecture. Two other algorithms are parallelizations of an efficient serial algorithm on the Intel Paragon. One SIMD algorithm is quite simple, but its cost is linear in the cache size. The two other SIMD algorithm are more complex, but have costs that are independent on the cache size. Both the second and third SIMD algorithms compute all stack distances; the second SIMD algorithm is completely general, whereas the third SIMD algorithm presumes and takes advantage of bounds on the range of reference tags. Both MIMD algorithm implemented on the Paragon are general and compute all stack distances; they differ in one step that may affect their respective scalability. We assess the strengths and weaknesses of these algorithms as a function of problem size and characteristics, and compare their performance on traces derived from execution of three SPEC benchmark programs.
Parallel Numerical Simulations of Water Reservoirs
Torres, Pedro; Mangiavacchi, Norberto
2010-11-01
The study of the water flow and scalar transport in water reservoirs is important for the determination of the water quality during the initial stages of the reservoir filling and during the life of the reservoir. For this scope, a parallel 2D finite element code for solving the incompressible Navier-Stokes equations coupled with scalar transport was implemented using the message-passing programming model, in order to perform simulations of hidropower water reservoirs in a computer cluster environment. The spatial discretization is based on the MINI element that satisfies the Babuska-Brezzi (BB) condition, which provides sufficient conditions for a stable mixed formulation. All the distributed data structures needed in the different stages of the code, such as preprocessing, solving and post processing, were implemented using the PETSc library. The resulting linear systems for the velocity and the pressure fields were solved using the projection method, implemented by an approximate block LU factorization. In order to increase the parallel performance in the solution of the linear systems, we employ the static condensation method for solving the intermediate velocity at vertex and centroid nodes separately. We compare performance results of the static condensation method with the approach of solving the complete system. In our tests the static condensation method shows better performance for large problems, at the cost of an increased memory usage. Performance results for other intensive parts of the code in a computer cluster are also presented.
Parallel multiscale simulations of a brain aneurysm
Energy Technology Data Exchange (ETDEWEB)
Grinberg, Leopold [Division of Applied Mathematics, Brown University, Providence, RI 02912 (United States); Fedosov, Dmitry A. [Institute of Complex Systems and Institute for Advanced Simulation, Forschungszentrum Jülich, Jülich 52425 (Germany); Karniadakis, George Em, E-mail: george_karniadakis@brown.edu [Division of Applied Mathematics, Brown University, Providence, RI 02912 (United States)
2013-07-01
Cardiovascular pathologies, such as a brain aneurysm, are affected by the global blood circulation as well as by the local microrheology. Hence, developing computational models for such cases requires the coupling of disparate spatial and temporal scales often governed by diverse mathematical descriptions, e.g., by partial differential equations (continuum) and ordinary differential equations for discrete particles (atomistic). However, interfacing atomistic-based with continuum-based domain discretizations is a challenging problem that requires both mathematical and computational advances. We present here a hybrid methodology that enabled us to perform the first multiscale simulations of platelet depositions on the wall of a brain aneurysm. The large scale flow features in the intracranial network are accurately resolved by using the high-order spectral element Navier–Stokes solver NεκTαr. The blood rheology inside the aneurysm is modeled using a coarse-grained stochastic molecular dynamics approach (the dissipative particle dynamics method) implemented in the parallel code LAMMPS. The continuum and atomistic domains overlap with interface conditions provided by effective forces computed adaptively to ensure continuity of states across the interface boundary. A two-way interaction is allowed with the time-evolving boundary of the (deposited) platelet clusters tracked by an immersed boundary method. The corresponding heterogeneous solvers (NεκTαr and LAMMPS) are linked together by a computational multilevel message passing interface that facilitates modularity and high parallel efficiency. Results of multiscale simulations of clot formation inside the aneurysm in a patient-specific arterial tree are presented. We also discuss the computational challenges involved and present scalability results of our coupled solver on up to 300 K computer processors. Validation of such coupled atomistic-continuum models is a main open issue that has to be addressed in
Parallel multiscale simulations of a brain aneurysm
International Nuclear Information System (INIS)
Grinberg, Leopold; Fedosov, Dmitry A.; Karniadakis, George Em
2013-01-01
Cardiovascular pathologies, such as a brain aneurysm, are affected by the global blood circulation as well as by the local microrheology. Hence, developing computational models for such cases requires the coupling of disparate spatial and temporal scales often governed by diverse mathematical descriptions, e.g., by partial differential equations (continuum) and ordinary differential equations for discrete particles (atomistic). However, interfacing atomistic-based with continuum-based domain discretizations is a challenging problem that requires both mathematical and computational advances. We present here a hybrid methodology that enabled us to perform the first multiscale simulations of platelet depositions on the wall of a brain aneurysm. The large scale flow features in the intracranial network are accurately resolved by using the high-order spectral element Navier–Stokes solver NεκTαr. The blood rheology inside the aneurysm is modeled using a coarse-grained stochastic molecular dynamics approach (the dissipative particle dynamics method) implemented in the parallel code LAMMPS. The continuum and atomistic domains overlap with interface conditions provided by effective forces computed adaptively to ensure continuity of states across the interface boundary. A two-way interaction is allowed with the time-evolving boundary of the (deposited) platelet clusters tracked by an immersed boundary method. The corresponding heterogeneous solvers (NεκTαr and LAMMPS) are linked together by a computational multilevel message passing interface that facilitates modularity and high parallel efficiency. Results of multiscale simulations of clot formation inside the aneurysm in a patient-specific arterial tree are presented. We also discuss the computational challenges involved and present scalability results of our coupled solver on up to 300 K computer processors. Validation of such coupled atomistic-continuum models is a main open issue that has to be addressed in
Parallel magnetic field perturbations in gyrokinetic simulations
International Nuclear Information System (INIS)
Joiner, N.; Hirose, A.; Dorland, W.
2010-01-01
At low β it is common to neglect parallel magnetic field perturbations on the basis that they are of order β 2 . This is only true if effects of order β are canceled by a term in the ∇B drift also of order β[H. L. Berk and R. R. Dominguez, J. Plasma Phys. 18, 31 (1977)]. To our knowledge this has not been rigorously tested with modern gyrokinetic codes. In this work we use the gyrokinetic code GS2[Kotschenreuther et al., Comput. Phys. Commun. 88, 128 (1995)] to investigate whether the compressional magnetic field perturbation B || is required for accurate gyrokinetic simulations at low β for microinstabilities commonly found in tokamaks. The kinetic ballooning mode (KBM) demonstrates the principle described by Berk and Dominguez strongly, as does the trapped electron mode, in a less dramatic way. The ion and electron temperature gradient (ETG) driven modes do not typically exhibit this behavior; the effects of B || are found to depend on the pressure gradients. The terms which are seen to cancel at long wavelength in KBM calculations can be cumulative in the ion temperature gradient case and increase with η e . The effect of B || on the ETG instability is shown to depend on the normalized pressure gradient β ' at constant β.
A Coupling Tool for Parallel Molecular Dynamics-Continuum Simulations
Neumann, Philipp
2012-06-01
We present a tool for coupling Molecular Dynamics and continuum solvers. It is written in C++ and is meant to support the developers of hybrid molecular - continuum simulations in terms of both realisation of the respective coupling algorithm as well as parallel execution of the hybrid simulation. We describe the implementational concept of the tool and its parallel extensions. We particularly focus on the parallel execution of particle insertions into dense molecular systems and propose a respective parallel algorithm. Our implementations are validated for serial and parallel setups in two and three dimensions. © 2012 IEEE.
Parallel sparse direct solver for integrated circuit simulation
Chen, Xiaoming; Yang, Huazhong
2017-01-01
This book describes algorithmic methods and parallelization techniques to design a parallel sparse direct solver which is specifically targeted at integrated circuit simulation problems. The authors describe a complete flow and detailed parallel algorithms of the sparse direct solver. They also show how to improve the performance by simple but effective numerical techniques. The sparse direct solver techniques described can be applied to any SPICE-like integrated circuit simulator and have been proven to be high-performance in actual circuit simulation. Readers will benefit from the state-of-the-art parallel integrated circuit simulation techniques described in this book, especially the latest parallel sparse matrix solution techniques. · Introduces complicated algorithms of sparse linear solvers, using concise principles and simple examples, without complex theory or lengthy derivations; · Describes a parallel sparse direct solver that can be adopted to accelerate any SPICE-like integrated circuit simulato...
Parallelization and automatic data distribution for nuclear reactor simulations
Energy Technology Data Exchange (ETDEWEB)
Liebrock, L.M. [Liebrock-Hicks Research, Calumet, MI (United States)
1997-07-01
Detailed attempts at realistic nuclear reactor simulations currently take many times real time to execute on high performance workstations. Even the fastest sequential machine can not run these simulations fast enough to ensure that the best corrective measure is used during a nuclear accident to prevent a minor malfunction from becoming a major catastrophe. Since sequential computers have nearly reached the speed of light barrier, these simulations will have to be run in parallel to make significant improvements in speed. In physical reactor plants, parallelism abounds. Fluids flow, controls change, and reactions occur in parallel with only adjacent components directly affecting each other. These do not occur in the sequentialized manner, with global instantaneous effects, that is often used in simulators. Development of parallel algorithms that more closely approximate the real-world operation of a reactor may, in addition to speeding up the simulations, actually improve the accuracy and reliability of the predictions generated. Three types of parallel architecture (shared memory machines, distributed memory multicomputers, and distributed networks) are briefly reviewed as targets for parallelization of nuclear reactor simulation. Various parallelization models (loop-based model, shared memory model, functional model, data parallel model, and a combined functional and data parallel model) are discussed along with their advantages and disadvantages for nuclear reactor simulation. A variety of tools are introduced for each of the models. Emphasis is placed on the data parallel model as the primary focus for two-phase flow simulation. Tools to support data parallel programming for multiple component applications and special parallelization considerations are also discussed.
Parallelization and automatic data distribution for nuclear reactor simulations
International Nuclear Information System (INIS)
Liebrock, L.M.
1997-01-01
Detailed attempts at realistic nuclear reactor simulations currently take many times real time to execute on high performance workstations. Even the fastest sequential machine can not run these simulations fast enough to ensure that the best corrective measure is used during a nuclear accident to prevent a minor malfunction from becoming a major catastrophe. Since sequential computers have nearly reached the speed of light barrier, these simulations will have to be run in parallel to make significant improvements in speed. In physical reactor plants, parallelism abounds. Fluids flow, controls change, and reactions occur in parallel with only adjacent components directly affecting each other. These do not occur in the sequentialized manner, with global instantaneous effects, that is often used in simulators. Development of parallel algorithms that more closely approximate the real-world operation of a reactor may, in addition to speeding up the simulations, actually improve the accuracy and reliability of the predictions generated. Three types of parallel architecture (shared memory machines, distributed memory multicomputers, and distributed networks) are briefly reviewed as targets for parallelization of nuclear reactor simulation. Various parallelization models (loop-based model, shared memory model, functional model, data parallel model, and a combined functional and data parallel model) are discussed along with their advantages and disadvantages for nuclear reactor simulation. A variety of tools are introduced for each of the models. Emphasis is placed on the data parallel model as the primary focus for two-phase flow simulation. Tools to support data parallel programming for multiple component applications and special parallelization considerations are also discussed
Parallel and vector implementation of APROS simulator code
International Nuclear Information System (INIS)
Niemi, J.; Tommiska, J.
1990-01-01
In this paper the vector and parallel processing implementation of a general purpose simulator code is discussed. In this code the utilization of vector processing is straightforward. In addition to the loop level parallel processing, the functional decomposition and the domain decomposition have been considered. Results represented for a PWR-plant simulation illustrate the potential speed-up factors of the alternatives. It turns out that the loop level parallelism and the domain decomposition are the most promising alternative to employ the parallel processing. (author)
A compositional reservoir simulator on distributed memory parallel computers
International Nuclear Information System (INIS)
Rame, M.; Delshad, M.
1995-01-01
This paper presents the application of distributed memory parallel computes to field scale reservoir simulations using a parallel version of UTCHEM, The University of Texas Chemical Flooding Simulator. The model is a general purpose highly vectorized chemical compositional simulator that can simulate a wide range of displacement processes at both field and laboratory scales. The original simulator was modified to run on both distributed memory parallel machines (Intel iPSC/960 and Delta, Connection Machine 5, Kendall Square 1 and 2, and CRAY T3D) and a cluster of workstations. A domain decomposition approach has been taken towards parallelization of the code. A portion of the discrete reservoir model is assigned to each processor by a set-up routine that attempts a data layout as even as possible from the load-balance standpoint. Each of these subdomains is extended so that data can be shared between adjacent processors for stencil computation. The added routines that make parallel execution possible are written in a modular fashion that makes the porting to new parallel platforms straight forward. Results of the distributed memory computing performance of Parallel simulator are presented for field scale applications such as tracer flood and polymer flood. A comparison of the wall-clock times for same problems on a vector supercomputer is also presented
Parallel-Processing Test Bed For Simulation Software
Blech, Richard; Cole, Gary; Townsend, Scott
1996-01-01
Second-generation Hypercluster computing system is multiprocessor test bed for research on parallel algorithms for simulation in fluid dynamics, electromagnetics, chemistry, and other fields with large computational requirements but relatively low input/output requirements. Built from standard, off-shelf hardware readily upgraded as improved technology becomes available. System used for experiments with such parallel-processing concepts as message-passing algorithms, debugging software tools, and computational steering. First-generation Hypercluster system described in "Hypercluster Parallel Processor" (LEW-15283).
Parallel simulated annealing algorithms for cell placement on hypercube multiprocessors
Banerjee, Prithviraj; Jones, Mark Howard; Sargent, Jeff S.
1990-01-01
Two parallel algorithms for standard cell placement using simulated annealing are developed to run on distributed-memory message-passing hypercube multiprocessors. The cells can be mapped in a two-dimensional area of a chip onto processors in an n-dimensional hypercube in two ways, such that both small and large cell exchange and displacement moves can be applied. The computation of the cost function in parallel among all the processors in the hypercube is described, along with a distributed data structure that needs to be stored in the hypercube to support the parallel cost evaluation. A novel tree broadcasting strategy is used extensively for updating cell locations in the parallel environment. A dynamic parallel annealing schedule estimates the errors due to interacting parallel moves and adapts the rate of synchronization automatically. Two novel approaches in controlling error in parallel algorithms are described: heuristic cell coloring and adaptive sequence control.
Prototyping and Simulating Parallel, Distributed Computations with VISA
National Research Council Canada - National Science Library
Demeure, Isabelle M; Nutt, Gary J
1989-01-01
...] to support the design, prototyping, and simulation of parallel, distributed computations. In particular, VISA is meant to guide the choice of partitioning and communication strategies for such computations, based on their performance...
Visual Interfaces for Parallel Simulations (VIPS), Phase I
National Aeronautics and Space Administration — Configuring the 3D geometry and physics of large scale parallel physics simulations is increasingly complex. Given the investment in time and effort to run these...
Parallel discrete-event simulation of FCFS stochastic queueing networks
Nicol, David M.
1988-01-01
Physical systems are inherently parallel. Intuition suggests that simulations of these systems may be amenable to parallel execution. The parallel execution of a discrete-event simulation requires careful synchronization of processes in order to ensure the execution's correctness; this synchronization can degrade performance. Largely negative results were recently reported in a study which used a well-known synchronization method on queueing network simulations. Discussed here is a synchronization method (appointments), which has proven itself to be effective on simulations of FCFS queueing networks. The key concept behind appointments is the provision of lookahead. Lookahead is a prediction on a processor's future behavior, based on an analysis of the processor's simulation state. It is shown how lookahead can be computed for FCFS queueing network simulations, give performance data that demonstrates the method's effectiveness under moderate to heavy loads, and discuss performance tradeoffs between the quality of lookahead, and the cost of computing lookahead.
Xyce parallel electronic simulator : users' guide.
Energy Technology Data Exchange (ETDEWEB)
Mei, Ting; Rankin, Eric Lamont; Thornquist, Heidi K.; Santarelli, Keith R.; Fixel, Deborah A.; Coffey, Todd Stirling; Russo, Thomas V.; Schiek, Richard Louis; Warrender, Christina E.; Keiter, Eric Richard; Pawlowski, Roger Patrick
2011-05-01
This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: (1) Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). Note that this includes support for most popular parallel and serial computers; (2) Improved performance for all numerical kernels (e.g., time integrator, nonlinear and linear solvers) through state-of-the-art algorithms and novel techniques. (3) Device models which are specifically tailored to meet Sandia's needs, including some radiation-aware devices (for Sandia users only); and (4) Object-oriented code design and implementation using modern coding practices that ensure that the Xyce Parallel Electronic Simulator will be maintainable and extensible far into the future. Xyce is a parallel code in the most general sense of the phrase - a message passing parallel implementation - which allows it to run efficiently on the widest possible number of computing platforms. These include serial, shared-memory and distributed-memory parallel as well as heterogeneous platforms. Careful attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows. The development of Xyce provides a platform for computational research and development aimed specifically at the needs of the Laboratory. With Xyce, Sandia has an 'in-house' capability with which both new electrical (e.g., device model development) and algorithmic (e.g., faster time-integration methods, parallel solver algorithms) research and development can be performed. As a result, Xyce is
Running Parallel Discrete Event Simulators on Sierra
Energy Technology Data Exchange (ETDEWEB)
Barnes, P. D. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Jefferson, D. R. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)
2015-12-03
In this proposal we consider porting the ROSS/Charm++ simulator and the discrete event models that run under its control so that they run on the Sierra architecture and make efficient use of the Volta GPUs.
Byington, Scott
1997-01-01
Presents a strategy to help students grasp the important implications of population growth. Involves an interactive demonstration that allows students to experience exponential and logistic population growth followed by a discussion of the implications of population-growth principles. (JRH)
Parallel pic plasma simulation through particle decomposition techniques
International Nuclear Information System (INIS)
Briguglio, S.; Vlad, G.; Di Martino, B.; Naples, Univ. 'Federico II'
1998-02-01
Particle-in-cell (PIC) codes are among the major candidates to yield a satisfactory description of the detail of kinetic effects, such as the resonant wave-particle interaction, relevant in determining the transport mechanism in magnetically confined plasmas. A significant improvement of the simulation performance of such codes con be expected from parallelization, e.g., by distributing the particle population among several parallel processors. Parallelization of a hybrid magnetohydrodynamic-gyrokinetic code has been accomplished within the High Performance Fortran (HPF) framework, and tested on the IBM SP2 parallel system, using a 'particle decomposition' technique. The adopted technique requires a moderate effort in porting the code in parallel form and results in intrinsic load balancing and modest inter processor communication. The performance tests obtained confirm the hypothesis of high effectiveness of the strategy, if targeted towards moderately parallel architectures. Optimal use of resources is also discussed with reference to a specific physics problem [it
Parallel discrete event simulation: A shared memory approach
Reed, Daniel A.; Malony, Allen D.; Mccredie, Bradley D.
1987-01-01
With traditional event list techniques, evaluating a detailed discrete event simulation model can often require hours or even days of computation time. Parallel simulation mimics the interacting servers and queues of a real system by assigning each simulated entity to a processor. By eliminating the event list and maintaining only sufficient synchronization to insure causality, parallel simulation can potentially provide speedups that are linear in the number of processors. A set of shared memory experiments is presented using the Chandy-Misra distributed simulation algorithm to simulate networks of queues. Parameters include queueing network topology and routing probabilities, number of processors, and assignment of network nodes to processors. These experiments show that Chandy-Misra distributed simulation is a questionable alternative to sequential simulation of most queueing network models.
CUBESIM, Hypercube and Denelcor Hep Parallel Computer Simulation
International Nuclear Information System (INIS)
Dunigan, T.H.
1988-01-01
1 - Description of program or function: CUBESIM is a set of subroutine libraries and programs for the simulation of message-passing parallel computers and shared-memory parallel computers. Subroutines are supplied to simulate the Intel hypercube and the Denelcor HEP parallel computers. The system permits a user to develop and test parallel programs written in C or FORTRAN on a single processor. The user may alter such hypercube parameters as message startup times, packet size, and the computation-to-communication ratio. The simulation generates a trace file that can be used for debugging, performance analysis, or graphical display. 2 - Method of solution: The CUBESIM simulator is linked with the user's parallel application routines to run as a single UNIX process. The simulator library provides a small operating system to perform process and message management. 3 - Restrictions on the complexity of the problem: Up to 128 processors can be simulated with a virtual memory limit of 6 million bytes. Up to 1000 processes can be simulated
Beam dynamics simulations using a parallel version of PARMILA
International Nuclear Information System (INIS)
Ryne, R.D.
1996-01-01
The computer code PARMILA has been the primary tool for the design of proton and ion linacs in the United States for nearly three decades. Previously it was sufficient to perform simulations with of order 10000 particles, but recently the need to perform high resolution halo studies for next-generation, high intensity linacs has made it necessary to perform simulations with of order 100 million particles. With the advent of massively parallel computers such simulations are now within reach. Parallel computers already make it possible, for example, to perform beam dynamics calculations with tens of millions of particles, requiring over 10 GByte of core memory, in just a few hours. Also, parallel computers are becoming easier to use thanks to the availability of mature, Fortran-like languages such as Connection Machine Fortran and High Performance Fortran. We will describe our experience developing a parallel version of PARMILA and the performance of the new code
Beam dynamics simulations using a parallel version of PARMILA
International Nuclear Information System (INIS)
Ryne, Robert
1996-01-01
The computer code PARMILA has been the primary tool for the design of proton and ion linacs in the United States for nearly three decades. Previously it was sufficient to perform simulations with of order 10000 particles, but recently the need to perform high resolution halo studies for next-generation, high intensity linacs has made it necessary to perform simulations with of order 100 million particles. With the advent of massively parallel computers such simulations are now within reach. Parallel computers already make it possible, for example, to perform beam dynamics calculations with tens of millions of particles, requiring over 10 GByte of core memory, in just a few hours. Also, parallel computers are becoming easier to use thanks to the availability of mature, Fortran-like languages such as Connection Machine Fortran and High Performance Fortran. We will describe our experience developing a parallel version of PARMILA and the performance of the new code. (author)
A path-level exact parallelization strategy for sequential simulation
Peredo, Oscar F.; Baeza, Daniel; Ortiz, Julián M.; Herrero, José R.
2018-01-01
Sequential Simulation is a well known method in geostatistical modelling. Following the Bayesian approach for simulation of conditionally dependent random events, Sequential Indicator Simulation (SIS) method draws simulated values for K categories (categorical case) or classes defined by K different thresholds (continuous case). Similarly, Sequential Gaussian Simulation (SGS) method draws simulated values from a multivariate Gaussian field. In this work, a path-level approach to parallelize SIS and SGS methods is presented. A first stage of re-arrangement of the simulation path is performed, followed by a second stage of parallel simulation for non-conflicting nodes. A key advantage of the proposed parallelization method is to generate identical realizations as with the original non-parallelized methods. Case studies are presented using two sequential simulation codes from GSLIB: SISIM and SGSIM. Execution time and speedup results are shown for large-scale domains, with many categories and maximum kriging neighbours in each case, achieving high speedup results in the best scenarios using 16 threads of execution in a single machine.
Lian, Yanping; Lin, Stephen; Yan, Wentao; Liu, Wing Kam; Wagner, Gregory J.
2018-01-01
In this paper, a parallelized 3D cellular automaton computational model is developed to predict grain morphology for solidification of metal during the additive manufacturing process. Solidification phenomena are characterized by highly localized events, such as the nucleation and growth of multiple grains. As a result, parallelization requires careful treatment of load balancing between processors as well as interprocess communication in order to maintain a high parallel efficiency. We give a detailed summary of the formulation of the model, as well as a description of the communication strategies implemented to ensure parallel efficiency. Scaling tests on a representative problem with about half a billion cells demonstrate parallel efficiency of more than 80% on 8 processors and around 50% on 64; loss of efficiency is attributable to load imbalance due to near-surface grain nucleation in this test problem. The model is further demonstrated through an additive manufacturing simulation with resulting grain structures showing reasonable agreement with those observed in experiments.
Lian, Yanping; Lin, Stephen; Yan, Wentao; Liu, Wing Kam; Wagner, Gregory J.
2018-05-01
In this paper, a parallelized 3D cellular automaton computational model is developed to predict grain morphology for solidification of metal during the additive manufacturing process. Solidification phenomena are characterized by highly localized events, such as the nucleation and growth of multiple grains. As a result, parallelization requires careful treatment of load balancing between processors as well as interprocess communication in order to maintain a high parallel efficiency. We give a detailed summary of the formulation of the model, as well as a description of the communication strategies implemented to ensure parallel efficiency. Scaling tests on a representative problem with about half a billion cells demonstrate parallel efficiency of more than 80% on 8 processors and around 50% on 64; loss of efficiency is attributable to load imbalance due to near-surface grain nucleation in this test problem. The model is further demonstrated through an additive manufacturing simulation with resulting grain structures showing reasonable agreement with those observed in experiments.
Tutorial: Parallel Computing of Simulation Models for Risk Analysis.
Reilly, Allison C; Staid, Andrea; Gao, Michael; Guikema, Seth D
2016-10-01
Simulation models are widely used in risk analysis to study the effects of uncertainties on outcomes of interest in complex problems. Often, these models are computationally complex and time consuming to run. This latter point may be at odds with time-sensitive evaluations or may limit the number of parameters that are considered. In this article, we give an introductory tutorial focused on parallelizing simulation code to better leverage modern computing hardware, enabling risk analysts to better utilize simulation-based methods for quantifying uncertainty in practice. This article is aimed primarily at risk analysts who use simulation methods but do not yet utilize parallelization to decrease the computational burden of these models. The discussion is focused on conceptual aspects of embarrassingly parallel computer code and software considerations. Two complementary examples are shown using the languages MATLAB and R. A brief discussion of hardware considerations is located in the Appendix. © 2016 Society for Risk Analysis.
Parallelization of Rocket Engine Simulator Software (PRESS)
Cezzar, Ruknet
1998-01-01
We have outlined our work in the last half of the funding period. We have shown how a demo package for RESSAP using MPI can be done. However, we also mentioned the difficulties with the UNIX platform. We have reiterated some of the suggestions made during the presentation of the progress of the at Fourth Annual HBCU Conference. Although we have discussed, in some detail, how TURBDES/PUMPDES software can be run in parallel using MPI, at present, we are unable to experiment any further with either MPI or PVM. Due to X windows not being implemented, we are also not able to experiment further with XPVM, which it will be recalled, has a nice GUI interface. There are also some concerns, on our part, about MPI being an appropriate tool. The best thing about MPr is that it is public domain. Although and plenty of documentation exists for the intricacies of using MPI, little information is available on its actual implementations. Other than very typical, somewhat contrived examples, such as Jacobi algorithm for solving Laplace's equation, there are few examples which can readily be applied to real situations, such as in our case. In effect, the review of literature on both MPI and PVM, and there is a lot, indicate something similar to the enormous effort which was spent on LISP and LISP-like languages as tools for artificial intelligence research. During the development of a book on programming languages [12], when we searched the literature for very simple examples like taking averages, reading and writing records, multiplying matrices, etc., we could hardly find a any! Yet, so much was said and done on that topic in academic circles. It appears that we faced the same problem with MPI, where despite significant documentation, we could not find even a simple example which supports course-grain parallelism involving only a few processes. From the foregoing, it appears that a new direction may be required for more productive research during the extension period (10/19/98 - 10
Vlasov simulations of parallel potential drops
Directory of Open Access Journals (Sweden)
H. Gunell
2013-07-01
Full Text Available An auroral flux tube is modelled from the magnetospheric equator to the ionosphere using Vlasov simulations. Starting from an initial state, the evolution of the plasma on the flux tube is followed in time. It is found that when applying a voltage between the ends of the flux tube, about two thirds of the potential drop is concentrated in a thin double layer at approximately one Earth radius altitude. The remaining part is situated in an extended region 1–2 Earth radii above the double layer. Waves on the ion timescale develop above the double layer, and they move toward higher altitude at approximately the ion acoustic speed. These waves are seen both in the electric field and as perturbations of the ion and electron distributions, indicative of an instability. Electrons of magnetospheric origin become trapped between the magnetic mirror and the double layer during its formation. At low altitude, waves on electron timescales appear and are seen to be non-uniformly distributed in space. The temporal evolution of the potential profile and the total voltage affect the double layer altitude, which decreases with an increasing field aligned potential drop. A current–voltage relationship is found by running several simulations with different voltages over the system, and it agrees with the Knight relation reasonably well.
General-purpose parallel simulator for quantum computing
International Nuclear Information System (INIS)
Niwa, Jumpei; Matsumoto, Keiji; Imai, Hiroshi
2002-01-01
With current technologies, it seems to be very difficult to implement quantum computers with many qubits. It is therefore of importance to simulate quantum algorithms and circuits on the existing computers. However, for a large-size problem, the simulation often requires more computational power than is available from sequential processing. Therefore, simulation methods for parallel processors are required. We have developed a general-purpose simulator for quantum algorithms/circuits on the parallel computer (Sun Enterprise4500). It can simulate algorithms/circuits with up to 30 qubits. In order to test efficiency of our proposed methods, we have simulated Shor's factorization algorithm and Grover's database search, and we have analyzed robustness of the corresponding quantum circuits in the presence of both decoherence and operational errors. The corresponding results, statistics, and analyses are presented in this paper
A hybrid algorithm for parallel molecular dynamics simulations
Mangiardi, Chris M.; Meyer, R.
2017-10-01
This article describes algorithms for the hybrid parallelization and SIMD vectorization of molecular dynamics simulations with short-range forces. The parallelization method combines domain decomposition with a thread-based parallelization approach. The goal of the work is to enable efficient simulations of very large (tens of millions of atoms) and inhomogeneous systems on many-core processors with hundreds or thousands of cores and SIMD units with large vector sizes. In order to test the efficiency of the method, simulations of a variety of configurations with up to 74 million atoms have been performed. Results are shown that were obtained on multi-core systems with Sandy Bridge and Haswell processors as well as systems with Xeon Phi many-core processors.
A tool for simulating parallel branch-and-bound methods
Golubeva, Yana; Orlov, Yury; Posypkin, Mikhail
2016-01-01
The Branch-and-Bound method is known as one of the most powerful but very resource consuming global optimization methods. Parallel and distributed computing can efficiently cope with this issue. The major difficulty in parallel B&B method is the need for dynamic load redistribution. Therefore design and study of load balancing algorithms is a separate and very important research topic. This paper presents a tool for simulating parallel Branchand-Bound method. The simulator allows one to run load balancing algorithms with various numbers of processors, sizes of the search tree, the characteristics of the supercomputer's interconnect thereby fostering deep study of load distribution strategies. The process of resolution of the optimization problem by B&B method is replaced by a stochastic branching process. Data exchanges are modeled using the concept of logical time. The user friendly graphical interface to the simulator provides efficient visualization and convenient performance analysis.
Parallel alternating direction preconditioner for isogeometric simulations of explicit dynamics
Łoś, Marcin
2015-04-27
In this paper we present a parallel implementation of the alternating direction preconditioner for isogeometric simulations of explicit dynamics. The Alternating Direction Implicit (ADI) algorithm, belongs to the category of matrix-splitting iterative methods, was proposed almost six decades ago for solving parabolic and elliptic partial differential equations, see [1–4]. The new version of this algorithm has been recently developed for isogeometric simulations of two dimensional explicit dynamics [5] and steady-state diffusion equations with orthotropic heterogenous coefficients [6]. In this paper we present a parallel version of the alternating direction implicit algorithm for three dimensional simulations. The algorithm has been incorporated as a part of PETIGA an isogeometric framework [7] build on top of PETSc [8]. We show the scalability of the parallel algorithm on STAMPEDE linux cluster up to 10,000 processors, as well as the convergence rate of the PCG solver with ADI algorithm as preconditioner.
Engineering-Based Thermal CFD Simulations on Massive Parallel Systems
Frisch, Jérôme
2015-05-22
The development of parallel Computational Fluid Dynamics (CFD) codes is a challenging task that entails efficient parallelization concepts and strategies in order to achieve good scalability values when running those codes on modern supercomputers with several thousands to millions of cores. In this paper, we present a hierarchical data structure for massive parallel computations that supports the coupling of a Navier–Stokes-based fluid flow code with the Boussinesq approximation in order to address complex thermal scenarios for energy-related assessments. The newly designed data structure is specifically designed with the idea of interactive data exploration and visualization during runtime of the simulation code; a major shortcoming of traditional high-performance computing (HPC) simulation codes. We further show and discuss speed-up values obtained on one of Germany’s top-ranked supercomputers with up to 140,000 processes and present simulation results for different engineering-based thermal problems.
A tool for simulating parallel branch-and-bound methods
Directory of Open Access Journals (Sweden)
Golubeva Yana
2016-01-01
Full Text Available The Branch-and-Bound method is known as one of the most powerful but very resource consuming global optimization methods. Parallel and distributed computing can efficiently cope with this issue. The major difficulty in parallel B&B method is the need for dynamic load redistribution. Therefore design and study of load balancing algorithms is a separate and very important research topic. This paper presents a tool for simulating parallel Branchand-Bound method. The simulator allows one to run load balancing algorithms with various numbers of processors, sizes of the search tree, the characteristics of the supercomputer’s interconnect thereby fostering deep study of load distribution strategies. The process of resolution of the optimization problem by B&B method is replaced by a stochastic branching process. Data exchanges are modeled using the concept of logical time. The user friendly graphical interface to the simulator provides efficient visualization and convenient performance analysis.
Parallel simulation of radio-frequency plasma discharges
International Nuclear Information System (INIS)
Fivaz, M.; Howling, A.; Ruegsegger, L.; Schwarzenbach, W.; Baeumle, B.
1994-01-01
The 1D Particle-In-Cell and Monte Carlo collision code XPDP1 is used to model radio-frequency argon plasma discharges. The code runs faster on a single-user parallel system called MUSIC than on a CRAY-YMP. The low cost of the MUSIC system allows a 24-hours-per-day use and the simulation results are available one to two orders of magnitude quicker than with a super computer shared with other users. The parallelization strategy and its implementation are discussed. Very good agreement is found between simulation results and measurements done in an experimental argon discharge. (author) 2 figs., 3 refs
Parallelization of a Monte Carlo particle transport simulation code
Hadjidoukas, P.; Bousis, C.; Emfietzoglou, D.
2010-05-01
We have developed a high performance version of the Monte Carlo particle transport simulation code MC4. The original application code, developed in Visual Basic for Applications (VBA) for Microsoft Excel, was first rewritten in the C programming language for improving code portability. Several pseudo-random number generators have been also integrated and studied. The new MC4 version was then parallelized for shared and distributed-memory multiprocessor systems using the Message Passing Interface. Two parallel pseudo-random number generator libraries (SPRNG and DCMT) have been seamlessly integrated. The performance speedup of parallel MC4 has been studied on a variety of parallel computing architectures including an Intel Xeon server with 4 dual-core processors, a Sun cluster consisting of 16 nodes of 2 dual-core AMD Opteron processors and a 200 dual-processor HP cluster. For large problem size, which is limited only by the physical memory of the multiprocessor server, the speedup results are almost linear on all systems. We have validated the parallel implementation against the serial VBA and C implementations using the same random number generator. Our experimental results on the transport and energy loss of electrons in a water medium show that the serial and parallel codes are equivalent in accuracy. The present improvements allow for studying of higher particle energies with the use of more accurate physical models, and improve statistics as more particles tracks can be simulated in low response time.
Simulation of neutron transport equation using parallel Monte Carlo for deep penetration problems
International Nuclear Information System (INIS)
Bekar, K. K.; Tombakoglu, M.; Soekmen, C. N.
2001-01-01
Neutron transport equation is simulated using parallel Monte Carlo method for deep penetration neutron transport problem. Monte Carlo simulation is parallelized by using three different techniques; direct parallelization, domain decomposition and domain decomposition with load balancing, which are used with PVM (Parallel Virtual Machine) software on LAN (Local Area Network). The results of parallel simulation are given for various model problems. The performances of the parallelization techniques are compared with each other. Moreover, the effects of variance reduction techniques on parallelization are discussed
Parallel PDE-Based Simulations Using the Common Component Architecture
International Nuclear Information System (INIS)
McInnes, Lois C.; Allan, Benjamin A.; Armstrong, Robert; Benson, Steven J.; Bernholdt, David E.; Dahlgren, Tamara L.; Diachin, Lori; Krishnan, Manoj Kumar; Kohl, James A.; Larson, J. Walter; Lefantzi, Sophia; Nieplocha, Jarek; Norris, Boyana; Parker, Steven G.; Ray, Jaideep; Zhou, Shujia
2006-01-01
The complexity of parallel PDE-based simulations continues to increase as multimodel, multiphysics, and multi-institutional projects become widespread. A goal of component based software engineering in such large-scale simulations is to help manage this complexity by enabling better interoperability among various codes that have been independently developed by different groups. The Common Component Architecture (CCA) Forum is defining a component architecture specification to address the challenges of high-performance scientific computing. In addition, several execution frameworks, supporting infrastructure, and general purpose components are being developed. Furthermore, this group is collaborating with others in the high-performance computing community to design suites of domain-specific component interface specifications and underlying implementations. This chapter discusses recent work on leveraging these CCA efforts in parallel PDE-based simulations involving accelerator design, climate modeling, combustion, and accidental fires and explosions. We explain how component technology helps to address the different challenges posed by each of these applications, and we highlight how component interfaces built on existing parallel toolkits facilitate the reuse of software for parallel mesh manipulation, discretization, linear algebra, integration, optimization, and parallel data redistribution. We also present performance data to demonstrate the suitability of this approach, and we discuss strategies for applying component technologies to both new and existing applications
Parallelization of a numerical simulation code for isotropic turbulence
International Nuclear Information System (INIS)
Sato, Shigeru; Yokokawa, Mitsuo; Watanabe, Tadashi; Kaburaki, Hideo.
1996-03-01
A parallel pseudospectral code which solves the three-dimensional Navier-Stokes equation by direct numerical simulation is developed and execution time, parallelization efficiency, load balance and scalability are evaluated. A vector parallel supercomputer, Fujitsu VPP500 with up to 16 processors is used for this calculation for Fourier modes up to 256x256x256 using 16 processors. Good scalability for number of processors is achieved when number of Fourier mode is fixed. For small Fourier modes, calculation time of the program is proportional to NlogN which is ideal complexity of calculation for 3D-FFT on vector parallel processors. It is found that the calculation performance decreases as the increase of the Fourier modes. (author)
Enabling parallel simulation of large-scale HPC network systems
International Nuclear Information System (INIS)
Mubarak, Misbah; Carothers, Christopher D.; Ross, Robert B.; Carns, Philip
2016-01-01
Here, with the increasing complexity of today’s high-performance computing (HPC) architectures, simulation has become an indispensable tool for exploring the design space of HPC systems—in particular, networks. In order to make effective design decisions, simulations of these systems must possess the following properties: (1) have high accuracy and fidelity, (2) produce results in a timely manner, and (3) be able to analyze a broad range of network workloads. Most state-of-the-art HPC network simulation frameworks, however, are constrained in one or more of these areas. In this work, we present a simulation framework for modeling two important classes of networks used in today’s IBM and Cray supercomputers: torus and dragonfly networks. We use the Co-Design of Multi-layer Exascale Storage Architecture (CODES) simulation framework to simulate these network topologies at a flit-level detail using the Rensselaer Optimistic Simulation System (ROSS) for parallel discrete-event simulation. Our simulation framework meets all the requirements of a practical network simulation and can assist network designers in design space exploration. First, it uses validated and detailed flit-level network models to provide an accurate and high-fidelity network simulation. Second, instead of relying on serial time-stepped or traditional conservative discrete-event simulations that limit simulation scalability and efficiency, we use the optimistic event-scheduling capability of ROSS to achieve efficient and scalable HPC network simulations on today’s high-performance cluster systems. Third, our models give network designers a choice in simulating a broad range of network workloads, including HPC application workloads using detailed network traces, an ability that is rarely offered in parallel with high-fidelity network simulations
Treatment planning in radiosurgery: parallel Monte Carlo simulation software
Energy Technology Data Exchange (ETDEWEB)
Scielzo, G [Galliera Hospitals, Genova (Italy). Dept. of Hospital Physics; Grillo Ruggieri, F [Galliera Hospitals, Genova (Italy) Dept. for Radiation Therapy; Modesti, M; Felici, R [Electronic Data System, Rome (Italy); Surridge, M [University of South Hampton (United Kingdom). Parallel Apllication Centre
1995-12-01
The main objective of this research was to evaluate the possibility of direct Monte Carlo simulation for accurate dosimetry with short computation time. We made us of: graphics workstation, linear accelerator, water, PMMA and anthropomorphic phantoms, for validation purposes; ionometric, film and thermo-luminescent techniques, for dosimetry; treatment planning system for comparison. Benchmarking results suggest that short computing times can be obtained with use of the parallel version of EGS4 that was developed. Parallelism was obtained assigning simulation incident photons to separate processors, and the development of a parallel random number generator was necessary. Validation consisted in: phantom irradiation, comparison of predicted and measured values good agreement in PDD and dose profiles. Experiments on anthropomorphic phantoms (with inhomogeneities) were carried out, and these values are being compared with results obtained with the conventional treatment planning system.
Xyce parallel electronic simulator reference guide, version 6.1
Energy Technology Data Exchange (ETDEWEB)
Keiter, Eric R; Mei, Ting; Russo, Thomas V.; Schiek, Richard Louis; Sholander, Peter E.; Thornquist, Heidi K.; Verley, Jason C.; Baur, David Gregory
2014-03-01
This document is a reference guide to the Xyce Parallel Electronic Simulator, and is a companion document to the Xyce Users Guide [1] . The focus of this document is (to the extent possible) exhaustively list device parameters, solver options, parser options, and other usage details of Xyce. This document is not intended to be a tutorial. Users who are new to circuit simulation are better served by the Xyce Users Guide [1] .
Xyce™ Parallel Electronic Simulator: Reference Guide, Version 5.1
Energy Technology Data Exchange (ETDEWEB)
Keiter, Eric R. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States). Electrical and Microsystems Modeling; Mei, Ting [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States). Electrical and Microsystems Modeling; Russo, Thomas V. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States). Electrical and Microsystems Modeling; Rankin, Eric Lamont [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States). Electrical and Microsystems Modeling; Schiek, Richard Louis [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States). Electrical and Microsystems Modeling; Santarelli, Keith R. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States). Electrical and Microsystems Modeling; Thornquist, Heidi K. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States). Electrical and Microsystems Modeling; Fixel, Deborah A. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States). Electrical and Microsystems Modeling; Coffey, Todd S. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States). Applied Mathematics and Applications; Pawlowski, Roger P. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States). Applied Mathematics and Applications
2009-11-01
This document is a reference guide to the Xyce Parallel Electronic Simulator, and is a companion document to the Xyce Users’ Guide. The focus of this document is (to the extent possible) exhaustively list device parameters, solver options, parser options, and other usage details of Xyce. This document is not intended to be a tutorial. Users who are new to circuit simulation are better served by the Xyce Users’ Guide.
Xyce Parallel Electronic Simulator : reference guide, version 4.1.
Energy Technology Data Exchange (ETDEWEB)
Mei, Ting; Rankin, Eric Lamont; Thornquist, Heidi K.; Santarelli, Keith R.; Fixel, Deborah A.; Coffey, Todd Stirling; Russo, Thomas V.; Schiek, Richard Louis; Keiter, Eric Richard; Pawlowski, Roger Patrick
2009-02-01
This document is a reference guide to the Xyce Parallel Electronic Simulator, and is a companion document to the Xyce Users Guide. The focus of this document is (to the extent possible) exhaustively list device parameters, solver options, parser options, and other usage details of Xyce. This document is not intended to be a tutorial. Users who are new to circuit simulation are better served by the Xyce Users Guide.
Xyce™ Parallel Electronic Simulator Reference Guide Version 6.8
Energy Technology Data Exchange (ETDEWEB)
Keiter, Eric R. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Aadithya, Karthik Venkatraman [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Mei, Ting [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Russo, Thomas V. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Schiek, Richard L. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Sholander, Peter E. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Thornquist, Heidi K. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Verley, Jason C. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)
2017-10-01
This document is a reference guide to the Xyce Parallel Electronic Simulator, and is a companion document to the Xyce Users' Guide. The focus of this document is (to the extent possible) exhaustively list device parameters, solver options, parser options, and other usage details of Xyce . This document is not intended to be a tutorial. Users who are new to circuit simulation are better served by the Xyce Users' Guide.
Xyce parallel electronic simulator reference guide, version 6.0.
Energy Technology Data Exchange (ETDEWEB)
Keiter, Eric R; Mei, Ting; Russo, Thomas V.; Schiek, Richard Louis; Thornquist, Heidi K.; Verley, Jason C.; Fixel, Deborah A.; Coffey, Todd S; Pawlowski, Roger P; Warrender, Christina E.; Baur, David Gregory.
2013-08-01
This document is a reference guide to the Xyce Parallel Electronic Simulator, and is a companion document to the Xyce Users Guide [1] . The focus of this document is (to the extent possible) exhaustively list device parameters, solver options, parser options, and other usage details of Xyce. This document is not intended to be a tutorial. Users who are new to circuit simulation are better served by the Xyce Users Guide [1] .
Smoldyn on graphics processing units: massively parallel Brownian dynamics simulations.
Dematté, Lorenzo
2012-01-01
Space is a very important aspect in the simulation of biochemical systems; recently, the need for simulation algorithms able to cope with space is becoming more and more compelling. Complex and detailed models of biochemical systems need to deal with the movement of single molecules and particles, taking into consideration localized fluctuations, transportation phenomena, and diffusion. A common drawback of spatial models lies in their complexity: models can become very large, and their simulation could be time consuming, especially if we want to capture the systems behavior in a reliable way using stochastic methods in conjunction with a high spatial resolution. In order to deliver the promise done by systems biology to be able to understand a system as whole, we need to scale up the size of models we are able to simulate, moving from sequential to parallel simulation algorithms. In this paper, we analyze Smoldyn, a widely diffused algorithm for stochastic simulation of chemical reactions with spatial resolution and single molecule detail, and we propose an alternative, innovative implementation that exploits the parallelism of Graphics Processing Units (GPUs). The implementation executes the most computational demanding steps (computation of diffusion, unimolecular, and bimolecular reaction, as well as the most common cases of molecule-surface interaction) on the GPU, computing them in parallel on each molecule of the system. The implementation offers good speed-ups and real time, high quality graphics output
An Advanced Simulation Framework for Parallel Discrete-Event Simulation
Li, P. P.; Tyrrell, R. Yeung D.; Adhami, N.; Li, T.; Henry, H.
1994-01-01
Discrete-event simulation (DEVS) users have long been faced with a three-way trade-off of balancing execution time, model fidelity, and number of objects simulated. Because of the limits of computer processing power the analyst is often forced to settle for less than desired performances in one or more of these areas.
Study and simulation of a parallel numerical processing machine
International Nuclear Information System (INIS)
Bel Hadj, Slaheddine
1981-12-01
This study has been carried out in the perspective of the implementation on a minicomputer of the NEPTUNIX package (software for the resolution of very large algebra-differential equation systems). Aiming at increasing the system performance, a previous research work has shown the necessity of reducing the execution time of certain numerical computation tasks, which are of frequent use. It has also demonstrated the feasibility of handling these tasks with efficient algorithms of parallel type. The present work deals with the study and simulation of a parallel architecture processor adapted to the fast execution of these algorithms. A minicomputer fitted with a connection to such a parallel processor, has a greatly extended computing power. Then the architecture of a parallel numerical processor, based on the use of VLSI microprocessors and co-processors, is described. Its design aims at the best cost / performance ratio. The last part deals with the simulation processor with the 'CHAMBOR' program. Results show an increasing factor of 30 in speed, in comparison with the execution on a MITRA 15 minicomputer. Moreover the conflicts importance, mainly at the level of access to a shared resource is evaluated. Although this implementation has been designed having in mind a dedicated application, other uses could be envisaged, particularly for the simulation of nuclear reactors: operator guiding system, the behavioural study under accidental circumstances, etc. (author) [fr
Efficient Parallel Algorithm For Direct Numerical Simulation of Turbulent Flows
Moitra, Stuti; Gatski, Thomas B.
1997-01-01
A distributed algorithm for a high-order-accurate finite-difference approach to the direct numerical simulation (DNS) of transition and turbulence in compressible flows is described. This work has two major objectives. The first objective is to demonstrate that parallel and distributed-memory machines can be successfully and efficiently used to solve computationally intensive and input/output intensive algorithms of the DNS class. The second objective is to show that the computational complexity involved in solving the tridiagonal systems inherent in the DNS algorithm can be reduced by algorithm innovations that obviate the need to use a parallelized tridiagonal solver.
Kinematics analysis and simulation of a new underactuated parallel robot
Directory of Open Access Journals (Sweden)
Wenxu YAN
2017-04-01
Full Text Available The number of degrees of freedom is equal to the number of the traditional robot driving motors, which causes defects such as low efficiency. To overcome that problem, based on the traditional parallel robot, a new underactuated parallel robot is presented. The structure characteristics and working principles of the underactuated parallel robot are analyzed. The forward and inverse solutions are derived by way of space analytic geometry and vector algebra. The kinematics model is established, and MATLAB is implied to verify the accuracy of forward and inverse solutions and identify the optimal work space. The simulation results show that the robot can realize the function of robot switch with three or four degrees of freedom when the number of driving motors is three, improving the efficiency of robot grasping, with the characteristics of large working space, high speed operation, high positioning accuracy, low manufacturing cost and so on, and it will have a wide range of industrial applications.
Parallel and distributed processing in power system simulation and control
Energy Technology Data Exchange (ETDEWEB)
Falcao, Djalma M [Universidade Federal, Rio de Janeiro, RJ (Brazil). Coordenacao dos Programas de Pos-graduacao de Engenharia
1994-12-31
Recent advances in computer technology will certainly have a great impact in the methodologies used in power system expansion and operational planning as well as in real-time control. Parallel and distributed processing are among the new technologies that present great potential for application in these areas. Parallel computers use multiple functional or processing units to speed up computation while distributed processing computer systems are collection of computers joined together by high speed communication networks having many objectives and advantages. The paper presents some ideas for the use of parallel and distributed processing in power system simulation and control. It also comments on some of the current research work in these topics and presents a summary of the work presently being developed at COPPE. (author) 53 refs., 2 figs.
The cost of conservative synchronization in parallel discrete event simulations
Nicol, David M.
1990-01-01
The performance of a synchronous conservative parallel discrete-event simulation protocol is analyzed. The class of simulation models considered is oriented around a physical domain and possesses a limited ability to predict future behavior. A stochastic model is used to show that as the volume of simulation activity in the model increases relative to a fixed architecture, the complexity of the average per-event overhead due to synchronization, event list manipulation, lookahead calculations, and processor idle time approach the complexity of the average per-event overhead of a serial simulation. The method is therefore within a constant factor of optimal. The analysis demonstrates that on large problems--those for which parallel processing is ideally suited--there is often enough parallel workload so that processors are not usually idle. The viability of the method is also demonstrated empirically, showing how good performance is achieved on large problems using a thirty-two node Intel iPSC/2 distributed memory multiprocessor.
Micro-mechanical Simulations of Soils using Massively Parallel Supercomputers
Directory of Open Access Journals (Sweden)
David W. Washington
2004-06-01
Full Text Available In this research a computer program, Trubal version 1.51, based on the Discrete Element Method was converted to run on a Connection Machine (CM-5,a massively parallel supercomputer with 512 nodes, to expedite the computational times of simulating Geotechnical boundary value problems. The dynamic memory algorithm in Trubal program did not perform efficiently in CM-2 machine with the Single Instruction Multiple Data (SIMD architecture. This was due to the communication overhead involving global array reductions, global array broadcast and random data movement. Therefore, a dynamic memory algorithm in Trubal program was converted to a static memory arrangement and Trubal program was successfully converted to run on CM-5 machines. The converted program was called "TRUBAL for Parallel Machines (TPM." Simulating two physical triaxial experiments and comparing simulation results with Trubal simulations validated the TPM program. With a 512 nodes CM-5 machine TPM produced a nine-fold speedup demonstrating the inherent parallelism within algorithms based on the Discrete Element Method.
Noise simulation in cone beam CT imaging with parallel computing
International Nuclear Information System (INIS)
Tu, S.-J.; Shaw, Chris C; Chen, Lingyun
2006-01-01
We developed a computer noise simulation model for cone beam computed tomography imaging using a general purpose PC cluster. This model uses a mono-energetic x-ray approximation and allows us to investigate three primary performance components, specifically quantum noise, detector blurring and additive system noise. A parallel random number generator based on the Weyl sequence was implemented in the noise simulation and a visualization technique was accordingly developed to validate the quality of the parallel random number generator. In our computer simulation model, three-dimensional (3D) phantoms were mathematically modelled and used to create 450 analytical projections, which were then sampled into digital image data. Quantum noise was simulated and added to the analytical projection image data, which were then filtered to incorporate flat panel detector blurring. Additive system noise was generated and added to form the final projection images. The Feldkamp algorithm was implemented and used to reconstruct the 3D images of the phantoms. A 24 dual-Xeon PC cluster was used to compute the projections and reconstructed images in parallel with each CPU processing 10 projection views for a total of 450 views. Based on this computer simulation system, simulated cone beam CT images were generated for various phantoms and technique settings. Noise power spectra for the flat panel x-ray detector and reconstructed images were then computed to characterize the noise properties. As an example among the potential applications of our noise simulation model, we showed that images of low contrast objects can be produced and used for image quality evaluation
Modularized Parallel Neutron Instrument Simulation on the TeraGrid
International Nuclear Information System (INIS)
Chen, Meili; Cobb, John W.; Hagen, Mark E.; Miller, Stephen D.; Lynch, Vickie E.
2007-01-01
In order to build a bridge between the TeraGrid (TG), a national scale cyberinfrastructure resource, and neutron science, the Neutron Science TeraGrid Gateway (NSTG) is focused on introducing productive HPC usage to the neutron science community, primarily the Spallation Neutron Source (SNS) at Oak Ridge National Laboratory (ORNL). Monte Carlo simulations are used as a powerful tool for instrument design and optimization at SNS. One of the successful efforts of a collaboration team composed of NSTG HPC experts and SNS instrument scientists is the development of a software facility named PSoNI, Parallelizing Simulations of Neutron Instruments. Parallelizing the traditional serial instrument simulation on TeraGrid resources, PSoNI quickly computes full instrument simulation at sufficient statistical levels in instrument de-sign. Upon SNS successful commissioning, to the end of 2007, three out of five commissioned instruments in SNS target station will be available for initial users. Advanced instrument study, proposal feasibility evaluation, and experiment planning are on the immediate schedule of SNS, which pose further requirements such as flexibility and high runtime efficiency on fast instrument simulation. PSoNI has been redesigned to meet the new challenges and a preliminary version is developed on TeraGrid. This paper explores the motivation and goals of the new design, and the improved software structure. Further, it describes the realized new features seen from MPI parallelized McStas running high resolution design simulations of the SEQUOIA and BSS instruments at SNS. A discussion regarding future work, which is targeted to do fast simulation for automated experiment adjustment and comparing models to data in analysis, is also presented
Fast robot kinematics modeling by using a parallel simulator (PSIM)
International Nuclear Information System (INIS)
El-Gazzar, H.M.; Ayad, N.M.A.
2002-01-01
High-speed computers are strongly needed not only for solving scientific and engineering problems, but also for numerous industrial applications. Such applications include computer-aided design, oil exploration, weather predication, space applications and safety of nuclear reactors. The rapid development in VLSI technology makes it possible to implement time consuming algorithms in real-time situations. Parallel processing approaches can now be used to reduce the processing-time for models of very high mathematical structure such as the kinematics molding of robot manipulator. This system is used to construct and evaluate the performance and cost effectiveness of several proposed methods to solve the Jacobian algorithm. Parallelism is introduced to the algorithms by using different task-allocations and dividing the whole job into sub tasks. Detailed analysis is performed and results are obtained for the case of six DOF (degree of freedom) robot arms (Stanford Arm). Execution times comparisons between Von Neumann (uni processor) and parallel processor architectures by using parallel simulator package (PSIM) are presented. The gained results are much in favour for the parallel techniques by at least fifty-percent improvements. Of course, further studies are needed to achieve the convenient and optimum number of processors has to be done
Fast robot kinematics modeling by using a parallel simulator (PSIM)
Energy Technology Data Exchange (ETDEWEB)
El-Gazzar, H M; Ayad, N M.A. [Atomic Energy Authority, Reactor Dept., Computer and Control Lab., P.O. Box no 13759 (Egypt)
2002-09-15
High-speed computers are strongly needed not only for solving scientific and engineering problems, but also for numerous industrial applications. Such applications include computer-aided design, oil exploration, weather predication, space applications and safety of nuclear reactors. The rapid development in VLSI technology makes it possible to implement time consuming algorithms in real-time situations. Parallel processing approaches can now be used to reduce the processing-time for models of very high mathematical structure such as the kinematics molding of robot manipulator. This system is used to construct and evaluate the performance and cost effectiveness of several proposed methods to solve the Jacobian algorithm. Parallelism is introduced to the algorithms by using different task-allocations and dividing the whole job into sub tasks. Detailed analysis is performed and results are obtained for the case of six DOF (degree of freedom) robot arms (Stanford Arm). Execution times comparisons between Von Neumann (uni processor) and parallel processor architectures by using parallel simulator package (PSIM) are presented. The gained results are much in favour for the parallel techniques by at least fifty-percent improvements. Of course, further studies are needed to achieve the convenient and optimum number of processors has to be done.
Parallel Stochastic discrete event simulation of calcium dynamics in neuron.
Ishlam Patoary, Mohammad Nazrul; Tropper, Carl; McDougal, Robert A; Zhongwei, Lin; Lytton, William W
2017-09-26
The intra-cellular calcium signaling pathways of a neuron depends on both biochemical reactions and diffusions. Some quasi-isolated compartments (e.g. spines) are so small and calcium concentrations are so low that one extra molecule diffusing in by chance can make a nontrivial difference in its concentration (percentage-wise). These rare events can affect dynamics discretely in such way that they cannot be evaluated by a deterministic simulation. Stochastic models of such a system provide a more detailed understanding of these systems than existing deterministic models because they capture their behavior at a molecular level. Our research focuses on the development of a high performance parallel discrete event simulation environment, Neuron Time Warp (NTW), which is intended for use in the parallel simulation of stochastic reaction-diffusion systems such as intra-calcium signaling. NTW is integrated with NEURON, a simulator which is widely used within the neuroscience community. We simulate two models, a calcium buffer and a calcium wave model. The calcium buffer model is employed in order to verify the correctness and performance of NTW by comparing it to a serial deterministic simulation in NEURON. We also derived a discrete event calcium wave model from a deterministic model using the stochastic IP3R structure.
Particle simulation on a distributed memory highly parallel processor
International Nuclear Information System (INIS)
Sato, Hiroyuki; Ikesaka, Morio
1990-01-01
This paper describes parallel molecular dynamics simulation of atoms governed by local force interaction. The space in the model is divided into cubic subspaces and mapped to the processor array of the CAP-256, a distributed memory, highly parallel processor developed at Fujitsu Labs. We developed a new technique to avoid redundant calculation of forces between atoms in different processors. Experiments showed the communication overhead was less than 5%, and the idle time due to load imbalance was less than 11% for two model problems which contain 11,532 and 46,128 argon atoms. From the software simulation, the CAP-II which is under development is estimated to be about 45 times faster than CAP-256 and will be able to run the same problem about 40 times faster than Fujitsu's M-380 mainframe when 256 processors are used. (author)
Parallel algorithms for simulating continuous time Markov chains
Nicol, David M.; Heidelberger, Philip
1992-01-01
We have previously shown that the mathematical technique of uniformization can serve as the basis of synchronization for the parallel simulation of continuous-time Markov chains. This paper reviews the basic method and compares five different methods based on uniformization, evaluating their strengths and weaknesses as a function of problem characteristics. The methods vary in their use of optimism, logical aggregation, communication management, and adaptivity. Performance evaluation is conducted on the Intel Touchstone Delta multiprocessor, using up to 256 processors.
cellGPU: Massively parallel simulations of dynamic vertex models
Sussman, Daniel M.
2017-10-01
Vertex models represent confluent tissue by polygonal or polyhedral tilings of space, with the individual cells interacting via force laws that depend on both the geometry of the cells and the topology of the tessellation. This dependence on the connectivity of the cellular network introduces several complications to performing molecular-dynamics-like simulations of vertex models, and in particular makes parallelizing the simulations difficult. cellGPU addresses this difficulty and lays the foundation for massively parallelized, GPU-based simulations of these models. This article discusses its implementation for a pair of two-dimensional models, and compares the typical performance that can be expected between running cellGPU entirely on the CPU versus its performance when running on a range of commercial and server-grade graphics cards. By implementing the calculation of topological changes and forces on cells in a highly parallelizable fashion, cellGPU enables researchers to simulate time- and length-scales previously inaccessible via existing single-threaded CPU implementations. Program Files doi:http://dx.doi.org/10.17632/6j2cj29t3r.1 Licensing provisions: MIT Programming language: CUDA/C++ Nature of problem: Simulations of off-lattice "vertex models" of cells, in which the interaction forces depend on both the geometry and the topology of the cellular aggregate. Solution method: Highly parallelized GPU-accelerated dynamical simulations in which the force calculations and the topological features can be handled on either the CPU or GPU. Additional comments: The code is hosted at https://gitlab.com/dmsussman/cellGPU, with documentation additionally maintained at http://dmsussman.gitlab.io/cellGPUdocumentation
Time parallelization of advanced operation scenario simulations of ITER plasma
International Nuclear Information System (INIS)
Samaddar, D; Casper, T A; Kim, S H; Houlberg, W A; Berry, L A; Elwasif, W R; Batchelor, D
2013-01-01
This work demonstrates that simulations of advanced burning plasma operation scenarios can be successfully parallelized in time using the parareal algorithm. CORSICA -an advanced operation scenario code for tokamak plasmas is used as a test case. This is a unique application since the parareal algorithm has so far been applied to relatively much simpler systems except for the case of turbulence. In the present application, a computational gain of an order of magnitude has been achieved which is extremely promising. A successful implementation of the Parareal algorithm to codes like CORSICA ushers in the possibility of time efficient simulations of ITER plasmas.
Xyce™ Parallel Electronic Simulator Reference Guide, Version 6.5
Energy Technology Data Exchange (ETDEWEB)
Keiter, Eric R. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States). Electrical Models and Simulation; Aadithya, Karthik V. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States). Electrical Models and Simulation; Mei, Ting [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States). Electrical Models and Simulation; Russo, Thomas V. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States). Electrical Models and Simulation; Schiek, Richard L. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States). Electrical Models and Simulation; Sholander, Peter E. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States). Electrical Models and Simulation; Thornquist, Heidi K. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States). Electrical Models and Simulation; Verley, Jason C. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States). Electrical Models and Simulation
2016-06-01
This document is a reference guide to the Xyce Parallel Electronic Simulator, and is a companion document to the Xyce Users’ Guide. The focus of this document is (to the extent possible) exhaustively list device parameters, solver options, parser options, and other usage details of Xyce. This document is not intended to be a tutorial. Users who are new to circuit simulation are better served by the Xyce Users’ Guide. The information herein is subject to change without notice. Copyright © 2002-2016 Sandia Corporation. All rights reserved.
SPINET: A Parallel Computing Approach to Spine Simulations
Directory of Open Access Journals (Sweden)
Peter G. Kropf
1996-01-01
Full Text Available Research in scientitic programming enables us to realize more and more complex applications, and on the other hand, application-driven demands on computing methods and power are continuously growing. Therefore, interdisciplinary approaches become more widely used. The interdisciplinary SPINET project presented in this article applies modern scientific computing tools to biomechanical simulations: parallel computing and symbolic and modern functional programming. The target application is the human spine. Simulations of the spine help us to investigate and better understand the mechanisms of back pain and spinal injury. Two approaches have been used: the first uses the finite element method for high-performance simulations of static biomechanical models, and the second generates a simulation developmenttool for experimenting with different dynamic models. A finite element program for static analysis has been parallelized for the MUSIC machine. To solve the sparse system of linear equations, a conjugate gradient solver (iterative method and a frontal solver (direct method have been implemented. The preprocessor required for the frontal solver is written in the modern functional programming language SML, the solver itself in C, thus exploiting the characteristic advantages of both functional and imperative programming. The speedup analysis of both solvers show very satisfactory results for this irregular problem. A mixed symbolic-numeric environment for rigid body system simulations is presented. It automatically generates C code from a problem specification expressed by the Lagrange formalism using Maple.
Synchronous Parallel System for Emulation and Discrete Event Simulation
Steinman, Jeffrey S. (Inventor)
2001-01-01
A synchronous parallel system for emulation and discrete event simulation having parallel nodes responds to received messages at each node by generating event objects having individual time stamps, stores only the changes to the state variables of the simulation object attributable to the event object and produces corresponding messages. The system refrains from transmitting the messages and changing the state variables while it determines whether the changes are superseded, and then stores the unchanged state variables in the event object for later restoral to the simulation object if called for. This determination preferably includes sensing the time stamp of each new event object and determining which new event object has the earliest time stamp as the local event horizon, determining the earliest local event horizon of the nodes as the global event horizon, and ignoring events whose time stamps are less than the global event horizon. Host processing between the system and external terminals enables such a terminal to query, monitor, command or participate with a simulation object during the simulation process.
HPC parallel programming model for gyrokinetic MHD simulation
International Nuclear Information System (INIS)
Naitou, Hiroshi; Yamada, Yusuke; Tokuda, Shinji; Ishii, Yasutomo; Yagi, Masatoshi
2011-01-01
The 3-dimensional gyrokinetic PIC (particle-in-cell) code for MHD simulation, Gpic-MHD, was installed on SR16000 (“Plasma Simulator”), which is a scalar cluster system consisting of 8,192 logical cores. The Gpic-MHD code advances particle and field quantities in time. In order to distribute calculations over large number of logical cores, the total simulation domain in cylindrical geometry was broken up into N DD-r × N DD-z (number of radial decomposition times number of axial decomposition) small domains including approximately the same number of particles. The axial direction was uniformly decomposed, while the radial direction was non-uniformly decomposed. N RP replicas (copies) of each decomposed domain were used (“particle decomposition”). The hybrid parallelization model of multi-threads and multi-processes was employed: threads were parallelized by the auto-parallelization and N DD-r × N DD-z × N RP processes were parallelized by MPI (message-passing interface). The parallelization performance of Gpic-MHD was investigated for the medium size system of N r × N θ × N z = 1025 × 128 × 128 mesh with 4.196 or 8.192 billion particles. The highest speed for the fixed number of logical cores was obtained for two threads, the maximum number of N DD-z , and optimum combination of N DD-r and N RP . The observed optimum speeds demonstrated good scaling up to 8,192 logical cores. (author)
Parallel hyperbolic PDE simulation on clusters: Cell versus GPU
Rostrup, Scott; De Sterck, Hans
2010-12-01
Increasingly, high-performance computing is looking towards data-parallel computational devices to enhance computational performance. Two technologies that have received significant attention are IBM's Cell Processor and NVIDIA's CUDA programming model for graphics processing unit (GPU) computing. In this paper we investigate the acceleration of parallel hyperbolic partial differential equation simulation on structured grids with explicit time integration on clusters with Cell and GPU backends. The message passing interface (MPI) is used for communication between nodes at the coarsest level of parallelism. Optimizations of the simulation code at the several finer levels of parallelism that the data-parallel devices provide are described in terms of data layout, data flow and data-parallel instructions. Optimized Cell and GPU performance are compared with reference code performance on a single x86 central processing unit (CPU) core in single and double precision. We further compare the CPU, Cell and GPU platforms on a chip-to-chip basis, and compare performance on single cluster nodes with two CPUs, two Cell processors or two GPUs in a shared memory configuration (without MPI). We finally compare performance on clusters with 32 CPUs, 32 Cell processors, and 32 GPUs using MPI. Our GPU cluster results use NVIDIA Tesla GPUs with GT200 architecture, but some preliminary results on recently introduced NVIDIA GPUs with the next-generation Fermi architecture are also included. This paper provides computational scientists and engineers who are considering porting their codes to accelerator environments with insight into how structured grid based explicit algorithms can be optimized for clusters with Cell and GPU accelerators. It also provides insight into the speed-up that may be gained on current and future accelerator architectures for this class of applications. Program summaryProgram title: SWsolver Catalogue identifier: AEGY_v1_0 Program summary URL
An FPGA-Based Massively Parallel Neuromorphic Cortex Simulator.
Wang, Runchun M; Thakur, Chetan S; van Schaik, André
2018-01-01
This paper presents a massively parallel and scalable neuromorphic cortex simulator designed for simulating large and structurally connected spiking neural networks, such as complex models of various areas of the cortex. The main novelty of this work is the abstraction of a neuromorphic architecture into clusters represented by minicolumns and hypercolumns, analogously to the fundamental structural units observed in neurobiology. Without this approach, simulating large-scale fully connected networks needs prohibitively large memory to store look-up tables for point-to-point connections. Instead, we use a novel architecture, based on the structural connectivity in the neocortex, such that all the required parameters and connections can be stored in on-chip memory. The cortex simulator can be easily reconfigured for simulating different neural networks without any change in hardware structure by programming the memory. A hierarchical communication scheme allows one neuron to have a fan-out of up to 200 k neurons. As a proof-of-concept, an implementation on one Altera Stratix V FPGA was able to simulate 20 million to 2.6 billion leaky-integrate-and-fire (LIF) neurons in real time. We verified the system by emulating a simplified auditory cortex (with 100 million neurons). This cortex simulator achieved a low power dissipation of 1.62 μW per neuron. With the advent of commercially available FPGA boards, our system offers an accessible and scalable tool for the design, real-time simulation, and analysis of large-scale spiking neural networks.
Simulation and parallel connection of step-down piezoelectric transformers
International Nuclear Information System (INIS)
Thang, Vo Viet; Kim, In Sung; Jeong, Soon Jong; Kim, Min Soo; Song, Jae Sung
2012-01-01
Piezoelectric transformers have been used widely in electronic circuits due to advantages such as high efficiency, miniaturization and no flammability; however the output power has been limited. For overcoming this drawback, some research has recently been focused on connections between piezoelectric transformers. Based on these operations, the output power has been improved compared to the single operation. Parallel operation of step-down piezoelectric transformers is presented in this paper. An important factor affecting the parallel operation of piezoelectric transformer was the resonance frequency, and a small difference in resonance frequencies was obtained with transformers having the same dimensions and fabricating processes. The piezoelectric transformers were found to operate in first radial mode at a frequency of 68 kHz. An equivalent circuit was used to investigate parallel driving of piezoelectric transformers and then to compare the result with experimental observations. The electrical characteristics, including the output voltage, output power and efficient were measured at a matching resistive load. Effects of frequency on the step-down ratio and of the input voltage on the power properties in the simulation were similar to the experimental results. The output power of the parallel operation was 35 W at a load of 50 Ω and an input voltage of 100 V; the temperature rise was 30 .deg. C and the efficiency was 88%.
Parallel conjugate gradient algorithms for manipulator dynamic simulation
Fijany, Amir; Scheld, Robert E.
1989-01-01
Parallel conjugate gradient algorithms for the computation of multibody dynamics are developed for the specialized case of a robot manipulator. For an n-dimensional positive-definite linear system, the Classical Conjugate Gradient (CCG) algorithms are guaranteed to converge in n iterations, each with a computation cost of O(n); this leads to a total computational cost of O(n sq) on a serial processor. A conjugate gradient algorithms is presented that provide greater efficiency using a preconditioner, which reduces the number of iterations required, and by exploiting parallelism, which reduces the cost of each iteration. Two Preconditioned Conjugate Gradient (PCG) algorithms are proposed which respectively use a diagonal and a tridiagonal matrix, composed of the diagonal and tridiagonal elements of the mass matrix, as preconditioners. Parallel algorithms are developed to compute the preconditioners and their inversions in O(log sub 2 n) steps using n processors. A parallel algorithm is also presented which, on the same architecture, achieves the computational time of O(log sub 2 n) for each iteration. Simulation results for a seven degree-of-freedom manipulator are presented. Variants of the proposed algorithms are also developed which can be efficiently implemented on the Robot Mathematics Processor (RMP).
Capacity Analysis for Parallel Runway through Agent-Based Simulation
Directory of Open Access Journals (Sweden)
Yang Peng
2013-01-01
Full Text Available Parallel runway is the mainstream structure of China hub airport, runway is often the bottleneck of an airport, and the evaluation of its capacity is of great importance to airport management. This study outlines a model, multiagent architecture, implementation approach, and software prototype of a simulation system for evaluating runway capacity. Agent Unified Modeling Language (AUML is applied to illustrate the inbound and departing procedure of planes and design the agent-based model. The model is evaluated experimentally, and the quality is studied in comparison with models, created by SIMMOD and Arena. The results seem to be highly efficient, so the method can be applied to parallel runway capacity evaluation and the model propose favorable flexibility and extensibility.
Xyce Parallel Electronic Simulator Users Guide Version 6.2.
Energy Technology Data Exchange (ETDEWEB)
Keiter, Eric R.; Mei, Ting; Russo, Thomas V.; Schiek, Richard Louis; Sholander, Peter E.; Thornquist, Heidi K.; Verley, Jason C.; Baur, David Gregory
2014-09-01
This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been de- signed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: Capability to solve extremely large circuit problems by supporting large-scale parallel com- puting platforms (up to thousands of processors). This includes support for most popular parallel and serial computers. A differential-algebraic-equation (DAE) formulation, which better isolates the device model package from solver algorithms. This allows one to develop new types of analysis without requiring the implementation of analysis-specific device models. Device models that are specifically tailored to meet Sandia's needs, including some radiation- aware devices (for Sandia users only). Object-oriented code design and implementation using modern coding practices. Xyce is a parallel code in the most general sense of the phrase -- a message passing parallel implementation -- which allows it to run efficiently a wide range of computing platforms. These include serial, shared-memory and distributed-memory parallel platforms. Attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows. Trademarks The information herein is subject to change without notice. Copyright c 2002-2014 Sandia Corporation. All rights reserved. Xyce TM Electronic Simulator and Xyce TM are trademarks of Sandia Corporation. Portions of the Xyce TM code are: Copyright c 2002, The Regents of the University of California. Produced at the Lawrence Livermore National Laboratory. Written by Alan Hindmarsh, Allan Taylor, Radu Serban. UCRL-CODE-2002-59 All rights reserved. Orcad, Orcad Capture, PSpice and Probe are
Xyce Parallel Electronic Simulator Users Guide Version 6.4
Energy Technology Data Exchange (ETDEWEB)
Keiter, Eric R. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Mei, Ting [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Russo, Thomas V. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Schiek, Richard [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Sholander, Peter E. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Thornquist, Heidi K. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Verley, Jason [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Baur, David Gregory [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)
2015-12-01
This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been de- signed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: Capability to solve extremely large circuit problems by supporting large-scale parallel com- puting platforms (up to thousands of processors). This includes support for most popular parallel and serial computers. A differential-algebraic-equation (DAE) formulation, which better isolates the device model package from solver algorithms. This allows one to develop new types of analysis without requiring the implementation of analysis-specific device models. Device models that are specifically tailored to meet Sandia's needs, including some radiation- aware devices (for Sandia users only). Object-oriented code design and implementation using modern coding practices. Xyce is a parallel code in the most general sense of the phrase -- a message passing parallel implementation -- which allows it to run efficiently a wide range of computing platforms. These include serial, shared-memory and distributed-memory parallel platforms. Attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows. Trademarks The information herein is subject to change without notice. Copyright c 2002-2015 Sandia Corporation. All rights reserved. Xyce TM Electronic Simulator and Xyce TM are trademarks of Sandia Corporation. Portions of the Xyce TM code are: Copyright c 2002, The Regents of the University of California. Produced at the Lawrence Livermore National Laboratory. Written by Alan Hindmarsh, Allan Taylor, Radu Serban. UCRL-CODE-2002-59 All rights reserved. Orcad, Orcad Capture, PSpice and Probe are
CHOLLA: A NEW MASSIVELY PARALLEL HYDRODYNAMICS CODE FOR ASTROPHYSICAL SIMULATION
Energy Technology Data Exchange (ETDEWEB)
Schneider, Evan E.; Robertson, Brant E. [Steward Observatory, University of Arizona, 933 North Cherry Avenue, Tucson, AZ 85721 (United States)
2015-04-15
We present Computational Hydrodynamics On ParaLLel Architectures (Cholla ), a new three-dimensional hydrodynamics code that harnesses the power of graphics processing units (GPUs) to accelerate astrophysical simulations. Cholla models the Euler equations on a static mesh using state-of-the-art techniques, including the unsplit Corner Transport Upwind algorithm, a variety of exact and approximate Riemann solvers, and multiple spatial reconstruction techniques including the piecewise parabolic method (PPM). Using GPUs, Cholla evolves the fluid properties of thousands of cells simultaneously and can update over 10 million cells per GPU-second while using an exact Riemann solver and PPM reconstruction. Owing to the massively parallel architecture of GPUs and the design of the Cholla code, astrophysical simulations with physically interesting grid resolutions (≳256{sup 3}) can easily be computed on a single device. We use the Message Passing Interface library to extend calculations onto multiple devices and demonstrate nearly ideal scaling beyond 64 GPUs. A suite of test problems highlights the physical accuracy of our modeling and provides a useful comparison to other codes. We then use Cholla to simulate the interaction of a shock wave with a gas cloud in the interstellar medium, showing that the evolution of the cloud is highly dependent on its density structure. We reconcile the computed mixing time of a turbulent cloud with a realistic density distribution destroyed by a strong shock with the existing analytic theory for spherical cloud destruction by describing the system in terms of its median gas density.
CHOLLA: A NEW MASSIVELY PARALLEL HYDRODYNAMICS CODE FOR ASTROPHYSICAL SIMULATION
International Nuclear Information System (INIS)
Schneider, Evan E.; Robertson, Brant E.
2015-01-01
We present Computational Hydrodynamics On ParaLLel Architectures (Cholla ), a new three-dimensional hydrodynamics code that harnesses the power of graphics processing units (GPUs) to accelerate astrophysical simulations. Cholla models the Euler equations on a static mesh using state-of-the-art techniques, including the unsplit Corner Transport Upwind algorithm, a variety of exact and approximate Riemann solvers, and multiple spatial reconstruction techniques including the piecewise parabolic method (PPM). Using GPUs, Cholla evolves the fluid properties of thousands of cells simultaneously and can update over 10 million cells per GPU-second while using an exact Riemann solver and PPM reconstruction. Owing to the massively parallel architecture of GPUs and the design of the Cholla code, astrophysical simulations with physically interesting grid resolutions (≳256 3 ) can easily be computed on a single device. We use the Message Passing Interface library to extend calculations onto multiple devices and demonstrate nearly ideal scaling beyond 64 GPUs. A suite of test problems highlights the physical accuracy of our modeling and provides a useful comparison to other codes. We then use Cholla to simulate the interaction of a shock wave with a gas cloud in the interstellar medium, showing that the evolution of the cloud is highly dependent on its density structure. We reconcile the computed mixing time of a turbulent cloud with a realistic density distribution destroyed by a strong shock with the existing analytic theory for spherical cloud destruction by describing the system in terms of its median gas density
Scalable and fast heterogeneous molecular simulation with predictive parallelization schemes
International Nuclear Information System (INIS)
Guzman, Horacio V.; Junghans, Christoph; Kremer, Kurt; Stuehn, Torsten
2017-01-01
Multiscale and inhomogeneous molecular systems are challenging topics in the field of molecular simulation. In particular, modeling biological systems in the context of multiscale simulations and exploring material properties are driving a permanent development of new simulation methods and optimization algorithms. In computational terms, those methods require parallelization schemes that make a productive use of computational resources for each simulation and from its genesis. Here, we introduce the heterogeneous domain decomposition approach, which is a combination of an heterogeneity-sensitive spatial domain decomposition with an a priori rearrangement of subdomain walls. Within this approach and paper, the theoretical modeling and scaling laws for the force computation time are proposed and studied as a function of the number of particles and the spatial resolution ratio. We also show the new approach capabilities, by comparing it to both static domain decomposition algorithms and dynamic load-balancing schemes. Specifically, two representative molecular systems have been simulated and compared to the heterogeneous domain decomposition proposed in this work. Finally, these two systems comprise an adaptive resolution simulation of a biomolecule solvated in water and a phase-separated binary Lennard-Jones fluid.
Scalable and fast heterogeneous molecular simulation with predictive parallelization schemes
Guzman, Horacio V.; Junghans, Christoph; Kremer, Kurt; Stuehn, Torsten
2017-11-01
Multiscale and inhomogeneous molecular systems are challenging topics in the field of molecular simulation. In particular, modeling biological systems in the context of multiscale simulations and exploring material properties are driving a permanent development of new simulation methods and optimization algorithms. In computational terms, those methods require parallelization schemes that make a productive use of computational resources for each simulation and from its genesis. Here, we introduce the heterogeneous domain decomposition approach, which is a combination of an heterogeneity-sensitive spatial domain decomposition with an a priori rearrangement of subdomain walls. Within this approach, the theoretical modeling and scaling laws for the force computation time are proposed and studied as a function of the number of particles and the spatial resolution ratio. We also show the new approach capabilities, by comparing it to both static domain decomposition algorithms and dynamic load-balancing schemes. Specifically, two representative molecular systems have been simulated and compared to the heterogeneous domain decomposition proposed in this work. These two systems comprise an adaptive resolution simulation of a biomolecule solvated in water and a phase-separated binary Lennard-Jones fluid.
Parallelization of ultrasonic field simulations for non destructive testing
International Nuclear Information System (INIS)
Lambert, Jason
2015-01-01
The Non Destructive Testing field increasingly uses simulation. It is used at every step of the whole control process of an industrial part, from speeding up control development to helping experts understand results. During this thesis, a fast ultrasonic field simulation tool dedicated to the computation of an ultrasonic field radiated by a phase array probe in an isotropic specimen has been developed. During this thesis, a simulation tool dedicated to the fast computation of an ultrasonic field radiated by a phased array probe in an isotropic specimen has been developed. Its performance enables an interactive usage. To benefit from the commonly available parallel architectures, a regular model (aimed at removing divergent branching) derived from the generic CIVA model has been developed. First, a reference implementation was developed to validate this model against CIVA results, and to analyze its performance behaviour before optimization. The resulting code has been optimized for three kinds of parallel architectures commonly available in workstations: general purpose processors (GPP), many-core co-processors (Intel MIC) and graphics processing units (nVidia GPU). On the GPP and the MIC, the algorithm was reorganized and implemented to benefit from both parallelism levels, multithreading and vector instructions. On the GPU, the multiple steps of field computing have been divided in multiple successive CUDA kernels. Moreover, libraries dedicated to each architecture were used to speedup Fast Fourier Transforms, Intel MKL on GPP and MIC and nVidia cuFFT on GPU. Performance and hardware adequation of the produced codes were thoroughly studied for each architecture. On multiple realistic control configurations, interactive performance was reached. Perspectives to address more complex configurations were drawn. Finally, the integration and the industrialization of this code in the commercial NDT platform CIVA is discussed. (author) [fr
Massively parallel algorithms for trace-driven cache simulations
Nicol, David M.; Greenberg, Albert G.; Lubachevsky, Boris D.
1991-01-01
Trace driven cache simulation is central to computer design. A trace is a very long sequence of reference lines from main memory. At the t(exp th) instant, reference x sub t is hashed into a set of cache locations, the contents of which are then compared with x sub t. If at the t sup th instant x sub t is not present in the cache, then it is said to be a miss, and is loaded into the cache set, possibly forcing the replacement of some other memory line, and making x sub t present for the (t+1) sup st instant. The problem of parallel simulation of a subtrace of N references directed to a C line cache set is considered, with the aim of determining which references are misses and related statistics. A simulation method is presented for the Least Recently Used (LRU) policy, which regradless of the set size C runs in time O(log N) using N processors on the exclusive read, exclusive write (EREW) parallel model. A simpler LRU simulation algorithm is given that runs in O(C log N) time using N/log N processors. Timings are presented of the second algorithm's implementation on the MasPar MP-1, a machine with 16384 processors. A broad class of reference based line replacement policies are considered, which includes LRU as well as the Least Frequently Used and Random replacement policies. A simulation method is presented for any such policy that on any trace of length N directed to a C line set runs in the O(C log N) time with high probability using N processors on the EREW model. The algorithms are simple, have very little space overhead, and are well suited for SIMD implementation.
An FPGA-Based Massively Parallel Neuromorphic Cortex Simulator
Directory of Open Access Journals (Sweden)
Runchun M. Wang
2018-04-01
Full Text Available This paper presents a massively parallel and scalable neuromorphic cortex simulator designed for simulating large and structurally connected spiking neural networks, such as complex models of various areas of the cortex. The main novelty of this work is the abstraction of a neuromorphic architecture into clusters represented by minicolumns and hypercolumns, analogously to the fundamental structural units observed in neurobiology. Without this approach, simulating large-scale fully connected networks needs prohibitively large memory to store look-up tables for point-to-point connections. Instead, we use a novel architecture, based on the structural connectivity in the neocortex, such that all the required parameters and connections can be stored in on-chip memory. The cortex simulator can be easily reconfigured for simulating different neural networks without any change in hardware structure by programming the memory. A hierarchical communication scheme allows one neuron to have a fan-out of up to 200 k neurons. As a proof-of-concept, an implementation on one Altera Stratix V FPGA was able to simulate 20 million to 2.6 billion leaky-integrate-and-fire (LIF neurons in real time. We verified the system by emulating a simplified auditory cortex (with 100 million neurons. This cortex simulator achieved a low power dissipation of 1.62 μW per neuron. With the advent of commercially available FPGA boards, our system offers an accessible and scalable tool for the design, real-time simulation, and analysis of large-scale spiking neural networks.
Parallel Multiscale Algorithms for Astrophysical Fluid Dynamics Simulations
Norman, Michael L.
1997-01-01
Our goal is to develop software libraries and applications for astrophysical fluid dynamics simulations in multidimensions that will enable us to resolve the large spatial and temporal variations that inevitably arise due to gravity, fronts and microphysical phenomena. The software must run efficiently on parallel computers and be general enough to allow the incorporation of a wide variety of physics. Cosmological structure formation with realistic gas physics is the primary application driver in this work. Accurate simulations of e.g. galaxy formation require a spatial dynamic range (i.e., ratio of system scale to smallest resolved feature) of 104 or more in three dimensions in arbitrary topologies. We take this as our technical requirement. We have achieved, and in fact, surpassed these goals.
International Nuclear Information System (INIS)
Li Hanyu; Zhou Haijing; Dong Zhiwei; Liao Cheng; Chang Lei; Cao Xiaolin; Xiao Li
2010-01-01
A large-scale parallel electromagnetic field simulation program JEMS-FDTD(J Electromagnetic Solver-Finite Difference Time Domain) is designed and implemented on JASMIN (J parallel Adaptive Structured Mesh applications INfrastructure). This program can simulate propagation, radiation, couple of electromagnetic field by solving Maxwell equations on structured mesh explicitly with FDTD method. JEMS-FDTD is able to simulate billion-mesh-scale problems on thousands of processors. In this article, the program is verified by simulating the radiation of an electric dipole. A beam waveguide is simulated to demonstrate the capability of large scale parallel computation. A parallel performance test indicates that a high parallel efficiency is obtained. (authors)
Direct numerical simulation of bubbles with parallelized adaptive mesh refinement
International Nuclear Information System (INIS)
Talpaert, A.
2015-01-01
The study of two-phase Thermal-Hydraulics is a major topic for Nuclear Engineering for both security and efficiency of nuclear facilities. In addition to experiments, numerical modeling helps to knowing precisely where bubbles appear and how they behave, in the core as well as in the steam generators. This work presents the finest scale of representation of two-phase flows, Direct Numerical Simulation of bubbles. We use the 'Di-phasic Low Mach Number' equation model. It is particularly adapted to low-Mach number flows, that is to say flows which velocity is much slower than the speed of sound; this is very typical of nuclear thermal-hydraulics conditions. Because we study bubbles, we capture the front between vapor and liquid phases thanks to a downward flux limiting numerical scheme. The specific discrete analysis technique this work introduces is well-balanced parallel Adaptive Mesh Refinement (AMR). With AMR, we refined the coarse grid on a batch of patches in order to locally increase precision in areas which matter more, and capture fine changes in the front location and its topology. We show that patch-based AMR is very adapted for parallel computing. We use a variety of physical examples: forced advection, heat transfer, phase changes represented by a Stefan model, as well as the combination of all those models. We will present the results of those numerical simulations, as well as the speed up compared to equivalent non-AMR simulation and to serial computation of the same problems. This document is made up of an abstract and the slides of the presentation. (author)
Massive parallel 3D PIC simulation of negative ion extraction
Revel, Adrien; Mochalskyy, Serhiy; Montellano, Ivar Mauricio; Wünderlich, Dirk; Fantz, Ursel; Minea, Tiberiu
2017-09-01
The 3D PIC-MCC code ONIX is dedicated to modeling Negative hydrogen/deuterium Ion (NI) extraction and co-extraction of electrons from radio-frequency driven, low pressure plasma sources. It provides valuable insight on the complex phenomena involved in the extraction process. In previous calculations, a mesh size larger than the Debye length was used, implying numerical electron heating. Important steps have been achieved in terms of computation performance and parallelization efficiency allowing successful massive parallel calculations (4096 cores), imperative to resolve the Debye length. In addition, the numerical algorithms have been improved in terms of grid treatment, i.e., the electric field near the complex geometry boundaries (plasma grid) is calculated more accurately. The revised model preserves the full 3D treatment, but can take advantage of a highly refined mesh. ONIX was used to investigate the role of the mesh size, the re-injection scheme for lost particles (extracted or wall absorbed), and the electron thermalization process on the calculated extracted current and plasma characteristics. It is demonstrated that all numerical schemes give the same NI current distribution for extracted ions. Concerning the electrons, the pair-injection technique is found well-adapted to simulate the sheath in front of the plasma grid.
Xyce Parallel Electronic Simulator Reference Guide Version 6.7.
Energy Technology Data Exchange (ETDEWEB)
Keiter, Eric R. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Aadithya, Karthik Venkatraman [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Mei, Ting [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Russo, Thomas V. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Schiek, Richard [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Sholander, Peter E. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Thornquist, Heidi K. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Verley, Jason [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)
2017-05-01
This document is a reference guide to the Xyce Parallel Electronic Simulator, and is a companion document to the Xyce Users' Guide [1] . The focus of this document is (to the extent possible) exhaustively list device parameters, solver options, parser options, and other usage details of Xyce . This document is not intended to be a tutorial. Users who are new to circuit simulation are better served by the Xyce Users' Guide [1] . The information herein is subject to change without notice. Copyright c 2002-2017 Sandia Corporation. All rights reserved. Trademarks Xyce TM Electronic Simulator and Xyce TM are trademarks of Sandia Corporation. Orcad, Orcad Capture, PSpice and Probe are registered trademarks of Cadence Design Systems, Inc. Microsoft, Windows and Windows 7 are registered trademarks of Microsoft Corporation. Medici, DaVinci and Taurus are registered trademarks of Synopsys Corporation. Amtec and TecPlot are trademarks of Amtec Engineering, Inc. All other trademarks are property of their respective owners. Contacts World Wide Web http://xyce.sandia.gov https://info.sandia.gov/xyce (Sandia only) Email xyce@sandia.gov (outside Sandia) xyce-sandia@sandia.gov (Sandia only) Bug Reports (Sandia only) http://joseki-vm.sandia.gov/bugzilla http://morannon.sandia.gov/bugzilla
A parallel algorithm for switch-level timing simulation on a hypercube multiprocessor
Rao, Hariprasad Nannapaneni
1989-01-01
The parallel approach to speeding up simulation is studied, specifically the simulation of digital LSI MOS circuitry on the Intel iPSC/2 hypercube. The simulation algorithm is based on RSIM, an event driven switch-level simulator that incorporates a linear transistor model for simulating digital MOS circuits. Parallel processing techniques based on the concepts of Virtual Time and rollback are utilized so that portions of the circuit may be simulated on separate processors, in parallel for as large an increase in speed as possible. A partitioning algorithm is also developed in order to subdivide the circuit for parallel processing.
Out-of-order parallel discrete event simulation for electronic system-level design
Chen, Weiwei
2014-01-01
This book offers readers a set of new approaches and tools a set of tools and techniques for facing challenges in parallelization with design of embedded systems.? It provides an advanced parallel simulation infrastructure for efficient and effective system-level model validation and development so as to build better products in less time.? Since parallel discrete event simulation (PDES) has the potential to exploit the underlying parallel computational capability in today's multi-core simulation hosts, the author begins by reviewing the parallelization of discrete event simulation, identifyin
A parallel adaptive finite difference algorithm for petroleum reservoir simulation
Energy Technology Data Exchange (ETDEWEB)
Hoang, Hai Minh
2005-07-01
Adaptive finite differential for problems arising in simulation of flow in porous medium applications are considered. Such methods have been proven useful for overcoming limitations of computational resources and improving the resolution of the numerical solutions to a wide range of problems. By local refinement of the computational mesh where it is needed to improve the accuracy of solutions, yields better solution resolution representing more efficient use of computational resources than is possible with traditional fixed-grid approaches. In this thesis, we propose a parallel adaptive cell-centered finite difference (PAFD) method for black-oil reservoir simulation models. This is an extension of the adaptive mesh refinement (AMR) methodology first developed by Berger and Oliger (1984) for the hyperbolic problem. Our algorithm is fully adaptive in time and space through the use of subcycling, in which finer grids are advanced at smaller time steps than the coarser ones. When coarse and fine grids reach the same advanced time level, they are synchronized to ensure that the global solution is conservative and satisfy the divergence constraint across all levels of refinement. The material in this thesis is subdivided in to three overall parts. First we explain the methodology and intricacies of AFD scheme. Then we extend a finite differential cell-centered approximation discretization to a multilevel hierarchy of refined grids, and finally we are employing the algorithm on parallel computer. The results in this work show that the approach presented is robust, and stable, thus demonstrating the increased solution accuracy due to local refinement and reduced computing resource consumption. (Author)
Parallel continuous simulated tempering and its applications in large-scale molecular simulations
Energy Technology Data Exchange (ETDEWEB)
Zang, Tianwu; Yu, Linglin; Zhang, Chong [Applied Physics Program and Department of Bioengineering, Rice University, Houston, Texas 77005 (United States); Ma, Jianpeng, E-mail: jpma@bcm.tmc.edu [Applied Physics Program and Department of Bioengineering, Rice University, Houston, Texas 77005 (United States); Verna and Marrs McLean Department of Biochemistry and Molecular Biology, Baylor College of Medicine, One Baylor Plaza, BCM-125, Houston, Texas 77030 (United States)
2014-07-28
In this paper, we introduce a parallel continuous simulated tempering (PCST) method for enhanced sampling in studying large complex systems. It mainly inherits the continuous simulated tempering (CST) method in our previous studies [C. Zhang and J. Ma, J. Chem. Phys. 130, 194112 (2009); C. Zhang and J. Ma, J. Chem. Phys. 132, 244101 (2010)], while adopts the spirit of parallel tempering (PT), or replica exchange method, by employing multiple copies with different temperature distributions. Differing from conventional PT methods, despite the large stride of total temperature range, the PCST method requires very few copies of simulations, typically 2–3 copies, yet it is still capable of maintaining a high rate of exchange between neighboring copies. Furthermore, in PCST method, the size of the system does not dramatically affect the number of copy needed because the exchange rate is independent of total potential energy, thus providing an enormous advantage over conventional PT methods in studying very large systems. The sampling efficiency of PCST was tested in two-dimensional Ising model, Lennard-Jones liquid and all-atom folding simulation of a small globular protein trp-cage in explicit solvent. The results demonstrate that the PCST method significantly improves sampling efficiency compared with other methods and it is particularly effective in simulating systems with long relaxation time or correlation time. We expect the PCST method to be a good alternative to parallel tempering methods in simulating large systems such as phase transition and dynamics of macromolecules in explicit solvent.
Engineering-Based Thermal CFD Simulations on Massive Parallel Systems
Frisch, Jé rô me; Mundani, Ralf-Peter; Rank, Ernst; van Treeck, Christoph
2015-01-01
The development of parallel Computational Fluid Dynamics (CFD) codes is a challenging task that entails efficient parallelization concepts and strategies in order to achieve good scalability values when running those codes on modern supercomputers
Parallelization of simulation code for liquid-gas model of lattice-gas fluid
International Nuclear Information System (INIS)
Kawai, Wataru; Ebihara, Kenichi; Kume, Etsuo; Watanabe, Tadashi
2000-03-01
A simulation code for hydrodynamical phenomena which is based on the liquid-gas model of lattice-gas fluid is parallelized by using MPI (Message Passing Interface) library. The parallelized code can be applied to the larger size of the simulations than the non-parallelized code. The calculation times of the parallelized code on VPP500 (Vector-Parallel super computer with dispersed memory units), AP3000 (Scalar-parallel server with dispersed memory units), and a workstation cluster decreased in inverse proportion to the number of processors. (author)
Xyce Parallel Electronic Simulator Reference Guide Version 6.4
Energy Technology Data Exchange (ETDEWEB)
Keiter, Eric R. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Mei, Ting [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Russo, Thomas V. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Schiek, Richard [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Sholander, Peter E. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Thornquist, Heidi K. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Verley, Jason [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Baur, David Gregory [Raytheon, Albuquerque, NM (United States)
2015-12-01
This document is a reference guide to the Xyce Parallel Electronic Simulator, and is a companion document to the Xyce Users' Guide [1] . The focus of this document is (to the extent possible) exhaustively list device parameters, solver options, parser options, and other usage details of Xyce . This document is not intended to be a tutorial. Users who are new to circuit simulation are better served by the Xyce Users' Guide [1] . Trademarks The information herein is subject to change without notice. Copyright c 2002-2015 Sandia Corporation. All rights reserved. Xyce TM Electronic Simulator and Xyce TM are trademarks of Sandia Corporation. Portions of the Xyce TM code are: Copyright c 2002, The Regents of the University of California. Produced at the Lawrence Livermore National Laboratory. Written by Alan Hindmarsh, Allan Taylor, Radu Serban. UCRL-CODE-2002-59 All rights reserved. Orcad, Orcad Capture, PSpice and Probe are registered trademarks of Cadence Design Systems, Inc. Microsoft, Windows and Windows 7 are registered trademarks of Microsoft Corporation. Medici, DaVinci and Taurus are registered trademarks of Synopsys Corporation. Amtec and TecPlot are trademarks of Amtec Engineering, Inc. Xyce 's expression library is based on that inside Spice 3F5 developed by the EECS Department at the University of California. The EKV3 MOSFET model was developed by the EKV Team of the Electronics Laboratory-TUC of the Technical University of Crete. All other trademarks are property of their respective owners. Contacts Bug Reports (Sandia only) http://joseki.sandia.gov/bugzilla http://charleston.sandia.gov/bugzilla World Wide Web http://xyce.sandia.gov http://charleston.sandia.gov/xyce (Sandia only) Email xyce@sandia.gov (outside Sandia) xyce-sandia@sandia.gov (Sandia only)
Parallel Beam Dynamics Simulation Tools for Future Light Source Linac Modeling
International Nuclear Information System (INIS)
Qiang, Ji; Pogorelov, Ilya v.; Ryne, Robert D.
2007-01-01
Large-scale modeling on parallel computers is playing an increasingly important role in the design of future light sources. Such modeling provides a means to accurately and efficiently explore issues such as limits to beam brightness, emittance preservation, the growth of instabilities, etc. Recently the IMPACT codes suite was enhanced to be applicable to future light source design. Simulations with IMPACT-Z were performed using up to one billion simulation particles for the main linac of a future light source to study the microbunching instability. Combined with the time domain code IMPACT-T, it is now possible to perform large-scale start-to-end linac simulations for future light sources, including the injector, main linac, chicanes, and transfer lines. In this paper we provide an overview of the IMPACT code suite, its key capabilities, and recent enhancements pertinent to accelerator modeling for future linac-based light sources
Parallel linear solvers for simulations of reactor thermal hydraulics
International Nuclear Information System (INIS)
Yan, Y.; Antal, S.P.; Edge, B.; Keyes, D.E.; Shaver, D.; Bolotnov, I.A.; Podowski, M.Z.
2011-01-01
The state-of-the-art multiphase fluid dynamics code, NPHASE-CMFD, performs multiphase flow simulations in complex domains using implicit nonlinear treatment of the governing equations and in parallel, which is a very challenging environment for the linear solver. The present work illustrates how the Portable, Extensible Toolkit for Scientific Computation (PETSc) and scalable Algebraic Multigrid (AMG) preconditioner from Hypre can be utilized to construct robust and scalable linear solvers for the Newton correction equation obtained from the discretized system of governing conservation equations in NPHASE-CMFD. The overall long-tem objective of this work is to extend the NPHASE-CMFD code into a fully-scalable solver of multiphase flow and heat transfer problems, applicable to both steady-state and stiff time-dependent phenomena in complete fuel assemblies of nuclear reactors and, eventually, the entire reactor core (such as the Virtual Reactor concept envisioned by CASL). This campaign appropriately begins with the linear algebraic equation solver, which is traditionally a bottleneck to scalability in PDE-based codes. The computational complexity of the solver is usually superlinear in problem size, whereas the rest of the code, the “physics” portion, usually has its complexity linear in the problem size. (author)
Parallel-Architecture Simulator Development Using Hardware Transactional Memory
Armejach Sanosa, Adrià
2009-01-01
To address the need for a simpler parallel programming model, Transactional Memory (TM) has been developed and promises good parallel performance with easy-to-write parallel code. Unlike lock-based approaches, with TM, programmers do not need to explicitly specify and manage the synchronization among threads. However, programmers simply mark code segments as transactions, and the TM system manages the concurrency control for them. TM can be implemented either in software (STM) or hardware (HT...
Hsieh, Shang-Hsien
1993-01-01
The principal objective of this research is to develop, test, and implement coarse-grained, parallel-processing strategies for nonlinear dynamic simulations of practical structural problems. There are contributions to four main areas: finite element modeling and analysis of rotational dynamics, numerical algorithms for parallel nonlinear solutions, automatic partitioning techniques to effect load-balancing among processors, and an integrated parallel analysis system.
Xyce Parallel Electronic Simulator Reference Guide Version 6.6.
Energy Technology Data Exchange (ETDEWEB)
Keiter, Eric R. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Aadithya, Karthik Venkatraman [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Mei, Ting [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Russo, Thomas V. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Schiek, Richard [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Sholander, Peter E. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Thornquist, Heidi K. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Verley, Jason [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)
2016-11-01
This document is a reference guide to the Xyce Parallel Electronic Simulator, and is a companion document to the Xyce Users' Guide [1] . The focus of this document is (to the extent possible) exhaustively list device parameters, solver options, parser options, and other usage details of Xyce . This document is not intended to be a tutorial. Users who are new to circuit simulation are better served by the Xyce Users' Guide [1] . The information herein is subject to change without notice. Copyright c 2002-2016 Sandia Corporation. All rights reserved. Acknowledgements The BSIM Group at the University of California, Berkeley developed the BSIM3, BSIM4, BSIM6, BSIM-CMG and BSIM-SOI models. The BSIM3 is Copyright c 1999, Regents of the University of California. The BSIM4 is Copyright c 2006, Regents of the University of California. The BSIM6 is Copyright c 2015, Regents of the University of California. The BSIM-CMG is Copyright c 2012 and 2016, Regents of the University of California. The BSIM-SOI is Copyright c 1990, Regents of the University of California. All rights reserved. The Mextram model has been developed by NXP Semiconductors until 2007, Delft University of Technology from 2007 to 2014, and Auburn University since April 2015. Copyrights c of Mextram are with Delft University of Technology, NXP Semiconductors and Auburn University. The MIT VS Model Research Group developed the MIT Virtual Source (MVS) model. Copyright c 2013 Massachusetts Institute of Technology (MIT). The EKV3 MOSFET model was developed by the EKV Team of the Electronics Laboratory-TUC of the Technical University of Crete. Trademarks Xyce TM Electronic Simulator and Xyce TM are trademarks of Sandia Corporation. Orcad, Orcad Capture, PSpice and Probe are registered trademarks of Cadence Design Systems, Inc. Microsoft, Windows and Windows 7 are registered trademarks of Microsoft Corporation. Medici, DaVinci and Taurus are registered trademarks of Synopsys Corporation. Amtec and Tec
Optimized Parallel Discrete Event Simulation (PDES) for High Performance Computing (HPC) Clusters
National Research Council Canada - National Science Library
Abu-Ghazaleh, Nael
2005-01-01
The aim of this project was to study the communication subsystem performance of state of the art optimistic simulator Synchronous Parallel Environment for Emulation and Discrete-Event Simulation (SPEEDES...
Parallelized FDTD simulation for flat-plate bounded wave EMP simulator with lumped terminator
International Nuclear Information System (INIS)
Zhu Xiangqin; Chen Weiqing; Chen Zaigao; Cai Libing; Wang Jianguo
2013-01-01
A parallelized finite-difference time-domain(FDTD) method for simulating the bounded wave electromagnetic pulse (EMP) simulator with lumped terminator and parallel plate is presented. The effects of several model-parameters on the simulator to the fields in the working volume are simulated and analyzed. The results show that if the width of the lower PEC plate is(or is bigger than)1.5 times that of the upper plate of working volume, the projection length of front transitional section does not have a significant effect on the rise-times of electric fields at the points near the front transitional section, and the rise-times of electric fields at the points near the working volume center decrease as the projection length increases, but the decrement of rise-time decreases. The rise-times of E z at all points also decrease as the lower PEC plate's width increases, but the decrements of rise-time decreases. If the projection length of the front transitional section is fixed, the good results can not be obtained by increasing or decreasing the height of the simulator only, however, which has an optimal value. (authors)
Computer simulation of grain growth in HAZ
Gao, Jinhua
Two different models for Monte Carlo simulation of normal grain growth in metals and alloys were developed. Each simulation model was based on a different approach to couple the Monte Carlo simulation time to real time-temperature. These models demonstrated the applicability of Monte Carlo simulation to grain growth in materials processing. A grain boundary migration (GBM) model coupled the Monte Carlo simulation to a first principle grain boundary migration model. The simulation results, by applying this model to isothermal grain growth in zone-refined tin, showed good agreement with experimental results. An experimental data based (EDB) model coupled the Monte Carlo simulation with grain growth kinetics obtained from the experiment. The results of the application of the EDB model to the grain growth during continuous heating of a beta titanium alloy correlated well with experimental data. In order to acquire the grain growth kinetics from the experiment, a new mathematical method was developed and utilized to analyze the experimental data on isothermal grain growth. Grain growth in the HAZ of 0.2% Cu-Al alloy was successfully simulated using the EDB model combined with grain growth kinetics obtained from the experiment and measured thermal cycles from the welding process. The simulated grain size distribution in the HAZ was in good agreement with experimental results. The pinning effect of second phase particles on grain growth was also simulated in this work. The simulation results confirmed that by introducing the variable R, degree of contact between grain boundaries and second phase particles, the Zener pinning model can be modified as${D/ r} = {K/{Rf}}$where D is the pinned grain size, r the mean size of second phase particles, K a constant, f the area fraction (or the volume fraction in 3-D) of second phase.
Kinematics and dynamics analysis of a novel serial-parallel dynamic simulator
Energy Technology Data Exchange (ETDEWEB)
Hu, Bo; Zhang, Lian Dong; Yu, Jingjing [Parallel Robot and Mechatronic System Laboratory of Hebei Province, Yanshan University, Qinhuangdao, Hebei (China)
2016-11-15
A serial-parallel dynamics simulator based on serial-parallel manipulator is proposed. According to the dynamics simulator motion requirement, the proposed serial-parallel dynamics simulator formed by 3-RRS (active revolute joint-revolute joint-spherical joint) and 3-SPR (Spherical joint-active prismatic joint-revolute joint) PMs adopts the outer and inner layout. By integrating the kinematics, constraint and coupling information of the 3-RRS and 3-SPR PMs into the serial-parallel manipulator, the inverse Jacobian matrix, velocity, and acceleration of the serial-parallel dynamics simulator are studied. Based on the principle of virtual work and the kinematics model, the inverse dynamic model is established. Finally, the workspace of the (3-RRS)+(3-SPR) dynamics simulator is constructed.
Kinematics and dynamics analysis of a novel serial-parallel dynamic simulator
International Nuclear Information System (INIS)
Hu, Bo; Zhang, Lian Dong; Yu, Jingjing
2016-01-01
A serial-parallel dynamics simulator based on serial-parallel manipulator is proposed. According to the dynamics simulator motion requirement, the proposed serial-parallel dynamics simulator formed by 3-RRS (active revolute joint-revolute joint-spherical joint) and 3-SPR (Spherical joint-active prismatic joint-revolute joint) PMs adopts the outer and inner layout. By integrating the kinematics, constraint and coupling information of the 3-RRS and 3-SPR PMs into the serial-parallel manipulator, the inverse Jacobian matrix, velocity, and acceleration of the serial-parallel dynamics simulator are studied. Based on the principle of virtual work and the kinematics model, the inverse dynamic model is established. Finally, the workspace of the (3-RRS)+(3-SPR) dynamics simulator is constructed
KMC Simulation of Surface Growth of Semiconductors
International Nuclear Information System (INIS)
Esen, M.
2004-01-01
In this work we have studied the growth and equilibration of semiconductor surfaces consisting of monoatomic steps separated by flat terraces using kinetic Monte Carlo method. Atomistic processes such as diffusion on terraces, attachment/detachment particles to/from step edges, attachment of particles from an upper terrace to a bounding step, diffusion of particles along step edges are considered. A rate equation for each, these processes is written and the overall transition probabilities are calculated where processes are ordered to make the distinction between slow and fast processes Iractal The interaction of steps is also included in the calculation of rate equations. The growth of such a surface is simulated when there is a particle flux to the surface. The rough of the surface and its dependence on both temperature and kinetic parameters such edge diffusion barrier are investigated. The formation of islands on terraces is prohibited and the distribution of their number and sizes are investigated as a function of temperature and appropriate kinetic parameters. In the absence of a flux to the surface, the equilibration of the surface is investigated paying particular attention to the top of the profile when the initial surface is a periodic profile where parallel monoatomic steps separated by terraces. It is observed that during equilibration of the profile, the topmost step disintegrates quickly and leads to many islands on the top of the profile due to. collision and annihilation of step edges of opposite sign. The islands then quickly disintegrate due to the line tension effect and this scenario repeats itself until the surface completely flattens
Investigation of Mediational Processes Using Parallel Process Latent Growth Curve Modeling
Cheong, JeeWon; MacKinnon, David P.; Khoo, Siek Toon
2010-01-01
This study investigated a method to evaluate mediational processes using latent growth curve modeling. The mediator and the outcome measured across multiple time points were viewed as 2 separate parallel processes. The mediational process was defined as the independent variable influencing the growth of the mediator, which, in turn, affected the growth of the outcome. To illustrate modeling procedures, empirical data from a longitudinal drug prevention program, Adolescents Training and Learning to Avoid Steroids, were used. The program effects on the growth of the mediator and the growth of the outcome were examined first in a 2-group structural equation model. The mediational process was then modeled and tested in a parallel process latent growth curve model by relating the prevention program condition, the growth rate factor of the mediator, and the growth rate factor of the outcome. PMID:20157639
Implementation of Parallel Dynamic Simulation on Shared-Memory vs. Distributed-Memory Environments
Energy Technology Data Exchange (ETDEWEB)
Jin, Shuangshuang; Chen, Yousu; Wu, Di; Diao, Ruisheng; Huang, Zhenyu
2015-12-09
Power system dynamic simulation computes the system response to a sequence of large disturbance, such as sudden changes in generation or load, or a network short circuit followed by protective branch switching operation. It consists of a large set of differential and algebraic equations, which is computational intensive and challenging to solve using single-processor based dynamic simulation solution. High-performance computing (HPC) based parallel computing is a very promising technology to speed up the computation and facilitate the simulation process. This paper presents two different parallel implementations of power grid dynamic simulation using Open Multi-processing (OpenMP) on shared-memory platform, and Message Passing Interface (MPI) on distributed-memory clusters, respectively. The difference of the parallel simulation algorithms and architectures of the two HPC technologies are illustrated, and their performances for running parallel dynamic simulation are compared and demonstrated.
International Nuclear Information System (INIS)
Mo Zeyao
2004-11-01
Multiphysics parallel numerical simulations are usually essential to simplify researches on complex physical phenomena in which several physics are tightly coupled. It is very important on how to concatenate those coupled physics for fully scalable parallel simulation. Meanwhile, three objectives should be balanced, the first is efficient data transfer among simulations, the second and the third are efficient parallel executions and simultaneously developments of those simulation codes. Two concatenating algorithms for multiphysics parallel numerical simulations coupling radiation hydrodynamics with neutron transport on unstructured grid are presented. The first algorithm, Fully Loosely Concatenation (FLC), focuses on the independence of code development and the independence running with optimal performance of code. The second algorithm. Two Level Tightly Concatenation (TLTC), focuses on the optimal tradeoffs among above three objectives. Theoretical analyses for communicational complexity and parallel numerical experiments on hundreds of processors on two parallel machines have showed that these two algorithms are efficient and can be generalized to other multiphysics parallel numerical simulations. In especial, algorithm TLTC is linearly scalable and has achieved the optimal parallel performance. (authors)
Simulated annealing algorithm for optimal capital growth
Luo, Yong; Zhu, Bo; Tang, Yong
2014-08-01
We investigate the problem of dynamic optimal capital growth of a portfolio. A general framework that one strives to maximize the expected logarithm utility of long term growth rate was developed. Exact optimization algorithms run into difficulties in this framework and this motivates the investigation of applying simulated annealing optimized algorithm to optimize the capital growth of a given portfolio. Empirical results with real financial data indicate that the approach is inspiring for capital growth portfolio.
Numerical simulation of Vlasov equation with parallel tools
International Nuclear Information System (INIS)
Peyroux, J.
2005-11-01
This project aims to make even more powerful the resolution of Vlasov codes through the various parallelization tools (MPI, OpenMP...). A simplified test case served as a base for constructing the parallel codes for obtaining a data-processing skeleton which, thereafter, could be re-used for increasingly complex models (more than four variables of phase space). This will thus make it possible to treat more realistic situations linked, for example, to the injection of ultra short and ultra intense impulses in inertial fusion plasmas, or the study of the instability of trapped ions now taken as being responsible for the generation of turbulence in tokamak plasmas. (author)
ANNarchy: a code generation approach to neural simulations on parallel hardware
Vitay, Julien; Dinkelbach, Helge Ü.; Hamker, Fred H.
2015-01-01
Many modern neural simulators focus on the simulation of networks of spiking neurons on parallel hardware. Another important framework in computational neuroscience, rate-coded neural networks, is mostly difficult or impossible to implement using these simulators. We present here the ANNarchy (Artificial Neural Networks architect) neural simulator, which allows to easily define and simulate rate-coded and spiking networks, as well as combinations of both. The interface in Python has been designed to be close to the PyNN interface, while the definition of neuron and synapse models can be specified using an equation-oriented mathematical description similar to the Brian neural simulator. This information is used to generate C++ code that will efficiently perform the simulation on the chosen parallel hardware (multi-core system or graphical processing unit). Several numerical methods are available to transform ordinary differential equations into an efficient C++code. We compare the parallel performance of the simulator to existing solutions. PMID:26283957
Parallel alternating direction preconditioner for isogeometric simulations of explicit dynamics
Łoś, Marcin; Woźniak, Maciej; Paszyński, Maciej; Dalcin, Lisandro; Calo, Victor M.
2015-01-01
incorporated as a part of PETIGA an isogeometric framework [7] build on top of PETSc [8]. We show the scalability of the parallel algorithm on STAMPEDE linux cluster up to 10,000 processors, as well as the convergence rate of the PCG solver
Efficient Simulation of Population Overflow in Parallel Queues
Nicola, V.F.; Zaburnenko, T.S.
2006-01-01
In this paper we propose a state-dependent importance sampling heuristic to estimate the probability of population overﬂow in networks of parallel queues. This heuristic approximates the “optimal��? state-dependent change of measure without the need for dif��?cult mathematical analysis or costly
Efficient Heuristics for Simulating Population Overflow in Parallel Networks
Zaburnenko, T.S.; Nicola, V.F.
2006-01-01
In this paper we propose a state-dependent importance sampling heuristic to estimate the probability of population overflow in networks of parallel queues. This heuristic approximates the “optimal��? state-dependent change of measure without the need for costly optimization involved in other
Co-simulation of dynamic systems in parallel and serial model configurations
International Nuclear Information System (INIS)
Sweafford, Trevor; Yoon, Hwan Sik
2013-01-01
Recent advancement in simulation software and computation hardware make it realizable to simulate complex dynamic systems comprised of multiple submodels developed in different modeling languages. The so-called co-simulation enables one to study various aspects of a complex dynamic system with heterogeneous submodels in a cost-effective manner. Among several different model configurations for co-simulation, synchronized parallel configuration is regarded to expedite the simulation process by simulation multiple sub models concurrently on a multi core processor. In this paper, computational accuracies as well as computation time are studied for three different co-simulation frameworks : integrated, serial, and parallel. for this purpose, analytical evaluations of the three different methods are made using the explicit Euler method and then they are applied to two-DOF mass-spring systems. The result show that while the parallel simulation configuration produces the same accurate results as the integrated configuration, results of the serial configuration, results of the serial configuration show a slight deviation. it is also shown that the computation time can be reduced by running simulation in the parallel configuration. Therefore, it can be concluded that the synchronized parallel simulation methodology is the best for both simulation accuracy and time efficiency.
Parallelizing an electron transport Monte Carlo simulator (MOCASIN 2.0)
International Nuclear Information System (INIS)
Schwetman, H.; Burdick, S.
1988-01-01
Electron transport simulators are tools for studying electrical properties of semiconducting materials and devices. As demands for modeling more complex devices and new materials have emerged, so have demands for more processing power. This paper documents a project to convert an electron transport simulator (MOCASIN 2.0) to a parallel processing environment. In addition to describing the conversion, the paper presents PPL, a parallel programming version of C running on a Sequent multiprocessor system. In timing tests, models that simulated the movement of 2,000 particles for 100 time steps were executed on ten processors, with a parallel efficiency of over 97%
Static and dynamic load-balancing strategies for parallel reservoir simulation
International Nuclear Information System (INIS)
Anguille, L.; Killough, J.E.; Li, T.M.C.; Toepfer, J.L.
1995-01-01
Accurate simulation of the complex phenomena that occur in flow in porous media can tax even the most powerful serial computers. Emergence of new parallel computer architectures as a future efficient tool in reservoir simulation may overcome this difficulty. Unfortunately, major problems remain to be solved before using parallel computers commercially: production serial programs must be rewritten to be efficient in parallel environments and load balancing methods must be explored to evenly distribute the workload on each processor during the simulation. This study implements both a static load-balancing algorithm and a receiver-initiated dynamic load-sharing algorithm to achieve high parallel efficiencies on both the IBM SP2 and Intel IPSC/860 parallel computers. Significant speedup improvement was recorded for both methods. Further optimization of these algorithms yielded a technique with efficiencies as high as 90% and 70% on 8 and 32 nodes, respectively. The increased performance was the result of the minimization of message-passing overhead
Numerical Simulation Of Silicon-Ribbon Growth
Woda, Ben K.; Kuo, Chin-Po; Utku, Senol; Ray, Sujit Kumar
1987-01-01
Mathematical model includes nonlinear effects. In development simulates growth of silicon ribbon from melt. Takes account of entire temperature and stress history of ribbon. Numerical simulations performed with new model helps in search for temperature distribution, pulling speed, and other conditions favoring growth of wide, flat, relatively defect-free silicon ribbons for solar photovoltaic cells at economically attractive, high production rates. Also applicable to materials other than silicon.
Simulating the growth of tafoni
Huinink, H.P.; Pel, L.; Kopinga, K.
2004-01-01
Throughout the world, large caves in rocks (tafoni) are found, which originate from salt weathering. The mechanisms that control their development are poorly understood. The growth of tafoni has been studied with a model that describes how a rock surface, containing a small pit, disintegrates by
Parallel Motion Simulation of Large-Scale Real-Time Crowd in a Hierarchical Environmental Model
Directory of Open Access Journals (Sweden)
Xin Wang
2012-01-01
Full Text Available This paper presents a parallel real-time crowd simulation method based on a hierarchical environmental model. A dynamical model of the complex environment should be constructed to simulate the state transition and propagation of individual motions. By modeling of a virtual environment where virtual crowds reside, we employ different parallel methods on a topological layer, a path layer and a perceptual layer. We propose a parallel motion path matching method based on the path layer and a parallel crowd simulation method based on the perceptual layer. The large-scale real-time crowd simulation becomes possible with these methods. Numerical experiments are carried out to demonstrate the methods and results.
Building Blocks for the Rapid Development of Parallel Simulations, Phase I
National Aeronautics and Space Administration — Scientists need to be able to quickly develop and run parallel simulations without paying the high price of writing low-level message passing codes using compiled...
Spiral Growth in Plants: Models and Simulations
Allen, Bradford D.
2004-01-01
The analysis and simulation of spiral growth in plants integrates algebra and trigonometry in a botanical setting. When the ideas presented here are used in a mathematics classroom/computer lab, students can better understand how basic assumptions about plant growth lead to the golden ratio and how the use of circular functions leads to accurate…
Kinetic-Monte-Carlo-Based Parallel Evolution Simulation Algorithm of Dust Particles
Directory of Open Access Journals (Sweden)
Xiaomei Hu
2014-01-01
Full Text Available The evolution simulation of dust particles provides an important way to analyze the impact of dust on the environment. KMC-based parallel algorithm is proposed to simulate the evolution of dust particles. In the parallel evolution simulation algorithm of dust particles, data distribution way and communication optimizing strategy are raised to balance the load of every process and reduce the communication expense among processes. The experimental results show that the simulation of diffusion, sediment, and resuspension of dust particles in virtual campus is realized and the simulation time is shortened by parallel algorithm, which makes up for the shortage of serial computing and makes the simulation of large-scale virtual environment possible.
Analysis for Parallel Execution without Performing Hardware/Software Co-simulation
Muhammad Rashid
2014-01-01
Hardware/software co-simulation improves the performance of embedded applications by executing the applications on a virtual platform before the actual hardware is available in silicon. However, the virtual platform of the target architecture is often not available during early stages of the embedded design flow. Consequently, analysis for parallel execution without performing hardware/software co-simulation is required. This article presents an analysis methodology for parallel execution of ...
Xyce parallel electronic simulator : users' guide. Version 5.1.
Energy Technology Data Exchange (ETDEWEB)
Mei, Ting; Rankin, Eric Lamont; Thornquist, Heidi K.; Santarelli, Keith R.; Fixel, Deborah A.; Coffey, Todd Stirling; Russo, Thomas V.; Schiek, Richard Louis; Keiter, Eric Richard; Pawlowski, Roger Patrick
2009-11-01
This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: (1) Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). Note that this includes support for most popular parallel and serial computers. (2) Improved performance for all numerical kernels (e.g., time integrator, nonlinear and linear solvers) through state-of-the-art algorithms and novel techniques. (3) Device models which are specifically tailored to meet Sandia's needs, including some radiation-aware devices (for Sandia users only). (4) Object-oriented code design and implementation using modern coding practices that ensure that the Xyce Parallel Electronic Simulator will be maintainable and extensible far into the future. Xyce is a parallel code in the most general sense of the phrase - a message passing parallel implementation - which allows it to run efficiently on the widest possible number of computing platforms. These include serial, shared-memory and distributed-memory parallel as well as heterogeneous platforms. Careful attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows. The development of Xyce provides a platform for computational research and development aimed specifically at the needs of the Laboratory. With Xyce, Sandia has an 'in-house' capability with which both new electrical (e.g., device model development) and algorithmic (e.g., faster time-integration methods, parallel solver algorithms) research and development can be performed. As a result, Xyce is a
Xyce Parallel Electronic Simulator : users' guide, version 4.1.
Energy Technology Data Exchange (ETDEWEB)
Mei, Ting; Rankin, Eric Lamont; Thornquist, Heidi K.; Santarelli, Keith R.; Fixel, Deborah A.; Coffey, Todd Stirling; Russo, Thomas V.; Schiek, Richard Louis; Keiter, Eric Richard; Pawlowski, Roger Patrick
2009-02-01
This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: (1) Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). Note that this includes support for most popular parallel and serial computers. (2) Improved performance for all numerical kernels (e.g., time integrator, nonlinear and linear solvers) through state-of-the-art algorithms and novel techniques. (3) Device models which are specifically tailored to meet Sandia's needs, including some radiation-aware devices (for Sandia users only). (4) Object-oriented code design and implementation using modern coding practices that ensure that the Xyce Parallel Electronic Simulator will be maintainable and extensible far into the future. Xyce is a parallel code in the most general sense of the phrase - a message passing parallel implementation - which allows it to run efficiently on the widest possible number of computing platforms. These include serial, shared-memory and distributed-memory parallel as well as heterogeneous platforms. Careful attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows. The development of Xyce provides a platform for computational research and development aimed specifically at the needs of the Laboratory. With Xyce, Sandia has an 'in-house' capability with which both new electrical (e.g., device model development) and algorithmic (e.g., faster time-integration methods, parallel solver algorithms) research and development can be performed. As a result, Xyce is a
Valasek, Lukas; Glasa, Jan
2017-12-01
Current fire simulation systems are capable to utilize advantages of high-performance computer (HPC) platforms available and to model fires efficiently in parallel. In this paper, efficiency of a corridor fire simulation on a HPC computer cluster is discussed. The parallel MPI version of Fire Dynamics Simulator is used for testing efficiency of selected strategies of allocation of computational resources of the cluster using a greater number of computational cores. Simulation results indicate that if the number of cores used is not equal to a multiple of the total number of cluster node cores there are allocation strategies which provide more efficient calculations.
Molecular Dynamic Simulations of Nanostructured Ceramic Materials on Parallel Computers
International Nuclear Information System (INIS)
Vashishta, Priya; Kalia, Rajiv
2005-01-01
Large-scale molecular-dynamics (MD) simulations have been performed to gain insight into: (1) sintering, structure, and mechanical behavior of nanophase SiC and SiO2; (2) effects of dynamic charge transfers on the sintering of nanophase TiO2; (3) high-pressure structural transformation in bulk SiC and GaAs nanocrystals; (4) nanoindentation in Si3N4; and (5) lattice mismatched InAs/GaAs nanomesas. In addition, we have designed a multiscale simulation approach that seamlessly embeds MD and quantum-mechanical (QM) simulations in a continuum simulation. The above research activities have involved strong interactions with researchers at various universities, government laboratories, and industries. 33 papers have been published and 22 talks have been given based on the work described in this report
A Coupling Tool for Parallel Molecular Dynamics-Continuum Simulations
Neumann, Philipp; Tchipev, Nikola
2012-01-01
We present a tool for coupling Molecular Dynamics and continuum solvers. It is written in C++ and is meant to support the developers of hybrid molecular - continuum simulations in terms of both realisation of the respective coupling algorithm
Migrating to a real-time distributed parallel simulator architecture
CSIR Research Space (South Africa)
Duvenhage, B
2007-07-01
Full Text Available A legacy non-distributed logical time simulator is migrated to a distributed architecture to parallelise execution. The existing Discrete Time System Specification (DTSS) modelling formalism is retained to simplify the reuse of existing models...
Parallel Earthquake Simulations on Large-Scale Multicore Supercomputers
Wu, Xingfu; Duan, Benchun; Taylor, Valerie
2011-01-01
, such as California and Japan, scientists have been using numerical simulations to study earthquake rupture propagation along faults and seismic wave propagation in the surrounding media on ever-advancing modern computers over past several decades. In particular
Finite element simulation for creep crack growth
International Nuclear Information System (INIS)
Miyazaki, Noriyuki; Sasaki, Toru; Nakagaki, Michihiko; Brust, F.W.
1992-01-01
A finite element method was applied to a generation phase simulation of creep crack growth. Experimental data on creep crack growth in a 1Cr-1Mo-1/4V steel compact tension specimen were numerically simulated using a node-release technique and the variations of various fracture mechanics parameters such as CTOA, J, C * and T * during creep crack growth were calculated. The path-dependencies of the integral parameters J, C * and T * were also obtained to examine whether or not they could characterize the stress field near the tip of a crack propagating under creep condition. The following conclusions were obtained from the present analysis. (1) The J integral shows strong path-dependency during creep crack growth, so that it is does not characterize creep crack growth. (2) The C * integral shows path-dependency to some extent during creep crack growth even in the case of Norton type steady state creep law. Strictly speaking, we cannot use it as a fracture mechanics parameter characterizing creep crack growth. It is, however, useful from the practical viewpoint because it correlates well the rate of creep crack growth. (3) The T * integral shows good path-independency during creep crack growth. Therefore, it is a candidate for a fracture mechanics parameter characterizing creep crack growth. (author)
Xyce parallel electronic simulator users guide, version 6.0.
Energy Technology Data Exchange (ETDEWEB)
Keiter, Eric R; Mei, Ting; Russo, Thomas V.; Schiek, Richard Louis; Thornquist, Heidi K.; Verley, Jason C.; Fixel, Deborah A.; Coffey, Todd S; Pawlowski, Roger P; Warrender, Christina E.; Baur, David Gregory.
2013-08-01
This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). This includes support for most popular parallel and serial computers. A differential-algebraic-equation (DAE) formulation, which better isolates the device model package from solver algorithms. This allows one to develop new types of analysis without requiring the implementation of analysis-specific device models. Device models that are specifically tailored to meet Sandias needs, including some radiationaware devices (for Sandia users only). Object-oriented code design and implementation using modern coding practices. Xyce is a parallel code in the most general sense of the phrase a message passing parallel implementation which allows it to run efficiently a wide range of computing platforms. These include serial, shared-memory and distributed-memory parallel platforms. Attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows.
Xyce parallel electronic simulator users' guide, Version 6.0.1.
Energy Technology Data Exchange (ETDEWEB)
Keiter, Eric R; Mei, Ting; Russo, Thomas V.; Schiek, Richard Louis; Thornquist, Heidi K.; Verley, Jason C.; Fixel, Deborah A.; Coffey, Todd S; Pawlowski, Roger P; Warrender, Christina E.; Baur, David Gregory.
2014-01-01
This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). This includes support for most popular parallel and serial computers. A differential-algebraic-equation (DAE) formulation, which better isolates the device model package from solver algorithms. This allows one to develop new types of analysis without requiring the implementation of analysis-specific device models. Device models that are specifically tailored to meet Sandias needs, including some radiationaware devices (for Sandia users only). Object-oriented code design and implementation using modern coding practices. Xyce is a parallel code in the most general sense of the phrase a message passing parallel implementation which allows it to run efficiently a wide range of computing platforms. These include serial, shared-memory and distributed-memory parallel platforms. Attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows.
Xyce parallel electronic simulator users guide, version 6.1
Energy Technology Data Exchange (ETDEWEB)
Keiter, Eric R; Mei, Ting; Russo, Thomas V.; Schiek, Richard Louis; Sholander, Peter E.; Thornquist, Heidi K.; Verley, Jason C.; Baur, David Gregory
2014-03-01
This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas; Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). This includes support for most popular parallel and serial computers; A differential-algebraic-equation (DAE) formulation, which better isolates the device model package from solver algorithms. This allows one to develop new types of analysis without requiring the implementation of analysis-specific device models; Device models that are specifically tailored to meet Sandia's needs, including some radiationaware devices (for Sandia users only); and Object-oriented code design and implementation using modern coding practices. Xyce is a parallel code in the most general sense of the phrase-a message passing parallel implementation-which allows it to run efficiently a wide range of computing platforms. These include serial, shared-memory and distributed-memory parallel platforms. Attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows.
Xyce™ Parallel Electronic Simulator Users' Guide, Version 6.5.
Energy Technology Data Exchange (ETDEWEB)
Keiter, Eric R. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States). Electrical Models and Simulation; Aadithya, Karthik V. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States). Electrical Models and Simulation; Mei, Ting [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States). Electrical Models and Simulation; Russo, Thomas V. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States). Electrical Models and Simulation; Schiek, Richard L. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States). Electrical Models and Simulation; Sholander, Peter E. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States). Electrical Models and Simulation; Thornquist, Heidi K. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States). Electrical Models and Simulation; Verley, Jason C. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States). Electrical Models and Simulation
2016-06-01
This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). This includes support for most popular parallel and serial computers. A differential-algebraic-equation (DAE) formulation, which better isolates the device model package from solver algorithms. This allows one to develop new types of analysis without requiring the implementation of analysis-specific device models. Device models that are specifically tailored to meet Sandia's needs, including some radiation- aware devices (for Sandia users only). Object-oriented code design and implementation using modern coding practices. Xyce is a parallel code in the most general sense of the phrase -- a message passing parallel implementation -- which allows it to run efficiently a wide range of computing platforms. These include serial, shared-memory and distributed-memory parallel platforms. Attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows. The information herein is subject to change without notice. Copyright © 2002-2016 Sandia Corporation. All rights reserved.
Xyce Parallel Electronic Simulator Users' Guide Version 6.8
Energy Technology Data Exchange (ETDEWEB)
Keiter, Eric R. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Aadithya, Karthik Venkatraman [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Mei, Ting [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Russo, Thomas V. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Schiek, Richard L. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Sholander, Peter E. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Thornquist, Heidi K. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Verley, Jason C. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)
2017-10-01
This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been de- signed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: Capability to solve extremely large circuit problems by supporting large-scale parallel com- puting platforms (up to thousands of processors). This includes support for most popular parallel and serial computers. A differential-algebraic-equation (DAE) formulation, which better isolates the device model package from solver algorithms. This allows one to develop new types of analysis without requiring the implementation of analysis-specific device models. Device models that are specifically tailored to meet Sandia's needs, including some radiation- aware devices (for Sandia users only). Object-oriented code design and implementation using modern coding practices. Xyce is a parallel code in the most general sense of the phrase$-$ a message passing parallel implementation $-$ which allows it to run efficiently a wide range of computing platforms. These include serial, shared-memory and distributed-memory parallel platforms. Attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows.
Event Based Simulator for Parallel Computing over the Wide Area Network for Real Time Visualization
Sundararajan, Elankovan; Harwood, Aaron; Kotagiri, Ramamohanarao; Satria Prabuwono, Anton
As the computational requirement of applications in computational science continues to grow tremendously, the use of computational resources distributed across the Wide Area Network (WAN) becomes advantageous. However, not all applications can be executed over the WAN due to communication overhead that can drastically slowdown the computation. In this paper, we introduce an event based simulator to investigate the performance of parallel algorithms executed over the WAN. The event based simulator known as SIMPAR (SIMulator for PARallel computation), simulates the actual computations and communications involved in parallel computation over the WAN using time stamps. Visualization of real time applications require steady stream of processed data flow for visualization purposes. Hence, SIMPAR may prove to be a valuable tool to investigate types of applications and computing resource requirements to provide uninterrupted flow of processed data for real time visualization purposes. The results obtained from the simulation show concurrence with the expected performance using the L-BSP model.
Simulation of partially coherent light propagation using parallel computing devices
Magalhães, Tiago C.; Rebordão, José M.
2017-08-01
Light acquires or loses coherence and coherence is one of the few optical observables. Spectra can be derived from coherence functions and understanding any interferometric experiment is also relying upon coherence functions. Beyond the two limiting cases (full coherence or incoherence) the coherence of light is always partial and it changes with propagation. We have implemented a code to compute the propagation of partially coherent light from the source plane to the observation plane using parallel computing devices (PCDs). In this paper, we restrict the propagation in free space only. To this end, we used the Open Computing Language (OpenCL) and the open-source toolkit PyOpenCL, which gives access to OpenCL parallel computation through Python. To test our code, we chose two coherence source models: an incoherent source and a Gaussian Schell-model source. In the former case, we divided into two different source shapes: circular and rectangular. The results were compared to the theoretical values. Our implemented code allows one to choose between the PyOpenCL implementation and a standard one, i.e using the CPU only. To test the computation time for each implementation (PyOpenCL and standard), we used several computer systems with different CPUs and GPUs. We used powers of two for the dimensions of the cross-spectral density matrix (e.g. 324, 644) and a significant speed increase is observed in the PyOpenCL implementation when compared to the standard one. This can be an important tool for studying new source models.
Random number generators for large-scale parallel Monte Carlo simulations on FPGA
Lin, Y.; Wang, F.; Liu, B.
2018-05-01
Through parallelization, field programmable gate array (FPGA) can achieve unprecedented speeds in large-scale parallel Monte Carlo (LPMC) simulations. FPGA presents both new constraints and new opportunities for the implementations of random number generators (RNGs), which are key elements of any Monte Carlo (MC) simulation system. Using empirical and application based tests, this study evaluates all of the four RNGs used in previous FPGA based MC studies and newly proposed FPGA implementations for two well-known high-quality RNGs that are suitable for LPMC studies on FPGA. One of the newly proposed FPGA implementations: a parallel version of additive lagged Fibonacci generator (Parallel ALFG) is found to be the best among the evaluated RNGs in fulfilling the needs of LPMC simulations on FPGA.
Development of parallel benchmark code by sheet metal forming simulator 'ITAS'
International Nuclear Information System (INIS)
Watanabe, Hiroshi; Suzuki, Shintaro; Minami, Kazuo
1999-03-01
This report describes the development of parallel benchmark code by sheet metal forming simulator 'ITAS'. ITAS is a nonlinear elasto-plastic analysis program by the finite element method for the purpose of the simulation of sheet metal forming. ITAS adopts the dynamic analysis method that computes displacement of sheet metal at every time unit and utilizes the implicit method with the direct linear equation solver. Therefore the simulator is very robust. However, it requires a lot of computational time and memory capacity. In the development of the parallel benchmark code, we designed the code by MPI programming to reduce the computational time. In numerical experiments on the five kinds of parallel super computers at CCSE JAERI, i.e., SP2, SR2201, SX-4, T94 and VPP300, good performances are observed. The result will be shown to the public through WWW so that the benchmark results may become a guideline of research and development of the parallel program. (author)
Modular high-temperature gas-cooled reactor simulation using parallel processors
International Nuclear Information System (INIS)
Ball, S.J.; Conklin, J.C.
1989-01-01
The MHPP (Modular HTGR Parallel Processor) code has been developed to simulate modular high-temperature gas-cooled reactor (MHTGR) transients and accidents. MHPP incorporates a very detailed model for predicting the dynamics of the reactor core, vessel, and cooling systems over a wide variety of scenarios ranging from expected transients to very-low-probability severe accidents. The simulations routines, which had originally been developed entirely as serial code, were readily adapted to parallel processing Fortran. The resulting parallelized simulation speed was enhanced significantly. Workstation interfaces are being developed to provide for user (operator) interaction. In this paper the benefits realized by adapting previous MHTGR codes to run on a parallel processor are discussed, along with results of typical accident analyses
Directory of Open Access Journals (Sweden)
Zhiteng Wang
2014-01-01
Full Text Available Service oriented modeling and simulation are hot issues in the field of modeling and simulation, and there is need to call service resources when simulation task workflow is running. How to optimize the service resource allocation to ensure that the task is complete effectively is an important issue in this area. In military modeling and simulation field, it is important to improve the probability of success and timeliness in simulation task workflow. Therefore, this paper proposes an optimization algorithm for multipath service resource parallel allocation, in which multipath service resource parallel allocation model is built and multiple chains coding scheme quantum optimization algorithm is used for optimization and solution. The multiple chains coding scheme quantum optimization algorithm is to extend parallel search space to improve search efficiency. Through the simulation experiment, this paper investigates the effect for the probability of success in simulation task workflow from different optimization algorithm, service allocation strategy, and path number, and the simulation result shows that the optimization algorithm for multipath service resource parallel allocation is an effective method to improve the probability of success and timeliness in simulation task workflow.
Long-range interactions and parallel scalability in molecular simulations
Patra, M.; Hyvönen, M.T.; Falck, E.; Sabouri-Ghomi, M.; Vattulainen, I.; Karttunen, M.E.J.
2007-01-01
Typical biomolecular systems such as cellular membranes, DNA, and protein complexes are highly charged. Thus, efficient and accurate treatment of electrostatic interactions is of great importance in computational modeling of such systems. We have employed the GROMACS simulation package to perform
Böddeker, B.; Teichler, H.
The MD simulation program TABB is motivated by the need of long time simulations for the investigation of slow processes near the glass transition of glass forming alloys. TABB is written in C++ with a high degree of flexibility: TABB allows the use of any short ranged pair potentials or EAM potentials, by generating and using a spline representation of all functions and their derivatives. TABB supports several numerical integration algorithms like the Runge-Kotta or the modified Gear-predictor-corrector algorithm of order five. The boundary conditions can be chosen to resemble the geometry of bulk materials or films. The simulation box length or the pressure can be fixed for each dimension separately. TABB may be used in isokinetic, isoenergeric or canonic (with random forces) mode. TABB contains a simple instruction interpreter to easily control the parameters and options during the simulation. The same source code can be compiled either for workstations or for parallel computers. The main optimization goal of TABB is to allow long time simulations of medium or small sized systems. To make this possible, much attention is spent on the optimized communication between the nodes. TABB uses a domain decomposition procedure. To use many nodes with a small system, the domain size has to be small compared to the range of particle interactions. In the limit of many nodes for only few atoms, the bottle neck of communication is the latency time. TABB minimizes the number of pairs of domains containing atoms that interact between these domains. This procedure minimizes the need of communication calls between pairs of nodes. TABB decides automatically, to how many, and to which directions the decomposition shall be applied. E.g., in the case of one dimensional domain decomposition, the simulation box is only split into "slabs" along a selected direction. The three dimensional domain decomposition is best with respect to the number of interacting domains only for simulations
Parallel Reservoir Simulations with Sparse Grid Techniques and Applications to Wormhole Propagation
Wu, Yuanqing
2015-09-08
In this work, two topics of reservoir simulations are discussed. The first topic is the two-phase compositional flow simulation in hydrocarbon reservoir. The major obstacle that impedes the applicability of the simulation code is the long run time of the simulation procedure, and thus speeding up the simulation code is necessary. Two means are demonstrated to address the problem: parallelism in physical space and the application of sparse grids in parameter space. The parallel code can gain satisfactory scalability, and the sparse grids can remove the bottleneck of flash calculations. Instead of carrying out the flash calculation in each time step of the simulation, a sparse grid approximation of all possible results of the flash calculation is generated before the simulation. Then the constructed surrogate model is evaluated to approximate the flash calculation results during the simulation. The second topic is the wormhole propagation simulation in carbonate reservoir. In this work, different from the traditional simulation technique relying on the Darcy framework, we propose a new framework called Darcy-Brinkman-Forchheimer framework to simulate wormhole propagation. Furthermore, to process the large quantity of cells in the simulation grid and shorten the long simulation time of the traditional serial code, standard domain-based parallelism is employed, using the Hypre multigrid library. In addition to that, a new technique called “experimenting field approach” to set coefficients in the model equations is introduced. In the 2D dissolution experiments, different configurations of wormholes and a series of properties simulated by both frameworks are compared. We conclude that the numerical results of the DBF framework are more like wormholes and more stable than the Darcy framework, which is a demonstration of the advantages of the DBF framework. The scalability of the parallel code is also evaluated, and good scalability can be achieved. Finally, a mixed
International Nuclear Information System (INIS)
Anderson, D.V.; Shumaker, D.E.
1993-01-01
From a computational standpoint, particle simulation calculations for plasmas have not adapted well to the transitions from scalar to vector processing nor from serial to parallel environments. They have suffered from inordinate and excessive accessing of computer memory and have been hobbled by relatively inefficient gather-scatter constructs resulting from the use of indirect indexing. Lastly, the many-to-one mapping characteristic of the deposition phase has made it difficult to perform this in parallel. The authors' code sorts and reorders the particles in a spatial order. This allows them to greatly reduce the memory references, to run in directly indexed vector mode, and to employ domain decomposition to achieve parallelization. In this hybrid simulation the electrons are modeled as a fluid and the field equations solved are obtained from the electron momentum equation together with the pre-Maxwell equations (displacement current neglected). Either zero or finite electron mass can be used in the electron model. The resulting field equations are solved with an iteratively explicit procedure which is thus trivial to parallelize. Likewise, the field interpolations and the particle pushing is simple to parallelize. The deposition, sorting, and reordering phases are less simple and it is for these that the authors present detailed algorithms. They have now successfully tested the parallel version of HOPS in serial mode and it is now being readied for parallel execution on the Cray C-90. They will then port HOPS to a massively parallel computer, in the next year
Efficient parallel simulation of CO2 geologic sequestration in saline aquifers
International Nuclear Information System (INIS)
Zhang, Keni; Doughty, Christine; Wu, Yu-Shu; Pruess, Karsten
2007-01-01
An efficient parallel simulator for large-scale, long-term CO2 geologic sequestration in saline aquifers has been developed. The parallel simulator is a three-dimensional, fully implicit model that solves large, sparse linear systems arising from discretization of the partial differential equations for mass and energy balance in porous and fractured media. The simulator is based on the ECO2N module of the TOUGH2code and inherits all the process capabilities of the single-CPU TOUGH2code, including a comprehensive description of the thermodynamics and thermophysical properties of H2O-NaCl- CO2 mixtures, modeling single and/or two-phase isothermal or non-isothermal flow processes, two-phase mixtures, fluid phases appearing or disappearing, as well as salt precipitation or dissolution. The new parallel simulator uses MPI for parallel implementation, the METIS software package for simulation domain partitioning, and the iterative parallel linear solver package Aztec for solving linear equations by multiple processors. In addition, the parallel simulator has been implemented with an efficient communication scheme. Test examples show that a linear or super-linear speedup can be obtained on Linux clusters as well as on supercomputers. Because of the significant improvement in both simulation time and memory requirement, the new simulator provides a powerful tool for tackling larger scale and more complex problems than can be solved by single-CPU codes. A high-resolution simulation example is presented that models buoyant convection, induced by a small increase in brine density caused by dissolution of CO2
Monte Carlo simulations of quantum systems on massively parallel supercomputers
International Nuclear Information System (INIS)
Ding, H.Q.
1993-01-01
A large class of quantum physics applications uses operator representations that are discrete integers by nature. This class includes magnetic properties of solids, interacting bosons modeling superfluids and Cooper pairs in superconductors, and Hubbard models for strongly correlated electrons systems. This kind of application typically uses integer data representations and the resulting algorithms are dominated entirely by integer operations. The authors implemented an efficient algorithm for one such application on the Intel Touchstone Delta and iPSC/860. The algorithm uses a multispin coding technique which allows significant data compactification and efficient vectorization of Monte Carlo updates. The algorithm regularly switches between two data decompositions, corresponding naturally to different Monte Carlo updating processes and observable measurements such that only nearest-neighbor communications are needed within a given decomposition. On 128 nodes of Intel Delta, this algorithm updates 183 million spins per second (compared to 21 million on CM-2 and 6.2 million on a Cray Y-MP). A systematic performance analysis shows a better than 90% efficiency in the parallel implementation
Cluster Optimization and Parallelization of Simulations with Dynamically Adaptive Grids
Schreiber, Martin; Weinzierl, Tobias; Bungartz, Hans-Joachim
2013-01-01
The present paper studies solvers for partial differential equations that work on dynamically adaptive grids stemming from spacetrees. Due to the underlying tree formalism, such grids efficiently can be decomposed into connected grid regions (clusters) on-the-fly. A graph on those clusters classified according to their grid invariancy, workload, multi-core affinity, and further meta data represents the inter-cluster communication. While stationary clusters already can be handled more efficiently than their dynamic counterparts, we propose to treat them as atomic grid entities and introduce a skip mechanism that allows the grid traversal to omit those regions completely. The communication graph ensures that the cluster data nevertheless are kept consistent, and several shared memory parallelization strategies are feasible. A hyperbolic benchmark that has to remesh selected mesh regions iteratively to preserve conforming tessellations acts as benchmark for the present work. We discuss runtime improvements resulting from the skip mechanism and the implications on shared memory performance and load balancing. © 2013 Springer-Verlag.
Parallel Performance Optimizations on Unstructured Mesh-based Simulations
Energy Technology Data Exchange (ETDEWEB)
Sarje, Abhinav; Song, Sukhyun; Jacobsen, Douglas; Huck, Kevin; Hollingsworth, Jeffrey; Malony, Allen; Williams, Samuel; Oliker, Leonid
2015-01-01
© The Authors. Published by Elsevier B.V. This paper addresses two key parallelization challenges the unstructured mesh-based ocean modeling code, MPAS-Ocean, which uses a mesh based on Voronoi tessellations: (1) load imbalance across processes, and (2) unstructured data access patterns, that inhibit intra- and inter-node performance. Our work analyzes the load imbalance due to naive partitioning of the mesh, and develops methods to generate mesh partitioning with better load balance and reduced communication. Furthermore, we present methods that minimize both inter- and intranode data movement and maximize data reuse. Our techniques include predictive ordering of data elements for higher cache efficiency, as well as communication reduction approaches. We present detailed performance data when running on thousands of cores using the Cray XC30 supercomputer and show that our optimization strategies can exceed the original performance by over 2×. Additionally, many of these solutions can be broadly applied to a wide variety of unstructured grid-based computations.
Parallel simulation of axon growth in the nervous system
J. Wensch; B.P. Sommeijer (Ben)
2002-01-01
textabstractIn this paper we discuss a model from neurobiology, which describes theoutgrowth of axons from neurons in the nervous system. The model combines ordinary differential equations, defining the movement of the axons, with parabolic partial differential equations. The parabolic equations
Parallel Earthquake Simulations on Large-Scale Multicore Supercomputers
Wu, Xingfu
2011-01-01
Earthquakes are one of the most destructive natural hazards on our planet Earth. Hugh earthquakes striking offshore may cause devastating tsunamis, as evidenced by the 11 March 2011 Japan (moment magnitude Mw9.0) and the 26 December 2004 Sumatra (Mw9.1) earthquakes. Earthquake prediction (in terms of the precise time, place, and magnitude of a coming earthquake) is arguably unfeasible in the foreseeable future. To mitigate seismic hazards from future earthquakes in earthquake-prone areas, such as California and Japan, scientists have been using numerical simulations to study earthquake rupture propagation along faults and seismic wave propagation in the surrounding media on ever-advancing modern computers over past several decades. In particular, ground motion simulations for past and future (possible) significant earthquakes have been performed to understand factors that affect ground shaking in populated areas, and to provide ground shaking characteristics and synthetic seismograms for emergency preparation and design of earthquake-resistant structures. These simulation results can guide the development of more rational seismic provisions for leading to safer, more efficient, and economical50pt]Please provide V. Taylor author e-mail ID. structures in earthquake-prone regions.
DEFF Research Database (Denmark)
Ferraris, Chiara F; Geiker, Mette Rica; Martys, Nicos S
2007-01-01
inapplicable here. This paper presents the analysis of a modified parallel plate rheometer for measuring cement mortar and propose a methodology for calibration using standard oils and numerical simulation of the flow. A lattice Boltzmann method was used to simulate the flow in the modified rheometer, thus...
Parallel-plate rheometer calibration using oil and lattice Boltzmann simulation
DEFF Research Database (Denmark)
Ferraris, Chiara F; Geiker, Mette Rica; Martys, Nicos S.
2007-01-01
compute the viscosity. This paper presents a modified parallel plate rheometer, and proposes means of calibration using standard oils and numerical simulation of the flow. A lattice Boltzmann method was used to simulate the flow in the modified rheometer, thus using an accurate numerical solution in place...
Acceleration of Radiance for Lighting Simulation by Using Parallel Computing with OpenCL
Energy Technology Data Exchange (ETDEWEB)
Zuo, Wangda; McNeil, Andrew; Wetter, Michael; Lee, Eleanor
2011-09-06
We report on the acceleration of annual daylighting simulations for fenestration systems in the Radiance ray-tracing program. The algorithm was optimized to reduce both the redundant data input/output operations and the floating-point operations. To further accelerate the simulation speed, the calculation for matrix multiplications was implemented using parallel computing on a graphics processing unit. We used OpenCL, which is a cross-platform parallel programming language. Numerical experiments show that the combination of the above measures can speed up the annual daylighting simulations 101.7 times or 28.6 times when the sky vector has 146 or 2306 elements, respectively.
Characterization of parallel-hole collimator using Monte Carlo Simulation
International Nuclear Information System (INIS)
Pandey, Anil Kumar; Sharma, Sanjay Kumar; Karunanithi, Sellam; Kumar, Praveen; Bal, Chandrasekhar; Kumar, Rakesh
2015-01-01
Accuracy of in vivo activity quantification improves after the correction of penetrated and scattered photons. However, accurate assessment is not possible with physical experiment. We have used Monte Carlo Simulation to accurately assess the contribution of penetrated and scattered photons in the photopeak window. Simulations were performed with Simulation of Imaging Nuclear Detectors Monte Carlo Code. The simulations were set up in such a way that it provides geometric, penetration, and scatter components after each simulation and writes binary images to a data file. These components were analyzed graphically using Microsoft Excel (Microsoft Corporation, USA). Each binary image was imported in software (ImageJ) and logarithmic transformation was applied for visual assessment of image quality, plotting profile across the center of the images and calculating full width at half maximum (FWHM) in horizontal and vertical directions. The geometric, penetration, and scatter at 140 keV for low-energy general-purpose were 93.20%, 4.13%, 2.67% respectively. Similarly, geometric, penetration, and scatter at 140 keV for low-energy high-resolution (LEHR), medium-energy general-purpose (MEGP), and high-energy general-purpose (HEGP) collimator were (94.06%, 3.39%, 2.55%), (96.42%, 1.52%, 2.06%), and (96.70%, 1.45%, 1.85%), respectively. For MEGP collimator at 245 keV photon and for HEGP collimator at 364 keV were 89.10%, 7.08%, 3.82% and 67.78%, 18.63%, 13.59%, respectively. Low-energy general-purpose and LEHR collimator is best to image 140 keV photon. HEGP can be used for 245 keV and 364 keV; however, correction for penetration and scatter must be applied if one is interested to quantify the in vivo activity of energy 364 keV. Due to heavy penetration and scattering, 511 keV photons should not be imaged with HEGP collimator
Cocaine Use and Delinquent Behavior among High-Risk Youths: A Growth Model of Parallel Processes
Dembo, Richard; Sullivan, Christopher
2009-01-01
We report the results of a parallel-process, latent growth model analysis examining the relationships between cocaine use and delinquent behavior among youths. The study examined a sample of 278 justice-involved juveniles completing at least one of three follow-up interviews as part of a National Institute on Drug Abuse-funded study. The results…
Gyrokinetic simulation of finite-β plasmas on parallel architectures
International Nuclear Information System (INIS)
Reynders, J.V.W.
1993-01-01
Much research exists on the linear and non-linear properties of plasma microinstabilities induced by density and temperature gradients. There has been an interest in the electromagnetic or finite-β effects on these microinstabilities. This thesis focuses on the finite-β modification of an ion temperature gradient (ITG) driven microinstability in a two-dimensional shearless and sheared-slab geometries. A gyrokinetic model is employed in the numerical and analytic studies of this instability. Chapter 1 introduces the electromagnetic gyrokinetic model employed in the numerical and analytic studies of the ITG instability. Some discussion of the Klimontovich particle representation of the gyrokinetic Vlasov equation and a multiple scale model of the background plasma gradient is presented. Chapter 2 details the computational issues facing an electromagnetic gyrokinetic particle simulation of the ITG mode. An electromagnetic extension of the partially linearized algorithm is presented with a comparison of quiet particle initialization routines. Chapter 3 presents and compares algorithms for the gyrokinetic particle simulation technique on SIMD and MIMD computing platforms. Chapter 4 discusses electromagnetic gyrokinetic fluctuation theory and provides a comparison of analytic and numerical results. Chapter 5 contains a linear and a non-linear three-wave coupling analysis of the finite-β modified ITG mode in a shearless slab geometry. Comparisons are made with linear and partially linearized gyrokinetic simulation results. Chapter 6 presents results from a finite-β modified ITG mode in a sheared slab geometry. The linear dispersion relation is derived and results from an integral eigenvalue code are presented. Comparisons are made with the gyrokinetic particle code in a variety of limits with both adiabatic and non-adiabatic electrons. Evidence of ITG driven microtearing is presented
International Nuclear Information System (INIS)
Apisit, Patchimpattapong; Alireza, Haghighat; Shedlock, D.
2003-01-01
An expert system for generating an effective mesh distribution for the SN particle transport simulation has been developed. This expert system consists of two main parts: 1) an algorithm for generating an effective mesh distribution in a serial environment, and 2) an algorithm for inference of an effective domain decomposition strategy for parallel computing. For the first part, the algorithm prepares an effective mesh distribution considering problem physics and the spatial differencing scheme. For the second part, the algorithm determines a parallel-performance-index (PPI), which is defined as the ratio of the granularity to the degree-of-coupling. The parallel-performance-index provides expected performance of an algorithm depending on computing environment and resources. A large index indicates a high granularity algorithm with relatively low coupling among processors. This expert system has been successfully tested within the PENTRAN (Parallel Environment Neutral-Particle Transport) code system for simulating real-life shielding problems. (authors)
Energy Technology Data Exchange (ETDEWEB)
Apisit, Patchimpattapong [Electricity Generating Authority of Thailand, Office of Corporate Planning, Bangkruai, Nonthaburi (Thailand); Alireza, Haghighat; Shedlock, D. [Florida Univ., Department of Nuclear and Radiological Engineering, Gainesville, FL (United States)
2003-07-01
An expert system for generating an effective mesh distribution for the SN particle transport simulation has been developed. This expert system consists of two main parts: 1) an algorithm for generating an effective mesh distribution in a serial environment, and 2) an algorithm for inference of an effective domain decomposition strategy for parallel computing. For the first part, the algorithm prepares an effective mesh distribution considering problem physics and the spatial differencing scheme. For the second part, the algorithm determines a parallel-performance-index (PPI), which is defined as the ratio of the granularity to the degree-of-coupling. The parallel-performance-index provides expected performance of an algorithm depending on computing environment and resources. A large index indicates a high granularity algorithm with relatively low coupling among processors. This expert system has been successfully tested within the PENTRAN (Parallel Environment Neutral-Particle Transport) code system for simulating real-life shielding problems. (authors)
SPEEDES - A multiple-synchronization environment for parallel discrete-event simulation
Steinman, Jeff S.
1992-01-01
Synchronous Parallel Environment for Emulation and Discrete-Event Simulation (SPEEDES) is a unified parallel simulation environment. It supports multiple-synchronization protocols without requiring users to recompile their code. When a SPEEDES simulation runs on one node, all the extra parallel overhead is removed automatically at run time. When the same executable runs in parallel, the user preselects the synchronization algorithm from a list of options. SPEEDES currently runs on UNIX networks and on the California Institute of Technology/Jet Propulsion Laboratory Mark III Hypercube. SPEEDES also supports interactive simulations. Featured in the SPEEDES environment is a new parallel synchronization approach called Breathing Time Buckets. This algorithm uses some of the conservative techniques found in Time Bucket synchronization, along with the optimism that characterizes the Time Warp approach. A mathematical model derived from first principles predicts the performance of Breathing Time Buckets. Along with the Breathing Time Buckets algorithm, this paper discusses the rules for processing events in SPEEDES, describes the implementation of various other synchronization protocols supported by SPEEDES, describes some new ones for the future, discusses interactive simulations, and then gives some performance results.
Xyce Parallel Electronic Simulator - User's Guide, Version 1.0
Energy Technology Data Exchange (ETDEWEB)
HUTCHINSON, SCOTT A; KEITER, ERIC R.; HOEKSTRA, ROBERT J.; WATERS, LON J.; RUSSO, THOMAS V.; RANKIN, ERIC LAMONT; WIX, STEVEN D.
2002-11-01
This manual describes the use of the Xyce Parallel Electronic Simulator code for simulating electrical circuits at a variety of abstraction levels. The Xyce Parallel Electronic Simulator has been written to support,in a rigorous manner, the simulation needs of the Sandia National Laboratories electrical designers. As such, the development has focused on improving the capability over the current state-of-the-art in the following areas: (1) Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). Note that this includes support for most popular parallel and serial computers. (2) Improved performance for all numerical kernels (e.g., time integrator, nonlinear and linear solvers) through state-of-the-art algorithms and novel techniques. (3) A client-server or multi-tiered operating model wherein the numerical kernel can operate independently of the graphical user interface (GUI). (4) Object-oriented code design and implementation using modern coding-practices that ensure that the Xyce Parallel Electronic Simulator will be maintainable and extensible far into the future. The code is a parallel code in the most general sense of the phrase--a message passing parallel implementation--which allows it to run efficiently on the widest possible number of computing platforms. These include serial, shared-memory and distributed-memory parallel as well as heterogeneous platforms. Furthermore, careful attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved even as the number of processors grows. Another feature required by designers is the ability to add device models, many specific to the needs of Sandia, to the code. To this end, the device package in the Xyce Parallel Electronic Simulator is designed to support a variety of device model inputs. These input formats include standard analytical models, behavioral models
An Evaluation of Parallel Synchronous and Conservative Asynchronous Logic-Level Simulations
Directory of Open Access Journals (Sweden)
Ausif Mahmood
1996-01-01
a circuit remain fixed during the entire simulation. We remove this limitation and, by extending the analyses to multi-input, multi-output circuits with an arbitrary number of input events, show that the conservative asynchronous simulation extracts more parallelism and executes faster than synchronous simulation in general. Our conclusions are supported by a comparison of the idealized execution times of synchronous and conservative asynchronous algorithms on ISCAS combinational and sequential benchmark circuits.
Parallel treatment of simulation particles in particle-in-cell codes on SUPRENUM
International Nuclear Information System (INIS)
Seldner, D.
1990-02-01
This report contains the program documentation and description of the program package 2D-PLAS, which has been developed at the Nuclear Research Center Karlsruhe in the Institute for Data Processing in Technology (IDT) under the auspices of the BMFT. 2D-PLAS is a parallel program version of the treatment of the simulation particles of the two-dimensional stationary particle-in-cell code BFCPIC which has been developed at the Nuclear Research Center Karlsruhe. This parallel version has been designed for the parallel computer SUPRENUM. (orig.) [de
A parallel algorithm for transient solid dynamics simulations with contact detection
International Nuclear Information System (INIS)
Attaway, S.; Hendrickson, B.; Plimpton, S.; Gardner, D.; Vaughan, C.; Heinstein, M.; Peery, J.
1996-01-01
Solid dynamics simulations with Lagrangian finite elements are used to model a wide variety of problems, such as the calculation of impact damage to shipping containers for nuclear waste and the analysis of vehicular crashes. Using parallel computers for these simulations has been hindered by the difficulty of searching efficiently for material surface contacts in parallel. A new parallel algorithm for calculation of arbitrary material contacts in finite element simulations has been developed and implemented in the PRONTO3D transient solid dynamics code. This paper will explore some of the issues involved in developing efficient, portable, parallel finite element models for nonlinear transient solid dynamics simulations. The contact-detection problem poses interesting challenges for efficient implementation of a solid dynamics simulation on a parallel computer. The finite element mesh is typically partitioned so that each processor owns a localized region of the finite element mesh. This mesh partitioning is optimal for the finite element portion of the calculation since each processor must communicate only with the few connected neighboring processors that share boundaries with the decomposed mesh. However, contacts can occur between surfaces that may be owned by any two arbitrary processors. Hence, a global search across all processors is required at every time step to search for these contacts. Load-imbalance can become a problem since the finite element decomposition divides the volumetric mesh evenly across processors but typically leaves the surface elements unevenly distributed. In practice, these complications have been limiting factors in the performance and scalability of transient solid dynamics on massively parallel computers. In this paper the authors present a new parallel algorithm for contact detection that overcomes many of these limitations
Monte Carlo simulation of grain growth
Directory of Open Access Journals (Sweden)
Paulo Blikstein
1999-07-01
Full Text Available Understanding and predicting grain growth in Metallurgy is meaningful. Monte Carlo methods have been used in computer simulations in many different fields of knowledge. Grain growth simulation using this method is especially attractive as the statistical behavior of the atoms is properly reproduced; microstructural evolution depends only on the real topology of the grains and not on any kind of geometric simplification. Computer simulation has the advantage of allowing the user to visualize graphically the procedures, even dynamically and in three dimensions. Single-phase alloy grain growth simulation was carried out by calculating the free energy of each atom in the lattice (with its present crystallographic orientation and comparing this value to another one calculated with a different random orientation. When the resulting free energy is lower or equal to the initial value, the new orientation replaces the former. The measure of time is the Monte Carlo Step (MCS, which involves a series of trials throughout the lattice. A very close relationship between experimental and theoretical values for the grain growth exponent (n was observed.
Robust large-scale parallel nonlinear solvers for simulations.
Energy Technology Data Exchange (ETDEWEB)
Bader, Brett William; Pawlowski, Roger Patrick; Kolda, Tamara Gibson (Sandia National Laboratories, Livermore, CA)
2005-11-01
This report documents research to develop robust and efficient solution techniques for solving large-scale systems of nonlinear equations. The most widely used method for solving systems of nonlinear equations is Newton's method. While much research has been devoted to augmenting Newton-based solvers (usually with globalization techniques), little has been devoted to exploring the application of different models. Our research has been directed at evaluating techniques using different models than Newton's method: a lower order model, Broyden's method, and a higher order model, the tensor method. We have developed large-scale versions of each of these models and have demonstrated their use in important applications at Sandia. Broyden's method replaces the Jacobian with an approximation, allowing codes that cannot evaluate a Jacobian or have an inaccurate Jacobian to converge to a solution. Limited-memory methods, which have been successful in optimization, allow us to extend this approach to large-scale problems. We compare the robustness and efficiency of Newton's method, modified Newton's method, Jacobian-free Newton-Krylov method, and our limited-memory Broyden method. Comparisons are carried out for large-scale applications of fluid flow simulations and electronic circuit simulations. Results show that, in cases where the Jacobian was inaccurate or could not be computed, Broyden's method converged in some cases where Newton's method failed to converge. We identify conditions where Broyden's method can be more efficient than Newton's method. We also present modifications to a large-scale tensor method, originally proposed by Bouaricha, for greater efficiency, better robustness, and wider applicability. Tensor methods are an alternative to Newton-based methods and are based on computing a step based on a local quadratic model rather than a linear model. The advantage of Bouaricha's method is that it can use any
Simulating tumor growth in confined heterogeneous environments
International Nuclear Information System (INIS)
Gevertz, Jana L; Torquato, Salvatore; Gillies, George T
2008-01-01
The holy grail of computational tumor modeling is to develop a simulation tool that can be utilized in the clinic to predict neoplastic progression and propose individualized optimal treatment strategies. In order to develop such a predictive model, one must account for many of the complex processes involved in tumor growth. One interaction that has not been incorporated into computational models of neoplastic progression is the impact that organ-imposed physical confinement and heterogeneity have on tumor growth. For this reason, we have taken a cellular automaton algorithm that was originally designed to simulate spherically symmetric tumor growth and generalized the algorithm to incorporate the effects of tissue shape and structure. We show that models that do not account for organ/tissue geometry and topology lead to false conclusions about tumor spread, shape and size. The impact that confinement has on tumor growth is more pronounced when a neoplasm is growing close to, versus far from, the confining boundary. Thus, any clinical simulation tool of cancer progression must not only consider the shape and structure of the organ in which a tumor is growing, but must also consider the location of the tumor within the organ if it is to accurately predict neoplastic growth dynamics
Simulation of reflooding on two parallel heated channel by TRACE
Energy Technology Data Exchange (ETDEWEB)
Zakir, Md. Ghulam [Department of Nuclear Engineering, Chalmers University of Technology, Gothenburg (Sweden)
2016-07-12
In case of Loss-Of-Coolant accident (LOCA) in a Boiling Water Reactor (BWR), heat generated in the nuclear fuel is not adequately removed because of the decrease of the coolant mass flow rate in the reactor core. This fact leads to an increase of the fuel temperature that can cause damage to the core and leakage of the radioactive fission products. In order to reflood the core and to discontinue the increase of temperature, an Emergency Core Cooling System (ECCS) delivers water under this kind of conditions. This study is an investigation of how the power distribution between two channels can affect the process of reflooding when the emergency water is injected from the top of the channels. The peak cladding temperature (PCT) on LOCA transient for different axial level is determined as well. A thermal-hydraulic system code TRACE has been used. A TRACE model of the two heated channels has been developed, and three hypothetical cases with different power distributions have been studied. Later, a comparison between a simulated and experimental data has been shown as well.
Xyce Parallel Electronic Simulator : users' guide, version 2.0.
Energy Technology Data Exchange (ETDEWEB)
Hoekstra, Robert John; Waters, Lon J.; Rankin, Eric Lamont; Fixel, Deborah A.; Russo, Thomas V.; Keiter, Eric Richard; Hutchinson, Scott Alan; Pawlowski, Roger Patrick; Wix, Steven D.
2004-06-01
This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator capable of simulating electrical circuits at a variety of abstraction levels. Primarily, Xyce has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability the current state-of-the-art in the following areas: {sm_bullet} Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). Note that this includes support for most popular parallel and serial computers. {sm_bullet} Improved performance for all numerical kernels (e.g., time integrator, nonlinear and linear solvers) through state-of-the-art algorithms and novel techniques. {sm_bullet} Device models which are specifically tailored to meet Sandia's needs, including many radiation-aware devices. {sm_bullet} A client-server or multi-tiered operating model wherein the numerical kernel can operate independently of the graphical user interface (GUI). {sm_bullet} Object-oriented code design and implementation using modern coding practices that ensure that the Xyce Parallel Electronic Simulator will be maintainable and extensible far into the future. Xyce is a parallel code in the most general sense of the phrase - a message passing of computing platforms. These include serial, shared-memory and distributed-memory parallel implementation - which allows it to run efficiently on the widest possible number parallel as well as heterogeneous platforms. Careful attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows. One feature required by designers is the ability to add device models, many specific to the needs of Sandia, to the code. To this end, the device package in the Xyce
Massively parallel simulator of optical coherence tomography of inhomogeneous turbid media.
Malektaji, Siavash; Lima, Ivan T; Escobar I, Mauricio R; Sherif, Sherif S
2017-10-01
An accurate and practical simulator for Optical Coherence Tomography (OCT) could be an important tool to study the underlying physical phenomena in OCT such as multiple light scattering. Recently, many researchers have investigated simulation of OCT of turbid media, e.g., tissue, using Monte Carlo methods. The main drawback of these earlier simulators is the long computational time required to produce accurate results. We developed a massively parallel simulator of OCT of inhomogeneous turbid media that obtains both Class I diffusive reflectivity, due to ballistic and quasi-ballistic scattered photons, and Class II diffusive reflectivity due to multiply scattered photons. This Monte Carlo-based simulator is implemented on graphic processing units (GPUs), using the Compute Unified Device Architecture (CUDA) platform and programming model, to exploit the parallel nature of propagation of photons in tissue. It models an arbitrary shaped sample medium as a tetrahedron-based mesh and uses an advanced importance sampling scheme. This new simulator speeds up simulations of OCT of inhomogeneous turbid media by about two orders of magnitude. To demonstrate this result, we have compared the computation times of our new parallel simulator and its serial counterpart using two samples of inhomogeneous turbid media. We have shown that our parallel implementation reduced simulation time of OCT of the first sample medium from 407 min to 92 min by using a single GPU card, to 12 min by using 8 GPU cards and to 7 min by using 16 GPU cards. For the second sample medium, the OCT simulation time was reduced from 209 h to 35.6 h by using a single GPU card, and to 4.65 h by using 8 GPU cards, and to only 2 h by using 16 GPU cards. Therefore our new parallel simulator is considerably more practical to use than its central processing unit (CPU)-based counterpart. Our new parallel OCT simulator could be a practical tool to study the different physical phenomena underlying OCT
From Massively Parallel Algorithms and Fluctuating Time Horizons to Nonequilibrium Surface Growth
International Nuclear Information System (INIS)
Korniss, G.; Toroczkai, Z.; Novotny, M. A.; Rikvold, P. A.
2000-01-01
We study the asymptotic scaling properties of a massively parallel algorithm for discrete-event simulations where the discrete events are Poisson arrivals. The evolution of the simulated time horizon is analogous to a nonequilibrium surface. Monte Carlo simulations and a coarse-grained approximation indicate that the macroscopic landscape in the steady state is governed by the Edwards-Wilkinson Hamiltonian. Since the efficiency of the algorithm corresponds to the density of local minima in the associated surface, our results imply that the algorithm is asymptotically scalable. (c) 2000 The American Physical Society
Parallel simulation of tsunami inundation on a large-scale supercomputer
Oishi, Y.; Imamura, F.; Sugawara, D.
2013-12-01
An accurate prediction of tsunami inundation is important for disaster mitigation purposes. One approach is to approximate the tsunami wave source through an instant inversion analysis using real-time observation data (e.g., Tsushima et al., 2009) and then use the resulting wave source data in an instant tsunami inundation simulation. However, a bottleneck of this approach is the large computational cost of the non-linear inundation simulation and the computational power of recent massively parallel supercomputers is helpful to enable faster than real-time execution of a tsunami inundation simulation. Parallel computers have become approximately 1000 times faster in 10 years (www.top500.org), and so it is expected that very fast parallel computers will be more and more prevalent in the near future. Therefore, it is important to investigate how to efficiently conduct a tsunami simulation on parallel computers. In this study, we are targeting very fast tsunami inundation simulations on the K computer, currently the fastest Japanese supercomputer, which has a theoretical peak performance of 11.2 PFLOPS. One computing node of the K computer consists of 1 CPU with 8 cores that share memory, and the nodes are connected through a high-performance torus-mesh network. The K computer is designed for distributed-memory parallel computation, so we have developed a parallel tsunami model. Our model is based on TUNAMI-N2 model of Tohoku University, which is based on a leap-frog finite difference method. A grid nesting scheme is employed to apply high-resolution grids only at the coastal regions. To balance the computation load of each CPU in the parallelization, CPUs are first allocated to each nested layer in proportion to the number of grid points of the nested layer. Using CPUs allocated to each layer, 1-D domain decomposition is performed on each layer. In the parallel computation, three types of communication are necessary: (1) communication to adjacent neighbours for the
Parallelization of a beam dynamics code and first large scale radio frequency quadrupole simulations
Directory of Open Access Journals (Sweden)
J. Xu
2007-01-01
Full Text Available The design and operation support of hadron (proton and heavy-ion linear accelerators require substantial use of beam dynamics simulation tools. The beam dynamics code TRACK has been originally developed at Argonne National Laboratory (ANL to fulfill the special requirements of the rare isotope accelerator (RIA accelerator systems. From the beginning, the code has been developed to make it useful in the three stages of a linear accelerator project, namely, the design, commissioning, and operation of the machine. To realize this concept, the code has unique features such as end-to-end simulations from the ion source to the final beam destination and automatic procedures for tuning of a multiple charge state heavy-ion beam. The TRACK code has become a general beam dynamics code for hadron linacs and has found wide applications worldwide. Until recently, the code has remained serial except for a simple parallelization used for the simulation of multiple seeds to study the machine errors. To speed up computation, the TRACK Poisson solver has been parallelized. This paper discusses different parallel models for solving the Poisson equation with the primary goal to extend the scalability of the code onto 1024 and more processors of the new generation of supercomputers known as BlueGene (BG/L. Domain decomposition techniques have been adapted and incorporated into the parallel version of the TRACK code. To demonstrate the new capabilities of the parallelized TRACK code, the dynamics of a 45 mA proton beam represented by 10^{8} particles has been simulated through the 325 MHz radio frequency quadrupole and initial accelerator section of the proposed FNAL proton driver. The results show the benefits and advantages of large-scale parallel computing in beam dynamics simulations.
Arabidopsis Growth Simulation Using Image Processing Technology
Directory of Open Access Journals (Sweden)
Junmei Zhang
2014-01-01
Full Text Available This paper aims to provide a method to represent the virtual Arabidopsis plant at each growth stage. It includes simulating the shape and providing growth parameters. The shape is described with elliptic Fourier descriptors. First, the plant is segmented from the background with the chromatic coordinates. With the segmentation result, the outer boundary series are obtained by using boundary tracking algorithm. The elliptic Fourier analysis is then carried out to extract the coefficients of the contour. The coefficients require less storage than the original contour points and can be used to simulate the shape of the plant. The growth parameters include total area and the number of leaves of the plant. The total area is obtained with the number of the plant pixels and the image calibration result. The number of leaves is derived by detecting the apex of each leaf. It is achieved by using wavelet transform to identify the local maximum of the distance signal between the contour points and the region centroid. Experiment result shows that this method can record the growth stage of Arabidopsis plant with fewer data and provide a visual platform for plant growth research.
A parallel simulated annealing algorithm for standard cell placement on a hypercube computer
Jones, Mark Howard
1987-01-01
A parallel version of a simulated annealing algorithm is presented which is targeted to run on a hypercube computer. A strategy for mapping the cells in a two dimensional area of a chip onto processors in an n-dimensional hypercube is proposed such that both small and large distance moves can be applied. Two types of moves are allowed: cell exchanges and cell displacements. The computation of the cost function in parallel among all the processors in the hypercube is described along with a distributed data structure that needs to be stored in the hypercube to support parallel cost evaluation. A novel tree broadcasting strategy is used extensively in the algorithm for updating cell locations in the parallel environment. Studies on the performance of the algorithm on example industrial circuits show that it is faster and gives better final placement results than the uniprocessor simulated annealing algorithms. An improved uniprocessor algorithm is proposed which is based on the improved results obtained from parallelization of the simulated annealing algorithm.
Parallel electric fields in a simulation of magnetotail reconnection and plasmoid evolution
International Nuclear Information System (INIS)
Hesse, M.; Birn, J.
1990-01-01
Properties of the electric field component parallel to the magnetic field are investigate in a 3D MHD simulation of plasmoid formation and evolution in the magnetotail, in the presence of a net dawn-dusk magnetic field component. The spatial localization of E-parallel, and the concept of a diffusion zone and the role of E-parallel in accelerating electrons are discussed. A localization of the region of enhanced E-parallel in all space directions is found, with a strong concentration in the z direction. This region is identified as the diffusion zone, which plays a crucial role in reconnection theory through the local break-down of magnetic flux conservation. 12 refs
International Nuclear Information System (INIS)
Al-Hallaq, A.; Amin, S.
1998-01-01
This paper introduces a new parallel algorithm and its simulation on a hypercube simulator for the low pass digital image filtering using a systolic array. This new algorithm is faster than the old one (Amin, 1988). This is due to the the fact that the old algorithm carries out the addition operations in a sequential mode. But in our new design these addition operations are divided into tow groups, which can be performed in parallel. One group will be performed on one half of the systolic array and the other on the second half, that is, by folding. This parallelism reduces the time required for the whole process by almost quarter the time of the old algorithm.(authors). 18 refs., 3 figs
STOCHSIMGPU: parallel stochastic simulation for the Systems Biology Toolbox 2 for MATLAB
Klingbeil, G.
2011-02-25
Motivation: The importance of stochasticity in biological systems is becoming increasingly recognized and the computational cost of biologically realistic stochastic simulations urgently requires development of efficient software. We present a new software tool STOCHSIMGPU that exploits graphics processing units (GPUs) for parallel stochastic simulations of biological/chemical reaction systems and show that significant gains in efficiency can be made. It is integrated into MATLAB and works with the Systems Biology Toolbox 2 (SBTOOLBOX2) for MATLAB. Results: The GPU-based parallel implementation of the Gillespie stochastic simulation algorithm (SSA), the logarithmic direct method (LDM) and the next reaction method (NRM) is approximately 85 times faster than the sequential implementation of the NRM on a central processing unit (CPU). Using our software does not require any changes to the user\\'s models, since it acts as a direct replacement of the stochastic simulation software of the SBTOOLBOX2. © The Author 2011. Published by Oxford University Press. All rights reserved.
Xyce parallel electronic simulator reference guide, Version 6.0.1.
Energy Technology Data Exchange (ETDEWEB)
Keiter, Eric R; Mei, Ting; Russo, Thomas V.; Schiek, Richard Louis; Thornquist, Heidi K.; Verley, Jason C.; Fixel, Deborah A.; Coffey, Todd S; Pawlowski, Roger P; Warrender, Christina E.; Baur, David Gregory.
2014-01-01
This document is a reference guide to the Xyce Parallel Electronic Simulator, and is a companion document to the Xyce Users Guide [1] . The focus of this document is (to the extent possible) exhaustively list device parameters, solver options, parser options, and other usage details of Xyce. This document is not intended to be a tutorial. Users who are new to circuit simulation are better served by the Xyce Users Guide [1] .
Parallel shooting methods for finding steady state solutions to engine simulation models
DEFF Research Database (Denmark)
Andersen, Stig Kildegård; Thomsen, Per Grove; Carlsen, Henrik
2007-01-01
Parallel single- and multiple shooting methods were tested for finding periodic steady state solutions to a Stirling engine model. The model was used to illustrate features of the methods and possibilities for optimisations. Performance was measured using simulation of an experimental data set...
MaMiCo: Software design for parallel molecular-continuum flow simulations
Neumann, Philipp; Flohr, Hanno; Arora, Rahul; Jarmatz, Piet; Tchipev, Nikola; Bungartz, Hans-Joachim
2015-01-01
The macro-micro-coupling tool (MaMiCo) was developed to ease the development of and modularize molecular-continuum simulations, retaining sequential and parallel performance. We demonstrate the functionality and performance of MaMiCo by coupling
Simulating streamer discharges in 3D with the parallel adaptive Afivo framework
H.J. Teunissen (Jannis); U. M. Ebert (Ute)
2017-01-01
htmlabstractWe present an open-source plasma fluid code for 2D, cylindrical and 3D simulations of streamer discharges, based on the Afivo framework that features adaptive mesh refinement, geometric multigrid methods for Poisson's equation, and OpenMP parallelism. We describe the numerical
A software framework for the portable parallelization of particle-mesh simulations
DEFF Research Database (Denmark)
Sbalzarini, I.F.; Walther, Jens Honore; Polasek, B.
2006-01-01
Abstract: We present a software framework for the transparent and portable parallelization of simulations using particle-mesh methods. Particles are used to transport physical properties and a mesh is required in order to reinitialize the distorted particle locations, ensuring the convergence...
Generation of Random Numbers and Parallel Random Number Streams for Monte Carlo Simulations
Directory of Open Access Journals (Sweden)
L. Yu. Barash
2012-01-01
Full Text Available Modern methods and libraries for high quality pseudorandom number generation and for generation of parallel random number streams for Monte Carlo simulations are considered. The probability equidistribution property and the parameters when the property holds at dimensions up to logarithm of mesh size are considered for Multiple Recursive Generators.
Parallel spatial direct numerical simulations on the Intel iPSC/860 hypercube
Joslin, Ronald D.; Zubair, Mohammad
1993-01-01
The implementation and performance of a parallel spatial direct numerical simulation (PSDNS) approach on the Intel iPSC/860 hypercube is documented. The direct numerical simulation approach is used to compute spatially evolving disturbances associated with the laminar-to-turbulent transition in boundary-layer flows. The feasibility of using the PSDNS on the hypercube to perform transition studies is examined. The results indicate that the direct numerical simulation approach can effectively be parallelized on a distributed-memory parallel machine. By increasing the number of processors nearly ideal linear speedups are achieved with nonoptimized routines; slower than linear speedups are achieved with optimized (machine dependent library) routines. This slower than linear speedup results because the Fast Fourier Transform (FFT) routine dominates the computational cost and because the routine indicates less than ideal speedups. However with the machine-dependent routines the total computational cost decreases by a factor of 4 to 5 compared with standard FORTRAN routines. The computational cost increases linearly with spanwise wall-normal and streamwise grid refinements. The hypercube with 32 processors was estimated to require approximately twice the amount of Cray supercomputer single processor time to complete a comparable simulation; however it is estimated that a subgrid-scale model which reduces the required number of grid points and becomes a large-eddy simulation (PSLES) would reduce the computational cost and memory requirements by a factor of 10 over the PSDNS. This PSLES implementation would enable transition simulations on the hypercube at a reasonable computational cost.
Massive Parallelism of Monte-Carlo Simulation on Low-End Hardware using Graphic Processing Units
Energy Technology Data Exchange (ETDEWEB)
Mburu, Joe Mwangi; Hah, Chang Joo Hah [KEPCO International Nuclear Graduate School, Ulsan (Korea, Republic of)
2014-05-15
Within the past decade, research has been done on utilizing GPU massive parallelization in core simulation with impressive results but unfortunately, not much commercial application has been done in the nuclear field especially in reactor core simulation. The purpose of this paper is to give an introductory concept on the topic and illustrate the potential of exploiting the massive parallel nature of GPU computing on a simple monte-carlo simulation with very minimal hardware specifications. To do a comparative analysis, a simple two dimension monte-carlo simulation is implemented for both the CPU and GPU in order to evaluate performance gain based on the computing devices. The heterogeneous platform utilized in this analysis is done on a slow notebook with only 1GHz processor. The end results are quite surprising whereby high speedups obtained are almost a factor of 10. In this work, we have utilized heterogeneous computing in a GPU-based approach in applying potential high arithmetic intensive calculation. By applying a complex monte-carlo simulation on GPU platform, we have speed up the computational process by almost a factor of 10 based on one million neutrons. This shows how easy, cheap and efficient it is in using GPU in accelerating scientific computing and the results should encourage in exploring further this avenue especially in nuclear reactor physics simulation where deterministic and stochastic calculations are quite favourable in parallelization.
Massive Parallelism of Monte-Carlo Simulation on Low-End Hardware using Graphic Processing Units
International Nuclear Information System (INIS)
Mburu, Joe Mwangi; Hah, Chang Joo Hah
2014-01-01
Within the past decade, research has been done on utilizing GPU massive parallelization in core simulation with impressive results but unfortunately, not much commercial application has been done in the nuclear field especially in reactor core simulation. The purpose of this paper is to give an introductory concept on the topic and illustrate the potential of exploiting the massive parallel nature of GPU computing on a simple monte-carlo simulation with very minimal hardware specifications. To do a comparative analysis, a simple two dimension monte-carlo simulation is implemented for both the CPU and GPU in order to evaluate performance gain based on the computing devices. The heterogeneous platform utilized in this analysis is done on a slow notebook with only 1GHz processor. The end results are quite surprising whereby high speedups obtained are almost a factor of 10. In this work, we have utilized heterogeneous computing in a GPU-based approach in applying potential high arithmetic intensive calculation. By applying a complex monte-carlo simulation on GPU platform, we have speed up the computational process by almost a factor of 10 based on one million neutrons. This shows how easy, cheap and efficient it is in using GPU in accelerating scientific computing and the results should encourage in exploring further this avenue especially in nuclear reactor physics simulation where deterministic and stochastic calculations are quite favourable in parallelization
Simulation of a parallel processor on a serial processor: The neutron diffusion equation
International Nuclear Information System (INIS)
Honeck, H.C.
1981-01-01
Parallel processors could provide the nuclear industry with very high computing power at a very moderate cost. Will we be able to make effective use of this power. This paper explores the use of a very simple parallel processor for solving the neutron diffusion equation to predict power distributions in a nuclear reactor. We first describe a simple parallel processor and estimate its theoretical performance based on the current hardware technology. Next, we show how the parallel processor could be used to solve the neutron diffusion equation. We then present the results of some simulations of a parallel processor run on a serial processor and measure some of the expected inefficiencies. Finally we extrapolate the results to estimate how actual design codes would perform. We find that the standard numerical methods for solving the neutron diffusion equation are still applicable when used on a parallel processor. However, some simple modifications to these methods will be necessary if we are to achieve the full power of these new computers. (orig.) [de
Parallel Simulation of Three-Dimensional Free Surface Fluid Flow Problems
International Nuclear Information System (INIS)
BAER, THOMAS A.; SACKINGER, PHILIP A.; SUBIA, SAMUEL R.
1999-01-01
Simulation of viscous three-dimensional fluid flow typically involves a large number of unknowns. When free surfaces are included, the number of unknowns increases dramatically. Consequently, this class of problem is an obvious application of parallel high performance computing. We describe parallel computation of viscous, incompressible, free surface, Newtonian fluid flow problems that include dynamic contact fines. The Galerkin finite element method was used to discretize the fully-coupled governing conservation equations and a ''pseudo-solid'' mesh mapping approach was used to determine the shape of the free surface. In this approach, the finite element mesh is allowed to deform to satisfy quasi-static solid mechanics equations subject to geometric or kinematic constraints on the boundaries. As a result, nodal displacements must be included in the set of unknowns. Other issues discussed are the proper constraints appearing along the dynamic contact line in three dimensions. Issues affecting efficient parallel simulations include problem decomposition to equally distribute computational work among a SPMD computer and determination of robust, scalable preconditioners for the distributed matrix systems that must be solved. Solution continuation strategies important for serial simulations have an enhanced relevance in a parallel coquting environment due to the difficulty of solving large scale systems. Parallel computations will be demonstrated on an example taken from the coating flow industry: flow in the vicinity of a slot coater edge. This is a three dimensional free surface problem possessing a contact line that advances at the web speed in one region but transitions to static behavior in another region. As such, a significant fraction of the computational time is devoted to processing boundary data. Discussion focuses on parallel speed ups for fixed problem size, a class of problems of immediate practical importance
A 3D gyrokinetic particle-in-cell simulation of fusion plasma microturbulence on parallel computers
Williams, T. J.
1992-12-01
One of the grand challenge problems now supported by HPCC is the Numerical Tokamak Project. A goal of this project is the study of low-frequency micro-instabilities in tokamak plasmas, which are believed to cause energy loss via turbulent thermal transport across the magnetic field lines. An important tool in this study is gyrokinetic particle-in-cell (PIC) simulation. Gyrokinetic, as opposed to fully-kinetic, methods are particularly well suited to the task because they are optimized to study the frequency and wavelength domain of the microinstabilities. Furthermore, many researchers now employ low-noise delta(f) methods to greatly reduce statistical noise by modelling only the perturbation of the gyrokinetic distribution function from a fixed background, not the entire distribution function. In spite of the increased efficiency of these improved algorithms over conventional PIC algorithms, gyrokinetic PIC simulations of tokamak micro-turbulence are still highly demanding of computer power--even fully-vectorized codes on vector supercomputers. For this reason, we have worked for several years to redevelop these codes on massively parallel computers. We have developed 3D gyrokinetic PIC simulation codes for SIMD and MIMD parallel processors, using control-parallel, data-parallel, and domain-decomposition message-passing (DDMP) programming paradigms. This poster summarizes our earlier work on codes for the Connection Machine and BBN TC2000 and our development of a generic DDMP code for distributed-memory parallel machines. We discuss the memory-access issues which are of key importance in writing parallel PIC codes, with special emphasis on issues peculiar to gyrokinetic PIC. We outline the domain decompositions in our new DDMP code and discuss the interplay of different domain decompositions suited for the particle-pushing and field-solution components of the PIC algorithm.
Simulated parallel annealing within a neighborhood for optimization of biomechanical systems.
Higginson, J S; Neptune, R R; Anderson, F C
2005-09-01
Optimization problems for biomechanical systems have become extremely complex. Simulated annealing (SA) algorithms have performed well in a variety of test problems and biomechanical applications; however, despite advances in computer speed, convergence to optimal solutions for systems of even moderate complexity has remained prohibitive. The objective of this study was to develop a portable parallel version of a SA algorithm for solving optimization problems in biomechanics. The algorithm for simulated parallel annealing within a neighborhood (SPAN) was designed to minimize interprocessor communication time and closely retain the heuristics of the serial SA algorithm. The computational speed of the SPAN algorithm scaled linearly with the number of processors on different computer platforms for a simple quadratic test problem and for a more complex forward dynamic simulation of human pedaling.
International Nuclear Information System (INIS)
Rios, Paulo R; Assis, Weslley L S; Ribeiro, Tatiana C S; Villa, Elena
2012-01-01
In a classical paper, Cahn derived expressions for the kinetics of transformations nucleated on random planes and lines. He used those as a model for nucleation on the boundaries, edges and vertices of a polycrystal consisting of equiaxed grains. In this paper it is demonstrated that Cahn's expression for random planes may be used in situations beyond the scope envisaged in Cahn's original paper. For instance, we derived an expression for the kinetics of transformations nucleated on random parallel planes that is identical to that formerly obtained by Cahn considering random planes. Computer simulation of transformations nucleated on random parallel planes is carried out. It is shown that there is excellent agreement between simulated results and analytical solutions. Such an agreement is to be expected if both the simulation and the analytical solution are correct. (paper)
Mesoscopic simulation of recrystallization and grain growth
International Nuclear Information System (INIS)
Rollett, A.D.
2000-01-01
A brief summary of simulation techniques for recrystallization and grain growth is given. The available methods include surface evolver, front tracking (including finite element methods and vertex methods), networks of curves, phase field, cellular automata, and Monte Carlo. Two of the models that use a regular lattice, the Potts model and the Cellular Automaton (CA) model, have proved to be very useful. Microstructure is represented on a discrete lattice where the value of the field at each point represents the local orientation of the material and boundaries exist between points of unlike orientation. Two issues are discussed: one is a hybrid approach to combining the standard Monte Carlo and cellular automata algorithms for recrystallization modeling. The second is adaptation of the MC method for modeling grain growth (and recrystallization) with physically based boundary properties. Both models have significant limitations in their standard forms. The CA model is very useful and efficient for simulating recrystallization with deterministic motion of the recrystallization fronts. It can be adapted to simulate curvature driven migration provided that multiple sub-lattices are used with a probabilistic switching rule. The Potts model is very successful in modeling curvature driven boundary migration and grain growth. It does not simulate the proportionality between boundary velocity and a stored energy driving force, however, unless rather restricted conditions of stored energy (in relation to the grain boundary energy) and lattice temperature are satisfied. A new approach based on a hybrid of the Potts model (MC) and the Cellular Automaton (CA) model has been developed to obtain the desired limiting behavior for both curvature-driven and stored energy-driven grain boundary migration. The combination of methods is achieved by interleaving the two different types of reorientation event in time. The results show that the hybrid algorithm models the Gibbs
Application of parallel computing to seismic damage process simulation of an arch dam
International Nuclear Information System (INIS)
Zhong Hong; Lin Gao; Li Jianbo
2010-01-01
The simulation of damage process of high arch dam subjected to strong earthquake shocks is significant to the evaluation of its performance and seismic safety, considering the catastrophic effect of dam failure. However, such numerical simulation requires rigorous computational capacity. Conventional serial computing falls short of that and parallel computing is a fairly promising solution to this problem. The parallel finite element code PDPAD was developed for the damage prediction of arch dams utilizing the damage model with inheterogeneity of concrete considered. Developed with programming language Fortran, the code uses a master/slave mode for programming, domain decomposition method for allocation of tasks, MPI (Message Passing Interface) for communication and solvers from AZTEC library for solution of large-scale equations. Speedup test showed that the performance of PDPAD was quite satisfactory. The code was employed to study the damage process of a being-built arch dam on a 4-node PC Cluster, with more than one million degrees of freedom considered. The obtained damage mode was quite similar to that of shaking table test, indicating that the proposed procedure and parallel code PDPAD has a good potential in simulating seismic damage mode of arch dams. With the rapidly growing need for massive computation emerged from engineering problems, parallel computing will find more and more applications in pertinent areas.
Parallel Monte Carlo simulations on an ARC-enabled computing grid
International Nuclear Information System (INIS)
Nilsen, Jon K; Samset, Bjørn H
2011-01-01
Grid computing opens new possibilities for running heavy Monte Carlo simulations of physical systems in parallel. The presentation gives an overview of GaMPI, a system for running an MPI-based random walker simulation on grid resources. Integrating the ARC middleware and the new storage system Chelonia with the Ganga grid job submission and control system, we show that MPI jobs can be run on a world-wide computing grid with good performance and promising scaling properties. Results for relatively communication-heavy Monte Carlo simulations run on multiple heterogeneous, ARC-enabled computing clusters in several countries are presented.
A method for data handling numerical results in parallel OpenFOAM simulations
International Nuclear Information System (INIS)
nd Vasile Pârvan Ave., 300223, TM Timişoara, Romania, alin.anton@cs.upt.ro (Romania))" data-affiliation=" (Faculty of Automatic Control and Computing, Politehnica University of Timişoara, 2nd Vasile Pârvan Ave., 300223, TM Timişoara, Romania, alin.anton@cs.upt.ro (Romania))" >Anton, Alin; th Mihai Viteazu Ave., 300221, TM Timişoara (Romania))" data-affiliation=" (Center for Advanced Research in Engineering Science, Romanian Academy – Timişoara Branch, 24th Mihai Viteazu Ave., 300221, TM Timişoara (Romania))" >Muntean, Sebastian
2015-01-01
Parallel computational fluid dynamics simulations produce vast amount of numerical result data. This paper introduces a method for reducing the size of the data by replaying the interprocessor traffic. The results are recovered only in certain regions of interest configured by the user. A known test case is used for several mesh partitioning scenarios using the OpenFOAM toolkit ® [1]. The space savings obtained with classic algorithms remain constant for more than 60 Gb of floating point data. Our method is most efficient on large simulation meshes and is much better suited for compressing large scale simulation results than the regular algorithms
A method for data handling numerical results in parallel OpenFOAM simulations
Energy Technology Data Exchange (ETDEWEB)
Anton, Alin [Faculty of Automatic Control and Computing, Politehnica University of Timişoara, 2" n" d Vasile Pârvan Ave., 300223, TM Timişoara, Romania, alin.anton@cs.upt.ro (Romania); Muntean, Sebastian [Center for Advanced Research in Engineering Science, Romanian Academy – Timişoara Branch, 24" t" h Mihai Viteazu Ave., 300221, TM Timişoara (Romania)
2015-12-31
Parallel computational fluid dynamics simulations produce vast amount of numerical result data. This paper introduces a method for reducing the size of the data by replaying the interprocessor traffic. The results are recovered only in certain regions of interest configured by the user. A known test case is used for several mesh partitioning scenarios using the OpenFOAM toolkit{sup ®}[1]. The space savings obtained with classic algorithms remain constant for more than 60 Gb of floating point data. Our method is most efficient on large simulation meshes and is much better suited for compressing large scale simulation results than the regular algorithms.
Application of parallel computing techniques to a large-scale reservoir simulation
International Nuclear Information System (INIS)
Zhang, Keni; Wu, Yu-Shu; Ding, Chris; Pruess, Karsten
2001-01-01
Even with the continual advances made in both computational algorithms and computer hardware used in reservoir modeling studies, large-scale simulation of fluid and heat flow in heterogeneous reservoirs remains a challenge. The problem commonly arises from intensive computational requirement for detailed modeling investigations of real-world reservoirs. This paper presents the application of a massive parallel-computing version of the TOUGH2 code developed for performing large-scale field simulations. As an application example, the parallelized TOUGH2 code is applied to develop a three-dimensional unsaturated-zone numerical model simulating flow of moisture, gas, and heat in the unsaturated zone of Yucca Mountain, Nevada, a potential repository for high-level radioactive waste. The modeling approach employs refined spatial discretization to represent the heterogeneous fractured tuffs of the system, using more than a million 3-D gridblocks. The problem of two-phase flow and heat transfer within the model domain leads to a total of 3,226,566 linear equations to be solved per Newton iteration. The simulation is conducted on a Cray T3E-900, a distributed-memory massively parallel computer. Simulation results indicate that the parallel computing technique, as implemented in the TOUGH2 code, is very efficient. The reliability and accuracy of the model results have been demonstrated by comparing them to those of small-scale (coarse-grid) models. These comparisons show that simulation results obtained with the refined grid provide more detailed predictions of the future flow conditions at the site, aiding in the assessment of proposed repository performance
Design of a real-time wind turbine simulator using a custom parallel architecture
Hoffman, John A.; Gluck, R.; Sridhar, S.
1995-01-01
The design of a new parallel-processing digital simulator is described. The new simulator has been developed specifically for analysis of wind energy systems in real time. The new processor has been named: the Wind Energy System Time-domain simulator, version 3 (WEST-3). Like previous WEST versions, WEST-3 performs many computations in parallel. The modules in WEST-3 are pure digital processors, however. These digital processors can be programmed individually and operated in concert to achieve real-time simulation of wind turbine systems. Because of this programmability, WEST-3 is very much more flexible and general than its two predecessors. The design features of WEST-3 are described to show how the system produces high-speed solutions of nonlinear time-domain equations. WEST-3 has two very fast Computational Units (CU's) that use minicomputer technology plus special architectural features that make them many times faster than a microcomputer. These CU's are needed to perform the complex computations associated with the wind turbine rotor system in real time. The parallel architecture of the CU causes several tasks to be done in each cycle, including an IO operation and the combination of a multiply, add, and store. The WEST-3 simulator can be expanded at any time for additional computational power. This is possible because the CU's interfaced to each other and to other portions of the simulation using special serial buses. These buses can be 'patched' together in essentially any configuration (in a manner very similar to the programming methods used in analog computation) to balance the input/ output requirements. CU's can be added in any number to share a given computational load. This flexible bus feature is very different from many other parallel processors which usually have a throughput limit because of rigid bus architecture.
Coupling methods for parallel running RELAPSim codes in nuclear power plant simulation
Energy Technology Data Exchange (ETDEWEB)
Li, Yankai; Lin, Meng, E-mail: linmeng@sjtu.edu.cn; Yang, Yanhua
2016-02-15
When the plant is modeled detailedly for high precision, it is hard to achieve real-time calculation for one single RELAP5 in a large-scale simulation. To improve the speed and ensure the precision of simulation at the same time, coupling methods for parallel running RELAPSim codes were proposed in this study. Explicit coupling method via coupling boundaries was realized based on a data-exchange and procedure-control environment. Compromise of synchronization frequency was well considered to improve the precision of simulation and guarantee the real-time simulation at the same time. The coupling methods were assessed using both single-phase flow models and two-phase flow models and good agreements were obtained between the splitting–coupling models and the integrated model. The mitigation of SGTR was performed as an integral application of the coupling models. A large-scope NPP simulator was developed adopting six splitting–coupling models of RELAPSim and other simulation codes. The coupling models could improve the speed of simulation significantly and make it possible for real-time calculation. In this paper, the coupling of the models in the engineering simulator is taken as an example to expound the coupling methods, i.e., coupling between parallel running RELAPSim codes, and coupling between RELAPSim code and other types of simulation codes. However, the coupling methods are also referable in other simulator, for example, a simulator employing ATHLETE instead of RELAP5, other logic code instead of SIMULINK. It is believed the coupling method is commonly used for NPP simulator regardless of the specific codes chosen in this paper.
Xyce Parallel Electronic Simulator Users' Guide Version 6.7.
Energy Technology Data Exchange (ETDEWEB)
Keiter, Eric R. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Aadithya, Karthik Venkatraman [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Mei, Ting [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Russo, Thomas V. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Schiek, Richard [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Sholander, Peter E. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Thornquist, Heidi K. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Verley, Jason [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)
2017-05-01
This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: Capability to solve extremely large circuit problems by supporting large-scale parallel com- puting platforms (up to thousands of processors). This includes support for most popular parallel and serial computers. A differential-algebraic-equation (DAE) formulation, which better isolates the device model package from solver algorithms. This allows one to develop new types of analysis without requiring the implementation of analysis-specific device models. Device models that are specifically tailored to meet Sandia's needs, including some radiation- aware devices (for Sandia users only). Object-oriented code design and implementation using modern coding practices. Xyce is a parallel code in the most general sense of the phrase -- a message passing parallel implementation -- which allows it to run efficiently a wide range of computing platforms. These include serial, shared-memory and distributed-memory parallel platforms. Attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows. The information herein is subject to change without notice. Copyright c 2002-2017 Sandia Corporation. All rights reserved. Trademarks Xyce TM Electronic Simulator and Xyce TM are trademarks of Sandia Corporation. Orcad, Orcad Capture, PSpice and Probe are registered trademarks of Cadence Design Systems, Inc. Microsoft, Windows and Windows 7 are registered trademarks of Microsoft Corporation. Medici, DaVinci and Taurus are registered trademarks of Synopsys Corporation. Amtec and TecPlot are trademarks of
Directory of Open Access Journals (Sweden)
Pedro Luiz Guzzo
2004-06-01
Full Text Available In the present study, the morphology and the impurity distribution were investigated in growth sectors formed around the [0001] axis of synthetic quartz crystals. Plates containing cylindrical holes and cylindrical bars parallel to [0001] were prepared by ultrasonic machining and further used as seed-crystals. The hydrothermal growth of synthetic quartz was carried out in a commercial autoclave under NaOH solution during 50 days. The morphologies of crystals grown from cylindrical seeds were characterized by X-ray diffraction topography. For both types of crystals, +X- and X- growth sectors were distinctly observed. Infrared spectroscopy and ionizing radiation were adopted to reveal the distribution of point defects related to Si-Al substitution and OH-species. It was found a different distribution of Al-related centers in relation to the crystals grown from conventional Y-bar and Z-plate seeds.
Parallel computing simulation of fluid flow in the unsaturated zone of Yucca Mountain, Nevada
International Nuclear Information System (INIS)
Zhang, Keni; Wu, Yu-Shu; Bodvarsson, G.S.
2001-01-01
This paper presents the application of parallel computing techniques to large-scale modeling of fluid flow in the unsaturated zone (UZ) at Yucca Mountain, Nevada. In this study, parallel computing techniques, as implemented into the TOUGH2 code, are applied in large-scale numerical simulations on a distributed-memory parallel computer. The modeling study has been conducted using an over-one-million-cell three-dimensional numerical model, which incorporates a wide variety of field data for the highly heterogeneous fractured formation at Yucca Mountain. The objective of this study is to analyze the impact of various surface infiltration scenarios (under current and possible future climates) on flow through the UZ system, using various hydrogeological conceptual models with refined grids. The results indicate that the one-million-cell models produce better resolution results and reveal some flow patterns that cannot be obtained using coarse-grid modeling models
Concurrent particle-in-cell plasma simulation on a multi-transputer parallel computer
International Nuclear Information System (INIS)
Khare, A.N.; Jethra, A.; Patel, Kartik
1992-01-01
This report describes the parallelization of a Particle-in-Cell (PIC) plasma simulation code on a multi-transputer parallel computer. The algorithm used in the parallelization of the PIC method is described. The decomposition schemes related to the distribution of the particles among the processors are discussed. The implementation of the algorithm on a transputer network connected as a torus is presented. The solutions of the problems related to global communication of data are presented in the form of a set of generalized communication functions. The performance of the program as a function of data size and the number of transputers show that the implementation is scalable and represents an effective way of achieving high performance at acceptable cost. (author). 11 refs., 4 figs., 2 tabs., appendices
Applications of parallel computer architectures to the real-time simulation of nuclear power systems
International Nuclear Information System (INIS)
Doster, J.M.; Sills, E.D.
1988-01-01
In this paper the authors report on efforts to utilize parallel computer architectures for the thermal-hydraulic simulation of nuclear power systems and current research efforts toward the development of advanced reactor operator aids and control systems based on this new technology. Many aspects of reactor thermal-hydraulic calculations are inherently parallel, and the computationally intensive portions of these calculations can be effectively implemented on modern computers. Timing studies indicate faster-than-real-time, high-fidelity physics models can be developed when the computational algorithms are designed to take advantage of the computer's architecture. These capabilities allow for the development of novel control systems and advanced reactor operator aids. Coupled with an integral real-time data acquisition system, evolving parallel computer architectures can provide operators and control room designers improved control and protection capabilities. Current research efforts are currently under way in this area
Rodrigues, Manuel J.; Fernandes, David E.; Silveirinha, Mário G.; Falcão, Gabriel
2018-01-01
This work introduces a parallel computing framework to characterize the propagation of electron waves in graphene-based nanostructures. The electron wave dynamics is modeled using both "microscopic" and effective medium formalisms and the numerical solution of the two-dimensional massless Dirac equation is determined using a Finite-Difference Time-Domain scheme. The propagation of electron waves in graphene superlattices with localized scattering centers is studied, and the role of the symmetry of the microscopic potential in the electron velocity is discussed. The computational methodologies target the parallel capabilities of heterogeneous multi-core CPU and multi-GPU environments and are built with the OpenCL parallel programming framework which provides a portable, vendor agnostic and high throughput-performance solution. The proposed heterogeneous multi-GPU implementation achieves speedup ratios up to 75x when compared to multi-thread and multi-core CPU execution, reducing simulation times from several hours to a couple of minutes.
GROMACS 4.5: A high-throughput and highly parallel open source molecular simulation toolkit
Energy Technology Data Exchange (ETDEWEB)
Pronk, Sander [Science for Life Lab., Stockholm (Sweden); KTH Royal Institute of Technology, Stockholm (Sweden); Pall, Szilard [Science for Life Lab., Stockholm (Sweden); KTH Royal Institute of Technology, Stockholm (Sweden); Schulz, Roland [Univ. of Tennessee, Knoxville, TN (United States); Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States); Larsson, Per [Univ. of Virginia, Charlottesville, VA (United States); Bjelkmar, Par [Science for Life Lab., Stockholm (Sweden); Stockholm Univ., Stockholm (Sweden); Apostolov, Rossen [Science for Life Lab., Stockholm (Sweden); KTH Royal Institute of Technology, Stockholm (Sweden); Shirts, Michael R. [Univ. of Virginia, Charlottesville, VA (United States); Smith, Jeremy C. [Univ. of Tennessee, Knoxville, TN (United States); Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States); Kasson, Peter M. [Univ. of Virginia, Charlottesville, VA (United States); van der Spoel, David [Science for Life Lab., Stockholm (Sweden); Uppsala Univ., Uppsala (Sweden); Hess, Berk [Science for Life Lab., Stockholm (Sweden); KTH Royal Institute of Technology, Stockholm (Sweden); Lindahl, Erik [Science for Life Lab., Stockholm (Sweden); KTH Royal Institute of Technology, Stockholm (Sweden); Stockholm Univ., Stockholm (Sweden)
2013-02-13
In this study, molecular simulation has historically been a low-throughput technique, but faster computers and increasing amounts of genomic and structural data are changing this by enabling large-scale automated simulation of, for instance, many conformers or mutants of biomolecules with or without a range of ligands. At the same time, advances in performance and scaling now make it possible to model complex biomolecular interaction and function in a manner directly testable by experiment. These applications share a need for fast and efficient software that can be deployed on massive scale in clusters, web servers, distributed computing or cloud resources. As a result, we present a range of new simulation algorithms and features developed during the past 4 years, leading up to the GROMACS 4.5 software package. The software now automatically handles wide classes of biomolecules, such as proteins, nucleic acids and lipids, and comes with all commonly used force fields for these molecules built-in. GROMACS supports several implicit solvent models, as well as new free-energy algorithms, and the software now uses multithreading for efficient parallelization even on low-end systems, including windows-based workstations. Together with hand-tuned assembly kernels and state-of-the-art parallelization, this provides extremely high performance and cost efficiency for high-throughput as well as massively parallel simulations.
A conceptual design of multidisciplinary-integrated C.F.D. simulation on parallel computers
International Nuclear Information System (INIS)
Onishi, Ryoichi; Ohta, Takashi; Kimura, Toshiya.
1996-11-01
A design of a parallel aeroelastic code for aircraft integrated simulations is conducted. The method for integrating aerodynamics and structural dynamics software on parallel computers is devised by using the Euler/Navier-Stokes equations coupled with wing-box finite element structures. A synthesis of modern aircraft requires the optimizations of aerodynamics, structures, controls, operabilities, or other design disciplines, and the R and D efforts to implement Multidisciplinary Design Optimization environments using high performance computers are made especially among the U.S. aerospace industries. This report describes a Multiple Program Multiple Data (MPMD) parallelization of aerodynamics and structural dynamics codes with a dynamic deformation grid. A three-dimensional computation of a flowfield with dynamic deformation caused by a structural deformation is performed, and a pressure data calculated is used for a computation of the structural deformation which is input again to a fluid dynamics code. This process is repeated exchanging the computed data of pressures and deformations between flowfield grids and structural elements. It enables to simulate the structure movements which take into account of the interaction of fluid and structure. The conceptual design for achieving the aforementioned various functions is reported. Also the future extensions to incorporate control systems, which enable to simulate a realistic aircraft configuration to be a major tool for Aircraft Integrated Simulation, are investigated. (author)
Recent progress in 3D EM/EM-PIC simulation with ARGUS and parallel ARGUS
International Nuclear Information System (INIS)
Mankofsky, A.; Petillo, J.; Krueger, W.; Mondelli, A.; McNamara, B.; Philp, R.
1994-01-01
ARGUS is an integrated, 3-D, volumetric simulation model for systems involving electric and magnetic fields and charged particles, including materials embedded in the simulation region. The code offers the capability to carry out time domain and frequency domain electromagnetic simulations of complex physical systems. ARGUS offers a boolean solid model structure input capability that can include essentially arbitrary structures on the computational domain, and a modular architecture that allows multiple physics packages to access the same data structure and to share common code utilities. Physics modules are in place to compute electrostatic and electromagnetic fields, the normal modes of RF structures, and self-consistent particle-in-cell (PIC) simulation in either a time dependent mode or a steady state mode. The PIC modules include multiple particle species, the Lorentz equations of motion, and algorithms for the creation of particles by emission from material surfaces, injection onto the grid, and ionization. In this paper, we present an updated overview of ARGUS, with particular emphasis given in recent algorithmic and computational advances. These include a completely rewritten frequency domain solver which efficiently treats lossy materials and periodic structures, a parallel version of ARGUS with support for both shared memory parallel vector (i.e. CRAY) machines and distributed memory massively parallel MIMD systems, and numerous new applications of the code
GROMACS 4.5: a high-throughput and highly parallel open source molecular simulation toolkit.
Pronk, Sander; Páll, Szilárd; Schulz, Roland; Larsson, Per; Bjelkmar, Pär; Apostolov, Rossen; Shirts, Michael R; Smith, Jeremy C; Kasson, Peter M; van der Spoel, David; Hess, Berk; Lindahl, Erik
2013-04-01
Molecular simulation has historically been a low-throughput technique, but faster computers and increasing amounts of genomic and structural data are changing this by enabling large-scale automated simulation of, for instance, many conformers or mutants of biomolecules with or without a range of ligands. At the same time, advances in performance and scaling now make it possible to model complex biomolecular interaction and function in a manner directly testable by experiment. These applications share a need for fast and efficient software that can be deployed on massive scale in clusters, web servers, distributed computing or cloud resources. Here, we present a range of new simulation algorithms and features developed during the past 4 years, leading up to the GROMACS 4.5 software package. The software now automatically handles wide classes of biomolecules, such as proteins, nucleic acids and lipids, and comes with all commonly used force fields for these molecules built-in. GROMACS supports several implicit solvent models, as well as new free-energy algorithms, and the software now uses multithreading for efficient parallelization even on low-end systems, including windows-based workstations. Together with hand-tuned assembly kernels and state-of-the-art parallelization, this provides extremely high performance and cost efficiency for high-throughput as well as massively parallel simulations. GROMACS is an open source and free software available from http://www.gromacs.org. Supplementary data are available at Bioinformatics online.
Fast Simulation of Large-Scale Floods Based on GPU Parallel Computing
Directory of Open Access Journals (Sweden)
Qiang Liu
2018-05-01
Full Text Available Computing speed is a significant issue of large-scale flood simulations for real-time response to disaster prevention and mitigation. Even today, most of the large-scale flood simulations are generally run on supercomputers due to the massive amounts of data and computations necessary. In this work, a two-dimensional shallow water model based on an unstructured Godunov-type finite volume scheme was proposed for flood simulation. To realize a fast simulation of large-scale floods on a personal computer, a Graphics Processing Unit (GPU-based, high-performance computing method using the OpenACC application was adopted to parallelize the shallow water model. An unstructured data management method was presented to control the data transportation between the GPU and CPU (Central Processing Unit with minimum overhead, and then both computation and data were offloaded from the CPU to the GPU, which exploited the computational capability of the GPU as much as possible. The parallel model was validated using various benchmarks and real-world case studies. The results demonstrate that speed-ups of up to one order of magnitude can be achieved in comparison with the serial model. The proposed parallel model provides a fast and reliable tool with which to quickly assess flood hazards in large-scale areas and, thus, has a bright application prospect for dynamic inundation risk identification and disaster assessment.
An efficient parallel stochastic simulation method for analysis of nonviral gene delivery systems
Kuwahara, Hiroyuki
2011-01-01
Gene therapy has a great potential to become an effective treatment for a wide variety of diseases. One of the main challenges to make gene therapy practical in clinical settings is the development of efficient and safe mechanisms to deliver foreign DNA molecules into the nucleus of target cells. Several computational and experimental studies have shown that the design process of synthetic gene transfer vectors can be greatly enhanced by computational modeling and simulation. This paper proposes a novel, effective parallelization of the stochastic simulation algorithm (SSA) for pharmacokinetic models that characterize the rate-limiting, multi-step processes of intracellular gene delivery. While efficient parallelizations of the SSA are still an open problem in a general setting, the proposed parallel simulation method is able to substantially accelerate the next reaction selection scheme and the reaction update scheme in the SSA by exploiting and decomposing the structures of stochastic gene delivery models. This, thus, makes computationally intensive analysis such as parameter optimizations and gene dosage control for specific cell types, gene vectors, and transgene expression stability substantially more practical than that could otherwise be with the standard SSA. Here, we translated the nonviral gene delivery model based on mass-action kinetics by Varga et al. [Molecular Therapy, 4(5), 2001] into a more realistic model that captures intracellular fluctuations based on stochastic chemical kinetics, and as a case study we applied our parallel simulation to this stochastic model. Our results show that our simulation method is able to increase the efficiency of statistical analysis by at least 50% in various settings. © 2011 ACM.
Long-time atomistic simulations with the Parallel Replica Dynamics method
Perez, Danny
Molecular Dynamics (MD) -- the numerical integration of atomistic equations of motion -- is a workhorse of computational materials science. Indeed, MD can in principle be used to obtain any thermodynamic or kinetic quantity, without introducing any approximation or assumptions beyond the adequacy of the interaction potential. It is therefore an extremely powerful and flexible tool to study materials with atomistic spatio-temporal resolution. These enviable qualities however come at a steep computational price, hence limiting the system sizes and simulation times that can be achieved in practice. While the size limitation can be efficiently addressed with massively parallel implementations of MD based on spatial decomposition strategies, allowing for the simulation of trillions of atoms, the same approach usually cannot extend the timescales much beyond microseconds. In this article, we discuss an alternative parallel-in-time approach, the Parallel Replica Dynamics (ParRep) method, that aims at addressing the timescale limitation of MD for systems that evolve through rare state-to-state transitions. We review the formal underpinnings of the method and demonstrate that it can provide arbitrarily accurate results for any definition of the states. When an adequate definition of the states is available, ParRep can simulate trajectories with a parallel speedup approaching the number of replicas used. We demonstrate the usefulness of ParRep by presenting different examples of materials simulations where access to long timescales was essential to access the physical regime of interest and discuss practical considerations that must be addressed to carry out these simulations. Work supported by the United States Department of Energy (U.S. DOE), Office of Science, Office of Basic Energy Sciences, Materials Sciences and Engineering Division.
Visual Data-Analytics of Large-Scale Parallel Discrete-Event Simulations
Energy Technology Data Exchange (ETDEWEB)
Ross, Caitlin; Carothers, Christopher D.; Mubarak, Misbah; Carns, Philip; Ross, Robert; Li, Jianping Kelvin; Ma, Kwan-Liu
2016-11-13
Parallel discrete-event simulation (PDES) is an important tool in the codesign of extreme-scale systems because PDES provides a cost-effective way to evaluate designs of highperformance computing systems. Optimistic synchronization algorithms for PDES, such as Time Warp, allow events to be processed without global synchronization among the processing elements. A rollback mechanism is provided when events are processed out of timestamp order. Although optimistic synchronization protocols enable the scalability of large-scale PDES, the performance of the simulations must be tuned to reduce the number of rollbacks and provide an improved simulation runtime. To enable efficient large-scale optimistic simulations, one has to gain insight into the factors that affect the rollback behavior and simulation performance. We developed a tool for ROSS model developers that gives them detailed metrics on the performance of their large-scale optimistic simulations at varying levels of simulation granularity. Model developers can use this information for parameter tuning of optimistic simulations in order to achieve better runtime and fewer rollbacks. In this work, we instrument the ROSS optimistic PDES framework to gather detailed statistics about the simulation engine. We have also developed an interactive visualization interface that uses the data collected by the ROSS instrumentation to understand the underlying behavior of the simulation engine. The interface connects real time to virtual time in the simulation and provides the ability to view simulation data at different granularities. We demonstrate the usefulness of our framework by performing a visual analysis of the dragonfly network topology model provided by the CODES simulation framework built on top of ROSS. The instrumentation needs to minimize overhead in order to accurately collect data about the simulation performance. To ensure that the instrumentation does not introduce unnecessary overhead, we perform a
Parallel-vector algorithms for particle simulations on shared-memory multiprocessors
International Nuclear Information System (INIS)
Nishiura, Daisuke; Sakaguchi, Hide
2011-01-01
Over the last few decades, the computational demands of massive particle-based simulations for both scientific and industrial purposes have been continuously increasing. Hence, considerable efforts are being made to develop parallel computing techniques on various platforms. In such simulations, particles freely move within a given space, and so on a distributed-memory system, load balancing, i.e., assigning an equal number of particles to each processor, is not guaranteed. However, shared-memory systems achieve better load balancing for particle models, but suffer from the intrinsic drawback of memory access competition, particularly during (1) paring of contact candidates from among neighboring particles and (2) force summation for each particle. Here, novel algorithms are proposed to overcome these two problems. For the first problem, the key is a pre-conditioning process during which particle labels are sorted by a cell label in the domain to which the particles belong. Then, a list of contact candidates is constructed by pairing the sorted particle labels. For the latter problem, a table comprising the list indexes of the contact candidate pairs is created and used to sum the contact forces acting on each particle for all contacts according to Newton's third law. With just these methods, memory access competition is avoided without additional redundant procedures. The parallel efficiency and compatibility of these two algorithms were evaluated in discrete element method (DEM) simulations on four types of shared-memory parallel computers: a multicore multiprocessor computer, scalar supercomputer, vector supercomputer, and graphics processing unit. The computational efficiency of a DEM code was found to be drastically improved with our algorithms on all but the scalar supercomputer. Thus, the developed parallel algorithms are useful on shared-memory parallel computers with sufficient memory bandwidth.
3-D electromagnetic plasma particle simulations on the Intel Delta parallel computer
International Nuclear Information System (INIS)
Wang, J.; Liewer, P.C.
1994-01-01
A three-dimensional electromagnetic PIC code has been developed on the 512 node Intel Touchstone Delta MIMD parallel computer. This code is based on the General Concurrent PIC algorithm which uses a domain decomposition to divide the computation among the processors. The 3D simulation domain can be partitioned into 1-, 2-, or 3-dimensional sub-domains. Particles must be exchanged between processors as they move among the subdomains. The Intel Delta allows one to use this code for very-large-scale simulations (i.e. over 10 8 particles and 10 6 grid cells). The parallel efficiency of this code is measured, and the overall code performance on the Delta is compared with that on Cray supercomputers. It is shown that their code runs with a high parallel efficiency of ≥ 95% for large size problems. The particle push time achieved is 115 nsecs/particle/time step for 162 million particles on 512 nodes. Comparing with the performance on a single processor Cray C90, this represents a factor of 58 speedup. The code uses a finite-difference leap frog method for field solve which is significantly more efficient than fast fourier transforms on parallel computers. The performance of this code on the 128 node Cray T3D will also be discussed
A multi-transputer system for parallel Monte Carlo simulations of extensive air showers
International Nuclear Information System (INIS)
Gils, H.J.; Heck, D.; Oehlschlaeger, J.; Schatz, G.; Thouw, T.
1989-01-01
A multiprocessor computer system has been brought into operation at the Kernforschungszentrum Karlsruhe. It is dedicated to Monte Carlo simulations of extensive air showers induced by ultra-high energy cosmic rays. The architecture consists of two independently working VMEbus systems each with a 68020 microprocessor as host computer and twelve T800 transputers for parallel processing. The two systems are linked via Ethernet for data exchange. The T800 transputers are equipped with 4 Mbyte RAM each, sufficient to run rather large codes. The host computers are operated under UNIX 5.3. On the transputers compilers for PARALLEL FORTRAN, C, and PASCAL are available. The simple modular architecture of this parallel computer reflects the single purpose for which it is intended. The hardware of the multiprocessor computer is described as well as the way how the user software is handled and distributed to the 24 working processors. The performance of the parallel computer is demonstrated by well-known benchmarks and by realistic Monte Carlo simulations of air showers. Comparisons with other types of microprocessors and with large universal computers are made. It is demonstrated that a cost reduction by more than a factor of 20 is achieved by this system as compared to universal computer. (orig.)
A Parallel, Finite-Volume Algorithm for Large-Eddy Simulation of Turbulent Flows
Bui, Trong T.
1999-01-01
A parallel, finite-volume algorithm has been developed for large-eddy simulation (LES) of compressible turbulent flows. This algorithm includes piecewise linear least-square reconstruction, trilinear finite-element interpolation, Roe flux-difference splitting, and second-order MacCormack time marching. Parallel implementation is done using the message-passing programming model. In this paper, the numerical algorithm is described. To validate the numerical method for turbulence simulation, LES of fully developed turbulent flow in a square duct is performed for a Reynolds number of 320 based on the average friction velocity and the hydraulic diameter of the duct. Direct numerical simulation (DNS) results are available for this test case, and the accuracy of this algorithm for turbulence simulations can be ascertained by comparing the LES solutions with the DNS results. The effects of grid resolution, upwind numerical dissipation, and subgrid-scale dissipation on the accuracy of the LES are examined. Comparison with DNS results shows that the standard Roe flux-difference splitting dissipation adversely affects the accuracy of the turbulence simulation. For accurate turbulence simulations, only 3-5 percent of the standard Roe flux-difference splitting dissipation is needed.
Parallel simulation of wormhole propagation with the Darcy-Brinkman-Forchheimer framework
Wu, Yuanqing
2015-07-09
The acid treatment of carbonate reservoirs is a widely practiced oil and gas well stimulation technique. The injected acid dissolves the material near the wellbore and creates flow channels that establish a good connectivity between the reservoir and the well. Such flow channels are called wormholes. Different from the traditional simulation technology relying on Darcy framework, the new Darcy-Brinkman-Forchheimer (DBF) framework is introduced to simulate the wormhole forming procedure. The DBF framework considers both large and small porosity conditions and should output better simulation results than the Darcy framework. To process the huge quantity of cells in the simulation grid and shorten the long simulation time of the traditional serial code, a parallel code with FORTRAN 90 and MPI was developed. The experimenting field approach to set coefficients in the model equations was also introduced. Moreover, a procedure to fill in the coefficient matrix in the linear system in the solver was described. After this, 2D dissolution experiments were carried out. In the experiments, different configurations of wormholes and a series of properties simulated by both frameworks were compared. We conclude that the numerical results of the DBF framework are more like wormholes and more stable than the Darcy framework, which is a demonstration of the advantages of the DBF framework. Finally, the scalability of the parallel code was evaluated, and we conclude that superlinear scalability can be achieved. © 2015 Elsevier Ltd.
Large scale simulations of lattice QCD thermodynamics on Columbia Parallel Supercomputers
International Nuclear Information System (INIS)
Ohta, Shigemi
1989-01-01
The Columbia Parallel Supercomputer project aims at the construction of a parallel processing, multi-gigaflop computer optimized for numerical simulations of lattice QCD. The project has three stages; 16-node, 1/4GF machine completed in April 1985, 64-node, 1GF machine completed in August 1987, and 256-node, 16GF machine now under construction. The machines all share a common architecture; a two dimensional torus formed from a rectangular array of N 1 x N 2 independent and identical processors. A processor is capable of operating in a multi-instruction multi-data mode, except for periods of synchronous interprocessor communication with its four nearest neighbors. Here the thermodynamics simulations on the two working machines are reported. (orig./HSI)
International Nuclear Information System (INIS)
Paćko, P; Bielak, T; Staszewski, W J; Uhl, T; Spencer, A B; Worden, K
2012-01-01
This paper demonstrates new parallel computation technology and an implementation for Lamb wave propagation modelling in complex structures. A graphical processing unit (GPU) and computer unified device architecture (CUDA), available in low-cost graphical cards in standard PCs, are used for Lamb wave propagation numerical simulations. The local interaction simulation approach (LISA) wave propagation algorithm has been implemented as an example. Other algorithms suitable for parallel discretization can also be used in practice. The method is illustrated using examples related to damage detection. The results demonstrate good accuracy and effective computational performance of very large models. The wave propagation modelling presented in the paper can be used in many practical applications of science and engineering. (paper)
Energy Technology Data Exchange (ETDEWEB)
Hepburn, I.; De Schutter, E., E-mail: erik@oist.jp [Computational Neuroscience Unit, Okinawa Institute of Science and Technology Graduate University, Onna, Okinawa 904 0495 (Japan); Theoretical Neurobiology & Neuroengineering, University of Antwerp, Antwerp 2610 (Belgium); Chen, W. [Computational Neuroscience Unit, Okinawa Institute of Science and Technology Graduate University, Onna, Okinawa 904 0495 (Japan)
2016-08-07
Spatial stochastic molecular simulations in biology are limited by the intense computation required to track molecules in space either in a discrete time or discrete space framework, which has led to the development of parallel methods that can take advantage of the power of modern supercomputers in recent years. We systematically test suggested components of stochastic reaction-diffusion operator splitting in the literature and discuss their effects on accuracy. We introduce an operator splitting implementation for irregular meshes that enhances accuracy with minimal performance cost. We test a range of models in small-scale MPI simulations from simple diffusion models to realistic biological models and find that multi-dimensional geometry partitioning is an important consideration for optimum performance. We demonstrate performance gains of 1-3 orders of magnitude in the parallel implementation, with peak performance strongly dependent on model specification.
Yu, Leiming; Nina-Paravecino, Fanny; Kaeli, David; Fang, Qianqian
2018-01-01
We present a highly scalable Monte Carlo (MC) three-dimensional photon transport simulation platform designed for heterogeneous computing systems. Through the development of a massively parallel MC algorithm using the Open Computing Language framework, this research extends our existing graphics processing unit (GPU)-accelerated MC technique to a highly scalable vendor-independent heterogeneous computing environment, achieving significantly improved performance and software portability. A number of parallel computing techniques are investigated to achieve portable performance over a wide range of computing hardware. Furthermore, multiple thread-level and device-level load-balancing strategies are developed to obtain efficient simulations using multiple central processing units and GPUs. (2018) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE).
Object-Oriented Parallel Particle-in-Cell Code for Beam Dynamics Simulation in Linear Accelerators
International Nuclear Information System (INIS)
Qiang, J.; Ryne, R.D.; Habib, S.; Decky, V.
1999-01-01
In this paper, we present an object-oriented three-dimensional parallel particle-in-cell code for beam dynamics simulation in linear accelerators. A two-dimensional parallel domain decomposition approach is employed within a message passing programming paradigm along with a dynamic load balancing. Implementing object-oriented software design provides the code with better maintainability, reusability, and extensibility compared with conventional structure based code. This also helps to encapsulate the details of communications syntax. Performance tests on SGI/Cray T3E-900 and SGI Origin 2000 machines show good scalability of the object-oriented code. Some important features of this code also include employing symplectic integration with linear maps of external focusing elements and using z as the independent variable, typical in accelerators. A successful application was done to simulate beam transport through three superconducting sections in the APT linac design
Yang, Sheng-Chun; Lu, Zhong-Yuan; Qian, Hu-Jun; Wang, Yong-Lei; Han, Jie-Ping
2017-11-01
In this work, we upgraded the electrostatic interaction method of CU-ENUF (Yang, et al., 2016) which first applied CUNFFT (nonequispaced Fourier transforms based on CUDA) to the reciprocal-space electrostatic computation and made the computation of electrostatic interaction done thoroughly in GPU. The upgraded edition of CU-ENUF runs concurrently in a hybrid parallel way that enables the computation parallelizing on multiple computer nodes firstly, then further on the installed GPU in each computer. By this parallel strategy, the size of simulation system will be never restricted to the throughput of a single CPU or GPU. The most critical technical problem is how to parallelize a CUNFFT in the parallel strategy, which is conquered effectively by deep-seated research of basic principles and some algorithm skills. Furthermore, the upgraded method is capable of computing electrostatic interactions for both the atomistic molecular dynamics (MD) and the dissipative particle dynamics (DPD). Finally, the benchmarks conducted for validation and performance indicate that the upgraded method is able to not only present a good precision when setting suitable parameters, but also give an efficient way to compute electrostatic interactions for huge simulation systems. Program Files doi:http://dx.doi.org/10.17632/zncf24fhpv.1 Licensing provisions: GNU General Public License 3 (GPL) Programming language: C, C++, and CUDA C Supplementary material: The program is designed for effective electrostatic interactions of large-scale simulation systems, which runs on particular computers equipped with NVIDIA GPUs. It has been tested on (a) single computer node with Intel(R) Core(TM) i7-3770@ 3.40 GHz (CPU) and GTX 980 Ti (GPU), and (b) MPI parallel computer nodes with the same configurations. Nature of problem: For molecular dynamics simulation, the electrostatic interaction is the most time-consuming computation because of its long-range feature and slow convergence in simulation space
An efficient numerical scheme for the simulation of parallel-plate active magnetic regenerators
DEFF Research Database (Denmark)
Torregrosa-Jaime, Bárbara; Corberán, José M.; Payá, Jorge
2015-01-01
A one-dimensional model of a parallel-plate active magnetic regenerator (AMR) is presented in this work. The model is based on an efficient numerical scheme which has been developed after analysing the heat transfer mechanisms in the regenerator bed. The new finite difference scheme optimally com...... to the fully implicit scheme, the proposed scheme achieves more accurate results, prevents numerical errors and requires less computational effort. In AMR simulations the new scheme can reduce the computational time by 88%....
Steinman, Jeffrey S. (Inventor)
1998-01-01
The present invention is embodied in a method of performing object-oriented simulation and a system having inter-connected processor nodes operating in parallel to simulate mutual interactions of a set of discrete simulation objects distributed among the nodes as a sequence of discrete events changing state variables of respective simulation objects so as to generate new event-defining messages addressed to respective ones of the nodes. The object-oriented simulation is performed at each one of the nodes by assigning passive self-contained simulation objects to each one of the nodes, responding to messages received at one node by generating corresponding active event objects having user-defined inherent capabilities and individual time stamps and corresponding to respective events affecting one of the passive self-contained simulation objects of the one node, restricting the respective passive self-contained simulation objects to only providing and receiving information from die respective active event objects, requesting information and changing variables within a passive self-contained simulation object by the active event object, and producing corresponding messages specifying events resulting therefrom by the active event objects.
Directory of Open Access Journals (Sweden)
Zhang Bin Loo
2017-01-01
Full Text Available Current network simulators abstract out wireless propagation models due to the high computation requirements for realistic modeling. As such, there is still a large gap between the results obtained from simulators and real world scenario. In this paper, we present a framework for improved path loss simulation built on top of an existing network simulation software, NS-3. Different from the conventional disk model, the proposed simulation also considers the diffraction loss computed using Epstein and Peterson’s model through the use of actual terrain elevation data to give an accurate estimate of path loss between a transmitter and a receiver. The drawback of high computation requirements is relaxed by offloading the computationally intensive components onto an inexpensive off-the-shelf parallel coprocessor, which is a NVIDIA GPU. Experiments are performed using actual terrain elevation data provided from United States Geological Survey. As compared to the conventional CPU architecture, the experimental result shows that a speedup of 20x to 42x is achieved by exploiting the parallel processing of GPU to compute the path loss between two nodes using terrain elevation data. The result shows that the path losses between two nodes are greatly affected by the terrain profile between these two nodes. Besides this, the result also suggests that the common strategy to place the transmitter in the highest position may not always work.
A parallel code named NEPTUNE for 3D fully electromagnetic and pic simulations
International Nuclear Information System (INIS)
Dong Ye; Yang Wenyuan; Chen Jun; Zhao Qiang; Xia Fang; Ma Yan; Xiao Li; Sun Huifang; Chen Hong; Zhou Haijing; Mao Zeyao; Dong Zhiwei
2010-01-01
A parallel code named NEPTUNE for 3D fully electromagnetic and particle-in-cell (PIC) simulations is introduced, which could run on the Linux system with hundreds to thousand CPUs. NEPTUNE is suitable to simulate entire 3D HPM devices; many HPM devices are simulated and designed by using it. In NEPTUNE code, the electromagnetic fields are updated by using the finite-difference in time domain (FDTD) method of solving Maxwell equations and the particles are moved by using Buneman-Boris advance method of solving relativistic Newton-Lorentz equation. Electromagnetic fields and particles are coupled by using liner weighing interpolation PIC method, and the electric filed components are corrected by using Boris method of solve Poisson equation in order to ensure charge-conservation. NEPTUNE code could construct many complicated geometric structures, such as arbitrary axial-symmetric structures, plane transforming structures, slow-wave-structures, coupling holes, foils, and so on. The boundary conditions used in NEPTUNE code are introduced in brief, including perfectly electric conductor boundary, external wave boundary, and particle boundary. Finally, some typical HPM devices are simulated and test by using NEPTUNE code, including MILO, RBWO, VCO, and RKA. The simulation results are with correct and credible physical images, and the parallel efficiencies are also given. (authors)
Directory of Open Access Journals (Sweden)
Xiaoliang Yin
2015-03-01
Full Text Available Complex electromechanical system is usually composed of multiple components from different domains, including mechanical, electronic, hydraulic, control, and so on. Modeling and simulation for electromechanical system on a unified platform is one of the research hotspots in system engineering at present. It is also the development trend of the design for complex electromechanical system. The unified modeling techniques and tools based on Modelica language provide a satisfactory solution. To meet with the requirements of collaborative modeling, simulation, and parallel computing for complex electromechanical systems based on Modelica, a general web-based modeling and simulation prototype environment, namely, WebMWorks, is designed and implemented. Based on the rich Internet application technologies, an interactive graphic user interface for modeling and post-processing on web browser was implemented; with the collaborative design module, the environment supports top-down, concurrent modeling and team cooperation; additionally, service-oriented architecture–based architecture was applied to supply compiling and solving services which run on cloud-like servers, so the environment can manage and dispatch large-scale simulation tasks in parallel on multiple computing servers simultaneously. An engineering application about pure electric vehicle is tested on WebMWorks. The results of simulation and parametric experiment demonstrate that the tested web-based environment can effectively shorten the design cycle of the complex electromechanical system.
Schnek: A C++ library for the development of parallel simulation codes on regular grids
Schmitz, Holger
2018-05-01
A large number of algorithms across the field of computational physics are formulated on grids with a regular topology. We present Schnek, a library that enables fast development of parallel simulations on regular grids. Schnek contains a number of easy-to-use modules that greatly reduce the amount of administrative code for large-scale simulation codes. The library provides an interface for reading simulation setup files with a hierarchical structure. The structure of the setup file is translated into a hierarchy of simulation modules that the developer can specify. The reader parses and evaluates mathematical expressions and initialises variables or grid data. This enables developers to write modular and flexible simulation codes with minimal effort. Regular grids of arbitrary dimension are defined as well as mechanisms for defining physical domain sizes, grid staggering, and ghost cells on these grids. Ghost cells can be exchanged between neighbouring processes using MPI with a simple interface. The grid data can easily be written into HDF5 files using serial or parallel I/O.
International Nuclear Information System (INIS)
Pic, Marc Michel
1995-01-01
Parallel programming covers task-parallelism and data-parallelism. Many problems need both parallelisms. Multi-SIMD computers allow hierarchical approach of these parallelisms. The T++ language, based on C++, is dedicated to exploit Multi-SIMD computers using a programming paradigm which is an extension of array-programming to tasks managing. Our language introduced array of independent tasks to achieve separately (MIMD), on subsets of processors of identical behaviour (SIMD), in order to translate the hierarchical inclusion of data-parallelism in task-parallelism. To manipulate in a symmetrical way tasks and data we propose meta-operations which have the same behaviour on tasks arrays and on data arrays. We explain how to implement this language on our parallel computer SYMPHONIE in order to profit by the locally-shared memory, by the hardware virtualization, and by the multiplicity of communications networks. We analyse simultaneously a typical application of such architecture. Finite elements scheme for Fluid mechanic needs powerful parallel computers and requires large floating points abilities. Lattice gases is an alternative to such simulations. Boolean lattice bases are simple, stable, modular, need to floating point computation, but include numerical noise. Boltzmann lattice gases present large precision of computation, but needs floating points and are only locally stable. We propose a new scheme, called multi-bit, who keeps the advantages of each boolean model to which it is applied, with large numerical precision and reduced noise. Experiments on viscosity, physical behaviour, noise reduction and spurious invariants are shown and implementation techniques for parallel Multi-SIMD computers detailed. (author) [fr
Real-time simulation of MHD/steam power plants by digital parallel processors
International Nuclear Information System (INIS)
Johnson, R.M.; Rudberg, D.A.
1981-01-01
Attention is given to a large FORTRAN coded program which simulates the dynamic response of the MHD/steam plant on either a SEL 32/55 or VAX 11/780 computer. The code realizes a detailed first-principle model of the plant. Quite recently, in addition to the VAX 11/780, an AD-10 has been installed for usage as a real-time simulation facility. The parallel processor AD-10 is capable of simulating the MHD/steam plant at several times real-time rates. This is desirable in order to develop rapidly a large data base of varied plant operating conditions. The combined-cycle MHD/steam plant model is discussed, taking into account a number of disadvantages. The disadvantages can be overcome with the aid of an array processor used as an adjunct to the unit processor. The conversion of some computations for real-time simulation is considered
Directory of Open Access Journals (Sweden)
Mondry Adrian
2004-08-01
Full Text Available Abstract Background Many arrhythmias are triggered by abnormal electrical activity at the ionic channel and cell level, and then evolve spatio-temporally within the heart. To understand arrhythmias better and to diagnose them more precisely by their ECG waveforms, a whole-heart model is required to explore the association between the massively parallel activities at the channel/cell level and the integrative electrophysiological phenomena at organ level. Methods We have developed a method to build large-scale electrophysiological models by using extended cellular automata, and to run such models on a cluster of shared memory machines. We describe here the method, including the extension of a language-based cellular automaton to implement quantitative computing, the building of a whole-heart model with Visible Human Project data, the parallelization of the model on a cluster of shared memory computers with OpenMP and MPI hybrid programming, and a simulation algorithm that links cellular activity with the ECG. Results We demonstrate that electrical activities at channel, cell, and organ levels can be traced and captured conveniently in our extended cellular automaton system. Examples of some ECG waveforms simulated with a 2-D slice are given to support the ECG simulation algorithm. A performance evaluation of the 3-D model on a four-node cluster is also given. Conclusions Quantitative multicellular modeling with extended cellular automata is a highly efficient and widely applicable method to weave experimental data at different levels into computational models. This process can be used to investigate complex and collective biological activities that can be described neither by their governing differentiation equations nor by discrete parallel computation. Transparent cluster computing is a convenient and effective method to make time-consuming simulation feasible. Arrhythmias, as a typical case, can be effectively simulated with the methods
A parallel process growth model of avoidant personality disorder symptoms and personality traits.
Wright, Aidan G C; Pincus, Aaron L; Lenzenweger, Mark F
2013-07-01
Avoidant personality disorder (AVPD), like other personality disorders, has historically been construed as a highly stable disorder. However, results from a number of longitudinal studies have found that the symptoms of AVPD demonstrate marked change over time. Little is known about which other psychological systems are related to this change. Although cross-sectional research suggests a strong relationship between AVPD and personality traits, no work has examined the relationship of their change trajectories. The current study sought to establish the longitudinal relationship between AVPD and basic personality traits using parallel process growth curve modeling. Parallel process growth curve modeling was applied to the trajectories of AVPD and basic personality traits from the Longitudinal Study of Personality Disorders (Lenzenweger, M. F., 2006, The longitudinal study of personality disorders: History, design considerations, and initial findings. Journal of Personality Disorders, 20, 645-670. doi:10.1521/pedi.2006.20.6.645), a naturalistic, prospective, multiwave, longitudinal study of personality disorder, temperament, and normal personality. The focus of these analyses is on the relationship between the rates of change in both AVPD symptoms and basic personality traits. AVPD symptom trajectories demonstrated significant negative relationships with the trajectories of interpersonal dominance and affiliation, and a significant positive relationship to rates of change in neuroticism. These results provide some of the first compelling evidence that trajectories of change in PD symptoms and personality traits are linked. These results have important implications for the ways in which temporal stability is conceptualized in AVPD specifically, and PD in general.
A Parallel Process Growth Model of Avoidant Personality Disorder Symptoms and Personality Traits
Wright, Aidan G. C.; Pincus, Aaron L.; Lenzenweger, Mark F.
2012-01-01
Background Avoidant personality disorder (AVPD), like other personality disorders, has historically been construed as a highly stable disorder. However, results from a number of longitudinal studies have found that the symptoms of AVPD demonstrate marked change over time. Little is known about which other psychological systems are related to this change. Although cross-sectional research suggests a strong relationship between AVPD and personality traits, no work has examined the relationship of their change trajectories. The current study sought to establish the longitudinal relationship between AVPD and basic personality traits using parallel process growth curve modeling. Methods Parallel process growth curve modeling was applied to the trajectories of AVPD and basic personality traits from the Longitudinal Study of Personality Disorders (Lenzenweger, 2006), a naturalistic, prospective, multiwave, longitudinal study of personality disorder, temperament, and normal personality. The focus of these analyses is on the relationship between the rates of change in both AVPD symptoms and basic personality traits. Results AVPD symptom trajectories demonstrated significant negative relationships with the trajectories of interpersonal dominance and affiliation, and a significant positive relationship to rates of change in neuroticism. Conclusions These results provide some of the first compelling evidence that trajectories of change in PD symptoms and personality traits are linked. These results have important implications for the ways in which temporal stability is conceptualized in AVPD specifically, and PD in general. PMID:22506627
Xyce Parallel Electronic Simulator - Users' Guide Version 2.1.
Energy Technology Data Exchange (ETDEWEB)
Hutchinson, Scott A; Hoekstra, Robert J.; Russo, Thomas V.; Rankin, Eric; Pawlowski, Roger P.; Fixel, Deborah A; Schiek, Richard; Bogdan, Carolyn W.; Shirley, David N.; Campbell, Phillip M.; Keiter, Eric R.
2005-06-01
This manual describes the use of theXyceParallel Electronic Simulator.Xycehasbeen designed as a SPICE-compatible, high-performance analog circuit simulator, andhas been written to support the simulation needs of the Sandia National Laboratorieselectrical designers. This development has focused on improving capability over thecurrent state-of-the-art in the following areas:%04Capability to solve extremely large circuit problems by supporting large-scale par-allel computing platforms (up to thousands of processors). Note that this includessupport for most popular parallel and serial computers.%04Improved performance for all numerical kernels (e.g., time integrator, nonlinearand linear solvers) through state-of-the-art algorithms and novel techniques.%04Device models which are specifically tailored to meet Sandia's needs, includingmany radiation-aware devices.3 XyceTMUsers' Guide%04Object-oriented code design and implementation using modern coding practicesthat ensure that theXyceParallel Electronic Simulator will be maintainable andextensible far into the future.Xyceis a parallel code in the most general sense of the phrase - a message passingparallel implementation - which allows it to run efficiently on the widest possible numberof computing platforms. These include serial, shared-memory and distributed-memoryparallel as well as heterogeneous platforms. Careful attention has been paid to thespecific nature of circuit-simulation problems to ensure that optimal parallel efficiencyis achieved as the number of processors grows.The development ofXyceprovides a platform for computational research and de-velopment aimed specifically at the needs of the Laboratory. WithXyce, Sandia hasan %22in-house%22 capability with which both new electrical (e.g., device model develop-ment) and algorithmic (e.g., faster time-integration methods, parallel solver algorithms)research and development can be performed. As a result,Xyceis a unique electricalsimulation capability
Massively parallel Monte Carlo for many-particle simulations on GPUs
Energy Technology Data Exchange (ETDEWEB)
Anderson, Joshua A.; Jankowski, Eric [Department of Chemical Engineering, University of Michigan, Ann Arbor, MI 48109 (United States); Grubb, Thomas L. [Department of Materials Science and Engineering, University of Michigan, Ann Arbor, MI 48109 (United States); Engel, Michael [Department of Chemical Engineering, University of Michigan, Ann Arbor, MI 48109 (United States); Glotzer, Sharon C., E-mail: sglotzer@umich.edu [Department of Chemical Engineering, University of Michigan, Ann Arbor, MI 48109 (United States); Department of Materials Science and Engineering, University of Michigan, Ann Arbor, MI 48109 (United States)
2013-12-01
Current trends in parallel processors call for the design of efficient massively parallel algorithms for scientific computing. Parallel algorithms for Monte Carlo simulations of thermodynamic ensembles of particles have received little attention because of the inherent serial nature of the statistical sampling. In this paper, we present a massively parallel method that obeys detailed balance and implement it for a system of hard disks on the GPU. We reproduce results of serial high-precision Monte Carlo runs to verify the method. This is a good test case because the hard disk equation of state over the range where the liquid transforms into the solid is particularly sensitive to small deviations away from the balance conditions. On a Tesla K20, our GPU implementation executes over one billion trial moves per second, which is 148 times faster than on a single Intel Xeon E5540 CPU core, enables 27 times better performance per dollar, and cuts energy usage by a factor of 13. With this improved performance we are able to calculate the equation of state for systems of up to one million hard disks. These large system sizes are required in order to probe the nature of the melting transition, which has been debated for the last forty years. In this paper we present the details of our computational method, and discuss the thermodynamics of hard disks separately in a companion paper.
Moon, Hongsik
What is the impact of multicore and associated advanced technologies on computational software for science? Most researchers and students have multicore laptops or desktops for their research and they need computing power to run computational software packages. Computing power was initially derived from Central Processing Unit (CPU) clock speed. That changed when increases in clock speed became constrained by power requirements. Chip manufacturers turned to multicore CPU architectures and associated technological advancements to create the CPUs for the future. Most software applications benefited by the increased computing power the same way that increases in clock speed helped applications run faster. However, for Computational ElectroMagnetics (CEM) software developers, this change was not an obvious benefit - it appeared to be a detriment. Developers were challenged to find a way to correctly utilize the advancements in hardware so that their codes could benefit. The solution was parallelization and this dissertation details the investigation to address these challenges. Prior to multicore CPUs, advanced computer technologies were compared with the performance using benchmark software and the metric was FLoting-point Operations Per Seconds (FLOPS) which indicates system performance for scientific applications that make heavy use of floating-point calculations. Is FLOPS an effective metric for parallelized CEM simulation tools on new multicore system? Parallel CEM software needs to be benchmarked not only by FLOPS but also by the performance of other parameters related to type and utilization of the hardware, such as CPU, Random Access Memory (RAM), hard disk, network, etc. The codes need to be optimized for more than just FLOPs and new parameters must be included in benchmarking. In this dissertation, the parallel CEM software named High Order Basis Based Integral Equation Solver (HOBBIES) is introduced. This code was developed to address the needs of the
Use of Parallel Micro-Platform for the Simulation the Space Exploration
Velasco Herrera, Victor Manuel; Velasco Herrera, Graciela; Rosano, Felipe Lara; Rodriguez Lozano, Salvador; Lucero Roldan Serrato, Karen
The purpose of this work is to create a parallel micro-platform, that simulates the virtual movements of a space exploration in 3D. One of the innovations presented in this design consists of the application of a lever mechanism for the transmission of the movement. The development of such a robot is a challenging task very different of the industrial manipulators due to a totally different target system of requirements. This work presents the study and simulation, aided by computer, of the movement of this parallel manipulator. The development of this model has been developed using the platform of computer aided design Unigraphics, in which it was done the geometric modeled of each one of the components and end assembly (CAD), the generation of files for the computer aided manufacture (CAM) of each one of the pieces and the kinematics simulation of the system evaluating different driving schemes. We used the toolbox (MATLAB) of aerospace and create an adaptive control module to simulate the system.
H5Part A Portable High Performance Parallel Data Interface for Particle Simulations
Adelmann, Andreas; Shalf, John M; Siegerist, Cristina
2005-01-01
Largest parallel particle simulations, in six dimensional phase space generate wast amont of data. It is also desirable to share data and data analysis tools such as ParViT (Particle Visualization Toolkit) among other groups who are working on particle-based accelerator simulations. We define a very simple file schema built on top of HDF5 (Hierarchical Data Format version 5) as well as an API that simplifies the reading/writing of the data to the HDF5 file format. HDF5 offers a self-describing machine-independent binary file format that supports scalable parallel I/O performance for MPI codes on a variety of supercomputing systems and works equally well on laptop computers. The API is available for C, C++, and Fortran codes. The file format will enable disparate research groups with very different simulation implementations to share data transparently and share data analysis tools. For instance, the common file format will enable groups that depend on completely different simulation implementations to share c...
STOCHSIMGPU: parallel stochastic simulation for the Systems Biology Toolbox 2 for MATLAB.
Klingbeil, Guido; Erban, Radek; Giles, Mike; Maini, Philip K
2011-04-15
The importance of stochasticity in biological systems is becoming increasingly recognized and the computational cost of biologically realistic stochastic simulations urgently requires development of efficient software. We present a new software tool STOCHSIMGPU that exploits graphics processing units (GPUs) for parallel stochastic simulations of biological/chemical reaction systems and show that significant gains in efficiency can be made. It is integrated into MATLAB and works with the Systems Biology Toolbox 2 (SBTOOLBOX2) for MATLAB. The GPU-based parallel implementation of the Gillespie stochastic simulation algorithm (SSA), the logarithmic direct method (LDM) and the next reaction method (NRM) is approximately 85 times faster than the sequential implementation of the NRM on a central processing unit (CPU). Using our software does not require any changes to the user's models, since it acts as a direct replacement of the stochastic simulation software of the SBTOOLBOX2. The software is open source under the GPL v3 and available at http://www.maths.ox.ac.uk/cmb/STOCHSIMGPU. The web site also contains supplementary information. klingbeil@maths.ox.ac.uk Supplementary data are available at Bioinformatics online.
Nishiura, Daisuke; Furuichi, Mikito; Sakaguchi, Hide
2015-09-01
The computational performance of a smoothed particle hydrodynamics (SPH) simulation is investigated for three types of current shared-memory parallel computer devices: many integrated core (MIC) processors, graphics processing units (GPUs), and multi-core CPUs. We are especially interested in efficient shared-memory allocation methods for each chipset, because the efficient data access patterns differ between compute unified device architecture (CUDA) programming for GPUs and OpenMP programming for MIC processors and multi-core CPUs. We first introduce several parallel implementation techniques for the SPH code, and then examine these on our target computer architectures to determine the most effective algorithms for each processor unit. In addition, we evaluate the effective computing performance and power efficiency of the SPH simulation on each architecture, as these are critical metrics for overall performance in a multi-device environment. In our benchmark test, the GPU is found to produce the best arithmetic performance as a standalone device unit, and gives the most efficient power consumption. The multi-core CPU obtains the most effective computing performance. The computational speed of the MIC processor on Xeon Phi approached that of two Xeon CPUs. This indicates that using MICs is an attractive choice for existing SPH codes on multi-core CPUs parallelized by OpenMP, as it gains computational acceleration without the need for significant changes to the source code.
Shen, Yanfeng; Cesnik, Carlos E. S.
2016-04-01
This paper presents a parallelized modeling technique for the efficient simulation of nonlinear ultrasonics introduced by the wave interaction with fatigue cracks. The elastodynamic wave equations with contact effects are formulated using an explicit Local Interaction Simulation Approach (LISA). The LISA formulation is extended to capture the contact-impact phenomena during the wave damage interaction based on the penalty method. A Coulomb friction model is integrated into the computation procedure to capture the stick-slip contact shear motion. The LISA procedure is coded using the Compute Unified Device Architecture (CUDA), which enables the highly parallelized supercomputing on powerful graphic cards. Both the explicit contact formulation and the parallel feature facilitates LISA's superb computational efficiency over the conventional finite element method (FEM). The theoretical formulations based on the penalty method is introduced and a guideline for the proper choice of the contact stiffness is given. The convergence behavior of the solution under various contact stiffness values is examined. A numerical benchmark problem is used to investigate the new LISA formulation and results are compared with a conventional contact finite element solution. Various nonlinear ultrasonic phenomena are successfully captured using this contact LISA formulation, including the generation of nonlinear higher harmonic responses. Nonlinear mode conversion of guided waves at fatigue cracks is also studied.
International Nuclear Information System (INIS)
Adelmann, Andreas; Gsell, Achim; Oswald, Benedikt; Schietinger, Thomas; Bethel, Wes; Shalf, John; Siegerist, Cristina; Stockinger, Kurt
2007-01-01
Significant problems facing all experimental and computational sciences arise from growing data size and complexity. Common to all these problems is the need to perform efficient data I/O on diverse computer architectures. In our scientific application, the largest parallel particle simulations generate vast quantities of six-dimensional data. Such a simulation run produces data for an aggregate data size up to several TB per run. Motivated by the need to address data I/O and access challenges, we have implemented H5Part, an open source data I/O API that simplifies the use of the Hierarchical Data Format v5 library (HDF5). HDF5 is an industry standard for high performance, cross-platform data storage and retrieval that runs on all contemporary architectures from large parallel supercomputers to laptops. H5Part, which is oriented to the needs of the particle physics and cosmology communities, provides support for parallel storage and retrieval of particles, structured and in the future unstructured meshes. In this paper, we describe recent work focusing on I/O support for particles and structured meshes and provide data showing performance on modern supercomputer architectures like the IBM POWER 5
International Nuclear Information System (INIS)
Trinitis, C; Schulz, M
2006-01-01
In today's world, the use of parallel programming and architectures is essential for simulating practical problems in engineering and related disciplines. Remarkable progress in CPU architecture, system scalability, and interconnect technology continues to provide new opportunities, as well as new challenges for both system architects and software developers. These trends are paralleled by progress in parallel algorithms, simulation techniques, and software integration from multiple disciplines. ParSim brings together researchers from both application disciplines and computer science and aims at fostering closer cooperation between these fields. Since its successful introduction in 2002, ParSim has established itself as an integral part of the EuroPVM/MPI conference series. In contrast to traditional conferences, emphasis is put on the presentation of up-to-date results with a short turn-around time. This offers a unique opportunity to present new aspects in this dynamic field and discuss them with a wide, interdisciplinary audience. The EuroPVM/MPI conference series, as one of the prime events in parallel computation, serves as an ideal surrounding for ParSim. This combination enables the participants to present and discuss their work within the scope of both the session and the host conference. This year, eleven papers from authors in nine countries were submitted to ParSim, and we selected five of them. They cover a wide range of different application fields including gas flow simulations, thermo-mechanical processes in nuclear waste storage, and cosmological simulations. At the same time, the selected contributions also address the computer science side of their codes and discuss different parallelization strategies, programming models and languages, as well as the use nonblocking collective operations in MPI. We are confident that this provides an attractive program and that ParSim will be an informal setting for lively discussions and for fostering new
International Nuclear Information System (INIS)
Pang, Kar Mun; Ng, Hoon Kiat; Gan, Suyin
2012-01-01
Highlights: ► A performance benchmarking exercise is conducted for diesel combustion simulations. ► The reduced chemical mechanism shows its advantages over base and skeletal models. ► High efficiency and great reduction of CPU runtime are achieved through 4-node solver. ► Increasing ISAT memory from 0.1 to 2 GB reduces the CPU runtime by almost 35%. ► Combustion and soot processes are predicted well with minimal computational cost. - Abstract: In the present study, in-cylinder diesel combustion simulation was performed with parallel processing on an Intel Xeon Quad-Core platform to allow both fluid dynamics and chemical kinetics of the surrogate diesel fuel model to be solved simultaneously on multiple processors. Here, Cartesian Z-Coordinate was selected as the most appropriate partitioning algorithm since it computationally bisects the domain such that the dynamic load associated with fuel particle tracking was evenly distributed during parallel computations. Other variables examined included number of compute nodes, chemistry sizes and in situ adaptive tabulation (ISAT) parameters. Based on the performance benchmarking test conducted, parallel configuration of 4-compute node was found to reduce the computational runtime most efficiently whereby a parallel efficiency of up to 75.4% was achieved. The simulation results also indicated that accuracy level was insensitive to the number of partitions or the partitioning algorithms. The effect of reducing the number of species on computational runtime was observed to be more significant than reducing the number of reactions. Besides, the study showed that an increase in the ISAT maximum storage of up to 2 GB reduced the computational runtime by 50%. Also, the ISAT error tolerance of 10 −3 was chosen to strike a balance between results accuracy and computational runtime. The optimised parameters in parallel processing and ISAT, as well as the use of the in-house reduced chemistry model allowed accurate
De Novo Ultrascale Atomistic Simulations On High-End Parallel Supercomputers
Energy Technology Data Exchange (ETDEWEB)
Nakano, A; Kalia, R K; Nomura, K; Sharma, A; Vashishta, P; Shimojo, F; van Duin, A; Goddard, III, W A; Biswas, R; Srivastava, D; Yang, L H
2006-09-04
We present a de novo hierarchical simulation framework for first-principles based predictive simulations of materials and their validation on high-end parallel supercomputers and geographically distributed clusters. In this framework, high-end chemically reactive and non-reactive molecular dynamics (MD) simulations explore a wide solution space to discover microscopic mechanisms that govern macroscopic material properties, into which highly accurate quantum mechanical (QM) simulations are embedded to validate the discovered mechanisms and quantify the uncertainty of the solution. The framework includes an embedded divide-and-conquer (EDC) algorithmic framework for the design of linear-scaling simulation algorithms with minimal bandwidth complexity and tight error control. The EDC framework also enables adaptive hierarchical simulation with automated model transitioning assisted by graph-based event tracking. A tunable hierarchical cellular decomposition parallelization framework then maps the O(N) EDC algorithms onto Petaflops computers, while achieving performance tunability through a hierarchy of parameterized cell data/computation structures, as well as its implementation using hybrid Grid remote procedure call + message passing + threads programming. High-end computing platforms such as IBM BlueGene/L, SGI Altix 3000 and the NSF TeraGrid provide an excellent test grounds for the framework. On these platforms, we have achieved unprecedented scales of quantum-mechanically accurate and well validated, chemically reactive atomistic simulations--1.06 billion-atom fast reactive force-field MD and 11.8 million-atom (1.04 trillion grid points) quantum-mechanical MD in the framework of the EDC density functional theory on adaptive multigrids--in addition to 134 billion-atom non-reactive space-time multiresolution MD, with the parallel efficiency as high as 0.998 on 65,536 dual-processor BlueGene/L nodes. We have also achieved an automated execution of hierarchical QM
OPEN SOURCE APPROACH TO URBAN GROWTH SIMULATION
Directory of Open Access Journals (Sweden)
A. Petrasova
2016-06-01
Full Text Available Spatial patterns of land use change due to urbanization and its impact on the landscape are the subject of ongoing research. Urban growth scenario simulation is a powerful tool for exploring these impacts and empowering planners to make informed decisions. We present FUTURES (FUTure Urban – Regional Environment Simulation – a patch-based, stochastic, multi-level land change modeling framework as a case showing how what was once a closed and inaccessible model benefited from integration with open source GIS.We will describe our motivation for releasing this project as open source and the advantages of integrating it with GRASS GIS, a free, libre and open source GIS and research platform for the geospatial domain. GRASS GIS provides efficient libraries for FUTURES model development as well as standard GIS tools and graphical user interface for model users. Releasing FUTURES as a GRASS GIS add-on simplifies the distribution of FUTURES across all main operating systems and ensures the maintainability of our project in the future. We will describe FUTURES integration into GRASS GIS and demonstrate its usage on a case study in Asheville, North Carolina. The developed dataset and tutorial for this case study enable researchers to experiment with the model, explore its potential or even modify the model for their applications.
Directory of Open Access Journals (Sweden)
Mark James Abraham
2015-09-01
Full Text Available GROMACS is one of the most widely used open-source and free software codes in chemistry, used primarily for dynamical simulations of biomolecules. It provides a rich set of calculation types, preparation and analysis tools. Several advanced techniques for free-energy calculations are supported. In version 5, it reaches new performance heights, through several new and enhanced parallelization algorithms. These work on every level; SIMD registers inside cores, multithreading, heterogeneous CPU–GPU acceleration, state-of-the-art 3D domain decomposition, and ensemble-level parallelization through built-in replica exchange and the separate Copernicus framework. The latest best-in-class compressed trajectory storage format is supported.
Xyce parallel electronic simulator design : mathematical formulation, version 2.0.
Energy Technology Data Exchange (ETDEWEB)
Hoekstra, Robert John; Waters, Lon J.; Hutchinson, Scott Alan; Keiter, Eric Richard; Russo, Thomas V.
2004-06-01
This document is intended to contain a detailed description of the mathematical formulation of Xyce, a massively parallel SPICE-style circuit simulator developed at Sandia National Laboratories. The target audience of this document are people in the role of 'service provider'. An example of such a person would be a linear solver expert who is spending a small fraction of his time developing solver algorithms for Xyce. Such a person probably is not an expert in circuit simulation, and would benefit from an description of the equations solved by Xyce. In this document, modified nodal analysis (MNA) is described in detail, with a number of examples. Issues that are unique to circuit simulation, such as voltage limiting, are also described in detail.
Xyce Parallel Electronic Simulator Users' Guide Version 6.6.
Energy Technology Data Exchange (ETDEWEB)
Keiter, Eric R. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Aadithya, Karthik Venkatraman [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Mei, Ting [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Russo, Thomas V. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Schiek, Richard [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Sholander, Peter E. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Thornquist, Heidi K. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Verley, Jason [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)
2016-11-01
This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been de- signed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: Capability to solve extremely large circuit problems by supporting large-scale parallel com- puting platforms (up to thousands of processors). This includes support for most popular parallel and serial computers. A differential-algebraic-equation (DAE) formulation, which better isolates the device model package from solver algorithms. This allows one to develop new types of analysis without requiring the implementation of analysis-specific device models. Device models that are specifically tailored to meet Sandia's needs, including some radiation- aware devices (for Sandia users only). Object-oriented code design and implementation using modern coding practices. Xyce is a parallel code in the most general sense of the phrase -- a message passing parallel implementation -- which allows it to run efficiently a wide range of computing platforms. These include serial, shared-memory and distributed-memory parallel platforms. Attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows. The information herein is subject to change without notice. Copyright c 2002-2016 Sandia Corporation. All rights reserved. Acknowledgements The BSIM Group at the University of California, Berkeley developed the BSIM3, BSIM4, BSIM6, BSIM-CMG and BSIM-SOI models. The BSIM3 is Copyright c 1999, Regents of the University of California. The BSIM4 is Copyright c 2006, Regents of the University of California. The BSIM6 is Copyright c 2015, Regents of the University of California. The BSIM-CMG is Copyright c
International Nuclear Information System (INIS)
Yeh, M.; Kim, J.; Khan, F.S.
1995-01-01
We present a parallel decomposition of the tight-binding fictitious Lagrangian algorithm for the Intel iPSC/860 and the Intel Paragon parallel computers. We show that it is possible to perform long simulations, of the order of 10 000 time steps, on semiconducting clusters consisting of as many as 512 atoms, on a time scale of the order of 20 h or less. We have made a very careful timing analysis of all parts of our code, and have identified the bottlenecks. We have also derived formulas which can predict the timing of our code, based on the number of processors, message passing bandwidth, floating point performance of each node, and the set up time for message passing, appropriate to the machine being used. The time of the simulation scales as the square of the number of particles, if the number of processors is made to scale linearly with the number of particles. We show that for a system as large as 512 atoms, the main bottleneck of the computation is the orthogonalization of the wave functions, which consumes about 90% of the total time of the simulation
An efficient parallel simulation of unsteady blood flows in patient-specific pulmonary artery.
Kong, Fande; Kheyfets, Vitaly; Finol, Ender; Cai, Xiao-Chuan
2018-04-01
Simulation of blood flows in the pulmonary artery provides some insight into certain diseases by examining the relationship between some continuum metrics, eg, the wall shear stress acting on the vascular endothelium, which responds to flow-induced mechanical forces by releasing vasodilators/constrictors. V. Kheyfets, in his previous work, studies numerically a patient-specific pulmonary circulation to show that decreasing wall shear stress is correlated with increasing pulmonary vascular impedance. In this paper, we develop a scalable parallel algorithm based on domain decomposition methods to investigate an unsteady model with patient-specific pulsatile waveforms as the inlet boundary condition. The unsteady model offers tremendously more information about the dynamic behavior of the flow field, but computationally speaking, the simulation is a lot more expensive since a problem which is similar to the steady-state problem has to be solved many times, and therefore, the traditional sequential approach is not suitable anymore. We show computationally that simulations using the proposed parallel approach with up to 10 000 processor cores can be obtained with much reduced compute time. This makes the technology potentially usable for the routine study of the dynamic behavior of blood flows in the pulmonary artery, in particular, the changes of the blood flows and the wall shear stress in the spatial and temporal dimensions. Copyright © 2017 John Wiley & Sons, Ltd.
A general parallelization strategy for random path based geostatistical simulation methods
Mariethoz, Grégoire
2010-07-01
The size of simulation grids used for numerical models has increased by many orders of magnitude in the past years, and this trend is likely to continue. Efficient pixel-based geostatistical simulation algorithms have been developed, but for very large grids and complex spatial models, the computational burden remains heavy. As cluster computers become widely available, using parallel strategies is a natural step for increasing the usable grid size and the complexity of the models. These strategies must profit from of the possibilities offered by machines with a large number of processors. On such machines, the bottleneck is often the communication time between processors. We present a strategy distributing grid nodes among all available processors while minimizing communication and latency times. It consists in centralizing the simulation on a master processor that calls other slave processors as if they were functions simulating one node every time. The key is to decouple the sending and the receiving operations to avoid synchronization. Centralization allows having a conflict management system ensuring that nodes being simulated simultaneously do not interfere in terms of neighborhood. The strategy is computationally efficient and is versatile enough to be applicable to all random path based simulation methods.
3-D Hybrid Simulation of Quasi-Parallel Bow Shock and Its Effects on the Magnetosphere
International Nuclear Information System (INIS)
Lin, Y.; Wang, X.Y.
2005-01-01
A three-dimensional (3-D) global-scale hybrid simulation is carried out for the structure of the quasi-parallel bow shock, in particular the foreshock waves and pressure pulses. The wave evolution and interaction with the dayside magnetosphere are discussed. It is shown that diamagnetic cavities are generated in the turbulent foreshock due to the ion beam plasma interaction, and these compressional pulses lead to strong surface perturbations at the magnetopause and Alfven waves/field line resonance in the magnetosphere
Forced-convection boiling tests performed in parallel simulated LMR fuel assemblies
International Nuclear Information System (INIS)
Rose, S.D.; Carbajo, J.J.; Levin, A.E.; Lloyd, D.B.; Montgomery, B.H.; Wantland, J.L.
1985-01-01
Forced-convection tests have been carried out using parallel simulated Liquid Metal Reactor fuel assemblies in an engineering-scale sodium loop, the Thermal-Hydraulic Out-of-Reactor Safety facility. The tests, performed under single- and two-phase conditions, have shown that for low forced-convection flow there is significant flow augmentation by thermal convection, an important phenomenon under degraded shutdown heat removal conditions in an LMR. The power and flows required for boiling and dryout to occur are much higher than decay heat levels. The experimental evidence supports analytical results that heat removal from an LMR is possible with a degraded shutdown heat removal system
Analysis of IDR(s Family of Solvers for Reservoir Simulations on Different Parallel Architectures
Directory of Open Access Journals (Sweden)
Seignole Vincent
2016-09-01
Full Text Available The present contribution consists in providing a detailed analysis of several realizations of the IDR(s family of solvers, under different facets: robustness, performance and implementation on different parallel environments in regards of sequential IDR(s resolution implementation tested through several industrial geologically and structurally coherent 3D-field case reservoir models. This work is the result of continuous efforts towards time-response improvement of Storengy’s reservoir three-dimensional simulator named Multi, dedicated to gas-storage applications.
Using a Linux Cluster for Parallel Simulations of an Active Magnetic Regenerator Refrigerator
DEFF Research Database (Denmark)
Petersen, T.F.; Pryds, N.; Smith, A.
2006-01-01
This paper describes the implementation of a Comsol Multiphysics model on a Linux computer Cluster. The Magnetic Refrigerator (MR) is a special type of refrigerator with potential to reduce the energy consumption of household refrigeration by a factor of two or more. To conduct numerical analysis....... The coupled set of equations and the transient convergence towards the final steady state means that the model has an excessive solution time. To make parametric studies practical, the developed model was implemented on a Cluster to allow parallel simulations, which has decreased the solution time...
International Nuclear Information System (INIS)
Spencer, VN
2001-01-01
An investigation has been conducted regarding the ability of clustered personal computers to improve the performance of executing software simulations for solving engineering problems. The power and utility of personal computers continues to grow exponentially through advances in computing capabilities such as newer microprocessors, advances in microchip technologies, electronic packaging, and cost effective gigabyte-size hard drive capacity. Many engineering problems require significant computing power. Therefore, the computation has to be done by high-performance computer systems that cost millions of dollars and need gigabytes of memory to complete the task. Alternately, it is feasible to provide adequate computing in the form of clustered personal computers. This method cuts the cost and size by linking (clustering) personal computers together across a network. Clusters also have the advantage that they can be used as stand-alone computers when they are not operating as a parallel computer. Parallel computing software to exploit clusters is available for computer operating systems like Unix, Windows NT, or Linux. This project concentrates on the use of Windows NT, and the Parallel Virtual Machine (PVM) system to solve an engineering dynamics problem in Fortran
Directory of Open Access Journals (Sweden)
Helio Yochihiro Fuchigami
2014-08-01
Full Text Available This article addresses the problem of minimizing makespan on two parallel flow shops with proportional processing and setup times. The setup times are separated and sequence-independent. The parallel flow shop scheduling problem is a specific case of well-known hybrid flow shop, characterized by a multistage production system with more than one machine working in parallel at each stage. This situation is very common in various kinds of companies like chemical, electronics, automotive, pharmaceutical and food industries. This work aimed to propose six Simulated Annealing algorithms, their perturbation schemes and an algorithm for initial sequence generation. This study can be classified as “applied research” regarding the nature, “exploratory” about the objectives and “experimental” as to procedures, besides the “quantitative” approach. The proposed algorithms were effective regarding the solution and computationally efficient. Results of Analysis of Variance (ANOVA revealed no significant difference between the schemes in terms of makespan. It’s suggested the use of PS4 scheme, which moves a subsequence of jobs, for providing the best percentage of success. It was also found that there is a significant difference between the results of the algorithms for each value of the proportionality factor of the processing and setup times of flow shops.
Directory of Open Access Journals (Sweden)
Lorenzo L. Pesce
2013-01-01
Full Text Available Our limited understanding of the relationship between the behavior of individual neurons and large neuronal networks is an important limitation in current epilepsy research and may be one of the main causes of our inadequate ability to treat it. Addressing this problem directly via experiments is impossibly complex; thus, we have been developing and studying medium-large-scale simulations of detailed neuronal networks to guide us. Flexibility in the connection schemas and a complete description of the cortical tissue seem necessary for this purpose. In this paper we examine some of the basic issues encountered in these multiscale simulations. We have determined the detailed behavior of two such simulators on parallel computer systems. The observed memory and computation-time scaling behavior for a distributed memory implementation were very good over the range studied, both in terms of network sizes (2,000 to 400,000 neurons and processor pool sizes (1 to 256 processors. Our simulations required between a few megabytes and about 150 gigabytes of RAM and lasted between a few minutes and about a week, well within the capability of most multinode clusters. Therefore, simulations of epileptic seizures on networks with millions of cells should be feasible on current supercomputers.
Co-simulation of Six DOF Wire Driven Parallel Mechanism Based on ADAMS and Matlab
Directory of Open Access Journals (Sweden)
Tang Aofei
2015-01-01
Full Text Available The dynamic model of the 6 DOF Wire Driven Parallel Mechanism (WDPM system is introduced. Based on MATLAB system, the simulation of the inverse dynamic model is achieved. According to the simulation result, the mechanical model for the WDPM system is reasonable. Using ADAMS system, the dynamic model of the virtual prototype is verified by the simulation analysis. The combined control model based on ADAMS/Simulink is derived. The WDPM control system is designed with MATLAB/Simulink. The torque control method is selected for the outer ring and the PD control method for the inner ring. Combined with the ADAMS control model and control law design, the interactive simulation analysis of the WDPM system is completed. According to the simulation results of the spatial circle tracking and line tracking at the end of the moving platform, the tracking error can be reduced by the designed control algorithm. The minimum tracking error is 0.2 mm to 0.3 mm. Therefore, the theoretical foundation for designing hardware systems of the WDPM control system is established.
Pesce, Lorenzo L; Lee, Hyong C; Hereld, Mark; Visser, Sid; Stevens, Rick L; Wildeman, Albert; van Drongelen, Wim
2013-01-01
Our limited understanding of the relationship between the behavior of individual neurons and large neuronal networks is an important limitation in current epilepsy research and may be one of the main causes of our inadequate ability to treat it. Addressing this problem directly via experiments is impossibly complex; thus, we have been developing and studying medium-large-scale simulations of detailed neuronal networks to guide us. Flexibility in the connection schemas and a complete description of the cortical tissue seem necessary for this purpose. In this paper we examine some of the basic issues encountered in these multiscale simulations. We have determined the detailed behavior of two such simulators on parallel computer systems. The observed memory and computation-time scaling behavior for a distributed memory implementation were very good over the range studied, both in terms of network sizes (2,000 to 400,000 neurons) and processor pool sizes (1 to 256 processors). Our simulations required between a few megabytes and about 150 gigabytes of RAM and lasted between a few minutes and about a week, well within the capability of most multinode clusters. Therefore, simulations of epileptic seizures on networks with millions of cells should be feasible on current supercomputers.
A PARALLEL MONTE CARLO CODE FOR SIMULATING COLLISIONAL N-BODY SYSTEMS
Energy Technology Data Exchange (ETDEWEB)
Pattabiraman, Bharath; Umbreit, Stefan; Liao, Wei-keng; Choudhary, Alok; Kalogera, Vassiliki; Memik, Gokhan; Rasio, Frederic A., E-mail: bharath@u.northwestern.edu [Center for Interdisciplinary Exploration and Research in Astrophysics, Northwestern University, Evanston, IL (United States)
2013-02-15
We present a new parallel code for computing the dynamical evolution of collisional N-body systems with up to N {approx} 10{sup 7} particles. Our code is based on the Henon Monte Carlo method for solving the Fokker-Planck equation, and makes assumptions of spherical symmetry and dynamical equilibrium. The principal algorithmic developments involve optimizing data structures and the introduction of a parallel random number generation scheme as well as a parallel sorting algorithm required to find nearest neighbors for interactions and to compute the gravitational potential. The new algorithms we introduce along with our choice of decomposition scheme minimize communication costs and ensure optimal distribution of data and workload among the processing units. Our implementation uses the Message Passing Interface library for communication, which makes it portable to many different supercomputing architectures. We validate the code by calculating the evolution of clusters with initial Plummer distribution functions up to core collapse with the number of stars, N, spanning three orders of magnitude from 10{sup 5} to 10{sup 7}. We find that our results are in good agreement with self-similar core-collapse solutions, and the core-collapse times generally agree with expectations from the literature. Also, we observe good total energy conservation, within {approx}< 0.04% throughout all simulations. We analyze the performance of the code, and demonstrate near-linear scaling of the runtime with the number of processors up to 64 processors for N = 10{sup 5}, 128 for N = 10{sup 6} and 256 for N = 10{sup 7}. The runtime reaches saturation with the addition of processors beyond these limits, which is a characteristic of the parallel sorting algorithm. The resulting maximum speedups we achieve are approximately 60 Multiplication-Sign , 100 Multiplication-Sign , and 220 Multiplication-Sign , respectively.
Parallel 3-D numerical simulation of dielectric barrier discharge plasma actuators
Houba, Tomas
Dielectric barrier discharge plasma actuators have shown promise in a range of applications including flow control, sterilization and ozone generation. Developing numerical models of plasma actuators is of great importance, because a high-fidelity parallel numerical model allows new design configurations to be tested rapidly. Additionally, it provides a better understanding of the plasma actuator physics which is useful for further innovation. The physics of plasma actuators is studied numerically. A loosely coupled approach is utilized for the coupling of the plasma to the neutral fluid. The state of the art in numerical plasma modeling is advanced by the development of a parallel, three-dimensional, first-principles model with detailed air chemistry. The model incorporates 7 charged species and 18 reactions, along with a solution of the electron energy equation. To the author's knowledge, a parallel three-dimensional model of a gas discharge with a detailed air chemistry model and the solution of electron energy is unique. Three representative geometries are studied using the gas discharge model. The discharge of gas between two parallel electrodes is used to validate the air chemistry model developed for the gas discharge code. The gas discharge model is then applied to the discharge produced by placing a dc powered wire and grounded plate electrodes in a channel. Finally, a three-dimensional simulation of gas discharge produced by electrodes placed inside a riblet is carried out. The body force calculated with the gas discharge model is loosely coupled with a fluid model to predict the induced flow inside the riblet.
A Pipelined and Parallel Architecture for Quantum Monte Carlo Simulations on FPGAs
Directory of Open Access Journals (Sweden)
Akila Gothandaraman
2010-01-01
Full Text Available Recent advances in Field-Programmable Gate Array (FPGA technology make reconfigurable computing using FPGAs an attractive platform for accelerating scientific applications. We develop a deeply pipelined and parallel architecture for Quantum Monte Carlo simulations using FPGAs. Quantum Monte Carlo simulations enable us to obtain the structural and energetic properties of atomic clusters. We experiment with different pipeline structures for each component of the design and develop a deeply pipelined architecture that provides the best performance in terms of achievable clock rate, while at the same time has a modest use of the FPGA resources. We discuss the details of the pipelined and generic architecture that is used to obtain the potential energy and wave function of a cluster of atoms.
Directory of Open Access Journals (Sweden)
Ravil’ Kudermetov
2018-02-01
Full Text Available Nowadays multi-core processors are installed almost in each modern workstation, but the question of these computational resources effective utilization is still a topical one. In this paper the four-point block one-step integration method is considered, the parallel algorithm of this method is proposed and the Java programmatic implementation of this algorithm is discussed. The effectiveness of the proposed algorithm is demonstrated by way of spacecraft attitude motion simulation. The results of this work can be used for practical simulation of dynamic systems that are described by ordinary differential equations. The results are also applicable to the development and debugging of computer programs that integrate the dynamic and kinematic equations of the angular motion of a rigid body.
An Optimized Parallel FDTD Topology for Challenging Electromagnetic Simulations on Supercomputers
Directory of Open Access Journals (Sweden)
Shugang Jiang
2015-01-01
Full Text Available It may not be a challenge to run a Finite-Difference Time-Domain (FDTD code for electromagnetic simulations on a supercomputer with more than 10 thousands of CPU cores; however, to make FDTD code work with the highest efficiency is a challenge. In this paper, the performance of parallel FDTD is optimized through MPI (message passing interface virtual topology, based on which a communication model is established. The general rules of optimal topology are presented according to the model. The performance of the method is tested and analyzed on three high performance computing platforms with different architectures in China. Simulations including an airplane with a 700-wavelength wingspan, and a complex microstrip antenna array with nearly 2000 elements are performed very efficiently using a maximum of 10240 CPU cores.
Parallel Beam-Beam Simulation Incorporating Multiple Bunches and Multiple Interaction Regions
Jones, F W; Pieloni, T
2007-01-01
The simulation code COMBI has been developed to enable the study of coherent beam-beam effects in the full collision scenario of the LHC, with multiple bunches interacting at multiple crossing points over many turns. The program structure and input are conceived in a general way which allows arbitrary numbers and placements of bunches and interaction points (IP's), together with procedural options for head-on and parasitic collisions (in the strong-strong sense), beam transport, statistics gathering, harmonic analysis, and periodic output of simulation data. The scale of this problem, once we go beyond the simplest case of a pair of bunches interacting once per turn, quickly escalates into the parallel computing arena, and herein we will describe the construction of an MPI-based version of COMBI able to utilize arbitrary numbers of processors to support efficient calculation of multi-bunch multi-IP interactions and transport. Implementing the parallel version did not require extensive disruption of the basic ...
3D multiphysics modeling of superconducting cavities with a massively parallel simulation suite
Directory of Open Access Journals (Sweden)
Oleksiy Kononenko
2017-10-01
Full Text Available Radiofrequency cavities based on superconducting technology are widely used in particle accelerators for various applications. The cavities usually have high quality factors and hence narrow bandwidths, so the field stability is sensitive to detuning from the Lorentz force and external loads, including vibrations and helium pressure variations. If not properly controlled, the detuning can result in a serious performance degradation of a superconducting accelerator, so an understanding of the underlying detuning mechanisms can be very helpful. Recent advances in the simulation suite ace3p have enabled realistic multiphysics characterization of such complex accelerator systems on supercomputers. In this paper, we present the new capabilities in ace3p for large-scale 3D multiphysics modeling of superconducting cavities, in particular, a parallel eigensolver for determining mechanical resonances, a parallel harmonic response solver to calculate the response of a cavity to external vibrations, and a numerical procedure to decompose mechanical loads, such as from the Lorentz force or piezoactuators, into the corresponding mechanical modes. These capabilities have been used to do an extensive rf-mechanical analysis of dressed TESLA-type superconducting cavities. The simulation results and their implications for the operational stability of the Linac Coherent Light Source-II are discussed.
Directory of Open Access Journals (Sweden)
Jiang Lei
2015-01-01
Full Text Available Direct numerical simulation (DNS of a round jet in crossflow based on lattice Boltzmann method (LBM is carried out on multi-GPU cluster. Data parallel SIMT (single instruction multiple thread characteristic of GPU matches the parallelism of LBM well, which leads to the high efficiency of GPU on the LBM solver. With present GPU settings (6 Nvidia Tesla K20M, the present DNS simulation can be completed in several hours. A grid system of 1.5 × 108 is adopted and largest jet Reynolds number reaches 3000. The jet-to-free-stream velocity ratio is set as 3.3. The jet is orthogonal to the mainstream flow direction. The validated code shows good agreement with experiments. Vortical structures of CRVP, shear-layer vortices and horseshoe vortices, are presented and analyzed based on velocity fields and vorticity distributions. Turbulent statistical quantities of Reynolds stress are also displayed. Coherent structures are revealed in a very fine resolution based on the second invariant of the velocity gradients.
Parallel Simulation of HGMS of Weakly Magnetic Nanoparticles in Irrotational Flow of Inviscid Fluid
Directory of Open Access Journals (Sweden)
Kanok Hournkumnuard
2014-01-01
Full Text Available The process of high gradient magnetic separation (HGMS using a microferromagnetic wire for capturing weakly magnetic nanoparticles in the irrotational flow of inviscid fluid is simulated by using parallel algorithm developed based on openMP. The two-dimensional problem of particle transport under the influences of magnetic force and fluid flow is considered in an annular domain surrounding the wire with inner radius equal to that of the wire and outer radius equal to various multiples of wire radius. The differential equations governing particle transport are solved numerically as an initial and boundary values problem by using the finite-difference method. Concentration distribution of the particles around the wire is investigated and compared with some previously reported results and shows the good agreement between them. The results show the feasibility of accumulating weakly magnetic nanoparticles in specific regions on the wire surface which is useful for applications in biomedical and environmental works. The speedup of parallel simulation ranges from 1.8 to 21 depending on the number of threads and the domain problem size as well as the number of iterations. With the nature of computing in the application and current multicore technology, it is observed that 4–8 threads are sufficient to obtain the optimized speedup.
International Nuclear Information System (INIS)
Zhang, B.; Li, G.; Wang, W.; Shangguan, D.; Deng, L.
2015-01-01
This paper introduces the Strategy of multilevel hybrid parallelism of JCOGIN Infrastructure on Monte Carlo Particle Transport for the large-scale full-core pin-by-pin simulations. The particle parallelism, domain decomposition parallelism and MPI/OpenMP parallelism are designed and implemented. By the testing, JMCT presents the parallel scalability of JCOGIN, which reaches the parallel efficiency 80% on 120,000 cores for the pin-by-pin computation of the BEAVRS benchmark. (author)
Parallel 3D Simulation of Seismic Wave Propagation in the Structure of Nobi Plain, Central Japan
Kotani, A.; Furumura, T.; Hirahara, K.
2003-12-01
We performed large-scale parallel simulations of the seismic wave propagation to understand the complex wave behavior in the 3D basin structure of the Nobi Plain, which is one of the high population cities in central Japan. In this area, many large earthquakes occurred in the past, such as the 1891 Nobi earthquake (M8.0), the 1944 Tonankai earthquake (M7.9) and the 1945 Mikawa earthquake (M6.8). In order to mitigate the potential disasters for future earthquakes, 3D subsurface structure of Nobi Plain has recently been investigated by local governments. We referred to this model together with bouguer anomaly data to construct a detail 3D basin structure model for Nobi plain, and conducted computer simulations of ground motions. We first evaluated the ground motions for two small earthquakes (M4~5); one occurred just beneath the basin edge at west, and the other occurred at south. The ground motions from these earthquakes were well recorded by the strong motion networks; K-net, Kik-net, and seismic intensity instruments operated by local governments. We compare the observed seismograms with simulations to validate the 3D model. For the 3D simulation we sliced the 3D model into a number of layers to assign to many processors for concurrent computing. The equation of motions are solved using a high order (32nd) staggered-grid FDM in horizontal directions, and a conventional (4th-order) FDM in vertical direction with the MPI inter-processor communications between neighbor region. The simulation model is 128km by 128km by 43km, which is discritized at variable grid size of 62.5-125m in horizontal directions and of 31.25-62.5m in vertical direction. We assigned a minimum shear wave velocity is Vs=0.4km/s, at the top of the sedimentary basin. The seismic sources for the small events are approximated by double-couple point source and we simulate the seismic wave propagation at maximum frequency of 2Hz. We used the Earth Simulator (JAMSTEC, Yokohama Inst) to conduct such
International Nuclear Information System (INIS)
Kiviniemi, T.
2001-01-01
One of the principal problems en route to a fusion reactor is that of insufficient plasma confinement, which has lead to both theoretical and experimental research into transport processes in the parameter range relevant for fusion energy production. The neoclassical theory of tokamak transport is well-established unlike the theory of turbulence driven anomalous transport in which extensive progress has been made during last few years. So far, anomalous transport has been dominant in experiments, but transport may be reduced to the neoclassical level in advanced tokamak scenarios. This thesis reports a numerical study of neoclassical fluxes, parallel viscosity, and neoclassical radial current balance in tokamaks. Neoclassical parallel viscosity and particle fluxes are simulated over a wide range of collisionalities, using the fully kinetic five-dimensional neoclassical orbit-following Monte Carlo code ASCOT. The qualitative behavior of parallel viscosity derived in earlier analytic models is shown to be incorrect for high poloidal Mach numbers. This is because the poloidal dependence of density was neglected. However, in high Mach number regime, it is the convection and compression terms, rather than the parallel viscosity term, that are shown to dominate the momentum balance. For fluxes, a reasonable agreement between numerical and analytical results is found in the collisional parameter regime. Neoclassical particle fluxes are additionally studied in the banana regime using the three-dimensional Fokker-Planck code DEPORA, which solves the drift-kinetic equation with finite differencing. Limitations of the small inverse aspect ratio approximation adopted in the analytic theory are addressed. Assuming that the anomalous transport is ambipolar, the radial electric field and its shear at the tokamak plasma edge can be solved from the neoclassical radial current balance. This is performed both for JET and ASDEX Upgrade tokamaks using the ASCOT code. It is shown that
Directory of Open Access Journals (Sweden)
Dawen Xia
2018-01-01
Full Text Available Frequent pattern mining is an effective approach for spatiotemporal association analysis of mobile trajectory big data in data-driven intelligent transportation systems. While existing parallel algorithms have been successfully applied to frequent pattern mining of large-scale trajectory data, two major challenges are how to overcome the inherent defects of Hadoop to cope with taxi trajectory big data including massive small files and how to discover the implicitly spatiotemporal frequent patterns with MapReduce. To conquer these challenges, this paper presents a MapReduce-based Parallel Frequent Pattern growth (MR-PFP algorithm to analyze the spatiotemporal characteristics of taxi operating using large-scale taxi trajectories with massive small file processing strategies on a Hadoop platform. More specifically, we first implement three methods, that is, Hadoop Archives (HAR, CombineFileInputFormat (CFIF, and Sequence Files (SF, to overcome the existing defects of Hadoop and then propose two strategies based on their performance evaluations. Next, we incorporate SF into Frequent Pattern growth (FP-growth algorithm and then implement the optimized FP-growth algorithm on a MapReduce framework. Finally, we analyze the characteristics of taxi operating in both spatial and temporal dimensions by MR-PFP in parallel. The results demonstrate that MR-PFP is superior to existing Parallel FP-growth (PFP algorithm in efficiency and scalability.
Efficient graph-based dynamic load-balancing for parallel large-scale agent-based traffic simulation
Xu, Y.; Cai, W.; Aydt, H.; Lees, M.; Tolk, A.; Diallo, S.Y.; Ryzhov, I.O.; Yilmaz, L.; Buckley, S.; Miller, J.A.
2014-01-01
One of the issues of parallelizing large-scale agent-based traffic simulations is partitioning and load-balancing. Traffic simulations are dynamic applications where the distribution of workload in the spatial domain constantly changes. Dynamic load-balancing at run-time has shown better efficiency
Accelerating Dust Storm Simulation by Balancing Task Allocation in Parallel Computing Environment
Gui, Z.; Yang, C.; XIA, J.; Huang, Q.; YU, M.
2013-12-01
Dust storm has serious negative impacts on environment, human health, and assets. The continuing global climate change has increased the frequency and intensity of dust storm in the past decades. To better understand and predict the distribution, intensity and structure of dust storm, a series of dust storm models have been developed, such as Dust Regional Atmospheric Model (DREAM), the NMM meteorological module (NMM-dust) and Chinese Unified Atmospheric Chemistry Environment for Dust (CUACE/Dust). The developments and applications of these models have contributed significantly to both scientific research and our daily life. However, dust storm simulation is a data and computing intensive process. Normally, a simulation for a single dust storm event may take several days or hours to run. It seriously impacts the timeliness of prediction and potential applications. To speed up the process, high performance computing is widely adopted. By partitioning a large study area into small subdomains according to their geographic location and executing them on different computing nodes in a parallel fashion, the computing performance can be significantly improved. Since spatiotemporal correlations exist in the geophysical process of dust storm simulation, each subdomain allocated to a node need to communicate with other geographically adjacent subdomains to exchange data. Inappropriate allocations may introduce imbalance task loads and unnecessary communications among computing nodes. Therefore, task allocation method is the key factor, which may impact the feasibility of the paralleling. The allocation algorithm needs to carefully leverage the computing cost and communication cost for each computing node to minimize total execution time and reduce overall communication cost for the entire system. This presentation introduces two algorithms for such allocation and compares them with evenly distributed allocation method. Specifically, 1) In order to get optimized solutions, a
International Nuclear Information System (INIS)
Satake, Shinsuke; Okamoto, Masao; Nakajima, Noriyoshi; Takamaru, Hisanori
2005-11-01
A neoclassical transport simulation code (FORTEC-3D) applicable to three-dimensional configurations has been developed using High Performance Fortran (HPF). Adoption of computing techniques for parallelization and a hybrid simulation model to the δf Monte-Carlo method transport simulation, including non-local transport effects in three-dimensional configurations, makes it possible to simulate the dynamism of global, non-local transport phenomena with a self-consistent radial electric field within a reasonable computation time. In this paper, development of the transport code using HPF is reported. Optimization techniques in order to achieve both high vectorization and parallelization efficiency, adoption of a parallel random number generator, and also benchmark results, are shown. (author)
Romano, Paul Kollath
Monte Carlo particle transport methods are being considered as a viable option for high-fidelity simulation of nuclear reactors. While Monte Carlo methods offer several potential advantages over deterministic methods, there are a number of algorithmic shortcomings that would prevent their immediate adoption for full-core analyses. In this thesis, algorithms are proposed both to ameliorate the degradation in parallel efficiency typically observed for large numbers of processors and to offer a means of decomposing large tally data that will be needed for reactor analysis. A nearest-neighbor fission bank algorithm was proposed and subsequently implemented in the OpenMC Monte Carlo code. A theoretical analysis of the communication pattern shows that the expected cost is O( N ) whereas traditional fission bank algorithms are O(N) at best. The algorithm was tested on two supercomputers, the Intrepid Blue Gene/P and the Titan Cray XK7, and demonstrated nearly linear parallel scaling up to 163,840 processor cores on a full-core benchmark problem. An algorithm for reducing network communication arising from tally reduction was analyzed and implemented in OpenMC. The proposed algorithm groups only particle histories on a single processor into batches for tally purposes---in doing so it prevents all network communication for tallies until the very end of the simulation. The algorithm was tested, again on a full-core benchmark, and shown to reduce network communication substantially. A model was developed to predict the impact of load imbalances on the performance of domain decomposed simulations. The analysis demonstrated that load imbalances in domain decomposed simulations arise from two distinct phenomena: non-uniform particle densities and non-uniform spatial leakage. The dominant performance penalty for domain decomposition was shown to come from these physical effects rather than insufficient network bandwidth or high latency. The model predictions were verified with
A package of Linux scripts for the parallelization of Monte Carlo simulations
Badal, Andreu; Sempau, Josep
2006-09-01
Despite the fact that fast computers are nowadays available at low cost, there are many situations where obtaining a reasonably low statistical uncertainty in a Monte Carlo (MC) simulation involves a prohibitively large amount of time. This limitation can be overcome by having recourse to parallel computing. Most tools designed to facilitate this approach require modification of the source code and the installation of additional software, which may be inconvenient for some users. We present a set of tools, named clonEasy, that implement a parallelization scheme of a MC simulation that is free from these drawbacks. In clonEasy, which is designed to run under Linux, a set of "clone" CPUs is governed by a "master" computer by taking advantage of the capabilities of the Secure Shell (ssh) protocol. Any Linux computer on the Internet that can be ssh-accessed by the user can be used as a clone. A key ingredient for the parallel calculation to be reliable is the availability of an independent string of random numbers for each CPU. Many generators—such as RANLUX, RANECU or the Mersenne Twister—can readily produce these strings by initializing them appropriately and, hence, they are suitable to be used with clonEasy. This work was primarily motivated by the need to find a straightforward way to parallelize PENELOPE, a code for MC simulation of radiation transport that (in its current 2005 version) employs the generator RANECU, which uses a combination of two multiplicative linear congruential generators (MLCGs). Thus, this paper is focused on this class of generators and, in particular, we briefly present an extension of RANECU that increases its period up to ˜5×10 and we introduce seedsMLCG, a tool that provides the information necessary to initialize disjoint sequences of an MLCG to feed different CPUs. This program, in combination with clonEasy, allows to run PENELOPE in parallel easily, without requiring specific libraries or significant alterations of the
Hybrid parallel strategy for the simulation of fast transient accidental situations at reactor scale
International Nuclear Information System (INIS)
Faucher, V.; Galon, P.; Beccantini, A.; Crouzet, F.; Debaud, F.; Gautier, T.
2015-01-01
Highlights: • Reference accidental situations for current and future reactors are considered. • They require the modeling of complex fluid–structure systems at full reactor scale. • EPX software computes the non-linear transient solution with explicit time stepping. • Focus on the parallel hybrid solver specific to the proposed coupled equations. - Abstract: This contribution is dedicated to the latest methodological developments implemented in the fast transient dynamics software EUROPLEXUS (EPX) to simulate the mechanical response of fully coupled fluid–structure systems to accidental situations to be considered at reactor scale, among which the Loss of Coolant Accident, the Core Disruptive Accident and the Hydrogen Explosion. Time integration is explicit and the search for reference solutions within the safety framework prevents any simplification and approximations in the coupled algorithm: for instance, all kinematic constraints are dealt with using Lagrange Multipliers, yielding a complex flow chart when non-permanent constraints such as unilateral contact or immersed fluid–structure boundaries are considered. The parallel acceleration of the solution process is then achieved through a hybrid approach, based on a weighted domain decomposition for distributed memory computing and the use of the KAAPI library for self-balanced shared memory processing inside subdomains
Wang, Ting; Plecháč, Petr
2017-12-01
Stochastic reaction networks that exhibit bistable behavior are common in systems biology, materials science, and catalysis. Sampling of stationary distributions is crucial for understanding and characterizing the long-time dynamics of bistable stochastic dynamical systems. However, simulations are often hindered by the insufficient sampling of rare transitions between the two metastable regions. In this paper, we apply the parallel replica method for a continuous time Markov chain in order to improve sampling of the stationary distribution in bistable stochastic reaction networks. The proposed method uses parallel computing to accelerate the sampling of rare transitions. Furthermore, it can be combined with the path-space information bounds for parametric sensitivity analysis. With the proposed methodology, we study three bistable biological networks: the Schlögl model, the genetic switch network, and the enzymatic futile cycle network. We demonstrate the algorithmic speedup achieved in these numerical benchmarks. More significant acceleration is expected when multi-core or graphics processing unit computer architectures and programming tools such as CUDA are employed.
Hardware and software and machine-tool simulation with parallel structures mechanisms
Directory of Open Access Journals (Sweden)
Keba P.V.
2016-12-01
Full Text Available The usage spectrum of mechanisms with parallel structure is spreading all the time. The mechanisms of machine-tools and manipulators become more complicated and it is necessary to improve the program-controlled modules. Closed circuit mechanisms are mostly spread in robotic complexes, where manipulator performs complicated spatial movements by the given trajectory. The usage spectrum is very wide and the most popular are sorting, welding, assembling and others. However, the problem of designing the operating programs is still present even today. It is just because the developed post-processors are created for the equipment that we have for now. But new machine tool constructions appear every day and there is a necessity to control them. The problems associated with using of hardware and software of mechanisms with parallel structure in computer-aided simulation are considered. The program for inverse problem kinematics solving is designed. New method of designing the control programs is found. The kinematic analysis methods options and calculated data obtained by computer mathematics systems are shown with «Tools Glide» software taken as an example.
Wang, Ting; Plecháč, Petr
2017-12-21
Stochastic reaction networks that exhibit bistable behavior are common in systems biology, materials science, and catalysis. Sampling of stationary distributions is crucial for understanding and characterizing the long-time dynamics of bistable stochastic dynamical systems. However, simulations are often hindered by the insufficient sampling of rare transitions between the two metastable regions. In this paper, we apply the parallel replica method for a continuous time Markov chain in order to improve sampling of the stationary distribution in bistable stochastic reaction networks. The proposed method uses parallel computing to accelerate the sampling of rare transitions. Furthermore, it can be combined with the path-space information bounds for parametric sensitivity analysis. With the proposed methodology, we study three bistable biological networks: the Schlögl model, the genetic switch network, and the enzymatic futile cycle network. We demonstrate the algorithmic speedup achieved in these numerical benchmarks. More significant acceleration is expected when multi-core or graphics processing unit computer architectures and programming tools such as CUDA are employed.
Parallel Adjective High-Order CFD Simulations Characterizing SOFIA Cavity Acoustics
Barad, Michael F.; Brehm, Christoph; Kiris, Cetin C.; Biswas, Rupak
2016-01-01
This paper presents large-scale MPI-parallel computational uid dynamics simulations for the Stratospheric Observatory for Infrared Astronomy (SOFIA). SOFIA is an airborne, 2.5-meter infrared telescope mounted in an open cavity in the aft fuselage of a Boeing 747SP. These simulations focus on how the unsteady ow eld inside and over the cavity interferes with the optical path and mounting structure of the telescope. A temporally fourth-order accurate Runge-Kutta, and spatially fth-order accurate WENO- 5Z scheme was used to perform implicit large eddy simulations. An immersed boundary method provides automated gridding for complex geometries and natural coupling to a block-structured Cartesian adaptive mesh re nement framework. Strong scaling studies using NASA's Pleiades supercomputer with up to 32k CPU cores and 4 billion compu- tational cells shows excellent scaling. Dynamic load balancing based on execution time on individual AMR blocks addresses irregular numerical cost associated with blocks con- taining boundaries. Limits to scaling beyond 32k cores are identi ed, and targeted code optimizations are discussed.
Experiences with serial and parallel algorithms for channel routing using simulated annealing
Brouwer, Randall Jay
1988-01-01
Two algorithms for channel routing using simulated annealing are presented. Simulated annealing is an optimization methodology which allows the solution process to back up out of local minima that may be encountered by inappropriate selections. By properly controlling the annealing process, it is very likely that the optimal solution to an NP-complete problem such as channel routing may be found. The algorithm presented proposes very relaxed restrictions on the types of allowable transformations, including overlapping nets. By freeing that restriction and controlling overlap situations with an appropriate cost function, the algorithm becomes very flexible and can be applied to many extensions of channel routing. The selection of the transformation utilizes a number of heuristics, still retaining the pseudorandom nature of simulated annealing. The algorithm was implemented as a serial program for a workstation, and a parallel program designed for a hypercube computer. The details of the serial implementation are presented, including many of the heuristics used and some of the resulting solutions.
Yasuda, Shugo; Yamamoto, Ryoichi
2015-11-01
The Synchronized Molecular-Dynamics simulation which was recently proposed by authors is applied to the analysis of polymer lubrication between parallel plates. In the SMD method, the MD simulations are assigned to small fluid elements to calculate the local stresses and temperatures and are synchronized at certain time intervals to satisfy the macroscopic heat- and momentum-transport equations.The rheological properties and conformation of the polymer chains coupled with local viscous heating are investigated with a non-dimensional parameter, the Nahme-Griffith number, which is defined as the ratio of the viscous heating to the thermal conduction at the characteristic temperature required to sufficiently change the viscosity. The present simulation demonstrates that strong shear thinning and a transitional behavior of the conformation of the polymer chains are exhibited with a rapid temperature rise when the Nahme-Griffith number exceeds unity.The results also clarify that the reentrant transition of the linear stress-optical relation occurs for large shear stresses due to the coupling of the conformation of polymer chains with heat generation under shear flows. This study was financially supported by JSPS KAKENHI Grant Nos. 26790080 and 26247069.
International Nuclear Information System (INIS)
Sakane, S; Takaki, T; Ohno, M; Shimokawabe, T; Aoki, T
2015-01-01
Phase-field method has emerged as the most powerful numerical scheme to simulate dendrite growth. However, most phase-field simulations of dendrite growth performed so far are limited to two-dimension or single dendrite in three-dimension because of the large computational cost involved. To express actual solidification microstructures, multiple dendrites with different preferred growth directions should be computed at the same time. In this study, in order to enable large-scale phase-field dendrite growth simulations, we developed a phase-field code using multiple graphics processing units in which a quantitative phase-field method for binary alloy solidification and moving frame algorithm for directional solidification were employed. First, we performed strong and weak scaling tests for the developed parallel code. Then, dendrite competitive growth simulations in three-dimensional binary alloy bicrystal were performed and the dendrite interactions in three-dimensional space were investigated. (paper)
Baumgärtel, M.; Ghanem, K.; Kiani, A.; Koch, E.; Pavarini, E.; Sims, H.; Zhang, G.
2017-07-01
We discuss the efficient implementation of general impurity solvers for dynamical mean-field theory. We show that both Lanczos and quantum Monte Carlo in different flavors (Hirsch-Fye, continuous-time hybridization- and interaction-expansion) exhibit excellent scaling on massively parallel supercomputers. We apply these algorithms to simulate realistic model Hamiltonians including the full Coulomb vertex, crystal-field splitting, and spin-orbit interaction. We discuss how to remove the sign problem in the presence of non-diagonal crystal-field and hybridization matrices. We show how to extract the physically observable quantities from imaginary time data, in particular correlation functions and susceptibilities. Finally, we present benchmarks and applications for representative correlated systems.
Parallel CFD simulation of flow in a 3D model of vibrating human vocal folds
Czech Academy of Sciences Publication Activity Database
Šidlof, Petr; Horáček, Jaromír; Řidký, V.
2013-01-01
Roč. 80, č. 1 (2013), s. 290-300 ISSN 0045-7930 R&D Projects: GA ČR(CZ) GAP101/11/0207 Institutional research plan: CEZ:AV0Z20760514 Keywords : numerical simulation * vocal folds * glottal airflow * inite volume method * parallel CFD Subject RIV: BI - Acoustics Impact factor: 1.532, year: 2013 http://www.sciencedirect.com/science?_ob=ArticleListURL&_method=list&_ArticleListID=-268060849&_sort=r&_st=13&view=c&_acct=C000034318&_version=1&_urlVersion=0&_userid=640952&md5=7c5b5539857ee9a02af5e690585b3126&searchtype=a
Hybrid parallel strategy for the simulation of fast transient accidental situations at reactor scale
International Nuclear Information System (INIS)
Faucher, V.; Galon, P.; Beccantini, A.; Crouzet, F.; Debaud, F.; Gautier, T.
2013-01-01
This contribution is dedicated to the latest methodological developments implemented in the fast transient dynamics software EUROPLEXUS (EPX) to simulate the mechanical response of fully coupled fluid-structure systems to accidental situations to be considered at reactor scale, among which the Loss of Coolant Accident, the Core Disruptive Accident and the Hydrogen Explosion. Time integration is explicit and the search for reference solutions within the safety framework prevents any simplification and approximations in the coupled algorithm: for instance, all kinematic constraints are dealt with using Lagrange Multipliers, yielding a complex flow chart when non-permanent constraints such as unilateral contact or immersed fluid-structure boundaries are considered. The parallel acceleration of the solution process is then achieved through a hybrid approach, based on a weighted domain decomposition for distributed memory computing and the use of the KAAPI library for self-balanced shared memory processing inside sub-domains. (authors)
Directory of Open Access Journals (Sweden)
Řidký Václav
2014-03-01
Full Text Available The work is devoted to 3D and 2D parallel numerical computation of pressure and velocity fields around an elastically supported airfoil self-oscillating due to interaction with the airflow. Numerical solution is computed in the OpenFOAM package, an open-source software package based on finite volume method. Movement of airfoil is described by translation and rotation, identified from experimental data. A new boundary condition for the 2DOF motion of the airfoil was implemented. The results of numerical simulations (velocity are compared with data measured in a wind tunnel, where a physical model of NACA0015 airfoil was mounted and tuned to exhibit the flutter instability. The experimental results were obtained previously in the Institute of Thermomechanics by interferographic measurements in a subsonic wind tunnel in Nový Knín.
Tirapu Azpiroz, Jaione; Burr, Geoffrey W.; Rosenbluth, Alan E.; Hibbs, Michael
2008-03-01
In the Hyper-NA immersion lithography regime, the electromagnetic response of the reticle is known to deviate in a complicated manner from the idealized Thin-Mask-like behavior. Already, this is driving certain RET choices, such as the use of polarized illumination and the customization of reticle film stacks. Unfortunately, full 3-D electromagnetic mask simulations are computationally intensive. And while OPC-compatible mask electromagnetic field (EMF) models can offer a reasonable tradeoff between speed and accuracy for full-chip OPC applications, full understanding of these complex physical effects demands higher accuracy. Our paper describes recent advances in leveraging High Performance Computing as a critical step towards lithographic modeling of the full manufacturing process. In this paper, highly accurate full 3-D electromagnetic simulation of very large mask layouts are conducted in parallel with reasonable turnaround time, using a Blue- Gene/L supercomputer and a Finite-Difference Time-Domain (FDTD) code developed internally within IBM. A 3-D simulation of a large 2-D layout spanning 5μm×5μm at the wafer plane (and thus (20μm×20μm×0.5μm at the mask) results in a simulation with roughly 12.5GB of memory (grid size of 10nm at the mask, single-precision computation, about 30 bytes/grid point). FDTD is flexible and easily parallelizable to enable full simulations of such large layout in approximately an hour using one BlueGene/L "midplane" containing 512 dual-processor nodes with 256MB of memory per processor. Our scaling studies on BlueGene/L demonstrate that simulations up to 100μm × 100μm at the mask can be computed in a few hours. Finally, we will show that the use of a subcell technique permits accurate simulation of features smaller than the grid discretization, thus improving on the tradeoff between computational complexity and simulation accuracy. We demonstrate the correlation of the real and quadrature components that comprise the
Is Climate Simulation in Growth Chambers Necessary?
Z.M. Wang; K.H. Johnsen; M.J. Lechowicz
1999-01-01
In the expression of their genetic potential as phenotypes, trees respond to environmental cues such as photoperiod, temperature and soil and atmospheric water. However, growth chamber experiments often utilize simple and standard environmental conditions that might not provide these important environmental signals. We conducted a study to compare seedling growth in...
Directory of Open Access Journals (Sweden)
Gianni Castelli
2010-01-01
Full Text Available This paper presents results on the modelling, simulation and experimental tests of a cable-based parallel manipulator to be used as an aiding or guiding system for people with motion disabilities. There is a high level of motivation for people with a motion disability or the elderly to perform basic daily-living activities independently. Therefore, it is of great interest to design and implement safe and reliable motion assisting and guiding devices that are able to help end-users. In general, a robot for a medical application should be able to interact with a patient in safety conditions, i.e. it must not damage people or surroundings; it must be designed to guarantee high accuracy and low acceleration during the operation. Furthermore, it should not be too bulky and it should exert limited wrenches after close interaction with people. It can be advisable to have a portable system which can be easily brought into and assembled in a hospital or a domestic environment. Cable-based robotic structures can fulfil those requirements because of their main characteristics that make them light and intrinsically safe. In this paper, a reconfigurable four-cable-based parallel manipulator has been proposed as a motion assisting and guiding device to help people to accomplish a number of tasks, such as an aiding or guiding system to move the upper and lower limbs or the whole body. Modelling and simulation are presented in the ADAMS environment. Moreover, experimental tests are reported as based on an available laboratory prototype.
Unterkircher, A
2005-01-01
We propose methods for parallel assembling and iterative equation solving based on graph algorithms. The assembling technique is independent of dimension, element type and model shape. As a parallel solving technique we construct a multiplicative symmetric Schwarz preconditioner for the conjugate gradient method. Both methods have been incorporated into a non-linear FE code to simulate 3D metal extrusion processes. We illustrate the efficiency of these methods on shared memory computers by realistic examples.
Modeling and simulation of Si crystal growth from melt
Energy Technology Data Exchange (ETDEWEB)
Liu, Lijun; Liu, Xin; Li, Zaoyang [National Engineering Research Center for Fluid Machinery and Compressors, School of Energy and Power Engineering, Xi' an Jiaotong University, Xi' an, Shaanxi 710049 (China); Miyazawa, Hiroaki; Nakano, Satoshi; Kakimoto, Koichi [Research Institute for Applied Mechanics, Kyushu University, Kasuga 816-8580 (Japan)
2009-07-01
A numerical simulator was developed with a global model of heat transfer for any crystal growth taking place at high temperature. Convective, conductive and radiative heat transfers in the furnace are solved together in a conjugated way by a finite volume method. A three-dimensional (3D) global model was especially developed for simulation of heat transfer in any crystal growth with 3D features. The model enables 3D global simulation be conducted with moderate requirement of computer resources. The application of this numerical simulator to a CZ growth and a directional solidification process for Si crystals, the two major production methods for crystalline Si for solar cells, was introduced. Some typical results were presented, showing the importance and effectiveness of numerical simulation in analyzing and improving these kinds of Si crystal growth processes from melt. (copyright 2009 WILEY-VCH Verlag GmbH and Co. KGaA, Weinheim) (orig.)
Competitive grain growth in directional solidification investigated by phase field simulation
International Nuclear Information System (INIS)
Li Junjie; Wang Zhijun; Wang Jincheng; Yang Yujuan
2012-01-01
During directional solidification, the competitive dendritic growth between various oriented grains is a key factor to obtain desirable texture. In order to understand the mechanism of competitive dendritic growth, the phase field method was adopted to simulate the microstructure evolution of bicrystal samples. The simulation has well reproduced the whole competitive growth process for both diverging and converging dendrites. In converging case, besides the block of the unfavorably oriented dendrite by the favorably oriented one, the unfavorably oriented dendrite is also able to overgrow the favorable one under the condition of relatively low pulling velocity. This unusual overgrowth is dictated by the solute interaction of the converging dendrite tips. In diverging case, it was found that the grain boundary can be either inclined or parallel to the favorably oriented grain depending on the disposition of two grains.
International Nuclear Information System (INIS)
Hwang, F-N; Wei, Z-H; Huang, T-M; Wang Weichung
2010-01-01
We develop a parallel Jacobi-Davidson approach for finding a partial set of eigenpairs of large sparse polynomial eigenvalue problems with application in quantum dot simulation. A Jacobi-Davidson eigenvalue solver is implemented based on the Portable, Extensible Toolkit for Scientific Computation (PETSc). The eigensolver thus inherits PETSc's efficient and various parallel operations, linear solvers, preconditioning schemes, and easy usages. The parallel eigenvalue solver is then used to solve higher degree polynomial eigenvalue problems arising in numerical simulations of three dimensional quantum dots governed by Schroedinger's equations. We find that the parallel restricted additive Schwarz preconditioner in conjunction with a parallel Krylov subspace method (e.g. GMRES) can solve the correction equations, the most costly step in the Jacobi-Davidson algorithm, very efficiently in parallel. Besides, the overall performance is quite satisfactory. We have observed near perfect superlinear speedup by using up to 320 processors. The parallel eigensolver can find all target interior eigenpairs of a quintic polynomial eigenvalue problem with more than 32 million variables within 12 minutes by using 272 Intel 3.0 GHz processors.
Monte Carlo simulation of continuous-space crystal growth
International Nuclear Information System (INIS)
Dodson, B.W.; Taylor, P.A.
1986-01-01
We describe a method, based on Monte Carlo techniques, of simulating the atomic growth of crystals without the discrete lattice space assumed by conventional Monte Carlo growth simulations. Since no lattice space is assumed, problems involving epitaxial growth, heteroepitaxy, phonon-driven mechanisms, surface reconstruction, and many other phenomena incompatible with the lattice-space approximation can be studied. Also, use of the Monte Carlo method circumvents to some extent the extreme limitations on simulated timescale inherent in crystal-growth techniques which might be proposed using molecular dynamics. The implementation of the new method is illustrated by studying the growth of strained-layer superlattice (SLS) interfaces in two-dimensional Lennard-Jones atomic systems. Despite the extreme simplicity of such systems, the qualitative features of SLS growth seen here are similar to those observed experimentally in real semiconductor systems
Chang, Ouliang
The objective of this dissertation is to study the physics of whistler turbulence evolution and its role in energy transport and dissipation in the solar wind plasmas through computational and theoretical investigations. This dissertation presents the first fully three-dimensional (3D) particle-in-cell (PIC) simulations of whistler turbulence forward cascade in a homogeneous, collisionless plasma with a uniform background magnetic field B o, and the first 3D PIC simulation of whistler turbulence with both forward and inverse cascades. Such computationally demanding research is made possible through the use of massively parallel, high performance electromagnetic PIC simulations on state-of-the-art supercomputers. Simulations are carried out to study characteristic properties of whistler turbulence under variable solar wind fluctuation amplitude (epsilon e) and electron beta (betae), relative contributions to energy dissipation and electron heating in whistler turbulence from the quasilinear scenario and the intermittency scenario, and whistler turbulence preferential cascading direction and wavevector anisotropy. The 3D simulations of whistler turbulence exhibit a forward cascade of fluctuations into broadband, anisotropic, turbulent spectrum at shorter wavelengths with wavevectors preferentially quasi-perpendicular to B o. The overall electron heating yields T ∥ > T⊥ for all epsilone and betae values, indicating the primary linear wave-particle interaction is Landau damping. But linear wave-particle interactions play a minor role in shaping the wavevector spectrum, whereas nonlinear wave-wave interactions are overall stronger and faster processes, and ultimately determine the wavevector anisotropy. Simulated magnetic energy spectra as function of wavenumber show a spectral break to steeper slopes, which scales as k⊥lambda e ≃ 1 independent of betae values, where lambdae is electron inertial length, qualitatively similar to solar wind observations. Specific
Energy Technology Data Exchange (ETDEWEB)
Peyroux, J
2005-11-15
This project aims to make even more powerful the resolution of Vlasov codes through the various parallelization tools (MPI, OpenMP...). A simplified test case served as a base for constructing the parallel codes for obtaining a data-processing skeleton which, thereafter, could be re-used for increasingly complex models (more than four variables of phase space). This will thus make it possible to treat more realistic situations linked, for example, to the injection of ultra short and ultra intense impulses in inertial fusion plasmas, or the study of the instability of trapped ions now taken as being responsible for the generation of turbulence in tokamak plasmas. (author)
Energy Technology Data Exchange (ETDEWEB)
Peyroux, J
2005-11-15
This project aims to make even more powerful the resolution of Vlasov codes through the various parallelization tools (MPI, OpenMP...). A simplified test case served as a base for constructing the parallel codes for obtaining a data-processing skeleton which, thereafter, could be re-used for increasingly complex models (more than four variables of phase space). This will thus make it possible to treat more realistic situations linked, for example, to the injection of ultra short and ultra intense impulses in inertial fusion plasmas, or the study of the instability of trapped ions now taken as being responsible for the generation of turbulence in tokamak plasmas. (author)
Byun, Hye Suk; El-Naggar, Mohamed Y.; Kalia, Rajiv K.; Nakano, Aiichiro; Vashishta, Priya
2017-10-01
Kinetic Monte Carlo (KMC) simulations are used to study long-time dynamics of a wide variety of systems. Unfortunately, the conventional KMC algorithm is not scalable to larger systems, since its time scale is inversely proportional to the simulated system size. A promising approach to resolving this issue is the synchronous parallel KMC (SPKMC) algorithm, which makes the time scale size-independent. This paper introduces a formal derivation of the SPKMC algorithm based on local transition-state and time-dependent Hartree approximations, as well as its scalable parallel implementation based on a dual linked-list cell method. The resulting algorithm has achieved a weak-scaling parallel efficiency of 0.935 on 1024 Intel Xeon processors for simulating biological electron transfer dynamics in a 4.2 billion-heme system, as well as decent strong-scaling parallel efficiency. The parallel code has been used to simulate a lattice of cytochrome complexes on a bacterial-membrane nanowire, and it is broadly applicable to other problems such as computational synthesis of new materials.
Borazjani, Iman; Ge, Liang; Le, Trung; Sotiropoulos, Fotis
2013-01-01
We develop an overset-curvilinear immersed boundary (overset-CURVIB) method in a general non-inertial frame of reference to simulate a wide range of challenging biological flow problems. The method incorporates overset-curvilinear grids to efficiently handle multi-connected geometries and increase the resolution locally near immersed boundaries. Complex bodies undergoing arbitrarily large deformations may be embedded within the overset-curvilinear background grid and treated as sharp interfaces using the curvilinear immersed boundary (CURVIB) method (Ge and Sotiropoulos, Journal of Computational Physics, 2007). The incompressible flow equations are formulated in a general non-inertial frame of reference to enhance the overall versatility and efficiency of the numerical approach. Efficient search algorithms to identify areas requiring blanking, donor cells, and interpolation coefficients for constructing the boundary conditions at grid interfaces of the overset grid are developed and implemented using efficient parallel computing communication strategies to transfer information among sub-domains. The governing equations are discretized using a second-order accurate finite-volume approach and integrated in time via an efficient fractional-step method. Various strategies for ensuring globally conservative interpolation at grid interfaces suitable for incompressible flow fractional step methods are implemented and evaluated. The method is verified and validated against experimental data, and its capabilities are demonstrated by simulating the flow past multiple aquatic swimmers and the systolic flow in an anatomic left ventricle with a mechanical heart valve implanted in the aortic position. PMID:23833331
A scalable fully implicit framework for reservoir simulation on parallel computers
Yang, Haijian
2017-11-10
The modeling of multiphase fluid flow in porous medium is of interest in the field of reservoir simulation. The promising numerical methods in the literature are mostly based on the explicit or semi-implicit approach, which both have certain stability restrictions on the time step size. In this work, we introduce and study a scalable fully implicit solver for the simulation of two-phase flow in a porous medium with capillarity, gravity and compressibility, which is free from the limitations of the conventional methods. In the fully implicit framework, a mixed finite element method is applied to discretize the model equations for the spatial terms, and the implicit Backward Euler scheme with adaptive time stepping is used for the temporal integration. The resultant nonlinear system arising at each time step is solved in a monolithic way by using a Newton–Krylov type method. The corresponding linear system from the Newton iteration is large sparse, nonsymmetric and ill-conditioned, consequently posing a significant challenge to the fully implicit solver. To address this issue, the family of additive Schwarz preconditioners is taken into account to accelerate the convergence of the linear system, and thereby improves the robustness of the outer Newton method. Several test cases in one, two and three dimensions are used to validate the correctness of the scheme and examine the performance of the newly developed algorithm on parallel computers.
Massively parallel Monte Carlo. Experiences running nuclear simulations on a large condor cluster
International Nuclear Information System (INIS)
Tickner, James; O'Dwyer, Joel; Roach, Greg; Uher, Josef; Hitchen, Greg
2010-01-01
The trivially-parallel nature of Monte Carlo (MC) simulations make them ideally suited for running on a distributed, heterogeneous computing environment. We report on the setup and operation of a large, cycle-harvesting Condor computer cluster, used to run MC simulations of nuclear instruments ('jobs') on approximately 4,500 desktop PCs. Successful operation must balance the competing goals of maximizing the availability of machines for running jobs whilst minimizing the impact on users' PC performance. This requires classification of jobs according to anticipated run-time and priority and careful optimization of the parameters used to control job allocation to host machines. To maximize use of a large Condor cluster, we have created a powerful suite of tools to handle job submission and analysis, as the manual creation, submission and evaluation of large numbers (hundred to thousands) of jobs would be too arduous. We describe some of the key aspects of this suite, which has been interfaced to the well-known MCNP and EGSnrc nuclear codes and our in-house PHOTON optical MC code. We report on our practical experiences of operating our Condor cluster and present examples of several large-scale instrument design problems that have been solved using this tool. (author)
Behrens, Jörg; Hanke, Moritz; Jahns, Thomas
2014-05-01
In this talk we present a way to facilitate efficient use of MPI communication for developers of climate models. Exploitation of the performance potential of today's highly parallel supercomputers with real world simulations is a complex task. This is partly caused by the low level nature of the MPI communication library which is the dominant communication tool at least for inter-node communication. In order to manage the complexity of the task, climate simulations with non-trivial communication patterns often use an internal abstraction layer above MPI without exploiting the benefits of communication aggregation or MPI-datatypes. The solution for the complexity and performance problem we propose is the communication library YAXT. This library is built on top of MPI and takes high level descriptions of arbitrary domain decompositions and automatically derives an efficient collective data exchange. Several exchanges can be aggregated in order to reduce latency costs. Examples are given which demonstrate the simplicity and the performance gains for selected climate applications.
Two-phase flow steam generator simulations on parallel computers using domain decomposition method
International Nuclear Information System (INIS)
Belliard, M.
2003-01-01
Within the framework of the Domain Decomposition Method (DDM), we present industrial steady state two-phase flow simulations of PWR Steam Generators (SG) using iteration-by-sub-domain methods: standard and Adaptive Dirichlet/Neumann methods (ADN). The averaged mixture balance equations are solved by a Fractional-Step algorithm, jointly with the Crank-Nicholson scheme and the Finite Element Method. The algorithm works with overlapping or non-overlapping sub-domains and with conforming or nonconforming meshing. Computations are run on PC networks or on massively parallel mainframe computers. A CEA code-linker and the PVM package are used (master-slave context). SG mock-up simulations, involving up to 32 sub-domains, highlight the efficiency (speed-up, scalability) and the robustness of the chosen approach. With the DDM, the computational problem size is easily increased to about 1,000,000 cells and the CPU time is significantly reduced. The difficulties related to industrial use are also discussed. (author)
A scalable fully implicit framework for reservoir simulation on parallel computers
Yang, Haijian; Sun, Shuyu; Li, Yiteng; Yang, Chao
2017-01-01
The modeling of multiphase fluid flow in porous medium is of interest in the field of reservoir simulation. The promising numerical methods in the literature are mostly based on the explicit or semi-implicit approach, which both have certain stability restrictions on the time step size. In this work, we introduce and study a scalable fully implicit solver for the simulation of two-phase flow in a porous medium with capillarity, gravity and compressibility, which is free from the limitations of the conventional methods. In the fully implicit framework, a mixed finite element method is applied to discretize the model equations for the spatial terms, and the implicit Backward Euler scheme with adaptive time stepping is used for the temporal integration. The resultant nonlinear system arising at each time step is solved in a monolithic way by using a Newton–Krylov type method. The corresponding linear system from the Newton iteration is large sparse, nonsymmetric and ill-conditioned, consequently posing a significant challenge to the fully implicit solver. To address this issue, the family of additive Schwarz preconditioners is taken into account to accelerate the convergence of the linear system, and thereby improves the robustness of the outer Newton method. Several test cases in one, two and three dimensions are used to validate the correctness of the scheme and examine the performance of the newly developed algorithm on parallel computers.
Automated integration of genomic physical mapping data via parallel simulated annealing
Energy Technology Data Exchange (ETDEWEB)
Slezak, T.
1994-06-01
The Human Genome Center at the Lawrence Livermore National Laboratory (LLNL) is nearing closure on a high-resolution physical map of human chromosome 19. We have build automated tools to assemble 15,000 fingerprinted cosmid clones into 800 contigs with minimal spanning paths identified. These islands are being ordered, oriented, and spanned by a variety of other techniques including: Fluorescence Insitu Hybridization (FISH) at 3 levels of resolution, ECO restriction fragment mapping across all contigs, and a multitude of different hybridization and PCR techniques to link cosmid, YAC, AC, PAC, and Pl clones. The FISH data provide us with partial order and distance data as well as orientation. We made the observation that map builders need a much rougher presentation of data than do map readers; the former wish to see raw data since these can expose errors or interesting biology. We further noted that by ignoring our length and distance data we could simplify our problem into one that could be readily attacked with optimization techniques. The data integration problem could then be seen as an M x N ordering of our N cosmid clones which ``intersect`` M larger objects by defining ``intersection`` to mean either contig/map membership or hybridization results. Clearly, the goal of making an integrated map is now to rearrange the N cosmid clone ``columns`` such that the number of gaps on the object ``rows`` are minimized. Our FISH partially-ordered cosmid clones provide us with a set of constraints that cannot be violated by the rearrangement process. We solved the optimization problem via simulated annealing performed on a network of 40+ Unix machines in parallel, using a server/client model built on explicit socket calls. For current maps we can create a map in about 4 hours on the parallel net versus 4+ days on a single workstation. Our biologists are now using this software on a daily basis to guide their efforts toward final closure.
3D streamers simulation in a pin to plane configuration using massively parallel computing
Plewa, J.-M.; Eichwald, O.; Ducasse, O.; Dessante, P.; Jacobs, C.; Renon, N.; Yousfi, M.
2018-03-01
This paper concerns the 3D simulation of corona discharge using high performance computing (HPC) managed with the message passing interface (MPI) library. In the field of finite volume methods applied on non-adaptive mesh grids and in the case of a specific 3D dynamic benchmark test devoted to streamer studies, the great efficiency of the iterative R&B SOR and BiCGSTAB methods versus the direct MUMPS method was clearly demonstrated in solving the Poisson equation using HPC resources. The optimization of the parallelization and the resulting scalability was undertaken as a function of the HPC architecture for a number of mesh cells ranging from 8 to 512 million and a number of cores ranging from 20 to 1600. The R&B SOR method remains at least about four times faster than the BiCGSTAB method and requires significantly less memory for all tested situations. The R&B SOR method was then implemented in a 3D MPI parallelized code that solves the classical first order model of an atmospheric pressure corona discharge in air. The 3D code capabilities were tested by following the development of one, two and four coplanar streamers generated by initial plasma spots for 6 ns. The preliminary results obtained allowed us to follow in detail the formation of the tree structure of a corona discharge and the effects of the mutual interactions between the streamers in terms of streamer velocity, trajectory and diameter. The computing time for 64 million of mesh cells distributed over 1000 cores using the MPI procedures is about 30 min ns-1, regardless of the number of streamers.
Revealing the Physics of Galactic Winds Through Massively-Parallel Hydrodynamics Simulations
Schneider, Evan Elizabeth
This thesis documents the hydrodynamics code Cholla and a numerical study of multiphase galactic winds. Cholla is a massively-parallel, GPU-based code designed for astrophysical simulations that is freely available to the astrophysics community. A static-mesh Eulerian code, Cholla is ideally suited to carrying out massive simulations (> 20483 cells) that require very high resolution. The code incorporates state-of-the-art hydrodynamics algorithms including third-order spatial reconstruction, exact and linearized Riemann solvers, and unsplit integration algorithms that account for transverse fluxes on multidimensional grids. Operator-split radiative cooling and a dual-energy formalism for high mach number flows are also included. An extensive test suite demonstrates Cholla's superior ability to model shocks and discontinuities, while the GPU-native design makes the code extremely computationally efficient - speeds of 5-10 million cell updates per GPU-second are typical on current hardware for 3D simulations with all of the aforementioned physics. The latter half of this work comprises a comprehensive study of the mixing between a hot, supernova-driven wind and cooler clouds representative of those observed in multiphase galactic winds. Both adiabatic and radiatively-cooling clouds are investigated. The analytic theory of cloud-crushing is applied to the problem, and adiabatic turbulent clouds are found to be mixed with the hot wind on similar timescales as the classic spherical case (4-5 t cc) with an appropriate rescaling of the cloud-crushing time. Radiatively cooling clouds survive considerably longer, and the differences in evolution between turbulent and spherical clouds cannot be reconciled with a simple rescaling. The rapid incorporation of low-density material into the hot wind implies efficient mass-loading of hot phases of galactic winds. At the same time, the extreme compression of high-density cloud material leads to long-lived but slow-moving clumps
Simulating urban growth in the George town conurbation | Samat ...
African Journals Online (AJOL)
Journal of Fundamental and Applied Sciences ... Therefore, this paper aims to develop an urban growth simulation model using GIS-based CA-Markov approach, incorporated with driving forces of urban growth in the Malaysian context. ... Keywords: CA-Markov; Geograpghic Information Sciences (GIS); Land use changes;
Zaidi, H; Morel, Christian
1998-01-01
This paper describes the implementation of the Eidolon Monte Carlo program designed to simulate fully three-dimensional (3D) cylindrical positron tomographs on a MIMD parallel architecture. The original code was written in Objective-C and developed under the NeXTSTEP development environment. Different steps involved in porting the software on a parallel architecture based on PowerPC 604 processors running under AIX 4.1 are presented. Basic aspects and strategies of running Monte Carlo calculations on parallel computers are described. A linear decrease of the computing time was achieved with the number of computing nodes. The improved time performances resulting from parallelisation of the Monte Carlo calculations makes it an attractive tool for modelling photon transport in 3D positron tomography. The parallelisation paradigm used in this work is independent from the chosen parallel architecture
Overview of urban Growth Simulation: With examples from different cities
CSIR Research Space (South Africa)
Waldeck, L
2013-08-01
Full Text Available This presentation provides an overview of Urban Growth Simulation as a risk free means of assessing the future outcome of major policy and investment decisions with some examples of scenarios that were simulated in different South African cities....
Scalability of Parallel Spatial Direct Numerical Simulations on Intel Hypercube and IBM SP1 and SP2
Joslin, Ronald D.; Hanebutte, Ulf R.; Zubair, Mohammad
1995-01-01
The implementation and performance of a parallel spatial direct numerical simulation (PSDNS) approach on the Intel iPSC/860 hypercube and IBM SP1 and SP2 parallel computers is documented. Spatially evolving disturbances associated with the laminar-to-turbulent transition in boundary-layer flows are computed with the PSDNS code. The feasibility of using the PSDNS to perform transition studies on these computers is examined. The results indicate that PSDNS approach can effectively be parallelized on a distributed-memory parallel machine by remapping the distributed data structure during the course of the calculation. Scalability information is provided to estimate computational costs to match the actual costs relative to changes in the number of grid points. By increasing the number of processors, slower than linear speedups are achieved with optimized (machine-dependent library) routines. This slower than linear speedup results because the computational cost is dominated by FFT routine, which yields less than ideal speedups. By using appropriate compile options and optimized library routines on the SP1, the serial code achieves 52-56 M ops on a single node of the SP1 (45 percent of theoretical peak performance). The actual performance of the PSDNS code on the SP1 is evaluated with a "real world" simulation that consists of 1.7 million grid points. One time step of this simulation is calculated on eight nodes of the SP1 in the same time as required by a Cray Y/MP supercomputer. For the same simulation, 32-nodes of the SP1 and SP2 are required to reach the performance of a Cray C-90. A 32 node SP1 (SP2) configuration is 2.9 (4.6) times faster than a Cray Y/MP for this simulation, while the hypercube is roughly 2 times slower than the Y/MP for this application. KEY WORDS: Spatial direct numerical simulations; incompressible viscous flows; spectral methods; finite differences; parallel computing.
International Nuclear Information System (INIS)
Liu, H.
1996-01-01
Computer simulations using the multi-particle code PARMELA with a three-dimensional point-by-point space charge algorithm have turned out to be very helpful in supporting injector commissioning and operations at Thomas Jefferson National Accelerator Facility (Jefferson Lab, formerly called CEBAF). However, this algorithm, which defines a typical N 2 problem in CPU time scaling, is very time-consuming when N, the number of macro-particles, is large. Therefore, it is attractive to use massively parallel processors (MPPs) to speed up the simulations. Motivated by this, the authors modified the space charge subroutine for using the MPPs of the Cray T3D. The techniques used to parallelize and optimize the code on the T3D are discussed in this paper. The performance of the code on the T3D is examined in comparison with a Parallel Vector Processing supercomputer of the Cray C90 and an HP 735/15 high-end workstation
Directory of Open Access Journals (Sweden)
Denis Becker
2016-05-01
Full Text Available Transaction level models of systems-on-chip in SystemC are commonly used in the industry to provide an early simulation environment. The SystemC standard imposes coroutine semantics for the scheduling of simulated processes, to ensure determinism and reproducibility of simulations. However, because of this, sequential implementations have, for a long time, been the only option available, and still now the reference implementation is sequential. With the increasing size and complexity of models, and the multiplication of computation cores on recent machines, the parallelization of SystemC simulations is a major research concern. There have been several proposals for SystemC parallelization, but most of them are limited to cycle-accurate models. In this paper we focus on loosely timed models, which are commonly used in the industry. We present an industrial context and show that, unfortunately, most of the existing approaches for SystemC parallelization can fundamentally not apply in this context. We support this claim with a set of measurements performed on a platform used in production at STMicroelectronics. This paper surveys existing techniques, presents a visualization and profiling tool and identifies unsolved challenges in the parallelization of SystemC models at transaction level.
Shen, Wenfeng; Wei, Daming; Xu, Weimin; Zhu, Xin; Yuan, Shizhong
2010-10-01
Biological computations like electrocardiological modelling and simulation usually require high-performance computing environments. This paper introduces an implementation of parallel computation for computer simulation of electrocardiograms (ECGs) in a personal computer environment with an Intel CPU of Core (TM) 2 Quad Q6600 and a GPU of Geforce 8800GT, with software support by OpenMP and CUDA. It was tested in three parallelization device setups: (a) a four-core CPU without a general-purpose GPU, (b) a general-purpose GPU plus 1 core of CPU, and (c) a four-core CPU plus a general-purpose GPU. To effectively take advantage of a multi-core CPU and a general-purpose GPU, an algorithm based on load-prediction dynamic scheduling was developed and applied to setting (c). In the simulation with 1600 time steps, the speedup of the parallel computation as compared to the serial computation was 3.9 in setting (a), 16.8 in setting (b), and 20.0 in setting (c). This study demonstrates that a current PC with a multi-core CPU and a general-purpose GPU provides a good environment for parallel computations in biological modelling and simulation studies. Copyright 2010 Elsevier Ireland Ltd. All rights reserved.
A Parallel 2D Numerical Simulation of Tumor Cells Necrosis by Local Hyperthermia
International Nuclear Information System (INIS)
Reis, R F; Loureiro, F S; Lobosco, M
2014-01-01
Hyperthermia has been widely used in cancer treatment to destroy tumors. The main idea of the hyperthermia is to heat a specific region like a tumor so that above a threshold temperature the tumor cells are destroyed. This can be accomplished by many heat supply techniques and the use of magnetic nanoparticles that generate heat when an alternating magnetic field is applied has emerged as a promise technique. In the present paper, the Pennes bioheat transfer equation is adopted to model the thermal tumor ablation in the context of magnetic nanoparticles. Numerical simulations are carried out considering different injection sites for the nanoparticles in an attempt to achieve better hyperthermia conditions. Explicit finite difference method is employed to solve the equations. However, a large amount of computation is required for this purpose. Therefore, this work also presents an initial attempt to improve performance using OpenMP, a parallel programming API. Experimental results were quite encouraging: speedups around 35 were obtained on a 64-core machine
A parallel direct-forcing fictitious domain method for simulating microswimmers
Gao, Tong; Lin, Zhaowu
2017-11-01
We present a 3D parallel direct-forcing fictitious domain method for simulating swimming micro-organisms at small Reynolds numbers. We treat the motile micro-swimmers as spherical rigid particles using the ``Squirmer'' model. The particle dynamics are solved on the moving Larangian meshes that overlay upon a fixed Eulerian mesh for solving the fluid motion, and the momentum exchange between the two phases is resolved by distributing pseudo body-forces over the particle interior regions which constrain the background fictitious fluids to follow the particle movement. While the solid and fluid subproblems are solved separately, no inner-iterations are required to enforce numerical convergence. We demonstrate the accuracy and robustness of the method by comparing our results with the existing analytical and numerical studies for various cases of single particle dynamics and particle-particle interactions. We also perform a series of numerical explorations to obtain statistical and rheological measurements to characterize the dynamics and structures of Squirmer suspensions. NSF DMS 1619960.
Simulation of dendritic growth of magnesium alloys with fluid flow
Directory of Open Access Journals (Sweden)
Meng-wu Wu
2017-11-01
Full Text Available Fluid flow has a significant impact on the microstructure evolution of alloys during solidification. Based on the previous work relating simulation of the dendritic growth of magnesium alloys with hcp (hexagonal close-packed structure, an extension was made to the formerly established CA (cellular automaton model with the purpose of studying the effect of fluid flow on the dendritic growth of magnesium alloys. The modified projection method was used to solve the transport equations of flow field. By coupling the flow field with the solute field, simulation results of equiaxed and columnar dendritic growth of magnesium alloys with fluid flow were achieved. The simulated results were quantitatively compared with those without fluid flow. Moreover, a comparison was also made between the present work and previous works conducted by others. It can be concluded that a deep understanding of the dendritic growth of magnesium alloys with fluid flow can be obtained by applying the present numerical model.
Numerical simulation of avascular tumor growth
Energy Technology Data Exchange (ETDEWEB)
Slezak, D Fernandez; Suarez, C; Soba, A; Risk, M; Marshall, G [Laboratorio de Sistemas Complejos, Departamento de Computacion, Facultad de Ciencias Exactas y Naturales, Universidad de Buenos Aires (C1428EGA) Buenos Aires (Argentina)
2007-11-15
A mathematical and numerical model for the description of different aspects of microtumor development is presented. The model is based in the solution of a system of partial differential equations describing an avascular tumor growth. A detailed second-order numeric algorithm for solving this system is described. Parameters are swiped to cover a range of feasible physiological values. While previous published works used a single set of parameters values, here we present a wide range of feasible solutions for tumor growth, covering a more realistic scenario. The model is validated by experimental data obtained with a multicellular spheroid model, a specific type of in vitro biological model which is at present considered to be optimum for the study of complex aspects of avascular microtumor physiology. Moreover, a dynamical analysis and local behaviour of the system is presented, showing chaotic situations for particular sets of parameter values at some fixed points. Further biological experiments related to those specific points may give potentially interesting results.
Simulation studies of emittance growth in RMS mismatched beams
International Nuclear Information System (INIS)
Cucchetti, A.; Wangler, T.; Reiser, M.
1991-01-01
As shown in a separate paper, a charged-particle beam, whose rms size is not matched when injected into a transport channel or accelerator, has excess energy compared with that of a matched beam. If nonlinear space-charge forces are present and the mismatched beam transforms to a matched equilibrium state, rms-emittance growth will occur. The theory yields formulas for the possible rms-emittance growth, but not for the time it takes to achieve this growth. In this paper we present the results of systematic simulation studies for a mismatched 2-D round beam in an ideal transport channel with continuous linear focusing. Emittance growth rates obtained from the simulations for different amounts of mismatch and initial charge will be presented and the emittance growth will be compared with the theory. 6 refs., 7 figs
Energy Technology Data Exchange (ETDEWEB)
Bylaska, Eric J., E-mail: Eric.Bylaska@pnnl.gov [Environmental Molecular Sciences Laboratory, Pacific Northwest National Laboratory, P.O. Box 999, Richland, Washington 99352 (United States); Weare, Jonathan Q., E-mail: weare@uchicago.edu [Department of Mathematics, University of Chicago, Chicago, Illinois 60637 (United States); Weare, John H., E-mail: jweare@ucsd.edu [Department of Chemistry and Biochemistry, University of California, San Diego, La Jolla, California 92093 (United States)
2013-08-21
Parallel in time simulation algorithms are presented and applied to conventional molecular dynamics (MD) and ab initio molecular dynamics (AIMD) models of realistic complexity. Assuming that a forward time integrator, f (e.g., Verlet algorithm), is available to propagate the system from time t{sub i} (trajectory positions and velocities x{sub i} = (r{sub i}, v{sub i})) to time t{sub i+1} (x{sub i+1}) by x{sub i+1} = f{sub i}(x{sub i}), the dynamics problem spanning an interval from t{sub 0}…t{sub M} can be transformed into a root finding problem, F(X) = [x{sub i} − f(x{sub (i−1})]{sub i} {sub =1,M} = 0, for the trajectory variables. The root finding problem is solved using a variety of root finding techniques, including quasi-Newton and preconditioned quasi-Newton schemes that are all unconditionally convergent. The algorithms are parallelized by assigning a processor to each time-step entry in the columns of F(X). The relation of this approach to other recently proposed parallel in time methods is discussed, and the effectiveness of various approaches to solving the root finding problem is tested. We demonstrate that more efficient dynamical models based on simplified interactions or coarsening time-steps provide preconditioners for the root finding problem. However, for MD and AIMD simulations, such preconditioners are not required to obtain reasonable convergence and their cost must be considered in the performance of the algorithm. The parallel in time algorithms developed are tested by applying them to MD and AIMD simulations of size and complexity similar to those encountered in present day applications. These include a 1000 Si atom MD simulation using Stillinger-Weber potentials, and a HCl + 4H{sub 2}O AIMD simulation at the MP2 level. The maximum speedup ((serial execution time)/(parallel execution time) ) obtained by parallelizing the Stillinger-Weber MD simulation was nearly 3.0. For the AIMD MP2 simulations, the algorithms achieved speedups of up
International Nuclear Information System (INIS)
Bylaska, Eric J.; Weare, Jonathan Q.; Weare, John H.
2013-01-01
Parallel in time simulation algorithms are presented and applied to conventional molecular dynamics (MD) and ab initio molecular dynamics (AIMD) models of realistic complexity. Assuming that a forward time integrator, f (e.g., Verlet algorithm), is available to propagate the system from time t i (trajectory positions and velocities x i = (r i , v i )) to time t i+1 (x i+1 ) by x i+1 = f i (x i ), the dynamics problem spanning an interval from t 0 …t M can be transformed into a root finding problem, F(X) = [x i − f(x (i−1 )] i =1,M = 0, for the trajectory variables. The root finding problem is solved using a variety of root finding techniques, including quasi-Newton and preconditioned quasi-Newton schemes that are all unconditionally convergent. The algorithms are parallelized by assigning a processor to each time-step entry in the columns of F(X). The relation of this approach to other recently proposed parallel in time methods is discussed, and the effectiveness of various approaches to solving the root finding problem is tested. We demonstrate that more efficient dynamical models based on simplified interactions or coarsening time-steps provide preconditioners for the root finding problem. However, for MD and AIMD simulations, such preconditioners are not required to obtain reasonable convergence and their cost must be considered in the performance of the algorithm. The parallel in time algorithms developed are tested by applying them to MD and AIMD simulations of size and complexity similar to those encountered in present day applications. These include a 1000 Si atom MD simulation using Stillinger-Weber potentials, and a HCl + 4H 2 O AIMD simulation at the MP2 level. The maximum speedup ((serial execution time)/(parallel execution time) ) obtained by parallelizing the Stillinger-Weber MD simulation was nearly 3.0. For the AIMD MP2 simulations, the algorithms achieved speedups of up to 14.3. The parallel in time algorithms can be implemented in a
Simulated Thin-Film Growth and Imaging
Schillaci, Michael
2001-06-01
Thin-films have become the cornerstone of the electronics, telecommunications, and broadband markets. A list of potential products includes: computer boards and chips, satellites, cell phones, fuel cells, superconductors, flat panel displays, optical waveguides, building and automotive windows, food and beverage plastic containers, metal foils, pipe plating, vision ware, manufacturing equipment and turbine engines. For all of these reasons a basic understanding of the physical processes involved in both growing and imaging thin-films can provide a wonderful research project for advanced undergraduate and first-year graduate students. After producing rudimentary two- and three-dimensional thin-film models incorporating ballsitic deposition and nearest neighbor Coulomb-type interactions, the QM tunneling equations are used to produce simulated scanning tunneling microscope (SSTM) images of the films. A discussion of computational platforms, languages, and software packages that may be used to accomplish similar results is also given.
MaMiCo: Software design for parallel molecular-continuum flow simulations
Neumann, Philipp
2015-11-19
The macro-micro-coupling tool (MaMiCo) was developed to ease the development of and modularize molecular-continuum simulations, retaining sequential and parallel performance. We demonstrate the functionality and performance of MaMiCo by coupling the spatially adaptive Lattice Boltzmann framework waLBerla with four molecular dynamics (MD) codes: the light-weight Lennard-Jones-based implementation SimpleMD, the node-level optimized software ls1 mardyn, and the community codes ESPResSo and LAMMPS. We detail interface implementations to connect each solver with MaMiCo. The coupling for each waLBerla-MD setup is validated in three-dimensional channel flow simulations which are solved by means of a state-based coupling method. We provide sequential and strong scaling measurements for the four molecular-continuum simulations. The overhead of MaMiCo is found to come at 10%-20% of the total (MD) runtime. The measurements further show that scalability of the hybrid simulations is reached on up to 500 Intel SandyBridge, and more than 1000 AMD Bulldozer compute cores. Program summary: Program title: MaMiCo. Catalogue identifier: AEYW_v1_0. Program summary URL: http://cpc.cs.qub.ac.uk/summaries/AEYW_v1_0.html Program obtainable from: CPC Program Library, Queen\\'s University, Belfast, N. Ireland. Licensing provisions: BSD License. No. of lines in distributed program, including test data, etc.: 67905. No. of bytes in distributed program, including test data, etc.: 1757334. Distribution format: tar.gz. Programming language: C, C++II. Computer: Standard PCs, compute clusters. Operating system: Unix/Linux. RAM: Test cases consume ca. 30-50 MB. Classification: 7.7. External routines: Scons (http:www.scons.org), ESPResSo, LAMMPS, ls1 mardyn, waLBerla. Nature of problem: Coupled molecular-continuum simulation for multi-resolution fluid dynamics: parts of the domain are resolved by molecular dynamics whereas large parts are covered by a CFD solver, e.g. a lattice Boltzmann automaton
Growth Kinetics of the Homogeneously Nucleated Water Droplets: Simulation Results
International Nuclear Information System (INIS)
Mokshin, Anatolii V; Galimzyanov, Bulat N
2012-01-01
The growth of homogeneously nucleated droplets in water vapor at the fixed temperatures T = 273, 283, 293, 303, 313, 323, 333, 343, 353, 363 and 373 K (the pressure p = 1 atm.) is investigated on the basis of the coarse-grained molecular dynamics simulation data with the mW-model. The treatment of simulation results is performed by means of the statistical method within the mean-first-passage-time approach, where the reaction coordinate is associated with the largest droplet size. It is found that the water droplet growth is characterized by the next features: (i) the rescaled growth law is unified at all the considered temperatures and (ii) the droplet growth evolves with acceleration and follows the power law.
Bylaska, Eric J; Weare, Jonathan Q; Weare, John H
2013-08-21
Parallel in time simulation algorithms are presented and applied to conventional molecular dynamics (MD) and ab initio molecular dynamics (AIMD) models of realistic complexity. Assuming that a forward time integrator, f (e.g., Verlet algorithm), is available to propagate the system from time ti (trajectory positions and velocities xi = (ri, vi)) to time ti + 1 (xi + 1) by xi + 1 = fi(xi), the dynamics problem spanning an interval from t0[ellipsis (horizontal)]tM can be transformed into a root finding problem, F(X) = [xi - f(x(i - 1)]i = 1, M = 0, for the trajectory variables. The root finding problem is solved using a variety of root finding techniques, including quasi-Newton and preconditioned quasi-Newton schemes that are all unconditionally convergent. The algorithms are parallelized by assigning a processor to each time-step entry in the columns of F(X). The relation of this approach to other recently proposed parallel in time methods is discussed, and the effectiveness of various approaches to solving the root finding problem is tested. We demonstrate that more efficient dynamical models based on simplified interactions or coarsening time-steps provide preconditioners for the root finding problem. However, for MD and AIMD simulations, such preconditioners are not required to obtain reasonable convergence and their cost must be considered in the performance of the algorithm. The parallel in time algorithms developed are tested by applying them to MD and AIMD simulations of size and complexity similar to those encountered in present day applications. These include a 1000 Si atom MD simulation using Stillinger-Weber potentials, and a HCl + 4H2O AIMD simulation at the MP2 level. The maximum speedup (serial execution/timeparallel execution time) obtained by parallelizing the Stillinger-Weber MD simulation was nearly 3.0. For the AIMD MP2 simulations, the algorithms achieved speedups of up to 14.3. The parallel in time algorithms can be implemented in a
International Nuclear Information System (INIS)
Epelboin, Y.
1996-01-01
This paper presents a new algorithm for the integration of Takagi-Taupin equations taking into account the fact that X-ray diffraction is a parallel phenomenon. The diffraction equations show that the propagation of the waves is independent in each incidence plane. It is thus possible to compute in parallel the propagation of the waves in different planes. Two algorithms are presented: the first one for multiprocessor machines where the processors share a common memory, the second one for massively parallel computers. The program is written to achieve a high vectorization ratio and to make it as efficient as possible with modern superscalar and array processors. The simulation of the image of a defect has been divided into two independent parts. In the first one, one computes the derivatives of the deformation inside the crystal; in the second one, these results are used to simulate the image. This allows one rapidly to change the model for a defect, something that was not feasible in all previously written simulation programs since the computation of the deformation was part of the simulation. The study of stroboscopic images of the propagation of acoustic waves in piezoelectric devices is given as an example of the possibilities of this new program. (orig.)
Parallel real-time visualization system for large-scale simulation. Application to WSPEEDI
International Nuclear Information System (INIS)
Muramatsu, Kazuhiro; Otani, Takayuki; Kitabata, Hideyuki; Matsumoto, Hideki; Takei, Toshifumi; Doi, Shun
2000-01-01
The real-time visualization system, PATRAS (PArallel TRAcking Steering system) has been developed on parallel computing servers. The system performs almost all of the visualization tasks on a parallel computing server, and uses image data compression technique for efficient communication between the server and the client terminal. Therefore, the system realizes high performance concurrent visualization in an internet computing environment. The experience in applying PATRAS to WSPEEDI (Worldwide version of System for Prediction Environmental Emergency Dose Information) is reported. The application of PATRAS to WSPEEDI enables users to understand behaviours of radioactive tracers from different release points easily and quickly. (author)
Epitaxial growth of Cu on Cu(001): Experiments and simulations
International Nuclear Information System (INIS)
Furman, Itay; Biham, Ofer; Zuo, Jiang-Kai; Swan, Anna K.; Wendelken, John
2000-01-01
A quantitative comparison between experimental and Monte Carlo simulation results for the epitaxial growth of Cu/Cu(001) in the submonolayer regime is presented. The simulations take into account a complete set of hopping processes whose activation energies are derived from semiempirical calculations using the embedded-atom method. The island separation is measured as a function of the incoming flux and the temperature. A good quantitative agreement between the experiment and simulation is found for the island separation, the activation energies for the dominant processes, and the exponents that characterize the growth. The simulation results are then analyzed at lower coverages, which are not accessible experimentally, providing good agreement with theoretical predictions as well
Jung, Jaewoon; Mori, Takaharu; Kobayashi, Chigusa; Matsunaga, Yasuhiro; Yoda, Takao; Feig, Michael; Sugita, Yuji
2015-07-01
GENESIS (Generalized-Ensemble Simulation System) is a new software package for molecular dynamics (MD) simulations of macromolecules. It has two MD simulators, called ATDYN and SPDYN. ATDYN is parallelized based on an atomic decomposition algorithm for the simulations of all-atom force-field models as well as coarse-grained Go-like models. SPDYN is highly parallelized based on a domain decomposition scheme, allowing large-scale MD simulations on supercomputers. Hybrid schemes combining OpenMP and MPI are used in both simulators to target modern multicore computer architectures. Key advantages of GENESIS are (1) the highly parallel performance of SPDYN for very large biological systems consisting of more than one million atoms and (2) the availability of various REMD algorithms (T-REMD, REUS, multi-dimensional REMD for both all-atom and Go-like models under the NVT, NPT, NPAT, and NPγT ensembles). The former is achieved by a combination of the midpoint cell method and the efficient three-dimensional Fast Fourier Transform algorithm, where the domain decomposition space is shared in real-space and reciprocal-space calculations. Other features in SPDYN, such as avoiding concurrent memory access, reducing communication times, and usage of parallel input/output files, also contribute to the performance. We show the REMD simulation results of a mixed (POPC/DMPC) lipid bilayer as a real application using GENESIS. GENESIS is released as free software under the GPLv2 licence and can be easily modified for the development of new algorithms and molecular models. WIREs Comput Mol Sci 2015, 5:310-323. doi: 10.1002/wcms.1220.
International Nuclear Information System (INIS)
Veerasingam, R.
1990-01-01
In fusion plasmas impurities such as carbon, oxygen or nickel can contaminate the plasma and cause degradation of the performance of a fusion device through radiation. However, impurities can also be used as diagnostics to obtain information about a plasma through spectroscopic experiments which can then be used in plasma modeling and simulations. In the past, serial algorithms have been described for either the time dependent or steady state problem. In this paper, we describe a parallel procedure adopted to solve the time-dependent problem. It can be shown that for the steady state problem a parallel procedure would not be a useful application of parallelization because a few seconds of the Central Processing Unit time on a CRAY-XMP or IBM 3090/600S would suffice to obtain the solution, while this is not the case for the time-dependent problem. In order to study the effects of low Z and high Z impurities on the final state of a plasma, time-dependent solutions are necessary. For purposes of diagnostics and comparisons with experiments, a fast turn around time of the simulations would be advantageous. We have implemented a parallel algorithm on and IBM 3090/600S and tested its performance for a typical set of fusion plasma parameters. 4 refs., 1 tab
Parallel Reservoir Simulations with Sparse Grid Techniques and Applications to Wormhole Propagation
Wu, Yuanqing
2015-01-01
the traditional simulation technique relying on the Darcy framework, we propose a new framework called Darcy-Brinkman-Forchheimer framework to simulate wormhole propagation. Furthermore, to process the large quantity of cells in the simulation grid and shorten
International Nuclear Information System (INIS)
Takei, Toshifumi; Doi, Shun; Matsumoto, Hideki; Muramatsu, Kazuhiro
2000-01-01
We have developed a concurrent visualization system RVSLIB (Real-time Visual Simulation Library). This paper shows the effectiveness of the system when it is applied to large-scale unsteady simulations, for which the conventional post-processing approach may no longer work, on high-performance parallel vector supercomputers. The system performs almost all of the visualization tasks on a computation server and uses compressed visualized image data for efficient communication between the server and the user terminal. We have introduced several techniques, including vectorization and parallelization, into the system to minimize the computational costs of the visualization tools. The performance of RVSLIB was evaluated by using an actual CFD code on an NEC SX-4. The computational time increase due to the concurrent visualization was at most 3% for a smaller (1.6 million) grid and less than 1% for a larger (6.2 million) one. (author)
High performance shallow water kernels for parallel overland flow simulations based on FullSWOF2D
Wittmann, Roland
2017-01-25
We describe code optimization and parallelization procedures applied to the sequential overland flow solver FullSWOF2D. Major difficulties when simulating overland flows comprise dealing with high resolution datasets of large scale areas which either cannot be computed on a single node either due to limited amount of memory or due to too many (time step) iterations resulting from the CFL condition. We address these issues in terms of two major contributions. First, we demonstrate a generic step-by-step transformation of the second order finite volume scheme in FullSWOF2D towards MPI parallelization. Second, the computational kernels are optimized by the use of templates and a portable vectorization approach. We discuss the load imbalance of the flux computation due to dry and wet cells and propose a solution using an efficient cell counting approach. Finally, scalability results are shown for different test scenarios along with a flood simulation benchmark using the Shaheen II supercomputer.
Growth of nitrogen-doped graphene on copper: Multiscale simulations
Gaillard, P.; Schoenhalz, A. L.; Moskovkin, P.; Lucas, S.; Henrard, L.
2016-02-01
We used multiscale simulations to model the growth of nitrogen-doped graphene on a copper substrate by chemical vapour deposition (CVD). Our simulations are based on ab-initio calculations of energy barriers for surface diffusion, which are complemented by larger scale Kinetic Monte Carlo (KMC) simulations. Our results indicate that the shape of grown doped graphene flakes depends on the temperature and deposition flux they are submitted during the process, but we found no significant effect of nitrogen doping on this shape. However, we show that nitrogen atoms have a preference for pyridine-like sites compared to graphite-like sites, as observed experimentally.
Sugar maple and yellow birch seedling growth after simulated browsing.
Frederick T. Metzger
1977-01-01
Simulating natural damage to leaders of forest-grown seedlings of yellow birch and sugar maple resulted in no loss of vigor but a loss in net height growth. Leader elongation depended upon seedling, shoot, and bud characteristics rather than on the extent of damage.
Pronk, Sander; Pouya, Iman; Lundborg, Magnus; Rotskoff, Grant; Wesén, Björn; Kasson, Peter M; Lindahl, Erik
2015-06-09
Computational chemistry and other simulation fields are critically dependent on computing resources, but few problems scale efficiently to the hundreds of thousands of processors available in current supercomputers-particularly for molecular dynamics. This has turned into a bottleneck as new hardware generations primarily provide more processing units rather than making individual units much faster, which simulation applications are addressing by increasingly focusing on sampling with algorithms such as free-energy perturbation, Markov state modeling, metadynamics, or milestoning. All these rely on combining results from multiple simulations into a single observation. They are potentially powerful approaches that aim to predict experimental observables directly, but this comes at the expense of added complexity in selecting sampling strategies and keeping track of dozens to thousands of simulations and their dependencies. Here, we describe how the distributed execution framework Copernicus allows the expression of such algorithms in generic workflows: dataflow programs. Because dataflow algorithms explicitly state dependencies of each constituent part, algorithms only need to be described on conceptual level, after which the execution is maximally parallel. The fully automated execution facilitates the optimization of these algorithms with adaptive sampling, where undersampled regions are automatically detected and targeted without user intervention. We show how several such algorithms can be formulated for computational chemistry problems, and how they are executed efficiently with many loosely coupled simulations using either distributed or parallel resources with Copernicus.
Phase-field crystal simulation facet and branch crystal growth
Chen, Zhi; Wang, Zhaoyang; Gu, Xinrui; Chen, Yufei; Hao, Limei; de Wit, Jos; Jin, Kexin
2018-05-01
Phase-field crystal model with one mode is introduced to describe morphological transition. The relationship between growth morphology and smooth density distribution was investigated. The results indicate that the pattern selection of dendrite growth is caused by the competition between interface energy anisotropy and interface kinetic anisotropy based on the 2D phase diagram. When the calculation time increases, the crystal grows to secondary dendrite at the dimensionless undercooling equal to - 0.4. Moreover, when noise is introduced in the growth progress, the symmetry is broken in the growth mode, and there becomes irregular fractal-like growth morphology. Furthermore, the single crystal shape develops into polycrystalline when the noise amplitude is large enough. When the dimensionless undercooling is less than - 0.3, the noise has a significant effect on the growth shape. In addition, the growth velocity of crystal near to liquid phase line is slow, while the shape far away from the liquid adapts to fast growth. Based on the simulation results, the method was proved to be effective, and it can easily obtain different crystal shapes by choosing the different points in 2D phase diagram.
Directory of Open Access Journals (Sweden)
Julián A García-Grajales
Full Text Available With the growing body of research on traumatic brain injury and spinal cord injury, computational neuroscience has recently focused its modeling efforts on neuronal functional deficits following mechanical loading. However, in most of these efforts, cell damage is generally only characterized by purely mechanistic criteria, functions of quantities such as stress, strain or their corresponding rates. The modeling of functional deficits in neurites as a consequence of macroscopic mechanical insults has been rarely explored. In particular, a quantitative mechanically based model of electrophysiological impairment in neuronal cells, Neurite, has only very recently been proposed. In this paper, we present the implementation details of this model: a finite difference parallel program for simulating electrical signal propagation along neurites under mechanical loading. Following the application of a macroscopic strain at a given strain rate produced by a mechanical insult, Neurite is able to simulate the resulting neuronal electrical signal propagation, and thus the corresponding functional deficits. The simulation of the coupled mechanical and electrophysiological behaviors requires computational expensive calculations that increase in complexity as the network of the simulated cells grows. The solvers implemented in Neurite--explicit and implicit--were therefore parallelized using graphics processing units in order to reduce the burden of the simulation costs of large scale scenarios. Cable Theory and Hodgkin-Huxley models were implemented to account for the electrophysiological passive and active regions of a neurite, respectively, whereas a coupled mechanical model accounting for the neurite mechanical behavior within its surrounding medium was adopted as a link between electrophysiology and mechanics. This paper provides the details of the parallel implementation of Neurite, along with three different application examples: a long myelinated axon
International Nuclear Information System (INIS)
Marzouk, Youssef M.; Ghoniem, Ahmed F.
2005-01-01
A number of complex physical problems can be approached through N-body simulation, from fluid flow at high Reynolds number to gravitational astrophysics and molecular dynamics. In all these applications, direct summation is prohibitively expensive for large N and thus hierarchical methods are employed for fast summation. This work introduces new algorithms, based on k-means clustering, for partitioning parallel hierarchical N-body interactions. We demonstrate that the number of particle-cluster interactions and the order at which they are performed are directly affected by partition geometry. Weighted k-means partitions minimize the sum of clusters' second moments and create well-localized domains, and thus reduce the computational cost of N-body approximations by enabling the use of lower-order approximations and fewer cells. We also introduce compatible techniques for dynamic load balancing, including adaptive scaling of cluster volumes and adaptive redistribution of cluster centroids. We demonstrate the performance of these algorithms by constructing a parallel treecode for vortex particle simulations, based on the serial variable-order Cartesian code developed by Lindsay and Krasny [Journal of Computational Physics 172 (2) (2001) 879-907]. The method is applied to vortex simulations of a transverse jet. Results show outstanding parallel efficiencies even at high concurrencies, with velocity evaluation errors maintained at or below their serial values; on a realistic distribution of 1.2 million vortex particles, we observe a parallel efficiency of 98% on 1024 processors. Excellent load balance is achieved even in the face of several obstacles, such as an irregular, time-evolving particle distribution containing a range of length scales and the continual introduction of new vortex particles throughout the domain. Moreover, results suggest that k-means yields a more efficient partition of the domain than a global oct-tree
International Nuclear Information System (INIS)
Lichters, R.; Pfund, R.E.W.; Meyer-ter-Vehn, J.
1997-08-01
The code LPIC++ presented here, is based on a one-dimensional, electromagnetic, relativistic PIC code that has originally been developed by one of the authors during a PhD thesis at the Max-Planck-Institut fuer Quantenoptik for kinetic simulations of high harmonic generation from overdense plasma surfaces. The code uses essentially the algorithm of Birdsall and Langdon and Villasenor and Bunemann. It is written in C++ in order to be easily extendable and has been parallelized to be able to grow in power linearly with the size of accessable hardware, e.g. massively parallel machines like Cray T3E. The parallel LPIC++ version uses PVM for communication between processors. PVM is public domain software, can be downloaded from the world wide web. A particular strength of LPIC++ lies in its clear program and data structure, which uses chained lists for the organization of grid cells and enables dynamic adjustment of spatial domain sizes in a very convenient way, and therefore easy balancing of processor loads. Also particles belonging to one cell are linked in a chained list and are immediately accessable from this cell. In addition to this convenient type of data organization in a PIC code, the code shows excellent performance in both its single processor and parallel version. (orig.)
International Nuclear Information System (INIS)
Jejcic, A.; Maillard, J.; Maurel, G.; Silva, J.; Wolff-Bacha, F.
1997-01-01
The work in the field of parallel processing has developed as research activities using several numerical Monte Carlo simulations related to basic or applied current problems of nuclear and particle physics. For the applications utilizing the GEANT code development or improvement works were done on parts simulating low energy physical phenomena like radiation, transport and interaction. The problem of actinide burning by means of accelerators was approached using a simulation with the GEANT code. A program of neutron tracking in the range of low energies up to the thermal region has been developed. It is coupled to the GEANT code and permits in a single pass the simulation of a hybrid reactor core receiving a proton burst. Other works in this field refers to simulations for nuclear medicine applications like, for instance, development of biological probes, evaluation and characterization of the gamma cameras (collimators, crystal thickness) as well as the method for dosimetric calculations. Particularly, these calculations are suited for a geometrical parallelization approach especially adapted to parallel machines of the TN310 type. Other works mentioned in the same field refer to simulation of the electron channelling in crystals and simulation of the beam-beam interaction effect in colliders. The GEANT code was also used to simulate the operation of germanium detectors designed for natural and artificial radioactivity monitoring of environment
Using cellular automata for parallel simulation of laser dynamics with dynamic load balancing
Guisado, J.L.; Fernández de Vega, F.; Jiménez Morales, F.; Iskra, K.A.; Sloot, P.M.A.
2008-01-01
We present an analysis of the feasibility of executing a parallel bioinspired model of laser dynamics, based on cellular automata (CA), on the usual target platform of this kind of applications: a heterogeneous non-dedicated cluster. As this model employs a synchronous CA, using the single program,
Convergence order vs. parallelism in the numerical simulation of the bidomain equations
International Nuclear Information System (INIS)
Sharomi, Oluwaseun; Spiteri, Raymond J
2012-01-01
The propagation of electrical activity in the human heart can be modelled mathematically by the bidomain equations. The bidomain equations represent a multi-scale reaction-diffusion model that consists of a set of ordinary differential equations governing the dynamics at the cellular level coupled with a set of partial differential equations governing the dynamics at the tissue level. Significant computation is generally required to generate clinically useful data from the bidomain equations. Contemporary developments in computer architecture, in particular multi- and many-core computers and graphics processing units, have made such computations feasible. However, the zeal to take advantage to parallel architectures has typically caused another important aspect of numerical methods for the solution of differential equations to be overlooked, namely the convergence order. It is well known that higher-order methods are generally more efficient than lower-order ones when solutions are smooth and relatively high accuracy is desired. In these situations, serial implementations of high-order methods may remain surprisingly competitive with parallel implementations of low-order methods. In this paper, we examine the effect of order on the numerical solution of the bidomain equations in parallel. We find that high-order methods, in particular high-order time-integration methods with relatively better stability properties, tend to outperform their low-order counterparts, even when the latter are run in parallel. In other words, increasing integration order often trumps increasing available computational resources, especially when relatively high accuracy is desired.
Efficient Heuristics for the Simulation of Buffer Overflow in Series and Parallel Queueing Networks
Nicola, V.F.; Zaburnenko, T.S.
2006-01-01
In this paper we propose state-dependent importance sampling heuristics to estimate the probability of population overï¬‚ow in Markovian networks of series and parallel queues. These heuristics capture state-dependence along the boundaries (when one or more queues are empty) which is critical for
Modelling and simulation of multiple single - phase induction motor in parallel connection
Directory of Open Access Journals (Sweden)
Sujitjorn, S.
2006-11-01
Full Text Available A mathematical model for parallel connected n-multiple single-phase induction motors in generalized state-space form is proposed in this paper. The motor group draws electric power from one inverter. The model is developed by the dq-frame theory and was tested against four loading scenarios in which satisfactory results were obtained.
International Nuclear Information System (INIS)
Azadeh, A.; Asadzadeh, S.M.; Salehi, N.; Firoozi, M.
2015-01-01
Condition-based maintenance (CBM) is an increasingly applicable policy in the competitive marketplace as a means of improving equipment reliability and efficiency. Not only has maintenance a close relationship with safety but its costs also make it even more attractive issue for researchers. This study proposes a model to evaluate the effectiveness of CBM policy compared to two other maintenance policies: Corrective Maintenance (CM) and Preventive Maintenance (PM). Maintenance policies are compared through two system performance indicators: reliability and cost. To estimate the reliability and costs of the system, the proposed Markovian discrete-event simulation model is developed under each of these policies. The applicability and usefulness of the proposed Markovian simulation model is illustrated for a series–parallel power generation system. The simulated characteristics of CBM system include its prognostics efficiency to estimate remaining useful life of the equipment. Results show that with an efficient prognostics, CBM policy is an effective strategy compared to other maintenance strategies. - Highlights: • A model is developed to evaluate the effectiveness of CBM policy. • Maintenance policies are compared through reliability and cost. • A Markovian simulation model is developed. • A series–parallel power generation system is considered. • CBM is an effective strategy compared to others
Directory of Open Access Journals (Sweden)
Xueli Chen
2010-01-01
Full Text Available During the past decade, Monte Carlo method has obtained wide applications in optical imaging to simulate photon transport process inside tissues. However, this method has not been effectively extended to the simulation of free-space photon transport at present. In this paper, a uniform framework for noncontact optical imaging is proposed based on Monte Carlo method, which consists of the simulation of photon transport both in tissues and in free space. Specifically, the simplification theory of lens system is utilized to model the camera lens equipped in the optical imaging system, and Monte Carlo method is employed to describe the energy transformation from the tissue surface to the CCD camera. Also, the focusing effect of camera lens is considered to establish the relationship of corresponding points between tissue surface and CCD camera. Furthermore, a parallel version of the framework is realized, making the simulation much more convenient and effective. The feasibility of the uniform framework and the effectiveness of the parallel version are demonstrated with a cylindrical phantom based on real experimental results.
O'Keeffe, C J; Ren, Ruichao; Orkoulas, G
2007-11-21
Spatial updating grand canonical Monte Carlo algorithms are generalizations of random and sequential updating algorithms for lattice systems to continuum fluid models. The elementary steps, insertions or removals, are constructed by generating points in space either at random (random updating) or in a prescribed order (sequential updating). These algorithms have previously been developed only for systems of impenetrable spheres for which no particle overlap occurs. In this work, spatial updating grand canonical algorithms are generalized to continuous, soft-core potentials to account for overlapping configurations. Results on two- and three-dimensional Lennard-Jones fluids indicate that spatial updating grand canonical algorithms, both random and sequential, converge faster than standard grand canonical algorithms. Spatial algorithms based on sequential updating not only exhibit the fastest convergence but also are ideal for parallel implementation due to the absence of strict detailed balance and the nature of the updating that minimizes interprocessor communication. Parallel simulation results for three-dimensional Lennard-Jones fluids show a substantial reduction of simulation time for systems of moderate and large size. The efficiency improvement by parallel processing through domain decomposition is always in addition to the efficiency improvement by sequential updating.
Modelisation and numerical simulation for bulk crystal growth processes
International Nuclear Information System (INIS)
Duffar, F.; Dusserre, P.; Barat, C.; Nabot, J.P.
1993-01-01
The aim of this work is to study the relevance of numerical simulation for improving the process control in the field of crystal growth. This investigation focused on the growth of semiconductor and halide crystals by the Bridgman solidification technique, the principle of which is to cool a seeded feed material contained in a crucible, either by pulling the crucible or by decreasing the temperature in the furnace. Calculations are performed with the finite element method, and for comparison, experiments are carried out on Bridgman pulling machines operating either in a laboratory or in industrial plants. Calculations and experimental data have shown a good agreement and a satisfactory reliability
Observation and simulation of crack growth in Zry-4
International Nuclear Information System (INIS)
Bertolino, Graciela; Meyer, Gabriel; Perez Ipina, J
2003-01-01
Security and life extension of components of nuclear reactors are the most motivating aspects that encourage to study embrittlement processes of zirconium alloys by reaction with hydrogen.Here, the use of fracture mechanics tests are suitable to monitor the material resistance of components under service.Because many times is difficult to obtain normalized probes from real size components, researchers look for alternative experimental techniques or crack growth simulation from the knowledge of particular material properties.In this work we present the results obtained after experimental observation and computer simulation of crack growth in Zry-4 probes.Experimental observation were obtained by performing flexion tests in three point probes SSEN(B) of 3 x 7 x 32 mm 3 located in the chamber of a scanning electron microscope, measuring in situ the crack length and opening when an external load is applied.Using the information obtained from stress-displacement measurements after tensile tests and the empiric relationship between crack opening and crack length, the crack growth process was simulated.Displacement field in the zone close to the crack tip was obtained by finite elements technique (Castem, DMT, CEA) assuming plain stress, a plastic bilinear homogeneous material and neglecting texture or directional anisotropy.To compare experimental observation and simulation, a grid (10 x 10 μm 2 each square) was drawn in the zone close to the crack tip by selective sputtering.Following the movement of two (three) points of the surface allows to compare uni (bi) dimensional deformation.A good agreement between observation and simulation was observed: after the crack opening grew 28 times (from 1.5 to 42 μm) the base-height relationship of a triangle involving the crack tip change 40% (35%) in the experimental observation (simulation)
The molecular dynamics simulation of ion-induced ripple growth
International Nuclear Information System (INIS)
Suele, P.; Heinig, K.-H.
2009-01-01
The wavelength-dependence of ion-sputtering induced growth of repetitive nanostructures, such as ripples has been studied by molecular dynamics (MD) simulations in Si. The early stage of the ion erosion driven development of ripples has been simulated on prepatterned Si stripes with a wavy surface. The time evolution of the height function and amplitude of the sinusoidal surface profile has been followed by simulated ion-sputtering. According to Bradley-Harper (BH) theory, we expect correlation between the wavelength of ripples and the stability of them. However, we find that in the small ripple wavelength (λ) regime BH theory fails to reproduce the results obtained by molecular dynamics. We find that at short wavelengths (λ 35 nm is stabilized in accordance with the available experimental results. According to the simulations, few hundreds of ion impacts in λ long and few nanometers wide Si ripples are sufficient for reaching saturation in surface growth for for λ>35 nm ripples. In another words, ripples in the long wavelength limit seems to be stable against ion-sputtering. A qualitative comparison of our simulation results with recent experimental data on nanopatterning under irradiation is attempted.
International Nuclear Information System (INIS)
Candel, A.; Kabel, A.; Ko, K.; Lee, L.; Li, Z.; Limborg, C.; Ng, C.; Prudencio, E.; Schussman, G.; Uplenchwar, R.
2007-01-01
Over the past years, SLAC's Advanced Computations Department (ACD) has developed the parallel finite element (FE) particle-in-cell code Pic3P (Pic2P) for simulations of beam-cavity interactions dominated by space-charge effects. As opposed to standard space-charge dominated beam transport codes, which are based on the electrostatic approximation, Pic3P (Pic2P) includes space-charge, retardation and boundary effects as it self-consistently solves the complete set of Maxwell-Lorentz equations using higher-order FE methods on conformal meshes. Use of efficient, large-scale parallel processing allows for the modeling of photoinjectors with unprecedented accuracy, aiding the design and operation of the next-generation of accelerator facilities. Applications to the Linac Coherent Light Source (LCLS) RF gun are presented
A Computer Simulation of the System-Wide Effects of Parallel-Offset Route Maneuvers
Lauderdale, Todd A.; Santiago, Confesor; Pankok, Carl
2010-01-01
Most aircraft managed by air-traffic controllers in the National Airspace System are capable of flying parallel-offset routes. This paper presents the results of two related studies on the effects of increased use of offset routes as a conflict resolution maneuver. The first study analyzes offset routes in the context of all standard resolution types which air-traffic controllers currently use. This study shows that by utilizing parallel-offset route maneuvers, significant system-wide savings in delay due to conflict resolution of up to 30% are possible. It also shows that most offset resolutions replace horizontal-vectoring resolutions. The second study builds on the results of the first and directly compares offset resolutions and standard horizontal-vectoring maneuvers to determine that in-trail conflicts are often more efficiently resolved by offset maneuvers.
PARALLEL ALGORITHM FOR THREE-DIMENSIONAL STOKES FLOW SIMULATION USING BOUNDARY ELEMENT METHOD
Directory of Open Access Journals (Sweden)
D. G. Pribytok
2016-01-01
Full Text Available Parallel computing technique for modeling three-dimensional viscous flow (Stokes flow using direct boundary element method is presented. The problem is solved in three phases: sampling and construction of system of linear algebraic equations (SLAE, its decision and finding the velocity of liquid at predetermined points. For construction of the system and finding the velocity, the parallel algorithms using graphics CUDA cards programming technology have been developed and implemented. To solve the system of linear algebraic equations the implemented software libraries are used. A comparison of time consumption for three main algorithms on the example of calculation of viscous fluid motion in three-dimensional cavity is performed.
Mathematical modeling and numerical simulation of Czochralski Crystal Growth
Energy Technology Data Exchange (ETDEWEB)
Jaervinen, J.; Nieminen, R. [Center for Scientific Computing, Espoo (Finland)
1996-12-31
A detailed mathematical model and numerical simulation tools based on the SUPG Finite Element Method for the Czochralski crystal growth has been developed. In this presentation the mathematical modeling and numerical simulation of the melt flow and the temperature distribution in a rotationally symmetric crystal growth environment is investigated. The temperature distribution and the position of the free boundary between the solid and liquid phases are solved by using the Enthalpy method. Heat inside of the Czochralski furnace is transferred by radiation, conduction and convection. The melt flow is governed by the incompressible Navier-Stokes equations coupled with the enthalpy equation. The melt flow is numerically demonstrated and the temperature distribution in the whole Czochralski furnace. (author)
Mathematical modeling and numerical simulation of Czochralski Crystal Growth
Energy Technology Data Exchange (ETDEWEB)
Jaervinen, J; Nieminen, R [Center for Scientific Computing, Espoo (Finland)
1997-12-31
A detailed mathematical model and numerical simulation tools based on the SUPG Finite Element Method for the Czochralski crystal growth has been developed. In this presentation the mathematical modeling and numerical simulation of the melt flow and the temperature distribution in a rotationally symmetric crystal growth environment is investigated. The temperature distribution and the position of the free boundary between the solid and liquid phases are solved by using the Enthalpy method. Heat inside of the Czochralski furnace is transferred by radiation, conduction and convection. The melt flow is governed by the incompressible Navier-Stokes equations coupled with the enthalpy equation. The melt flow is numerically demonstrated and the temperature distribution in the whole Czochralski furnace. (author)
A PARALLEL MONTE CARLO CODE FOR SIMULATING COLLISIONAL N-BODY SYSTEMS
International Nuclear Information System (INIS)
Pattabiraman, Bharath; Umbreit, Stefan; Liao, Wei-keng; Choudhary, Alok; Kalogera, Vassiliki; Memik, Gokhan; Rasio, Frederic A.
2013-01-01
We present a new parallel code for computing the dynamical evolution of collisional N-body systems with up to N ∼ 10 7 particles. Our code is based on the Hénon Monte Carlo method for solving the Fokker-Planck equation, and makes assumptions of spherical symmetry and dynamical equilibrium. The principal algorithmic developments involve optimizing data structures and the introduction of a parallel random number generation scheme as well as a parallel sorting algorithm required to find nearest neighbors for interactions and to compute the gravitational potential. The new algorithms we introduce along with our choice of decomposition scheme minimize communication costs and ensure optimal distribution of data and workload among the processing units. Our implementation uses the Message Passing Interface library for communication, which makes it portable to many different supercomputing architectures. We validate the code by calculating the evolution of clusters with initial Plummer distribution functions up to core collapse with the number of stars, N, spanning three orders of magnitude from 10 5 to 10 7 . We find that our results are in good agreement with self-similar core-collapse solutions, and the core-collapse times generally agree with expectations from the literature. Also, we observe good total energy conservation, within ∼ 5 , 128 for N = 10 6 and 256 for N = 10 7 . The runtime reaches saturation with the addition of processors beyond these limits, which is a characteristic of the parallel sorting algorithm. The resulting maximum speedups we achieve are approximately 60×, 100×, and 220×, respectively.
Simulating the initial growth of a deposit from colloidal suspensions
International Nuclear Information System (INIS)
Oliveira, T J; Aarão Reis, F D A
2014-01-01
We study the short time properties of a two-dimensional film growth model in which incident particles execute advective-diffusive motion with a vertical step followed by D horizontal steps. The model represents some features of the deposition of anisotropic colloidal particles of the experiment of Yunker et al (2013 Phys. Rev. Lett. 110 035501), in which wandering particles are attracted to particle-rich regions in the deposit. Height profiles changing from rough to columnar structure are observed as D increases from 0 (ballistic deposition) to 8, with striking similarity to the experimental ones. The effective growth exponents match the experimental estimates and the scaling of those exponents on D shows a remarkable effect of the range of the particle-deposit interaction. The nearly ellipsoidal shape of colloidal particles is represented for the calculation of roughness exponents in conditions that parallel the experimental ones, giving a range of estimates that also includes the experimental values. The effective dynamic exponents calculated from the autocorrelation function are shown to be suitable to decide between a true dynamic scaling or transient behavior, particularly because the latter leads to deviations in an exponent relation. These results are consistent with arguments on short time unstable (columnar) growth of Nicoli et al (2013 Phys. Rev. Lett. 111 209601), indicating that critical quenched KPZ dynamics does not explain that colloidal particle deposition problem. (paper)
Visualizing Network Traffic to Understand the Performance of Massively Parallel Simulations
Landge, A. G.
2012-12-01
The performance of massively parallel applications is often heavily impacted by the cost of communication among compute nodes. However, determining how to best use the network is a formidable task, made challenging by the ever increasing size and complexity of modern supercomputers. This paper applies visualization techniques to aid parallel application developers in understanding the network activity by enabling a detailed exploration of the flow of packets through the hardware interconnect. In order to visualize this large and complex data, we employ two linked views of the hardware network. The first is a 2D view, that represents the network structure as one of several simplified planar projections. This view is designed to allow a user to easily identify trends and patterns in the network traffic. The second is a 3D view that augments the 2D view by preserving the physical network topology and providing a context that is familiar to the application developers. Using the massively parallel multi-physics code pF3D as a case study, we demonstrate that our tool provides valuable insight that we use to explain and optimize pF3D-s performance on an IBM Blue Gene/P system. © 1995-2012 IEEE.
Characterization of Minnesota lunar simulant for plant growth
Oglesby, James P.; Lindsay, Willard L.; Sadeh, Willy Z.
1993-01-01
Processing of lunar regolith into a plant growth medium is crucial in the development of a regenerative life support system for a lunar base. Plants, which are the core of such a system, produce food and oxygen for humans and, at the same time, consume carbon dioxide. Because of the scarcity of lunar regolith, simulants must be used to infer its properties and to develop procedures for weathering and chemical analyses. The Minnesota Lunar Simulant (MLS) has been identified to date as the best available simulant for lunar regolith. Results of the dissolution studies reveal that appropriately fertilized MLS can be a suitable medium for plant growth. The techniques used in conducting these studies can be extended to investigate the suitability of actual lunar regolith as a plant growth medium. Dissolution experiments were conducted using the MLS to determine its nutritional and toxicity characteristics for plant growth and to develop weathering and chemical analysis techniques. Two weathering regimes, one with water and one with dilute organic acids simulating the root rhizosphere microenvironment, were investigated. Elemental concentrations were measured using inductively-coupled-plasma (ICP) emission spectrometry and ion chromatography (IC). The geochemical speciation model, MINTEQA2, was used to determine the major solution species and the minerals controlling them. Acidification was found to be a useful method for increasing cation concentrations to meaningful levels. Initial results indicate that MLS weathers to give neutral to slightly basic solutions which contain acceptable amounts of the essential elements required for plant nutrition (i.e., potassium, calcium, magnesium, sulfur, zinc, sodium, silicon, manganese, copper, chlorine, boron, molybdenum, and cobalt). Elements that need to be supplemented include carbon, nitrogen, and perhaps phosphorus and iron. Trace metals in solution were present at nontoxic levels.
Fast Simulation of Large-Scale Floods Based on GPU Parallel Computing
Qiang Liu; Yi Qin; Guodong Li
2018-01-01
Computing speed is a significant issue of large-scale flood simulations for real-time response to disaster prevention and mitigation. Even today, most of the large-scale flood simulations are generally run on supercomputers due to the massive amounts of data and computations necessary. In this work, a two-dimensional shallow water model based on an unstructured Godunov-type finite volume scheme was proposed for flood simulation. To realize a fast simulation of large-scale floods on a personal...
Knepper, Andreas; Heiser, Michael; Glauche, Florian; Neubauer, Peter
2014-12-01
The enormous variation possibilities of bioprocesses challenge process development to fix a commercial process with respect to costs and time. Although some cultivation systems and some devices for unit operations combine the latest technology on miniaturization, parallelization, and sensing, the degree of automation in upstream and downstream bioprocess development is still limited to single steps. We aim to face this challenge by an interdisciplinary approach to significantly shorten development times and costs. As a first step, we scaled down analytical assays to the microliter scale and created automated procedures for starting the cultivation and monitoring the optical density (OD), pH, concentrations of glucose and acetate in the culture medium, and product formation in fed-batch cultures in the 96-well format. Then, the separate measurements of pH, OD, and concentrations of acetate and glucose were combined to one method. This method enables automated process monitoring at dedicated intervals (e.g., also during the night). By this approach, we managed to increase the information content of cultivations in 96-microwell plates, thus turning them into a suitable tool for high-throughput bioprocess development. Here, we present the flowcharts as well as cultivation data of our automation approach. © 2014 Society for Laboratory Automation and Screening.
Molecular dynamics simulation of gold cluster growth during sputter deposition
Energy Technology Data Exchange (ETDEWEB)
Abraham, J. W., E-mail: abraham@theo-physik.uni-kiel.de; Bonitz, M., E-mail: bonitz@theo-physik.uni-kiel.de [Institut für Theoretische Physik und Astrophysik, Christian-Albrechts-Universität zu Kiel, Leibnizstraße 15, D-24098 Kiel (Germany); Strunskus, T.; Faupel, F. [Institut für Materialwissenschaft, Lehrstuhl für Materialverbunde, Christian-Albrechts-Universität zu Kiel, Kaiserstraße 2, D-24143 Kiel (Germany)
2016-05-14
We present a molecular dynamics simulation scheme that we apply to study the time evolution of the self-organized growth process of metal cluster assemblies formed by sputter-deposited gold atoms on a planar surface. The simulation model incorporates the characteristics of the plasma-assisted deposition process and allows for an investigation over a wide range of deposition parameters. It is used to obtain data for the cluster properties which can directly be compared with recently published experimental data for gold on polystyrene [M. Schwartzkopf et al., ACS Appl. Mater. Interfaces 7, 13547 (2015)]. While good agreement is found between the two, the simulations additionally provide valuable time-dependent real-space data of the surface morphology, some of whose details are hidden in the reciprocal-space scattering images that were used for the experimental analysis.
Daniel A. Yaussy
2000-01-01
Two individual-tree growth simulators are used to predict the growth and mortality on a 30-year-old forest site and an 80-year-old forest site in eastern Kentucky. The empirical growth and yield model (NE-TWIGS) was developed to simulate short-term (
Averkin, Sergey N.; Gatsonis, Nikolaos A.
2018-06-01
An unstructured electrostatic Particle-In-Cell (EUPIC) method is developed on arbitrary tetrahedral grids for simulation of plasmas bounded by arbitrary geometries. The electric potential in EUPIC is obtained on cell vertices from a finite volume Multi-Point Flux Approximation of Gauss' law using the indirect dual cell with Dirichlet, Neumann and external circuit boundary conditions. The resulting matrix equation for the nodal potential is solved with a restarted generalized minimal residual method (GMRES) and an ILU(0) preconditioner algorithm, parallelized using a combination of node coloring and level scheduling approaches. The electric field on vertices is obtained using the gradient theorem applied to the indirect dual cell. The algorithms for injection, particle loading, particle motion, and particle tracking are parallelized for unstructured tetrahedral grids. The algorithms for the potential solver, electric field evaluation, loading, scatter-gather algorithms are verified using analytic solutions for test cases subject to Laplace and Poisson equations. Grid sensitivity analysis examines the L2 and L∞ norms of the relative error in potential, field, and charge density as a function of edge-averaged and volume-averaged cell size. Analysis shows second order of convergence for the potential and first order of convergence for the electric field and charge density. Temporal sensitivity analysis is performed and the momentum and energy conservation properties of the particle integrators in EUPIC are examined. The effects of cell size and timestep on heating, slowing-down and the deflection times are quantified. The heating, slowing-down and the deflection times are found to be almost linearly dependent on number of particles per cell. EUPIC simulations of current collection by cylindrical Langmuir probes in collisionless plasmas show good comparison with previous experimentally validated numerical results. These simulations were also used in a parallelization
Energy Technology Data Exchange (ETDEWEB)
Milind Deo; Chung-Kan Huang; Huabing Wang
2008-08-31
volume of injection at lower rates. However, if oil production can be continued at high water cuts, the discounted cumulative production usually favors higher production rates. The workflow developed during the project was also used to perform multiphase simulations in heterogeneous, fracture-matrix systems. Compositional and thermal-compositional simulators were developed for fractured reservoirs using the generalized framework. The thermal-compositional simulator was based on a novel 'equation-alignment' approach that helped choose the correct variables to solve depending on the number of phases present and the prescribed component partitioning. The simulators were used in steamflooding and in insitu combustion applications. The framework was constructed to be inherently parallel. The partitioning routines employed in the framework allowed generalized partitioning on highly complex fractured reservoirs and in instances when wells (incorporated in these models as line sources) were divided between two or more processors.
Prudencio, Ernesto E.
2012-01-01
QUESO is a collection of statistical algorithms and programming constructs supporting research into the uncertainty quantification (UQ) of models and their predictions. It has been designed with three objectives: it should (a) be sufficiently abstract in order to handle a large spectrum of models, (b) be algorithmically extensible, allowing an easy insertion of new and improved algorithms, and (c) take advantage of parallel computing, in order to handle realistic models. Such objectives demand a combination of an object-oriented design with robust software engineering practices. QUESO is written in C++, uses MPI, and leverages libraries already available to the scientific community. We describe some UQ concepts, present QUESO, and list planned enhancements.
Methodes spectrales paralleles et applications aux simulations de couches de melange compressibles
Male , Jean-Michel; Fezoui , Loula ,
1993-01-01
La resolution des equations de Navier-Stokes en methodes spectrales pour des ecoulements compressibles peut etre assez gourmande en temps de calcul. On etudie donc ici la parallelisation d'un tel algorithme et son implantation sur une machine massivement parallele, la connection-machine CM-2. La methode spectrale s'adapte bien aux exigences du parallelisme massif, mais l'un des outils de base de cette methode, la transformee de Fourier rapide (lorsqu'elle doit etre appliquee sur les deux dime...
Campbell, Timothy; Kalia, Rajiv K.; Nakano, Aiichiro; Vashishta, Priya; Ogata, Shuji; Rodgers, Stephen
1999-06-01
Oxidation of aluminum nanoclusters is investigated with a parallel molecular-dynamics approach based on dynamic charge transfer among atoms. Structural and dynamic correlations reveal that significant charge transfer gives rise to large negative pressure in the oxide which dominates the positive pressure due to steric forces. As a result, aluminum moves outward and oxygen moves towards the interior of the cluster with the aluminum diffusivity 60% higher than that of oxygen. A stable 40 Å thick amorphous oxide is formed; this is in excellent agreement with experiments.
Marx, Alain; Lütjens, Hinrich
2017-03-01
A hybrid MPI/OpenMP parallel version of the XTOR-2F code [Lütjens and Luciani, J. Comput. Phys. 229 (2010) 8130] solving the two-fluid MHD equations in full tokamak geometry by means of an iterative Newton-Krylov matrix-free method has been developed. The present work shows that the code has been parallelized significantly despite the numerical profile of the problem solved by XTOR-2F, i.e. a discretization with pseudo-spectral representations in all angular directions, the stiffness of the two-fluid stability problem in tokamaks, and the use of a direct LU decomposition to invert the physical pre-conditioner at every Krylov iteration of the solver. The execution time of the parallelized version is an order of magnitude smaller than the sequential one for low resolution cases, with an increasing speedup when the discretization mesh is refined. Moreover, it allows to perform simulations with higher resolutions, previously forbidden because of memory limitations.
High Resolution N-Body Simulations of Terrestrial Planet Growth
Clark Wallace, Spencer; Quinn, Thomas R.
2018-04-01
We investigate planetesimal accretion with a direct N-body simulation of an annulus at 1 AU around a 1 M_sun star. The planetesimal ring, which initially contains N = 106 bodies is evolved through the runaway growth stage into the phase of oligarchic growth. We find that the mass distribution of planetesimals develops a bump around 1022 g shortly after the oligarchs form. This feature is absent in previous lower resolution studies. We find that this bump marks a boundary between growth modes. Below the bump mass, planetesimals are packed tightly enough together to populate first order mean motion resonances with the oligarchs. These resonances act to heat the tightly packed, low mass planetesimals, inhibiting their growth. We examine the eccentricity evolution of a dynamically hot planetary embryo embedded in an annulus of planetesimals and find that dynamical friction acts more strongly on the embryo when the planetesimals are finely resolved. This effect disappears when the annulus is made narrow enough to exclude most of the mean motion resonances. Additionally, we find that the 1022 g bump is significantly less prominent when we follow planetesimal growth with a skinny annulus.This feature, which is reminiscent of the power law break seen in the size distribution of asteroid belt objects may be an important clue for constraining the initial size of planetesimals in planet formation models.
Numerical simulations of material mismatch and ductile crack growth
Energy Technology Data Exchange (ETDEWEB)
Oestby, Erling
2002-07-01
Both the global geometry and inhomogeneities in material properties will influence the fracture behaviour of structures in presence of cracks. In this thesis numerical simulations have been used to investigate how some aspects of both these issues affect the conditions at the crack-tip. The thesis is organised in an introduction chapter, summarising the major findings and conclusions, a review chapter, presenting the main aspects of the developments in the field of fracture mechanics, and three research papers. Paper I considers the effect of mismatch in hardening exponent on the local near-tip stress field for stationary interface cracks in bi-materials under small scale yielding conditions. It is demonstrated that the stress level in the weaker material increases compared to what is found in the homogeneous material for the same globally applied load level, with the effect being of increasing importance as the crack-tip is approached. Although a coupling between the radial and angular dependence of the stress fields exists, the evolving stress field can still be normalised with the applied J. The effect on the increase in stress level can closely be characterised by the difference in hardening exponent, {delta}n, termed the hardening mismatch, and is more or less independent of the absolute level of hardening in the two materials. Paper II and Ill deal with the effects of geometry, specimen size, hardening level and yield stress mismatch in relation to ductile crack growth. The ductile crack growth is simulated through use of the Gurson model. In Paper H the effect of specimen size on the crack growth resistance is investigated for deep cracked bend and shallow cracked tensile specimens. At small amounts of crack growth the effect of specimen size on the crack growth resistance is small, but a more significant effect is found for larger amounts of crack growth. The crack growth resistance decreases in smaller specimens loaded in tension, whereas the opposite is
Simulation of radioactive waste transmutation on the t.node parallel computer
International Nuclear Information System (INIS)
Bacha, F.; Maillard, J.; Silva, J.
1995-01-01
Before any experiment on reactor driven by an accelerator, computer simulation supplies tools for optimization. Some of the key parameters are neutron production on a heavy target and neutronic distribution flux in the core. During two code benchmarks organized by the NEA-OECD, simulations of energetic incident proton collisions on a thin lead target for the first one, on a thick lead target for the second one, are described. One validation of the numeric codes is based on these results. A preliminary design of a burning waste system using benchmark result analysis and fission focused simulations is proposed
Simulation of radioactive waste transmutation on the T. Node parallel computer
International Nuclear Information System (INIS)
Bacha, F.; Maillard, J.; Silva, J.
1995-01-01
Before any experiment on reactor driven by an accelerator, computer simulation supplies tools for optimization. Some of the key parameters are neutron production on a heavy target and neutronic distribution flux in the core. During two code benchmarks organized by the NEA-OECD, simulations of energetic incident proton collisions on a thin lead target for the first one, on a thick lead target for the second one, are described. One validation of our numeric codes is based on these results. A preliminary design of a burning waste system using benchmark result analysis and fission focused simulations is proposed
Simulation of radioactive waste transmutation on the t.node parallel computer
Energy Technology Data Exchange (ETDEWEB)
Bacha, F.; Maillard, J.; Silva, J. [LPC College de France, Paris (France)
1995-10-01
Before any experiment on reactor driven by an accelerator, computer simulation supplies tools for optimization. Some of the key parameters are neutron production on a heavy target and neutronic distribution flux in the core. During two code benchmarks organized by the NEA-OECD, simulations of energetic incident proton collisions on a thin lead target for the first one, on a thick lead target for the second one, are described. One validation of the numeric codes is based on these results. A preliminary design of a burning waste system using benchmark result analysis and fission focused simulations is proposed.
Simulation of incompressible flows with heat and mass transfer using parallel finite element method
Directory of Open Access Journals (Sweden)
Jalal Abedi
2003-02-01
Full Text Available The stabilized finite element formulations based on the SUPG (Stream-line-Upwind/Petrov-Galerkin and PSPG (Pressure-Stabilization/Petrov-Galerkin methods are developed and applied to solve buoyancy-driven incompressible flows with heat and mass transfer. The SUPG stabilization term allows us to solve flow problems at high speeds (advection dominant flows and the PSPG term eliminates instabilities associated with the use of equal order interpolation functions for both pressure and velocity. The finite element formulations are implemented in parallel using MPI. In parallel computations, the finite element mesh is partitioned into contiguous subdomains using METIS, which are then assigned to individual processors. To ensure a balanced load, the number of elements assigned to each processor is approximately equal. To solve nonlinear systems in large-scale applications, we developed a matrix-free GMRES iterative solver. Here we totally eliminate a need to form any matrices, even at the element levels. To measure the accuracy of the method, we solve 2D and 3D example of natural convection flows at moderate to high Rayleigh numbers.
Comparative Simulation Study of Production Scheduling in the Hybrid and the Parallel Flow
Directory of Open Access Journals (Sweden)
Varela Maria L.R.
2017-06-01
Full Text Available Scheduling is one of the most important decisions in production control. An approach is proposed for supporting users to solve scheduling problems, by choosing the combination of physical manufacturing system configuration and the material handling system settings. The approach considers two alternative manufacturing scheduling configurations in a two stage product oriented manufacturing system, exploring the hybrid flow shop (HFS and the parallel flow shop (PFS environments. For illustrating the application of the proposed approach an industrial case from the automotive components industry is studied. The main aim of this research to compare results of study of production scheduling in the hybrid and the parallel flow, taking into account the makespan minimization criterion. Thus the HFS and the PFS performance is compared and analyzed, mainly in terms of the makespan, as the transportation times vary. The study shows that the performance HFS is clearly better when the work stations’ processing times are unbalanced, either in nature or as a consequence of the addition of transport times just to one of the work station processing time but loses advantage, becoming worse than the performance of the PFS configuration when the work stations’ processing times are balanced, either in nature or as a consequence of the addition of transport times added on the work stations’ processing times. This means that physical layout configurations along with the way transport time are including the work stations’ processing times should be carefully taken into consideration due to its influence on the performance reached by both HFS and PFS configurations.
A Framework for Parallel Numerical Simulations on Multi-Scale Geometries
Varduhn, Vasco
2012-06-01
In this paper, an approach on performing numerical multi-scale simulations on fine detailed geometries is presented. In particular, the focus lies on the generation of sufficient fine mesh representations, whereas a resolution of dozens of millions of voxels is inevitable in order to sufficiently represent the geometry. Furthermore, the propagation of boundary conditions is investigated by using simulation results on the coarser simulation scale as input boundary conditions on the next finer scale. Finally, the applicability of our approach is shown on a two-phase simulation for flooding scenarios in urban structures running from a city wide scale to a fine detailed in-door scale on feature rich building geometries. © 2012 IEEE.
National Aeronautics and Space Administration — In this proposal, researchers from Cascade Technologies and Stanford University outline a multi-year research plan to develop large-eddy simulation (LES) tools to...
STOCHSIMGPU: parallel stochastic simulation for the Systems Biology Toolbox 2 for MATLAB
Klingbeil, G.; Erban, R.; Giles, M.; Maini, P. K.
2011-01-01
Motivation: The importance of stochasticity in biological systems is becoming increasingly recognized and the computational cost of biologically realistic stochastic simulations urgently requires development of efficient software. We present a new
Migrating to a real-time distributed parallel simulator architecture- An update
CSIR Research Space (South Africa)
Duvenhage, B
2007-09-01
Full Text Available A legacy non-distributed logical time simulator was previously migrated to a distributed architecture to parallelise execution. The existing Discrete Time System Specification (DTSS) modelling formalism was retained to simplify the reuse of existing...
National Research Council Canada - National Science Library
Kalia, Rajiv
1997-01-01
Large-scale molecular-dynamics (MD) simulations were performed to investigate: (1) sintering process, structural correlations, and mechanical behavior including dynamic fracture in microporous and nanophase Si3N4...
An efficient parallel stochastic simulation method for analysis of nonviral gene delivery systems
Kuwahara, Hiroyuki; Gao, Xin
2011-01-01
DNA molecules into the nucleus of target cells. Several computational and experimental studies have shown that the design process of synthetic gene transfer vectors can be greatly enhanced by computational modeling and simulation. This paper proposes a
National Research Council Canada - National Science Library
Nakano, Aiichiro
2000-01-01
...; and dielectric properties of high permittivity TiO2 for ultrathin gate dielectric films. Scalable software infrastructure has been developed to enable multiscale simulations of nanoelectronic devices using MD and quantum mechanical...
Stupl, Jan; Faber, Nicolas; Foster, Cyrus; Yang, Fan Yang; Nelson, Bron; Aziz, Jonathan; Nuttall, Andrew; Henze, Chris; Levit, Creon
2014-01-01
This paper provides an updated efficiency analysis of the LightForce space debris collision avoidance scheme. LightForce aims to prevent collisions on warning by utilizing photon pressure from ground based, commercial off the shelf lasers. Past research has shown that a few ground-based systems consisting of 10 kilowatt class lasers directed by 1.5 meter telescopes with adaptive optics could lower the expected number of collisions in Low Earth Orbit (LEO) by an order of magnitude. Our simulation approach utilizes the entire Two Line Element (TLE) catalogue in LEO for a given day as initial input. Least-squares fitting of a TLE time series is used for an improved orbit estimate. We then calculate the probability of collision for all LEO objects in the catalogue for a time step of the simulation. The conjunctions that exceed a threshold probability of collision are then engaged by a simulated network of laser ground stations. After those engagements, the perturbed orbits are used to re-assess the probability of collision and evaluate the efficiency of the system. This paper describes new simulations with three updated aspects: 1) By utilizing a highly parallel simulation approach employing hundreds of processors, we have extended our analysis to a much broader dataset. The simulation time is extended to one year. 2) We analyze not only the efficiency of LightForce on conjunctions that naturally occur, but also take into account conjunctions caused by orbit perturbations due to LightForce engagements. 3) We use a new simulation approach that is regularly updating the LightForce engagement strategy, as it would be during actual operations. In this paper we present our simulation approach to parallelize the efficiency analysis, its computational performance and the resulting expected efficiency of the LightForce collision avoidance system. Results indicate that utilizing a network of four LightForce stations with 20 kilowatt lasers, 85% of all conjunctions with a
International Nuclear Information System (INIS)
Tavares, R S; Tsuzuki, M S G; Martins, T C
2012-01-01
Electrical Impedance Tomography (EIT) is an imaging technique that attempts to reconstruct the conductivity distribution inside an object from electrical currents and potentials applied and measured at its surface. The EIT reconstruction problem is approached as an optimization problem, where the difference between the simulated and measured distributions must be minimized. This optimization problem can be solved using Simulated Annealing (SA), but at a high computational cost. To reduce the computational load, it is possible to use an incomplete evaluation of the objective function. This algorithm showed to present an outside-in behavior, determining the impedance of the external elements first, similar to a layer striping algorithm. A new outside-in heuristic to make use of this property is proposed. It also presents the impact of using GPU for parallelizing matrix-vector multiplication and triangular solvers. Results with experimental data are presented. The outside-in heuristic showed to be faster when compared to the conventional SA algorithm.
Phase space simulation of collisionless stellar systems on the massively parallel processor
International Nuclear Information System (INIS)
White, R.L.
1987-01-01
A numerical technique for solving the collisionless Boltzmann equation describing the time evolution of a self gravitating fluid in phase space was implemented on the Massively Parallel Processor (MPP). The code performs calculations for a two dimensional phase space grid (with one space and one velocity dimension). Some results from calculations are presented. The execution speed of the code is comparable to the speed of a single processor of a Cray-XMP. Advantages and disadvantages of the MPP architecture for this type of problem are discussed. The nearest neighbor connectivity of the MPP array does not pose a significant obstacle. Future MPP-like machines should have much more local memory and easier access to staging memory and disks in order to be effective for this type of problem
Design and simulation of parallel and distributed architectures for images processing
International Nuclear Information System (INIS)
Pirson, Alain
1990-01-01
The exploitation of visual information requires special computers. The diversity of operations and the Computing power involved bring about structures founded on the concepts of concurrency and distributed processing. This work identifies a vision computer with an association of dedicated intelligent entities, exchanging messages according to the model of parallelism introduced by the language Occam. It puts forward an architecture of the 'enriched processor network' type. It consists of a classical multiprocessor structure where each node is provided with specific devices. These devices perform processing tasks as well as inter-nodes dialogues. Such an architecture benefits from the homogeneity of multiprocessor networks and the power of dedicated resources. Its implementation corresponds to that of a distributed structure, tasks being allocated to each Computing element. This approach culminates in an original architecture called ATILA. This modular structure is based on a transputer network supplied with vision dedicated co-processors and powerful communication devices. (author) [fr
van Walsum, P. E. V.; Supit, I.
2012-06-01
Hydrologic climate change modelling is hampered by climate-dependent model parameterizations. To reduce this dependency, we extended the regional hydrologic modelling framework SIMGRO to host a two-way coupling between the soil moisture model MetaSWAP and the crop growth simulation model WOFOST, accounting for ecohydrologic feedbacks in terms of radiation fraction that reaches the soil, crop coefficient, interception fraction of rainfall, interception storage capacity, and root zone depth. Except for the last, these feedbacks are dependent on the leaf area index (LAI). The influence of regional groundwater on crop growth is included via a coupling to MODFLOW. Two versions of the MetaSWAP-WOFOST coupling were set up: one with exogenous vegetation parameters, the "static" model, and one with endogenous crop growth simulation, the "dynamic" model. Parameterization of the static and dynamic models ensured that for the current climate the simulated long-term averages of actual evapotranspiration are the same for both models. Simulations were made for two climate scenarios and two crops: grass and potato. In the dynamic model, higher temperatures in a warm year under the current climate resulted in accelerated crop development, and in the case of potato a shorter growing season, thus partly avoiding the late summer heat. The static model has a higher potential transpiration; depending on the available soil moisture, this translates to a higher actual transpiration. This difference between static and dynamic models is enlarged by climate change in combination with higher CO2 concentrations. Including the dynamic crop simulation gives for potato (and other annual arable land crops) systematically higher effects on the predicted recharge change due to climate change. Crop yields from soils with poor water retention capacities strongly depend on capillary rise if moisture supply from other sources is limited. Thus, including a crop simulation model in an integrated
Stochastic simulation of grain growth during continuous casting
Energy Technology Data Exchange (ETDEWEB)
Ramirez, A. [Department of Aerounatical Engineering, S.E.P.I., E.S.I.M.E., IPN, Instituto Politecnico Nacional (Unidad Profesional Ticoman), Av. Ticoman 600, Col. Ticoman, C.P.07340 (Mexico)]. E-mail: adalop123@mailbanamex.com; Carrillo, F. [Department of Processing Materials, CICATA-IPN Unidad Altamira Tamps (Mexico); Gonzalez, J.L. [Department of Metallurgy and Materials Engineering, E.S.I.Q.I.E.-IPN (Mexico); Lopez, S. [Department of Molecular Engineering of I.M.P., AP 14-805 (Mexico)
2006-04-15
The evolution of microstructure is a very important topic in material science engineering because the solidification conditions of steel billets during continuous casting process affect directly the properties of the final products. In this paper a mathematical model is described in order to simulate the dendritic growth using data of real casting operations; here a combination of deterministic and stochastic methods was used as a function of the solidification time of every node in order to create a reconstruction about the morphology of cast structures.
Stochastic simulation of grain growth during continuous casting
International Nuclear Information System (INIS)
Ramirez, A.; Carrillo, F.; Gonzalez, J.L.; Lopez, S.
2006-01-01
The evolution of microstructure is a very important topic in material science engineering because the solidification conditions of steel billets during continuous casting process affect directly the properties of the final products. In this paper a mathematical model is described in order to simulate the dendritic growth using data of real casting operations; here a combination of deterministic and stochastic methods was used as a function of the solidification time of every node in order to create a reconstruction about the morphology of cast structures
Kobayashi, Chigusa; Jung, Jaewoon; Matsunaga, Yasuhiro; Mori, Takaharu; Ando, Tadashi; Tamura, Koichi; Kamiya, Motoshi; Sugita, Yuji
2017-09-30
GENeralized-Ensemble SImulation System (GENESIS) is a software package for molecular dynamics (MD) simulation of biological systems. It is designed to extend limitations in system size and accessible time scale by adopting highly parallelized schemes and enhanced conformational sampling algorithms. In this new version, GENESIS 1.1, new functions and advanced algorithms have been added. The all-atom and coarse-grained potential energy functions used in AMBER and GROMACS packages now become available in addition to CHARMM energy functions. The performance of MD simulations has been greatly improved by further optimization, multiple time-step integration, and hybrid (CPU + GPU) computing. The string method and replica-exchange umbrella sampling with flexible collective variable choice are used for finding the minimum free-energy pathway and obtaining free-energy profiles for conformational changes of a macromolecule. These new features increase the usefulness and power of GENESIS for modeling and simulation in biological research. © 2017 Wiley Periodicals, Inc. © 2017 Wiley Periodicals, Inc.
Prinz, Jan-Hendrik; Chodera, John D; Pande, Vijay S; Swope, William C; Smith, Jeremy C; Noé, Frank
2011-06-28
Parallel tempering (PT) molecular dynamics simulations have been extensively investigated as a means of efficient sampling of the configurations of biomolecular systems. Recent work has demonstrated how the short physical trajectories generated in PT simulations of biomolecules can be used to construct the Markov models describing biomolecular dynamics at each simulated temperature. While this approach describes the temperature-dependent kinetics, it does not make optimal use of all available PT data, instead estimating the rates at a given temperature using only data from that temperature. This can be problematic, as some relevant transitions or states may not be sufficiently sampled at the temperature of interest, but might be readily sampled at nearby temperatures. Further, the comparison of temperature-dependent properties can suffer from the false assumption that data collected from different temperatures are uncorrelated. We propose here a strategy in which, by a simple modification of the PT protocol, the harvested trajectories can be reweighted, permitting data from all temperatures to contribute to the estimated kinetic model. The method reduces the statistical uncertainty in the kinetic model relative to the single temperature approach and provides estimates of transition probabilities even for transitions not observed at the temperature of interest. Further, the method allows the kinetics to be estimated at temperatures other than those at which simulations were run. We illustrate this method by applying it to the generation of a Markov model of the conformational dynamics of the solvated terminally blocked alanine peptide.
Energy Technology Data Exchange (ETDEWEB)
Küchlin, Stephan, E-mail: kuechlin@ifd.mavt.ethz.ch; Jenny, Patrick
2017-01-01
A major challenge for the conventional Direct Simulation Monte Carlo (DSMC) technique lies in the fact that its computational cost becomes prohibitive in the near continuum regime, where the Knudsen number (Kn)—characterizing the degree of rarefaction—becomes small. In contrast, the Fokker–Planck (FP) based particle Monte Carlo scheme allows for computationally efficient simulations of rarefied gas flows in the low and intermediate Kn regime. The Fokker–Planck collision operator—instead of performing binary collisions employed by the DSMC method—integrates continuous stochastic processes for the phase space evolution in time. This allows for time step and grid cell sizes larger than the respective collisional scales required by DSMC. Dynamically switching between the FP and the DSMC collision operators in each computational cell is the basis of the combined FP-DSMC method, which has been proven successful in simulating flows covering the whole Kn range. Until recently, this algorithm had only been applied to two-dimensional test cases. In this contribution, we present the first general purpose implementation of the combined FP-DSMC method. Utilizing both shared- and distributed-memory parallelization, this implementation provides the capability for simulations involving many particles and complex geometries by exploiting state of the art computer cluster technologies.
DEFF Research Database (Denmark)
Lima, Francisco Kleber A.; Branco, Carlos Gustavo C.; Guerrero, Josep M.
2013-01-01
is difficult due to its physical location. This paper has considered that the UPS systems there were no comunication between their controls. A detailed mathematical model about the explored system is shown in that work and simulation results are presented in order to prove the theory presented....
International Nuclear Information System (INIS)
Asaei, Behzad; Habibidoost, Mahdi
2013-01-01
Highlights: • Design, simulation, and manufacturing of a hybrid electric motorcycle are explained. • The electric machine is mounted in the front wheel hub of an ordinary motorcycle. • Two different energy control strategy are implemented. • The simulation results show that the motorcycle performance is improved. • The acceleration is improved and the fuel consumption and pollutions are decreased. - Abstract: In this paper, design, simulation, and conversion of a normal motorcycle to a Hybrid Electric Motorcycle (HEM) is described. At first, a simple model designed and simulated using ADVISOR2002. Then, the controller schematic and its optimized control strategy are described. A 125 cc ICE motorcycle is selected and converted into a HEM. A brushless DC (BLDC) motor assembled in the front wheel and a normal internal combustion engine in the rear wheel propel the motorcycle. The nominal powers are 6.6 kW and 500 W for the ICE and BLDC respectively. The original motorcycle has a Continuous Variable Transmission (CVT) that is the best choice for a HEM power transmission because it can operate in the automatic handling mode and has high efficiency. Moreover, by using the CVT, the ICE can be started while motorcycle is running. Finally, three operating modes of HEM, two implemented energy control strategies, and HEM engine control system by servomotors, and LCD display are explained
Modeling a Million-Node Slim Fly Network Using Parallel Discrete-Event Simulation
Energy Technology Data Exchange (ETDEWEB)
Wolfe, Noah; Carothers, Christopher; Mubarak, Misbah; Ross, Robert; Carns, Philip
2016-05-15
As supercomputers close in on exascale performance, the increased number of processors and processing power translates to an increased demand on the underlying network interconnect. The Slim Fly network topology, a new lowdiameter and low-latency interconnection network, is gaining interest as one possible solution for next-generation supercomputing interconnect systems. In this paper, we present a high-fidelity Slim Fly it-level model leveraging the Rensselaer Optimistic Simulation System (ROSS) and Co-Design of Exascale Storage (CODES) frameworks. We validate our Slim Fly model with the Kathareios et al. Slim Fly model results provided at moderately sized network scales. We further scale the model size up to n unprecedented 1 million compute nodes; and through visualization of network simulation metrics such as link bandwidth, packet latency, and port occupancy, we get an insight into the network behavior at the million-node scale. We also show linear strong scaling of the Slim Fly model on an Intel cluster achieving a peak event rate of 36 million events per second using 128 MPI tasks to process 7 billion events. Detailed analysis of the underlying discrete-event simulation performance shows that a million-node Slim Fly model simulation can execute in 198 seconds on the Intel cluster.
Simulating chemical systems : MPI and GPU parallelization of novel SD algorithms
Goga, N.
Molecular dynamics is used for simulating chemical systems with the goal of studying a large range of phenomena starting from cell structures to the design of new materials, drugs, etc. A very important component of molecular dynamics is the use of well-suited atomistic and molecular modelling of
A high precision extrapolation method in multiphase-field model for simulating dendrite growth
Yang, Cong; Xu, Qingyan; Liu, Baicheng
2018-05-01
The phase-field method coupling with thermodynamic data has become a trend for predicting the microstructure formation in technical alloys. Nevertheless, the frequent access to thermodynamic database and calculation of local equilibrium conditions can be time intensive. The extrapolation methods, which are derived based on Taylor expansion, can provide approximation results with a high computational efficiency, and have been proven successful in applications. This paper presents a high precision second order extrapolation method for calculating the driving force in phase transformation. To obtain the phase compositions, different methods in solving the quasi-equilibrium condition are tested, and the M-slope approach is chosen for its best accuracy. The developed second order extrapolation method along with the M-slope approach and the first order extrapolation method are applied to simulate dendrite growth in a Ni-Al-Cr ternary alloy. The results of the extrapolation methods are compared with the exact solution with respect to the composition profile and dendrite tip position, which demonstrate the high precision and efficiency of the newly developed algorithm. To accelerate the phase-field and extrapolation computation, the graphic processing unit (GPU) based parallel computing scheme is developed. The application to large-scale simulation of multi-dendrite growth in an isothermal cross-section has demonstrated the ability of the developed GPU-accelerated second order extrapolation approach for multiphase-field model.
van Walsum, P. E. V.
2011-11-01
Climate change impact modelling of hydrologic responses is hampered by climate-dependent model parameterizations. Reducing this dependency was one of the goals of extending the regional hydrologic modelling system SIMGRO with a two-way coupling to the crop growth simulation model WOFOST. The coupling includes feedbacks to the hydrologic model in terms of the root zone depth, soil cover, leaf area index, interception storage capacity, crop height and crop factor. For investigating whether such feedbacks lead to significantly different simulation results, two versions of the model coupling were set up for a test region: one with exogenous vegetation parameters, the "static" model, and one with endogenous simulation of the crop growth, the "dynamic" model WOFOST. The used parameterization methods of the static/dynamic vegetation models ensure that for the current climate the simulated long-term average of the actual evapotranspiration is the same for both models. Simulations were made for two climate scenarios. Owing to the higher temperatures in combination with a higher CO2-concentration of the atmosphere, a forward time shift of the crop development is simulated in the dynamic model; the used arable land crop, potatoes, also shows a shortening of the growing season. For this crop, a significant reduction of the potential transpiration is simulated compared to the static model, in the example by 15% in a warm, dry year. In consequence, the simulated crop water stress (the unit minus the relative transpiration) is lower when the dynamic model is used; also the simulated increase of crop water stress due to climate change is lower; in the example, the simulated increase is 15 percentage points less (of 55) than when a static model is used. The static/dynamic models also simulate different absolute values of the transpiration. The difference is most pronounced for potatoes at locations with ample moisture supply; this supply can either come from storage release of a
Zhu, Shijin; Li, Li; Liu, Jiabin; Wang, Hongtao; Wang, Tian; Zhang, Yuxin; Zhang, Lili; Ruoff, Rodney S; Dong, Fan
2018-02-27
Two-dimensional birnessite has attracted attention for electrochemical energy storage because of the presence of redox active Mn 4+ /Mn 3+ ions and spacious interlayer channels available for ions diffusion. However, current strategies are largely limited to enhancing the electrical conductivity of birnessite. One key limitation affecting the electrochemical properties of birnessite is the poor utilization of the MnO 6 unit. Here, we assemble β-MnO 2 /birnessite core-shell structure that exploits the exposed crystal face of β-MnO 2 as the core and ultrathin birnessite sheets that have the structure advantage to enhance the utilization efficiency of the Mn from the bulk. Our birnessite that has sheets parallel to each other is found to have unusual crystal structure with interlayer spacing, Mn(III)/Mn(IV) ratio and the content of the balancing cations differing from that of the common birnessite. The substrate directed growth mechanism is carefully investigated. The as-prepared core-shell nanostructures enhance the exposed surface area of birnessite and achieve high electrochemical performances (for example, 657 F g -1 in 1 M Na 2 SO 4 electrolyte based on the weight of parallel birnessite) and excellent rate capability over a potential window of up to 1.2 V. This strategy opens avenues for fundamental studies of birnessite and its properties and suggests the possibility of its use in energy storage and other applications. The potential window of an asymmetric supercapacitor that was assembled with this material can be enlarged to 2.2 V (in aqueous electrolyte) with a good cycling ability.
Using GPU parallelization to perform realistic simulations of the LPCTrap experiments
Energy Technology Data Exchange (ETDEWEB)
Fabian, X., E-mail: fabian@lpccaen.in2p3.fr; Mauger, F.; Quéméner, G. [Université de Caen, CNRS/IN2P3, LPC-Caen, ENSICAEN (France); Velten, Ph. [KU Leuven, Instituut voor Kern- en Straglingsfysica (Belgium); Ban, G.; Couratin, C. [Université de Caen, CNRS/IN2P3, LPC-Caen, ENSICAEN (France); Delahaye, P. [GANIL, CEA/DSM-CNRS/IN2P3 (France); Durand, D. [Université de Caen, CNRS/IN2P3, LPC-Caen, ENSICAEN (France); Fabre, B. [CELIA, Université de Bordeaux, CEA/CNRS (France); Finlay, P. [KU Leuven, Instituut voor Kern- en Straglingsfysica (Belgium); Fléchard, X.; Liénard, E. [Université de Caen, CNRS/IN2P3, LPC-Caen, ENSICAEN (France); Méry, A. [Université de Caen, CIMAP, CEA/CNRS/ENSICAEN (France); Naviliat-Cuncic, O. [NSCL and Department of Physics and Astronomy, MSU (United States); Pons, B. [CELIA, Université de Bordeaux, CEA/CNRS (France); Porobic, T.; Severijns, N. [KU Leuven, Instituut voor Kern- en Straglingsfysica (Belgium); Thomas, J. C. [GANIL, CEA/DSM-CNRS/IN2P3 (France)
2015-11-15
The LPCTrap setup is a sensitive tool to measure the β − ν angular correlation coefficient, a{sub βν}, which can yield the mixing ratio ρ of a β decay transition. The latter enables the extraction of the Cabibbo-Kobayashi-Maskawa (CKM) matrix element V{sub ud}. In such a measurement, the most relevant observable is the energy distribution of the recoiling daughter nuclei following the nuclear β decay, which is obtained using a time-of-flight technique. In order to maximize the precision, one can reduce the systematic errors through a thorough simulation of the whole set-up, especially with a correct model of the trapped ion cloud. This paper presents such a simulation package and focuses on the ion cloud features; particular attention is therefore paid to realistic descriptions of trapping field dynamics, buffer gas cooling and the N-body space charge effects.
Using GPU parallelization to perform realistic simulations of the LPCTrap experiments
International Nuclear Information System (INIS)
Fabian, X.; Mauger, F.; Quéméner, G.; Velten, Ph.; Ban, G.; Couratin, C.; Delahaye, P.; Durand, D.; Fabre, B.; Finlay, P.; Fléchard, X.; Liénard, E.; Méry, A.; Naviliat-Cuncic, O.; Pons, B.; Porobic, T.; Severijns, N.; Thomas, J. C.
2015-01-01
The LPCTrap setup is a sensitive tool to measure the β − ν angular correlation coefficient, a βν , which can yield the mixing ratio ρ of a β decay transition. The latter enables the extraction of the Cabibbo-Kobayashi-Maskawa (CKM) matrix element V ud . In such a measurement, the most relevant observable is the energy distribution of the recoiling daughter nuclei following the nuclear β decay, which is obtained using a time-of-flight technique. In order to maximize the precision, one can reduce the systematic errors through a thorough simulation of the whole set-up, especially with a correct model of the trapped ion cloud. This paper presents such a simulation package and focuses on the ion cloud features; particular attention is therefore paid to realistic descriptions of trapping field dynamics, buffer gas cooling and the N-body space charge effects
Miloichikova, I. A.; Bespalov, V. I.; Krasnykh, A. A.; Stuchebrov, S. G.; Cherepennikov, Yu. M.; Dusaev, R. R.
2018-04-01
Simulation by the Monte Carlo method is widely used to calculate the character of ionizing radiation interaction with substance. A wide variety of programs based on the given method allows users to choose the most suitable package for solving computational problems. In turn, it is important to know exactly restrictions of numerical systems to avoid gross errors. Results of estimation of the feasibility of application of the program PCLab (Computer Laboratory, version 9.9) for numerical simulation of the electron energy distribution absorbed in beryllium, aluminum, gold, and water for industrial, research, and clinical beams are presented. The data obtained using programs ITS and Geant4 being the most popular software packages for solving the given problems and the program PCLab are presented in the graphic form. A comparison and an analysis of the results obtained demonstrate the feasibility of application of the program PCLab for simulation of the absorbed energy distribution and dose of electrons in various materials for energies in the range 1-20 MeV.
International Nuclear Information System (INIS)
Byers, J.A.; Williams, T.J.; Cohen, B.I.; Dimits, A.M.
1994-01-01
One of the programs of the Magnetic fusion Energy (MFE) Theory and computations Program is studying the anomalous transport of thermal energy across the field lines in the core of a tokamak. We use the method of gyrokinetic particle-in-cell simulation in this study. For this LDRD project we employed massively parallel processing, new algorithms, and new algorithms, and new formal techniques to improve this research. Specifically, we sought to take steps toward: researching experimentally-relevant parameters in our simulations, learning parallel computing to have as a resource for our group, and achieving a 100 x speedup over our starting-point Cray2 simulation code's performance
Simulation of instability growth on ICF capsule ablators
Niasse, Nicolas; Chittenden, Jeremy
2014-10-01
It is believed that the ablation-front instabilities are mainly responsible for the hot-spot mix that impacts the performance of ICF capsules. Understanding the formation of these instabilities is therefore a first step towards a better control of the implosion dynamics and the optimization of the fusion yield. Using the Chimera code currently in development at Imperial College, we have performed several spherical wedge simulations of the low and high adiabat ablation phase pre-imposing different single-mode 2D and 3D perturbations on the capsule surface. Synthetic Sc, Fe and V X-ray backlighter images are generated by the Spk code and used to measure the growth of modes 30-160 with initial amplitude <= 3.4 μm PTV. The growth of imposed 2D perturbations is assessed for both low-foot and high-foot radiation pulse shapes on the National Ignition Facility. Results showing the merger of spike and bubble structures in multi-mode perturbations in both 2D and 3D simulations are explored and preliminary assessments of the difference between 2D and 3D non-linear behaviour is discussed. The sensitivity of shock timing to NLTE changes in opacity is also assessed.
Final Report for 'ParSEC-Parallel Simulation of Electron Cooling''
International Nuclear Information System (INIS)
David L Bruhwiler
2005-01-01
The Department of Energy has plans, during the next two or three years, to design an electron cooling section for the collider ring at RHIC (Relativistic Heavy Ion Collider) [1]. Located at Brookhaven National Laboratory (BNL), RHIC is the premier nuclear physics facility. The new cooling section would be part of a proposed luminosity upgrade [2] for RHIC. This electron cooling section will be different from previous electron cooling facilities in three fundamental ways. First, the electron energy will be 50 MeV, as opposed to 100's of keV (or 4 MeV for the electron cooling system now operating at Fermilab [3]). Second, both the electron beam and the ion beam will be bunched, rather than being essentially continuous. Third, the cooling will take place in a collider rather than in a storage ring. Analytical work, in combination with the use and further development of the semi-analytical codes BETACOOL [4,5] and SimCool [6,7] are being pursued at BNL [8] and at other laboratories around the world. However, there is a growing consensus in the field that high-fidelity 3-D particle simulations are required to fully understand the critical cooling physics issues in this new regime. Simulations of the friction coefficient, using the VORPAL code [9], for single gold ions passing once through the interaction region, have been compared with theoretical calculations [10,11], and the results have been presented in conference proceedings papers [8,12,13,14] and presentations [15,16,17]. Charged particles are advanced using a fourth-order Hermite predictor corrector algorithm [18]. The fields in the beam frame are obtained from direct calculation of Coulomb's law, which is more efficient than multipole-type algorithms for less than ∼ 10 6 particles. Because the interaction time is so short, it is necessary to suppress the diffusive aspect of the ion dynamics through the careful use of positrons in the simulations, and to run 100's of simulations with the same physical
Simulation of fatigue crack growth under large scale yielding conditions
Schweizer, Christoph; Seifert, Thomas; Riedel, Hermann
2010-07-01
A simple mechanism based model for fatigue crack growth assumes a linear correlation between the cyclic crack-tip opening displacement (ΔCTOD) and the crack growth increment (da/dN). The objective of this work is to compare analytical estimates of ΔCTOD with results of numerical calculations under large scale yielding conditions and to verify the physical basis of the model by comparing the predicted and the measured evolution of the crack length in a 10%-chromium-steel. The material is described by a rate independent cyclic plasticity model with power-law hardening and Masing behavior. During the tension-going part of the cycle, nodes at the crack-tip are released such that the crack growth increment corresponds approximately to the crack-tip opening. The finite element analysis performed in ABAQUS is continued for so many cycles until a stabilized value of ΔCTOD is reached. The analytical model contains an interpolation formula for the J-integral, which is generalized to account for cyclic loading and crack closure. Both simulated and estimated ΔCTOD are reasonably consistent. The predicted crack length evolution is found to be in good agreement with the behavior of microcracks observed in a 10%-chromium steel.
International Nuclear Information System (INIS)
Leggett, C; Jackson, K; Tatarkhanov, M; Yao, Y; Binet, S; Levinthal, D
2011-01-01
Thermal limitations have forced CPU manufacturers to shift from simply increasing clock speeds to improve processor performance, to producing chip designs with multi- and many-core architectures. Further the cores themselves can run multiple threads as a zero overhead context switch allowing low level resource sharing (Intel Hyperthreading). To maximize bandwidth and minimize memory latency, memory access has become non uniform (NUMA). As manufacturers add more cores to each chip, a careful understanding of the underlying architecture is required in order to fully utilize the available resources. We present AthenaMP and the Atlas event loop manager, the driver of the simulation and reconstruction engines, which have been rewritten to make use of multiple cores, by means of event based parallelism, and final stage I/O synchronization. However, initial studies on 8 andl6 core Intel architectures have shown marked non-linearities as parallel process counts increase, with as much as 30% reductions in event throughput in some scenarios. Since the Intel Nehalem architecture (both Gainestown and Westmere) will be the most common choice for the next round of hardware procurements, an understanding of these scaling issues is essential. Using hardware based event counters and Intel's Performance Tuning Utility, we have studied the performance bottlenecks at the hardware level, and discovered optimization schemes to maximize processor throughput. We have also produced optimization mechanisms, common to all large experiments, that address the extreme nature of today's HEP code, which due to it's size, places huge burdens on the memory infrastructure of today's processors.
Núñez, M; Robie, T; Vlachos, D G
2017-10-28
Kinetic Monte Carlo (KMC) simulation provides insights into catalytic reactions unobtainable with either experiments or mean-field microkinetic models. Sensitivity analysis of KMC models assesses the robustness of the predictions to parametric perturbations and identifies rate determining steps in a chemical reaction network. Stiffness in the chemical reaction network, a ubiquitous feature, demands lengthy run times for KMC models and renders efficient sensitivity analysis based on the likelihood ratio method unusable. We address the challenge of efficiently conducting KMC simulations and performing accurate sensitivity analysis in systems with unknown time scales by employing two acceleration techniques: rate constant rescaling and parallel processing. We develop statistical criteria that ensure sufficient sampling of non-equilibrium steady state conditions. Our approach provides the twofold benefit of accelerating the simulation itself and enabling likelihood ratio sensitivity analysis, which provides further speedup relative to finite difference sensitivity analysis. As a result, the likelihood ratio method can be applied to real chemistry. We apply our methodology to the water-gas shift reaction on Pt(111).
Simulation of wind wave growth with reference source functions
Badulin, Sergei I.; Zakharov, Vladimir E.; Pushkarev, Andrei N.
2013-04-01
We present results of extensive simulations of wind wave growth with the so-called reference source function in the right-hand side of the Hasselmann equation written as follows First, we use Webb's algorithm [8] for calculating the exact nonlinear transfer function Snl. Second, we consider a family of wind input functions in accordance with recent consideration [9] ( )s S = ?(k)N , ?(k) = ? ? ?- f (?). in k 0 ?0 in (2) Function fin(?) describes dependence on angle ?. Parameters in (2) are tunable and determine magnitude (parameters ?0, ?0) and wave growth rate s [9]. Exponent s plays a key role in this study being responsible for reference scenarios of wave growth: s = 4-3 gives linear growth of wave momentum, s = 2 - linear growth of wave energy and s = 8-3 - constant rate of wave action growth. Note, the values are close to ones of conventional parameterizations of wave growth rates (e.g. s = 1 for [7] and s = 2 for [5]). Dissipation function Sdiss is chosen as one providing the Phillips spectrum E(?) ~ ?5 at high frequency range [3] (parameter ?diss fixes a dissipation scale of wind waves) Sdiss = Cdissμ4w?N (k)θ(? - ?diss) (3) Here frequency-dependent wave steepness μ2w = E(?,?)?5-g2 makes this function to be heavily nonlinear and provides a remarkable property of stationary solutions at high frequencies: the dissipation coefficient Cdiss should keep certain value to provide the observed power-law tails close to the Phillips spectrum E(?) ~ ?-5. Our recent estimates [3] give Cdiss ? 2.0. The Hasselmann equation (1) with the new functions Sin, Sdiss (2,3) has a family of self-similar solutions of the same form as previously studied models [1,3,9] and proposes a solid basis for further theoretical and numerical study of wave evolution under action of all the physical mechanisms: wind input, wave dissipation and nonlinear transfer. Simulations of duration- and fetch-limited wind wave growth have been carried out within the above model setup to check its
International Nuclear Information System (INIS)
Kropaczek, David J.
2008-01-01
A new concept for performing nuclear fuel optimization over a multi-cycle planning horizon is presented. The method provides for an implicit coupling between traditionally separate in-core and out-of-core fuel management decisions including determination of: fresh fuel batch size, enrichment and bundle design; exposed fuel reuse; and core loading pattern. The algorithm uses simulated annealing optimization, modified with a technique called mixing of states that allows for deployment in a scalable parallel environment. Analysis of algorithm performance for a transition cycle design (i.e. a PWR 6 month cycle length extension) demonstrates the feasibility of the approach as a production tool for fuel procurement and multi-cycle core design. (authors)
Edgerton, Jason D; Keough, Matthew T; Roberts, Lance W
2018-02-21
This study examines whether there are multiple joint trajectories of depression and problem gambling co-development in a sample of emerging adults. Data were from the Manitoba Longitudinal Study of Young Adults (n = 679), which was collected in 4 waves across 5 years (age 18-20 at baseline). Parallel process latent class growth modeling was used to identified 5 joint trajectory classes: low decreasing gambling, low increasing depression (81%); low stable gambling, moderate decreasing depression (9%); low stable gambling, high decreasing depression (5%); low stable gambling, moderate stable depression (3%); moderate stable problem gambling, no depression (2%). There was no evidence of reciprocal growth in problem gambling and depression in any of the joint classes. Multinomial logistic regression analyses of baseline risk and protective factors found that only neuroticism, escape-avoidance coping, and perceived level of family social support were significant predictors of joint trajectory class membership. Consistent with the pathways model framework, we observed that individuals in the problem gambling only class were more likely using gambling as a stable way to cope with negative emotions. Similarly, high levels of neuroticism and low levels of family support were associated with increased odds of being in a class with moderate to high levels of depressive symptoms (but low gambling problems). The results suggest that interventions for problem gambling and/or depression need to focus on promoting more adaptive coping skills among more "at-risk" young adults, and such interventions should be tailored in relation to specific subtypes of comorbid mental illness.
Directory of Open Access Journals (Sweden)
Mårten Sundberg
Full Text Available Today immunoassays are widely used in veterinary medicine, but lack of species specific assays often necessitates the use of assays developed for human applications. Mass spectrometry (MS is an attractive alternative due to high specificity and versatility, allowing for species-independent analysis. Targeted MS-based quantification methods are valuable complements to large scale shotgun analysis. A method referred to as parallel reaction monitoring (PRM, implemented on Orbitrap MS, has lately been presented as an excellent alternative to more traditional selected reaction monitoring/multiple reaction monitoring (SRM/MRM methods. The insulin-like growth factor (IGF-system is not well described in the cat but there are indications of important differences between cats and humans. In feline medicine IGF-I is mainly analyzed for diagnosis of growth hormone disorders but also for research, while the other proteins in the IGF-system are not routinely analyzed within clinical practice. Here, a PRM method for quantification of IGF-I, IGF-II, IGF binding protein (BP -3 and IGFBP-5 in feline serum is presented. Selective quantification was supported by the use of a newly launched internal standard named QPrEST™. Homology searches demonstrated the possibility to use this standard of human origin for quantification of the targeted feline proteins. Excellent quantitative sensitivity at the attomol/μL (pM level and selectivity were obtained. As the presented approach is very generic we show that high resolution mass spectrometry in combination with PRM and QPrEST™ internal standards is a versatile tool for protein quantitation across multispecies.
Final Report: Simulation Tools for Parallel Microwave Particle in Cell Modeling
International Nuclear Information System (INIS)
Stoltz, Peter H.
2008-01-01
Transport of high-power rf fields and the subsequent deposition of rf power into plasma is an important component of developing tokamak fusion energy. Two limitations on rf heating are: (i) breakdown of the metallic structures used to deliver rf power to the plasma, and (ii) a detailed understanding of how rf power couples into a plasma. Computer simulation is a main tool for helping solve both of these problems, but one of the premier tools, VORPAL, is traditionally too difficult to use for non-experts. During this Phase II project, we developed the VorpalView user interface tool. This tool allows Department of Energy researchers a fully graphical interface for analyzing VORPAL output to more easily model rf power delivery and deposition in plasmas.
Guan, W.; Cheng, X.; Huang, J.; Huber, G.; Li, W.; McCammon, J. A.; Zhang, B.
2018-06-01
RPYFMM is a software package for the efficient evaluation of the potential field governed by the Rotne-Prager-Yamakawa (RPY) tensor interactions in biomolecular hydrodynamics simulations. In our algorithm, the RPY tensor is decomposed as a linear combination of four Laplace interactions, each of which is evaluated using the adaptive fast multipole method (FMM) (Greengard and Rokhlin, 1997) where the exponential expansions are applied to diagonalize the multipole-to-local translation operators. RPYFMM offers a unified execution on both shared and distributed memory computers by leveraging the DASHMM library (DeBuhr et al., 2016, 2018). Preliminary numerical results show that the interactions for a molecular system of 15 million particles (beads) can be computed within one second on a Cray XC30 cluster using 12,288 cores, while achieving approximately 54% strong-scaling efficiency.
Directory of Open Access Journals (Sweden)
N. Shivasankaran
2013-04-01
Full Text Available Scheduling problems are generally treated as NP andash; complete combinatorial optimization problems which is a multi-objective and multi constraint one. Repair shop Job sequencing and operator allocation is one such NP andash; complete problem. For such problems, an efficient technique is required that explores a wide range of solution space. This paper deals with Simulated Annealing Technique, a Meta - heuristic to solve the complex Car Sequencing and Operator Allocation problem in a car repair shop. The algorithm is tested with several constraint settings and the solution quality exceeds the results reported in the literature with high convergence speed and accuracy. This algorithm could be considered as quite effective while other heuristic routine fails.
The shape of the invisible halo: N-body simulations on parallel supercomputers
Energy Technology Data Exchange (ETDEWEB)
Warren, M.S.; Zurek, W.H. (Los Alamos National Lab., NM (USA)); Quinn, P.J. (Australian National Univ., Canberra (Australia). Mount Stromlo and Siding Spring Observatories); Salmon, J.K. (California Inst. of Tech., Pasadena, CA (USA))
1990-01-01
We study the shapes of halos and the relationship to their angular momentum content by means of N-body (N {approximately} 10{sup 6}) simulations. Results indicate that in relaxed halos with no apparent substructure: (i) the shape and orientation of the isodensity contours tends to persist throughout the virialised portion of the halo; (ii) most ({approx}70%) of the halos are prolate; (iii) the approximate direction of the angular momentum vector tends to persist throughout the halo; (iv) for spherical shells centered on the core of the halo the magnitude of the specific angular momentum is approximately proportional to their radius; (v) the shortest axis of the ellipsoid which approximates the shape of the halo tends to align with the rotation axis of the halo. This tendency is strongest in the fastest rotating halos. 13 refs., 4 figs.
De Marco, Tommaso; Ries, Florian; Guermandi, Marco; Guerrieri, Roberto
2012-05-01
Electrical impedance tomography (EIT) is an imaging technology based on impedance measurements. To retrieve meaningful insights from these measurements, EIT relies on detailed knowledge of the underlying electrical properties of the body. This is obtained from numerical models of current flows therein. The nonhomogeneous and anisotropic electric properties of human tissues make accurate modeling and simulation very challenging, leading to a tradeoff between physical accuracy and technical feasibility, which at present severely limits the capabilities of EIT. This work presents a complete algorithmic flow for an accurate EIT modeling environment featuring high anatomical fidelity with a spatial resolution equal to that provided by an MRI and a novel realistic complete electrode model implementation. At the same time, we demonstrate that current graphics processing unit (GPU)-based platforms provide enough computational power that a domain discretized with five million voxels can be numerically modeled in about 30 s.
Simulation and theory of island growth on stepped substrates
International Nuclear Information System (INIS)
Pownall, C.D.
1999-10-01
The nucleation, growth and coalescence of islands on stepped substrates is investigated by Monte Carlo simulations and analytical theories. Substrate steps provide a preferential site for the nucleation of islands, making many of the important processes one-dimensional in nature, and are of potentially major importance in the development of low-dimensional structures as a means of growing highly ordered chains of 'quantum dots' or continuous 'quantum wires'. A model is developed in which island nucleation is entirely restricted to the step edge, islands grow in compact morphologies by monomer capture, and eventually coalesce with one another until a single continuous cluster of islands covers the entire step. A series of analytical theories is developed to describe the dynamics of the whole evolution. The initial nucleation and aggregation regimes are modeled using the traditional approach of rate equations, rooted in mean field theory, but incorporating corrections to account for correlations in the nucleation and capture processes. This approach is found to break down close to the point at which the island density saturates and a new approach is developed based upon geometric and probabilistic arguments to describe the saturation behaviour, including the characteristic dynamic scaling which is found to persist through the coalescence regime as well. A further new theory, incorporating arguments based on the geometry of Capture Zones, is presented which reproduces the dynamics of the coalescence regime. The, latter part of the. thesis considers the spatial properties of the system, in particular the spacing of the islands along the step. An expression is derived which describes the distribution of gap sizes, and this is solved using a recently-developed relaxation method. An important result is the discovery that larger critical island sizes tend to yield more evenly spaced arrays of islands. The extent of this effect is analysed by solving for critical island
International Nuclear Information System (INIS)
Wendel, D. E.; Olson, D. K.; Hesse, M.; Kuznetsova, M.; Adrian, M. L.; Aunai, N.; Karimabadi, H.; Daughton, W.
2013-01-01
We investigate the distribution of parallel electric fields and their relationship to the location and rate of magnetic reconnection in a large particle-in-cell simulation of 3D turbulent magnetic reconnection with open boundary conditions. The simulation's guide field geometry inhibits the formation of simple topological features such as null points. Therefore, we derive the location of potential changes in magnetic connectivity by finding the field lines that experience a large relative change between their endpoints, i.e., the quasi-separatrix layer. We find a good correspondence between the locus of changes in magnetic connectivity or the quasi-separatrix layer and the map of large gradients in the integrated parallel electric field (or quasi-potential). Furthermore, we investigate the distribution of the parallel electric field along the reconnecting field lines. We find the reconnection rate is controlled by only the low-amplitude, zeroth and first–order trends in the parallel electric field while the contribution from fluctuations of the parallel electric field, such as electron holes, is negligible. The results impact the determination of reconnection sites and reconnection rates in models and in situ spacecraft observations of 3D turbulent reconnection. It is difficult through direct observation to isolate the loci of the reconnection parallel electric field amidst the large amplitude fluctuations. However, we demonstrate that a positive slope of the running sum of the parallel electric field along the field line as a function of field line length indicates where reconnection is occurring along the field line
Directory of Open Access Journals (Sweden)
A. Bubeck
2017-11-01
Full Text Available The mechanical interaction of propagating normal faults is known to influence the linkage geometry of first-order faults, and the development of second-order faults and fractures, which transfer displacement within relay zones. Here we use natural examples of growth faults from two active volcanic rift zones (Koa`e, island of Hawai`i, and Krafla, northern Iceland to illustrate the importance of horizontal-plane extension (heave gradients, and associated vertical axis rotations, in evolving continental rift systems. Second-order extension and extensional-shear faults within the relay zones variably resolve components of regional extension, and components of extension and/or shortening parallel to the rift zone, to accommodate the inherently three-dimensional (3-D strains associated with relay zone development and rotation. Such a configuration involves volume increase, which is accommodated at the surface by open fractures; in the subsurface this may be accommodated by veins or dikes oriented obliquely and normal to the rift axis. To consider the scalability of the effects of relay zone rotations, we compare the geometry and kinematics of fault and fracture sets in the Koa`e and Krafla rift zones with data from exhumed contemporaneous fault and dike systems developed within a > 5×104 km2 relay system that developed during formation of the NE Atlantic margins. Based on the findings presented here we propose a new conceptual model for the evolution of segmented continental rift basins on the NE Atlantic margins.
Bisetti, Fabrizio
2014-07-14
Combustion of fossil fuels is likely to continue for the near future due to the growing trends in energy consumption worldwide. The increase in efficiency and the reduction of pollutant emissions from combustion devices are pivotal to achieving meaningful levels of carbon abatement as part of the ongoing climate change efforts. Computational fluid dynamics featuring adequate combustion models will play an increasingly important role in the design of more efficient and cleaner industrial burners, internal combustion engines, and combustors for stationary power generation and aircraft propulsion. Today, turbulent combustion modelling is hindered severely by the lack of data that are accurate and sufficiently complete to assess and remedy model deficiencies effectively. In particular, the formation of pollutants is a complex, nonlinear and multi-scale process characterized by the interaction of molecular and turbulent mixing with a multitude of chemical reactions with disparate time scales. The use of direct numerical simulation (DNS) featuring a state of the art description of the underlying chemistry and physical processes has contributed greatly to combustion model development in recent years. In this paper, the analysis of the intricate evolution of soot formation in turbulent flames demonstrates how DNS databases are used to illuminate relevant physico-chemical mechanisms and to identify modelling needs. © 2014 The Author(s) Published by the Royal Society.
Directory of Open Access Journals (Sweden)
M. A. Martin
2011-09-01
Full Text Available We present a dynamic equilibrium simulation of the ice sheet-shelf system on Antarctica with the Potsdam Parallel Ice Sheet Model (PISM-PIK. The simulation is initialized with present-day conditions for bed topography and ice thickness and then run to steady state with constant present-day surface mass balance. Surface temperature and sub-shelf basal melt distribution are parameterized. Grounding lines and calving fronts are free to evolve, and their modeled equilibrium state is compared to observational data. A physically-motivated calving law based on horizontal spreading rates allows for realistic calving fronts for various types of shelves. Steady-state dynamics including surface velocity and ice flux are analyzed for whole Antarctica and the Ronne-Filchner and Ross ice shelf areas in particular. The results show that the different flow regimes in sheet and shelves, and the transition zone between them, are captured reasonably well, supporting the approach of superposition of SIA and SSA for the representation of fast motion of grounded ice. This approach also leads to a natural emergence of sliding-dominated flow in stream-like features in this new 3-D marine ice sheet model.
Martin, M. A.; Winkelmann, R.; Haseloff, M.; Albrecht, T.; Bueler, E.; Khroulev, C.; Levermann, A.
2011-09-01
We present a dynamic equilibrium simulation of the ice sheet-shelf system on Antarctica with the Potsdam Parallel Ice Sheet Model (PISM-PIK). The simulation is initialized with present-day conditions for bed topography and ice thickness and then run to steady state with constant present-day surface mass balance. Surface temperature and sub-shelf basal melt distribution are parameterized. Grounding lines and calving fronts are free to evolve, and their modeled equilibrium state is compared to observational data. A physically-motivated calving law based on horizontal spreading rates allows for realistic calving fronts for various types of shelves. Steady-state dynamics including surface velocity and ice flux are analyzed for whole Antarctica and the Ronne-Filchner and Ross ice shelf areas in particular. The results show that the different flow regimes in sheet and shelves, and the transition zone between them, are captured reasonably well, supporting the approach of superposition of SIA and SSA for the representation of fast motion of grounded ice. This approach also leads to a natural emergence of sliding-dominated flow in stream-like features in this new 3-D marine ice sheet model.
Poulet, Thomas; Paesold, Martin; Veveakis, Manolis
2017-03-01
Faults play a major role in many economically and environmentally important geological systems, ranging from impermeable seals in petroleum reservoirs to fluid pathways in ore-forming hydrothermal systems. Their behavior is therefore widely studied and fault mechanics is particularly focused on the mechanisms explaining their transient evolution. Single faults can change in time from seals to open channels as they become seismically active and various models have recently been presented to explain the driving forces responsible for such transitions. A model of particular interest is the multi-physics oscillator of Alevizos et al. (J Geophys Res Solid Earth 119(6), 4558-4582, 2014) which extends the traditional rate and state friction approach to rate and temperature-dependent ductile rocks, and has been successfully applied to explain spatial features of exposed thrusts as well as temporal evolutions of current subduction zones. In this contribution we implement that model in REDBACK, a parallel open-source multi-physics simulator developed to solve such geological instabilities in three dimensions. The resolution of the underlying system of equations in a tightly coupled manner allows REDBACK to capture appropriately the various theoretical regimes of the system, including the periodic and non-periodic instabilities. REDBACK can then be used to simulate the drastic permeability evolution in time of such systems, where nominally impermeable faults can sporadically become fluid pathways, with permeability increases of several orders of magnitude.
Perini, Ana P.; Neves, Lucio P.; Maia, Ana F.; Caldas, Linda V. E.
2013-12-01
In this work, a new extended-length parallel-plate ionization chamber was tested in the standard radiation qualities for computed tomography established according to the half-value layers defined at the IEC 61267 standard, at the Calibration Laboratory of the Instituto de Pesquisas Energéticas e Nucleares (IPEN). The experimental characterization was made following the IEC 61674 standard recommendations. The experimental results obtained with the ionization chamber studied in this work were compared to those obtained with a commercial pencil ionization chamber, showing a good agreement. With the use of the PENELOPE Monte Carlo code, simulations were undertaken to evaluate the influence of the cables, insulator, PMMA body, collecting electrode, guard ring, screws, as well as different materials and geometrical arrangements, on the energy deposited on the ionization chamber sensitive volume. The maximum influence observed was 13.3% for the collecting electrode, and regarding the use of different materials and design, the substitutions showed that the original project presented the most suitable configuration. The experimental and simulated results obtained in this work show that this ionization chamber has appropriate characteristics to be used at calibration laboratories, for dosimetry in standard computed tomography and diagnostic radiology quality beams.
Energy Technology Data Exchange (ETDEWEB)
Rodgers, A; Matzel, E; Pasyanos, M; Petersson, A; Sjogreen, B; Bono, C; Vorobiev, O; Antoun, T; Walter, W; Myers, S; Lomov, I
2008-07-07
The development of accurate numerical methods to simulate wave propagation in three-dimensional (3D) earth models and advances in computational power offer exciting possibilities for modeling the motions excited by underground nuclear explosions. This presentation will describe recent work to use new numerical techniques and parallel computing to model earthquakes and underground explosions to improve understanding of the wave excitation at the source and path-propagation effects. Firstly, we are using the spectral element method (SEM, SPECFEM3D code of Komatitsch and Tromp, 2002) to model earthquakes and explosions at regional distances using available 3D models. SPECFEM3D simulates anelastic wave propagation in fully 3D earth models in spherical geometry with the ability to account for free surface topography, anisotropy, ellipticity, rotation and gravity. Results show in many cases that 3D models are able to reproduce features of the observed seismograms that arise from path-propagation effects (e.g. enhanced surface wave dispersion, refraction, amplitude variations from focusing and defocusing, tangential component energy from isotropic sources). We are currently investigating the ability of different 3D models to predict path-specific seismograms as a function of frequency. A number of models developed using a variety of methodologies are available for testing. These include the WENA/Unified model of Eurasia (e.g. Pasyanos et al 2004), the global CUB 2.0 model (Shapiro and Ritzwoller, 2002), the partitioned waveform model for the Mediterranean (van der Lee et al., 2007) and stochastic models of the Yellow Sea Korean Peninsula region (Pasyanos et al., 2006). Secondly, we are extending our Cartesian anelastic finite difference code (WPP of Nilsson et al., 2007) to model the effects of free-surface topography. WPP models anelastic wave propagation in fully 3D earth models using mesh refinement to increase computational speed and improve memory efficiency. Thirdly
Simulation study on the growth of grains in dusty plasmas
International Nuclear Information System (INIS)
Sato, Tetsuya; Watanabe, Kunihiko
1997-01-01
A new particle simulation code is developed for studying the dynamics of the grains which are exposed to charging by the background plasma particles. Effects of regular attachment of electrons and ions, effects of secondary electron emission, and coagulation of grains are included in this code. Simulation results show that grains randomly change their charges from negative to positive, or from positive to negative in a 'flip-flop' fashion as a result of competition between the electron attachment and secondary electron emission. It is found that the flip-flop effect becomes remarkable when the radius of grains is of the order of 10 nm, because the attachment of a single electron to a grain is less effective on the surface potential for larger grains, while the average probability of electron attachment is smaller for smaller grains. Grains with opposite charges attract each other to coagulate, so that grains of size of 10 nm are likely to grow in size. The flip-flop effect is found to be essential to the growth of grains. (author)
Amir, Sahar Z.
2013-05-01
We introduce an efficient thermodynamically consistent technique to extrapolate and interpolate normalized Canonical NVT ensemble averages like pressure and energy for Lennard-Jones (L-J) fluids. Preliminary results show promising applicability in oil and gas modeling, where accurate determination of thermodynamic properties in reservoirs is challenging. The thermodynamic interpolation and thermodynamic extrapolation schemes predict ensemble averages at different thermodynamic conditions from expensively simulated data points. The methods reweight and reconstruct previously generated database values of Markov chains at neighboring temperature and density conditions. To investigate the efficiency of these methods, two databases corresponding to different combinations of normalized density and temperature are generated. One contains 175 Markov chains with 10,000,000 MC cycles each and the other contains 3000 Markov chains with 61,000,000 MC cycles each. For such massive database creation, two algorithms to parallelize the computations have been investigated. The accuracy of the thermodynamic extrapolation scheme is investigated with respect to classical interpolation and extrapolation. Finally, thermodynamic interpolation benefiting from four neighboring Markov chains points is implemented and compared with previous schemes. The thermodynamic interpolation scheme using knowledge from the four neighboring points proves to be more accurate than the thermodynamic extrapolation from the closest point only, while both thermodynamic extrapolation and thermodynamic interpolation are more accurate than the classical interpolation and extrapolation. The investigated extrapolation scheme has great potential in oil and gas reservoir modeling.That is, such a scheme has the potential to speed up the MCMC thermodynamic computation to be comparable with conventional Equation of State approaches in efficiency. In particular, this makes it applicable to large-scale optimization of L
Mamey, Mary Rose; Barbosa-Leiker, Celestina; McPherson, Sterling; Burns, G Leonard; Parks, Craig; Roll, John
2015-12-01
Researchers often want to examine 2 comorbid conditions simultaneously. One strategy to do so is through the use of parallel latent growth curve modeling (LGCM). This statistical technique allows for the simultaneous evaluation of 2 disorders to determine the explanations and predictors of change over time. Additionally, a piecewise model can help identify whether there are more than 2 growth processes within each disorder (e.g., during a clinical trial). A parallel piecewise LGCM was applied to self-reported attention-deficit/hyperactivity disorder (ADHD) and self-reported substance use symptoms in 303 adolescents enrolled in cognitive-behavioral therapy treatment for a substance use disorder and receiving either oral-methylphenidate or placebo for ADHD across 16 weeks. Assessing these 2 disorders concurrently allowed us to determine whether elevated levels of 1 disorder predicted elevated levels or increased risk of the other disorder. First, a piecewise growth model measured ADHD and substance use separately. Next, a parallel piecewise LGCM was used to estimate the regressions across disorders to determine whether higher scores at baseline of the disorders (i.e., ADHD or substance use disorder) predicted rates of change in the related disorder. Finally, treatment was added to the model to predict change. While the analyses revealed no significant relationships across disorders, this study explains and applies a parallel piecewise growth model to examine the developmental processes of comorbid conditions over the course of a clinical trial. Strengths of piecewise and parallel LGCMs for other addictions researchers interested in examining dual processes over time are discussed. (PsycINFO Database Record (c) 2015 APA, all rights reserved).
Simulation of Growth Trajectories of Childhood Obesity into Adulthood.
Ward, Zachary J; Long, Michael W; Resch, Stephen C; Giles, Catherine M; Cradock, Angie L; Gortmaker, Steven L
2017-11-30
Although the current obesity epidemic has been well documented in children and adults, less is known about long-term risks of adult obesity for a given child at his or her present age and weight. We developed a simulation model to estimate the risk of adult obesity at the age of 35 years for the current population of children in the United States. We pooled height and weight data from five nationally representative longitudinal studies totaling 176,720 observations from 41,567 children and adults. We simulated growth trajectories across the life course and adjusted for secular trends. We created 1000 virtual populations of 1 million children through the age of 19 years that were representative of the 2016 population of the United States and projected their trajectories in height and weight up to the age of 35 years. Severe obesity was defined as a body-mass index (BMI, the weight in kilograms divided by the square of the height in meters) of 35 or higher in adults and 120% or more of the 95th percentile in children. Given the current level of childhood obesity, the models predicted that a majority of today's children (57.3%; 95% uncertainly interval [UI], 55.2 to 60.0) will be obese at the age of 35 years, and roughly half of the projected prevalence will occur during childhood. Our simulations indicated that the relative risk of adult obesity increased with age and BMI, from 1.17 (95% UI, 1.09 to 1.29) for overweight 2-year-olds to 3.10 (95% UI, 2.43 to 3.65) for 19-year-olds with severe obesity. For children with severe obesity, the chance they will no longer be obese at the age of 35 years fell from 21.0% (95% UI, 7.3 to 47.3) at the age of 2 years to 6.1% (95% UI, 2.1 to 9.9) at the age of 19 years. On the basis of our simulation models, childhood obesity and overweight will continue to be a major health problem in the United States. Early development of obesity predicted obesity in adulthood, especially for children who were severely obese. (Funded by the JPB
Monte Carlo simulation of asymmetrical growth of cube-shaped nanoparticles
International Nuclear Information System (INIS)
Wang Yuanyuan; Xie Huaqing; Wu Zihua; Xing Jiaojiao
2016-01-01
We simulated the asymmetrical growth of cube-shaped nanoparticles by applying the Monte Carlo method. The influence of the specific mechanisms on the crystal growth of nanoparticles has been phenomenologically described by efficient growth possibilities along different directions (or crystal faces). The roles of the thermodynamic and kinetic factors have been evaluated in three phenomenological models. The simulation results would benefit the understanding about the cause and manner of the asymmetrical growth of nanoparticles. (paper)
Application of a Cycle Jump Technique for Acceleration of Fatigue Crack Growth Simulation
DEFF Research Database (Denmark)
Moslemian, Ramin; Berggreen, Christian; Karlsson, A.M.
2010-01-01
A method for accelerated simulation of fatigue crack growth in a bimaterial interface is proposed. To simulate fatigue crack growth in a bimaterial interface a routine is developed in the commercial finite element code ANSYS and a method to accelerate the simulation is implemented. The proposed m...... of the simulation show that with fair accuracy, using the cycle jump method, more than 70% reduction in computation time can be achieved....
International Nuclear Information System (INIS)
Da Silva, R S; De Carvalho, D K E; Antunes, A R E; Lyra, P R M; Willmersdorf, R B
2010-01-01
In this paper a finite volume method with a 'Modified Implicit Pressure, Explicit Saturation' (MIMPES) approach is used to model the 3-D incompressible and immiscible two-phase flow of water and oil in heterogeneous and anisotropic porous media. A vertex centered finite volume method with an edge-based data structure is adopted to discretize both the elliptic pressure and the hyperbolic saturation equations using parallel computers with distributed memory. Due to the explicit solution of the saturation equation in the IMPES method, severe time step restrictions are imposed on the simulation. In order to circumvent this problem, an edge-based implementation of the MIMPES method was used. In this method, the pressure equation is solved and the velocity field is computed much less frequently than the saturation field. Following the work of Hurtado, a mean relative variation of the velocity field throughout the simulation is used to automatically control the updating process, allowing for much larger time-steps in a very simple way. In order to run large scale problems, we have developed a parallel implementation using clusters of PC's. The simulator uses open source parallel libraries like FMDB, ParMetis and PETSc. Results of speed-up and efficiency are presented to validate the performance of the parallel simulator.
International Nuclear Information System (INIS)
Michael J. Bockelie
2002-01-01
This DOE SBIR Phase II final report summarizes research that has been performed to develop a parallel adaptive tool for modeling steady, two phase turbulent reacting flow. The target applications for the new tool are full scale, fossil-fuel fired boilers and furnaces such as those used in the electric utility industry, chemical process industry and mineral/metal process industry. The type of analyses to be performed on these systems are engineering calculations to evaluate the impact on overall furnace performance due to operational, process or equipment changes. To develop a Computational Fluid Dynamics (CFD) model of an industrial scale furnace requires a carefully designed grid that will capture all of the large and small scale features of the flowfield. Industrial systems are quite large, usually measured in tens of feet, but contain numerous burners, air injection ports, flames and localized behavior with dimensions that are measured in inches or fractions of inches. To create an accurate computational model of such systems requires capturing length scales within the flow field that span several orders of magnitude. In addition, to create an industrially useful model, the grid can not contain too many grid points - the model must be able to execute on an inexpensive desktop PC in a matter of days. An adaptive mesh provides a convenient means to create a grid that can capture both fine flow field detail within a very large domain with a ''reasonable'' number of grid points. However, the use of an adaptive mesh requires the development of a new flow solver. To create the new simulation tool, we have combined existing reacting CFD modeling software with new software based on emerging block structured Adaptive Mesh Refinement (AMR) technologies developed at Lawrence Berkeley National Laboratory (LBNL). Specifically, we combined: -physical models, modeling expertise, and software from existing combustion simulation codes used by Reaction Engineering International
International Nuclear Information System (INIS)
Mula-Hernandez, Olga
2014-01-01
In this thesis, we have first developed a time dependent 3D neutron transport solver on unstructured meshes with discontinuous Galerkin finite elements spatial discretization. The solver (called MINARET) represents in itself an important contribution in reactor physics thanks to the accuracy that it can provide in the knowledge of the state of the core during severe accidents. It will also play an important role on vessel fluence calculations. From a mathematical point of view, the most important contribution has consisted in the implementation of modern algorithms that are well adapted for modern parallel architectures and that significantly decrease the computing times. A special effort has been done in order to efficiently parallelize the time variable by the use of the parareal in time algorithm. For this, we have first analyzed the performances that the classical scheme of parareal can provide when applied to the resolution of the neutron transport equation in a reactor core. Then, with the purpose of improving these performances, a parareal scheme that takes more efficiently into account the presence of other iterative schemes in the resolution of each time step has been proposed. The main idea consists in limiting the number of internal iterations for each time step and to reach convergence across the parareal iterations. A second phase of our work has been motivated by the following question: given the high degree of accuracy that MINARET can provide in the modeling of the neutron population, could we somehow use it as a tool to monitor in real time the population of neutrons on the purpose of helping in the operation of the reactor? And, what is more, how to make such a tool be coherent in some sense with the measurements taken in situ? One of the main challenges of this problem is the real time aspect of the simulations. Indeed, despite all of our efforts to speed-up the calculations, the discretization methods used in MINARET do not provide simulations
Stacking fault growth of FCC crystal: The Monte-Carlo simulation approach
International Nuclear Information System (INIS)
Jian Jianmin; Ming Naiben
1988-03-01
The Monte-Carlo method has been used to simulate the growth of the FCC (111) crystal surface, on which is presented the outcrop of a stacking fault. The comparison of the growth rates has been made between the stacking fault containing surface and the perfect surface. The successive growth stages have been simulated. It is concluded that the outcrop of stacking fault on the crystal surface can act as a self-perpetuating step generating source. (author). 7 refs, 3 figs
Simulation of forest growth, applied to douglas fir stands in the Netherlands
Mohren, G.M.J.
1987-01-01
Forest growth in relation to weather and soils is studied using a physiological simulation model. Growth potential depends on physiological characteristics of the plant species in combination with ambient weather conditions (mainly temperature and incoming radiation). For a given site, growth may be
Phase field simulation of grain growth in porous uranium dioxide
International Nuclear Information System (INIS)
Ahmed, Karim; Pakarinen, Janne; Allen, Todd; El-Azab, Anter
2014-01-01
Graphical abstract: Display Omitted -- Abstract: A novel phase field model has been developed to investigate grain growth in porous polycrystalline UO 2 . Based on a system of Cahn–Hilliard and Allen–Cahn equations, the model takes into consideration both the curvature driven grain boundary motion and pore migration by surface diffusion. As such, the model accounts for the interaction between pore and grain boundary kinetics, which tends to retard the growth process. The phase field model parameters are found in terms of measurable material properties. Hence, quantitative results that can be compared with experiments were obtained. The model has been used to investigate the effect of porosity on the kinetics of grain growth in UO 2 . It is found that, as the amount of porosity increases, grain growth in UO 2 gradually changes from boundary controlled growth to pore controlled growth. For high porosity levels, the grain growth completely stops after a short evolution time. It is also found that the inhomogeneous distribution of pores leads to abnormal grain growth even without taking into account the anisotropy in grain boundary energy and mobility. The effects of porosity, temperature and initial microstructure on grain growth were thoroughly investigated. The model predictions are in good agreement with published experimental results of grain growth in UO 2
Russkova, Tatiana V.
2017-11-01
One tool to improve the performance of Monte Carlo methods for numerical simulation of light transport in the Earth's atmosphere is the parallel technology. A new algorithm oriented to parallel execution on the CUDA-enabled NVIDIA graphics processor is discussed. The efficiency of parallelization is analyzed on the basis of calculating the upward and downward fluxes of solar radiation in both a vertically homogeneous and inhomogeneous models of the atmosphere. The results of testing the new code under various atmospheric conditions including continuous singlelayered and multilayered clouds, and selective molecular absorption are presented. The results of testing the code using video cards with different compute capability are analyzed. It is shown that the changeover of computing from conventional PCs to the architecture of graphics processors gives more than a hundredfold increase in performance and fully reveals the capabilities of the technology used.
Matsuzaki, Tomoya; Shibata, Yosei; Takeda, Risa; Ishinabe, Takahiro; Fujikake, Hideo
2017-01-01
For directional control of organic single crystals, we propose a crystal growth method using liquid crystal as the solvent. In this study, we examined the formation of 2,7-dioctyl[1]benzothieno[3,2-b][1]benzothiophene (C8-BTBT) single crystals using a parallel aligned liquid crystal (LC) cell and rubbing-treated polyimide films in order to clarify the effects of LC alignment on anisotropic C8-BTBT crystal growth. Based on the results, we found that the crystal growth direction of C8-BTBT single crystals was related to the direction of the aligned LC molecules because of rubbing treatment. Moreover, by optical evaluation, we found that the C8-BTBT single crystals have a aligned molecular structure.
International Nuclear Information System (INIS)
Misawa, Takeharu; Yoshida, Hiroyuki; Akimoto, Hajime
2008-01-01
In Japan Atomic Energy Agency (JAEA), the Innovative Water Reactor for Flexible Fuel Cycle (FLWR) has been developed. For thermal design of FLWR, it is necessary to develop analytical method to predict boiling transition of FLWR. Japan Atomic Energy Agency (JAEA) has been developing three-dimensional two-fluid model analysis code ACE-3D, which adopts boundary fitted coordinate system to simulate complex shape channel flow. In this paper, as a part of development of ACE-3D to apply to rod bundle analysis, introduction of parallelization to ACE-3D and assessments of ACE-3D are shown. In analysis of large-scale domain such as a rod bundle, even two-fluid model requires large number of computational cost, which exceeds upper limit of memory amount of 1 CPU. Therefore, parallelization was introduced to ACE-3D to divide data amount for analysis of large-scale domain among large number of CPUs, and it is confirmed that analysis of large-scale domain such as a rod bundle can be performed by parallel computation with keeping parallel computation performance even using large number of CPUs. ACE-3D adopts two-phase flow models, some of which are dependent upon channel geometry. Therefore, analyses in the domains, which simulate individual subchannel and 37 rod bundle, are performed, and compared with experiments. It is confirmed that the results obtained by both analyses using ACE-3D show agreement with past experimental result qualitatively. (author)
International Nuclear Information System (INIS)
Yokohama, Noriya
2013-01-01
This report was aimed at structuring the design of architectures and studying performance measurement of a parallel computing environment using a Monte Carlo simulation for particle therapy using a high performance computing (HPC) instance within a public cloud-computing infrastructure. Performance measurements showed an approximately 28 times faster speed than seen with single-thread architecture, combined with improved stability. A study of methods of optimizing the system operations also indicated lower cost. (author)
Three-dimensional growth simulation: A study of substrate oriented films
International Nuclear Information System (INIS)
Besnard, A; Martin, N; Carpentier, L
2010-01-01
Monte Carlo simulations are developed to simulate the growth of three-dimensional columnar microstructure in thin films. We are studying in particular oriented microstructure like those produced with the Glancing Angle Deposition technique (GLAD). Some geometrical characteristics of the particles flux, the organization of defect sites on the substrate surface and the atomic surface diffusion are mainly investigated in order to predict the growth processes and the resulting features of the films. This study reports on simulations of thin film growth exhibiting an oblique and zigzag columnar microstructure. Column angle evolution and density are investigated versus incidence angle α or period number n and compared with experimental measurements.
International Nuclear Information System (INIS)
Xu, Zuwei; Zhao, Haibo; Zheng, Chuguang
2015-01-01
This paper proposes a comprehensive framework for accelerating population balance-Monte Carlo (PBMC) simulation of particle coagulation dynamics. By combining Markov jump model, weighted majorant kernel and GPU (graphics processing unit) parallel computing, a significant gain in computational efficiency is achieved. The Markov jump model constructs a coagulation-rule matrix of differentially-weighted simulation particles, so as to capture the time evolution of particle size distribution with low statistical noise over the full size range and as far as possible to reduce the number of time loopings. Here three coagulation rules are highlighted and it is found that constructing appropriate coagulation rule provides a route to attain the compromise between accuracy and cost of PBMC methods. Further, in order to avoid double looping over all simulation particles when considering the two-particle events (typically, particle coagulation), the weighted majorant kernel is introduced to estimate the maximum coagulation rates being used for acceptance–rejection processes by single-looping over all particles, and meanwhile the mean time-step of coagulation event is estimated by summing the coagulation kernels of rejected and accepted particle pairs. The computational load of these fast differentially-weighted PBMC simulations (based on the Markov jump model) is reduced greatly to be proportional to the number of simulation particles in a zero-dimensional system (single cell). Finally, for a spatially inhomogeneous multi-dimensional (multi-cell) simulation, the proposed fast PBMC is performed in each cell, and multiple cells are parallel processed by multi-cores on a GPU that can implement the massively threaded data-parallel tasks to obtain remarkable speedup ratio (comparing with CPU computation, the speedup ratio of GPU parallel computing is as high as 200 in a case of 100 cells with 10 000 simulation particles per cell). These accelerating approaches of PBMC are
Simulation modeling on the growth of firm's safety management capability
Institute of Scientific and Technical Information of China (English)
LIU Tie-zhong; LI Zhi-xiang
2008-01-01
Aiming to the deficiency of safety management measure, established simulation model about firm's safety management capability(FSMC) based on organizational learning theory. The system dynamics(SD) method was used, in which level and rate system, variable equation and system structure flow diagram was concluded. Simulation model was verified from two aspects: first, model's sensitivity to variable was tested from the gross of safety investment and the proportion of safety investment; second, variables dependency was checked up from the correlative variable of FSMC and organizational learning. The feasibility of simulation model is verified though these processes.
A new model for simulating growth in fish
Directory of Open Access Journals (Sweden)
Johannes Hamre
2014-01-01
Full Text Available A real dynamic population model calculates change in population sizes independent of time. The Beverton & Holt (B&H model commonly used in fish assessment includes the von Bertalanffy growth function which has age or accumulated time as an independent variable. As a result the B&H model has to assume constant fish growth. However, growth in fish is highly variable depending on food availability and environmental conditions. We propose a new growth model where the length increment of fish living under constant conditions and unlimited food supply, decreases linearly with increasing fish length until it reaches zero at a maximal fish length. The model is independent of time and includes a term which accounts for the environmental variation. In the present study, the model was validated in zebrafish held at constant conditions. There was a good fit of the model to data on observed growth in Norwegian spring spawning herring, capelin from the Barents Sea, North Sea herring and in farmed coastal cod. Growth data from Walleye Pollock from the Eastern Bering Sea and blue whiting from the Norwegian Sea also fitted reasonably well to the model, whereas data from cod from the North Sea showed a good fit to the model only above a length of 70 cm. Cod from the Barents Sea did not grow according to the model. The last results can be explained by environmental factors and variable food availability in the time under study. The model implicates that the efficiency of energy conversion from food decreases as the individual animal approaches its maximal length and is postulated to represent a natural law of fish <