Massive parallel 3D PIC simulation of negative ion extraction
Revel, Adrien; Mochalskyy, Serhiy; Montellano, Ivar Mauricio; Wünderlich, Dirk; Fantz, Ursel; Minea, Tiberiu
2017-09-01
The 3D PIC-MCC code ONIX is dedicated to modeling Negative hydrogen/deuterium Ion (NI) extraction and co-extraction of electrons from radio-frequency driven, low pressure plasma sources. It provides valuable insight on the complex phenomena involved in the extraction process. In previous calculations, a mesh size larger than the Debye length was used, implying numerical electron heating. Important steps have been achieved in terms of computation performance and parallelization efficiency allowing successful massive parallel calculations (4096 cores), imperative to resolve the Debye length. In addition, the numerical algorithms have been improved in terms of grid treatment, i.e., the electric field near the complex geometry boundaries (plasma grid) is calculated more accurately. The revised model preserves the full 3D treatment, but can take advantage of a highly refined mesh. ONIX was used to investigate the role of the mesh size, the re-injection scheme for lost particles (extracted or wall absorbed), and the electron thermalization process on the calculated extracted current and plasma characteristics. It is demonstrated that all numerical schemes give the same NI current distribution for extracted ions. Concerning the electrons, the pair-injection technique is found well-adapted to simulate the sheath in front of the plasma grid.
International Nuclear Information System (INIS)
Liu, H.
1996-01-01
Computer simulations using the multi-particle code PARMELA with a three-dimensional point-by-point space charge algorithm have turned out to be very helpful in supporting injector commissioning and operations at Thomas Jefferson National Accelerator Facility (Jefferson Lab, formerly called CEBAF). However, this algorithm, which defines a typical N 2 problem in CPU time scaling, is very time-consuming when N, the number of macro-particles, is large. Therefore, it is attractive to use massively parallel processors (MPPs) to speed up the simulations. Motivated by this, the authors modified the space charge subroutine for using the MPPs of the Cray T3D. The techniques used to parallelize and optimize the code on the T3D are discussed in this paper. The performance of the code on the T3D is examined in comparison with a Parallel Vector Processing supercomputer of the Cray C90 and an HP 735/15 high-end workstation
Massively parallel Fokker-Planck code ALLAp
International Nuclear Information System (INIS)
Batishcheva, A.A.; Krasheninnikov, S.I.; Craddock, G.G.; Djordjevic, V.
1996-01-01
The recently developed for workstations Fokker-Planck code ALLA simulates the temporal evolution of 1V, 2V and 1D2V collisional edge plasmas. In this work we present the results of code parallelization on the CRI T3D massively parallel platform (ALLAp version). Simultaneously we benchmark the 1D2V parallel vesion against an analytic self-similar solution of the collisional kinetic equation. This test is not trivial as it demands a very strong spatial temperature and density variation within the simulation domain. (orig.)
Schultz, A.
2010-12-01
3D forward solvers lie at the core of inverse formulations used to image the variation of electrical conductivity within the Earth's interior. This property is associated with variations in temperature, composition, phase, presence of volatiles, and in specific settings, the presence of groundwater, geothermal resources, oil/gas or minerals. The high cost of 3D solutions has been a stumbling block to wider adoption of 3D methods. Parallel algorithms for modeling frequency domain 3D EM problems have not achieved wide scale adoption, with emphasis on fairly coarse grained parallelism using MPI and similar approaches. The communications bandwidth as well as the latency required to send and receive network communication packets is a limiting factor in implementing fine grained parallel strategies, inhibiting wide adoption of these algorithms. Leading Graphics Processor Unit (GPU) companies now produce GPUs with hundreds of GPU processor cores per die. The footprint, in silicon, of the GPU's restricted instruction set is much smaller than the general purpose instruction set required of a CPU. Consequently, the density of processor cores on a GPU can be much greater than on a CPU. GPUs also have local memory, registers and high speed communication with host CPUs, usually through PCIe type interconnects. The extremely low cost and high computational power of GPUs provides the EM geophysics community with an opportunity to achieve fine grained (i.e. massive) parallelization of codes on low cost hardware. The current generation of GPUs (e.g. NVidia Fermi) provides 3 billion transistors per chip die, with nearly 500 processor cores and up to 6 GB of fast (DDR5) GPU memory. This latest generation of GPU supports fast hardware double precision (64 bit) floating point operations of the type required for frequency domain EM forward solutions. Each Fermi GPU board can sustain nearly 1 TFLOP in double precision, and multiple boards can be installed in the host computer system. We
First massively parallel algorithm to be implemented in Apollo-II code
International Nuclear Information System (INIS)
Stankovski, Z.
1994-01-01
The collision probability (CP) method in neutron transport, as applied to arbitrary 2D XY geometries, like the TDT module in APOLLO-II, is very time consuming. Consequently RZ or 3D extensions became prohibitive. Fortunately, this method is very suitable for parallelization. Massively parallel computer architectures, especially MIMD machines, bring a new breath to this method. In this paper we present a CM5 implementation of the CP method. Parallelization is applied to the energy groups, using the CMMD message passing library. In our case we use 32 processors for the standard 99-group APOLLIB-II library. The real advantage of this algorithm will appear in the calculation of the future fine multigroup library (about 8000 groups) of the SAPHYR project with a massively parallel computer (to the order of hundreds of processors). (author). 3 tabs., 4 figs., 4 refs
Adapting algorithms to massively parallel hardware
Sioulas, Panagiotis
2016-01-01
In the recent years, the trend in computing has shifted from delivering processors with faster clock speeds to increasing the number of cores per processor. This marks a paradigm shift towards parallel programming in which applications are programmed to exploit the power provided by multi-cores. Usually there is gain in terms of the time-to-solution and the memory footprint. Specifically, this trend has sparked an interest towards massively parallel systems that can provide a large number of processors, and possibly computing nodes, as in the GPUs and MPPAs (Massively Parallel Processor Arrays). In this project, the focus was on two distinct computing problems: k-d tree searches and track seeding cellular automata. The goal was to adapt the algorithms to parallel systems and evaluate their performance in different cases.
Computational fluid dynamics on a massively parallel computer
Jespersen, Dennis C.; Levit, Creon
1989-01-01
A finite difference code was implemented for the compressible Navier-Stokes equations on the Connection Machine, a massively parallel computer. The code is based on the ARC2D/ARC3D program and uses the implicit factored algorithm of Beam and Warming. The codes uses odd-even elimination to solve linear systems. Timings and computation rates are given for the code, and a comparison is made with a Cray XMP.
Application of Raptor-M3G to reactor dosimetry problems on massively parallel architectures - 026
International Nuclear Information System (INIS)
Longoni, G.
2010-01-01
The solution of complex 3-D radiation transport problems requires significant resources both in terms of computation time and memory availability. Therefore, parallel algorithms and multi-processor architectures are required to solve efficiently large 3-D radiation transport problems. This paper presents the application of RAPTOR-M3G (Rapid Parallel Transport Of Radiation - Multiple 3D Geometries) to reactor dosimetry problems. RAPTOR-M3G is a newly developed parallel computer code designed to solve the discrete ordinates (SN) equations on multi-processor computer architectures. This paper presents the results for a reactor dosimetry problem using a 3-D model of a commercial 2-loop pressurized water reactor (PWR). The accuracy and performance of RAPTOR-M3G will be analyzed and the numerical results obtained from the calculation will be compared directly to measurements of the neutron field in the reactor cavity air gap. The parallel performance of RAPTOR-M3G on massively parallel architectures, where the number of computing nodes is in the order of hundreds, will be analyzed up to four hundred processors. The performance results will be presented based on two supercomputing architectures: the POPLE supercomputer operated by the Pittsburgh Supercomputing Center and the Westinghouse computer cluster. The Westinghouse computer cluster is equipped with a standard Ethernet network connection and an InfiniBand R interconnects capable of a bandwidth in excess of 20 GBit/sec. Therefore, the impact of the network architecture on RAPTOR-M3G performance will be analyzed as well. (authors)
Visualizing Network Traffic to Understand the Performance of Massively Parallel Simulations
Landge, A. G.
2012-12-01
The performance of massively parallel applications is often heavily impacted by the cost of communication among compute nodes. However, determining how to best use the network is a formidable task, made challenging by the ever increasing size and complexity of modern supercomputers. This paper applies visualization techniques to aid parallel application developers in understanding the network activity by enabling a detailed exploration of the flow of packets through the hardware interconnect. In order to visualize this large and complex data, we employ two linked views of the hardware network. The first is a 2D view, that represents the network structure as one of several simplified planar projections. This view is designed to allow a user to easily identify trends and patterns in the network traffic. The second is a 3D view that augments the 2D view by preserving the physical network topology and providing a context that is familiar to the application developers. Using the massively parallel multi-physics code pF3D as a case study, we demonstrate that our tool provides valuable insight that we use to explain and optimize pF3D-s performance on an IBM Blue Gene/P system. © 1995-2012 IEEE.
New adaptive differencing strategy in the PENTRAN 3-d parallel Sn code
International Nuclear Information System (INIS)
Sjoden, G.E.; Haghighat, A.
1996-01-01
It is known that three-dimensional (3-D) discrete ordinates (S n ) transport problems require an immense amount of storage and computational effort to solve. For this reason, parallel codes that offer a capability to completely decompose the angular, energy, and spatial domains among a distributed network of processors are required. One such code recently developed is PENTRAN, which iteratively solves 3-D multi-group, anisotropic S n problems on distributed-memory platforms, such as the IBM-SP2. Because large problems typically contain several different material zones with various properties, available differencing schemes should automatically adapt to the transport physics in each material zone. To minimize the memory and message-passing overhead required for massively parallel S n applications, available differencing schemes in an adaptive strategy should also offer reasonable accuracy and positivity, yet require only the zeroth spatial moment of the transport equation; differencing schemes based on higher spatial moments, in spite of their greater accuracy, require at least twice the amount of storage and communication cost for implementation in a massively parallel transport code. This paper discusses a new adaptive differencing strategy that uses increasingly accurate schemes with low parallel memory and communication overhead. This strategy, implemented in PENTRAN, includes a new scheme, exponential directional averaged (EDA) differencing
On maximal massive 3D supergravity
Bergshoeff , Eric A; Hohm , Olaf; Rosseel , Jan; Townsend , Paul K
2010-01-01
ABSTRACT We construct, at the linearized level, the three-dimensional (3D) N = 4 supersymmetric " general massive supergravity " and the maximally supersymmetric N = 8 " new massive supergravity ". We also construct the maximally supersymmetric linearized N = 7 topologically massive supergravity, although we expect N = 6 to be maximal at the non-linear level. (Bergshoeff, Eric A) (Hohm, Olaf) (Rosseel, Jan) P.K.Townsend@da...
Massively Parallel Computing: A Sandia Perspective
Energy Technology Data Exchange (ETDEWEB)
Dosanjh, Sudip S.; Greenberg, David S.; Hendrickson, Bruce; Heroux, Michael A.; Plimpton, Steve J.; Tomkins, James L.; Womble, David E.
1999-05-06
The computing power available to scientists and engineers has increased dramatically in the past decade, due in part to progress in making massively parallel computing practical and available. The expectation for these machines has been great. The reality is that progress has been slower than expected. Nevertheless, massively parallel computing is beginning to realize its potential for enabling significant break-throughs in science and engineering. This paper provides a perspective on the state of the field, colored by the authors' experiences using large scale parallel machines at Sandia National Laboratories. We address trends in hardware, system software and algorithms, and we also offer our view of the forces shaping the parallel computing industry.
3D streamers simulation in a pin to plane configuration using massively parallel computing
Plewa, J.-M.; Eichwald, O.; Ducasse, O.; Dessante, P.; Jacobs, C.; Renon, N.; Yousfi, M.
2018-03-01
This paper concerns the 3D simulation of corona discharge using high performance computing (HPC) managed with the message passing interface (MPI) library. In the field of finite volume methods applied on non-adaptive mesh grids and in the case of a specific 3D dynamic benchmark test devoted to streamer studies, the great efficiency of the iterative R&B SOR and BiCGSTAB methods versus the direct MUMPS method was clearly demonstrated in solving the Poisson equation using HPC resources. The optimization of the parallelization and the resulting scalability was undertaken as a function of the HPC architecture for a number of mesh cells ranging from 8 to 512 million and a number of cores ranging from 20 to 1600. The R&B SOR method remains at least about four times faster than the BiCGSTAB method and requires significantly less memory for all tested situations. The R&B SOR method was then implemented in a 3D MPI parallelized code that solves the classical first order model of an atmospheric pressure corona discharge in air. The 3D code capabilities were tested by following the development of one, two and four coplanar streamers generated by initial plasma spots for 6 ns. The preliminary results obtained allowed us to follow in detail the formation of the tree structure of a corona discharge and the effects of the mutual interactions between the streamers in terms of streamer velocity, trajectory and diameter. The computing time for 64 million of mesh cells distributed over 1000 cores using the MPI procedures is about 30 min ns-1, regardless of the number of streamers.
3D printed soft parallel actuator
Zolfagharian, Ali; Kouzani, Abbas Z.; Khoo, Sui Yang; Noshadi, Amin; Kaynak, Akif
2018-04-01
This paper presents a 3-dimensional (3D) printed soft parallel contactless actuator for the first time. The actuator involves an electro-responsive parallel mechanism made of two segments namely active chain and passive chain both 3D printed. The active chain is attached to the ground from one end and constitutes two actuator links made of responsive hydrogel. The passive chain, on the other hand, is attached to the active chain from one end and consists of two rigid links made of polymer. The actuator links are printed using an extrusion-based 3D-Bioplotter with polyelectrolyte hydrogel as printer ink. The rigid links are also printed by a 3D fused deposition modelling (FDM) printer with acrylonitrile butadiene styrene (ABS) as print material. The kinematics model of the soft parallel actuator is derived via transformation matrices notations to simulate and determine the workspace of the actuator. The printed soft parallel actuator is then immersed into NaOH solution with specific voltage applied to it via two contactless electrodes. The experimental data is then collected and used to develop a parametric model to estimate the end-effector position and regulate kinematics model in response to specific input voltage over time. It is observed that the electroactive actuator demonstrates expected behaviour according to the simulation of its kinematics model. The use of 3D printing for the fabrication of parallel soft actuators opens a new chapter in manufacturing sophisticated soft actuators with high dexterity and mechanical robustness for biomedical applications such as cell manipulation and drug release.
Massively Parallel Algorithms for Solution of Schrodinger Equation
Fijany, Amir; Barhen, Jacob; Toomerian, Nikzad
1994-01-01
In this paper massively parallel algorithms for solution of Schrodinger equation are developed. Our results clearly indicate that the Crank-Nicolson method, in addition to its excellent numerical properties, is also highly suitable for massively parallel computation.
Massively parallel mathematical sieves
Energy Technology Data Exchange (ETDEWEB)
Montry, G.R.
1989-01-01
The Sieve of Eratosthenes is a well-known algorithm for finding all prime numbers in a given subset of integers. A parallel version of the Sieve is described that produces computational speedups over 800 on a hypercube with 1,024 processing elements for problems of fixed size. Computational speedups as high as 980 are achieved when the problem size per processor is fixed. The method of parallelization generalizes to other sieves and will be efficient on any ensemble architecture. We investigate two highly parallel sieves using scattered decomposition and compare their performance on a hypercube multiprocessor. A comparison of different parallelization techniques for the sieve illustrates the trade-offs necessary in the design and implementation of massively parallel algorithms for large ensemble computers.
A Massively Parallel Face Recognition System
Directory of Open Access Journals (Sweden)
Lahdenoja Olli
2007-01-01
Full Text Available We present methods for processing the LBPs (local binary patterns with a massively parallel hardware, especially with CNN-UM (cellular nonlinear network-universal machine. In particular, we present a framework for implementing a massively parallel face recognition system, including a dedicated highly accurate algorithm suitable for various types of platforms (e.g., CNN-UM and digital FPGA. We study in detail a dedicated mixed-mode implementation of the algorithm and estimate its implementation cost in the view of its performance and accuracy restrictions.
A Massively Parallel Face Recognition System
Directory of Open Access Journals (Sweden)
Ari Paasio
2006-12-01
Full Text Available We present methods for processing the LBPs (local binary patterns with a massively parallel hardware, especially with CNN-UM (cellular nonlinear network-universal machine. In particular, we present a framework for implementing a massively parallel face recognition system, including a dedicated highly accurate algorithm suitable for various types of platforms (e.g., CNN-UM and digital FPGA. We study in detail a dedicated mixed-mode implementation of the algorithm and estimate its implementation cost in the view of its performance and accuracy restrictions.
A Massively Parallel Solver for the Mechanical Harmonic Analysis of Accelerator Cavities
International Nuclear Information System (INIS)
2015-01-01
ACE3P is a 3D massively parallel simulation suite that developed at SLAC National Accelerator Laboratory that can perform coupled electromagnetic, thermal and mechanical study. Effectively utilizing supercomputer resources, ACE3P has become a key simulation tool for particle accelerator R and D. A new frequency domain solver to perform mechanical harmonic response analysis of accelerator components is developed within the existing parallel framework. This solver is designed to determine the frequency response of the mechanical system to external harmonic excitations for time-efficient accurate analysis of the large-scale problems. Coupled with the ACE3P electromagnetic modules, this capability complements a set of multi-physics tools for a comprehensive study of microphonics in superconducting accelerating cavities in order to understand the RF response and feedback requirements for the operational reliability of a particle accelerator. (auth)
Massively-parallel best subset selection for ordinary least-squares regression
DEFF Research Database (Denmark)
Gieseke, Fabian; Polsterer, Kai Lars; Mahabal, Ashish
2017-01-01
Selecting an optimal subset of k out of d features for linear regression models given n training instances is often considered intractable for feature spaces with hundreds or thousands of dimensions. We propose an efficient massively-parallel implementation for selecting such optimal feature...
3D multiphysics modeling of superconducting cavities with a massively parallel simulation suite
Directory of Open Access Journals (Sweden)
Oleksiy Kononenko
2017-10-01
Full Text Available Radiofrequency cavities based on superconducting technology are widely used in particle accelerators for various applications. The cavities usually have high quality factors and hence narrow bandwidths, so the field stability is sensitive to detuning from the Lorentz force and external loads, including vibrations and helium pressure variations. If not properly controlled, the detuning can result in a serious performance degradation of a superconducting accelerator, so an understanding of the underlying detuning mechanisms can be very helpful. Recent advances in the simulation suite ace3p have enabled realistic multiphysics characterization of such complex accelerator systems on supercomputers. In this paper, we present the new capabilities in ace3p for large-scale 3D multiphysics modeling of superconducting cavities, in particular, a parallel eigensolver for determining mechanical resonances, a parallel harmonic response solver to calculate the response of a cavity to external vibrations, and a numerical procedure to decompose mechanical loads, such as from the Lorentz force or piezoactuators, into the corresponding mechanical modes. These capabilities have been used to do an extensive rf-mechanical analysis of dressed TESLA-type superconducting cavities. The simulation results and their implications for the operational stability of the Linac Coherent Light Source-II are discussed.
International Nuclear Information System (INIS)
Soltz, R; Vranas, P; Blumrich, M; Chen, D; Gara, A; Giampap, M; Heidelberger, P; Salapura, V; Sexton, J; Bhanot, G
2007-01-01
The theory of the strong nuclear force, Quantum Chromodynamics (QCD), can be numerically simulated from first principles on massively-parallel supercomputers using the method of Lattice Gauge Theory. We describe the special programming requirements of lattice QCD (LQCD) as well as the optimal supercomputer hardware architectures that it suggests. We demonstrate these methods on the BlueGene massively-parallel supercomputer and argue that LQCD and the BlueGene architecture are a natural match. This can be traced to the simple fact that LQCD is a regular lattice discretization of space into lattice sites while the BlueGene supercomputer is a discretization of space into compute nodes, and that both are constrained by requirements of locality. This simple relation is both technologically important and theoretically intriguing. The main result of this paper is the speedup of LQCD using up to 131,072 CPUs on the largest BlueGene/L supercomputer. The speedup is perfect with sustained performance of about 20% of peak. This corresponds to a maximum of 70.5 sustained TFlop/s. At these speeds LQCD and BlueGene are poised to produce the next generation of strong interaction physics theoretical results
Energy Technology Data Exchange (ETDEWEB)
Bauerle, Matthew [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)
2014-08-01
This project utilizes Graphics Processing Units (GPUs) to compute radiograph simulations for arbitrary objects. The generation of radiographs, also known as the forward projection imaging model, is computationally intensive and not widely utilized. The goal of this research is to develop a massively parallel algorithm that can compute forward projections for objects with a trillion voxels (3D pixels). To achieve this end, the data are divided into blocks that can each t into GPU memory. The forward projected image is also divided into segments to allow for future parallelization and to avoid needless computations.
Programming massively parallel processors a hands-on approach
Kirk, David B
2010-01-01
Programming Massively Parallel Processors discusses basic concepts about parallel programming and GPU architecture. ""Massively parallel"" refers to the use of a large number of processors to perform a set of computations in a coordinated parallel way. The book details various techniques for constructing parallel programs. It also discusses the development process, performance level, floating-point format, parallel patterns, and dynamic parallelism. The book serves as a teaching guide where parallel programming is the main topic of the course. It builds on the basics of C programming for CUDA, a parallel programming environment that is supported on NVI- DIA GPUs. Composed of 12 chapters, the book begins with basic information about the GPU as a parallel computer source. It also explains the main concepts of CUDA, data parallelism, and the importance of memory access efficiency using CUDA. The target audience of the book is graduate and undergraduate students from all science and engineering disciplines who ...
Neural Parallel Engine: A toolbox for massively parallel neural signal processing.
Tam, Wing-Kin; Yang, Zhi
2018-05-01
Large-scale neural recordings provide detailed information on neuronal activities and can help elicit the underlying neural mechanisms of the brain. However, the computational burden is also formidable when we try to process the huge data stream generated by such recordings. In this study, we report the development of Neural Parallel Engine (NPE), a toolbox for massively parallel neural signal processing on graphical processing units (GPUs). It offers a selection of the most commonly used routines in neural signal processing such as spike detection and spike sorting, including advanced algorithms such as exponential-component-power-component (EC-PC) spike detection and binary pursuit spike sorting. We also propose a new method for detecting peaks in parallel through a parallel compact operation. Our toolbox is able to offer a 5× to 110× speedup compared with its CPU counterparts depending on the algorithms. A user-friendly MATLAB interface is provided to allow easy integration of the toolbox into existing workflows. Previous efforts on GPU neural signal processing only focus on a few rudimentary algorithms, are not well-optimized and often do not provide a user-friendly programming interface to fit into existing workflows. There is a strong need for a comprehensive toolbox for massively parallel neural signal processing. A new toolbox for massively parallel neural signal processing has been created. It can offer significant speedup in processing signals from large-scale recordings up to thousands of channels. Copyright © 2018 Elsevier B.V. All rights reserved.
International Nuclear Information System (INIS)
Idomura, Yasuhiro; Jolliet, Sebastien
2010-01-01
A gyrokinetic toroidal five dimensional Eulerian code GT5D is ported on six advanced massively parallel platforms and comprehensive benchmark tests are performed. A parallelisation technique based on physical properties of the gyrokinetic equation is presented. By extending the parallelisation technique with a hybrid parallel model, the scalability of the code is improved on platforms with multi-core processors. In the benchmark tests, a good salability is confirmed up to several thousands cores on every platforms, and the maximum sustained performance of ∼18.6 Tflops is achieved using 16384 cores of BX900. (author)
A spin-4 analog of 3D massive gravity
Bergshoeff, Eric A.; Kovacevic, Marija; Rosseel, Jan; Townsend, Paul K.; Yin, Yihao
2011-01-01
A sixth-order, but ghost-free, gauge-invariant action is found for a fourth-rank symmetric tensor potential in a three-dimensional (3D) Minkowski spacetime. It propagates two massive modes of spin 4 that are interchanged by parity and is thus a spin-4 analog of linearized 'new massive gravity'. Also
Exact Solutions in 3D New Massive Gravity
Ahmedov, Haji; Aliev, Alikram N.
2011-01-01
We show that the field equations of new massive gravity (NMG) consist of a massive (tensorial) Klein-Gordon-type equation with a curvature-squared source term and a constraint equation. We also show that, for algebraic type D and N spacetimes, the field equations of topologically massive gravity (TMG) can be thought of as the “square root” of the massive Klein-Gordon-type equation. Using this fact, we establish a simple framework for mapping all types D and N solutions of TMG into NMG. Finally, we present new examples of types D and N solutions to NMG.
First massively parallel algorithm to be implemented in APOLLO-II code
International Nuclear Information System (INIS)
Stankovski, Z.
1994-01-01
The collision probability method in neutron transport, as applied to arbitrary 2-dimensional geometries, like the two dimensional transport module in APOLLO-II is very time consuming. Consequently 3-dimensional extension became prohibitive. Fortunately, this method is very suitable for parallelization. Massively parallel computer architectures, especially MIMD machines, bring a new breath to this method. In this paper we present a CM5 implementation of the collision probability method. Parallelization is applied to the energy groups, using the CMMD massage passing library. In our case we used 32 processors for the standard 99-group APOLLIB-II library. The real advantage of this algorithm will appear in the calculation of the future multigroup library (about 8000 groups) of the SAPHYR project with a massively parallel computer (to the order of hundreds of processors). (author). 4 refs., 4 figs., 3 tabs
RAMA: A file system for massively parallel computers
Miller, Ethan L.; Katz, Randy H.
1993-01-01
This paper describes a file system design for massively parallel computers which makes very efficient use of a few disks per processor. This overcomes the traditional I/O bottleneck of massively parallel machines by storing the data on disks within the high-speed interconnection network. In addition, the file system, called RAMA, requires little inter-node synchronization, removing another common bottleneck in parallel processor file systems. Support for a large tertiary storage system can easily be integrated in lo the file system; in fact, RAMA runs most efficiently when tertiary storage is used.
Wideband aperture array using RF channelizers and massively parallel digital 2D IIR filterbank
Sengupta, Arindam; Madanayake, Arjuna; Gómez-García, Roberto; Engeberg, Erik D.
2014-05-01
Wideband receive-mode beamforming applications in wireless location, electronically-scanned antennas for radar, RF sensing, microwave imaging and wireless communications require digital aperture arrays that offer a relatively constant far-field beam over several octaves of bandwidth. Several beamforming schemes including the well-known true time-delay and the phased array beamformers have been realized using either finite impulse response (FIR) or fast Fourier transform (FFT) digital filter-sum based techniques. These beamforming algorithms offer the desired selectivity at the cost of a high computational complexity and frequency-dependant far-field array patterns. A novel approach to receiver beamforming is the use of massively parallel 2-D infinite impulse response (IIR) fan filterbanks for the synthesis of relatively frequency independent RF beams at an order of magnitude lower multiplier complexity compared to FFT or FIR filter based conventional algorithms. The 2-D IIR filterbanks demand fast digital processing that can support several octaves of RF bandwidth, fast analog-to-digital converters (ADCs) for RF-to-bits type direct conversion of wideband antenna element signals. Fast digital implementation platforms that can realize high-precision recursive filter structures necessary for real-time beamforming, at RF radio bandwidths, are also desired. We propose a novel technique that combines a passive RF channelizer, multichannel ADC technology, and single-phase massively parallel 2-D IIR digital fan filterbanks, realized at low complexity using FPGA and/or ASIC technology. There exists native support for a larger bandwidth than the maximum clock frequency of the digital implementation technology. We also strive to achieve More-than-Moore throughput by processing a wideband RF signal having content with N-fold (B = N Fclk/2) bandwidth compared to the maximum clock frequency Fclk Hz of the digital VLSI platform under consideration. Such increase in bandwidth is
Complexity growth in minimal massive 3D gravity
Qaemmaqami, Mohammad M.
2018-01-01
We study the complexity growth by using "complexity =action " (CA) proposal in the minimal massive 3D gravity (MMG) model which is proposed for resolving the bulk-boundary clash problem of topologically massive gravity (TMG). We observe that the rate of the complexity growth for Banados-Teitelboim-Zanelli (BTZ) black hole saturates the proposed bound by physical mass of the BTZ black hole in the MMG model, when the angular momentum parameter and the inner horizon of black hole goes to zero.
Parallel 3-D method of characteristics in MPACT
International Nuclear Information System (INIS)
Kochunas, B.; Dovvnar, T. J.; Liu, Z.
2013-01-01
A new parallel 3-D MOC kernel has been developed and implemented in MPACT which makes use of the modular ray tracing technique to reduce computational requirements and to facilitate parallel decomposition. The parallel model makes use of both distributed and shared memory parallelism which are implemented with the MPI and OpenMP standards, respectively. The kernel is capable of parallel decomposition of problems in space, angle, and by characteristic rays up to 0(104) processors. Initial verification of the parallel 3-D MOC kernel was performed using the Takeda 3-D transport benchmark problems. The eigenvalues computed by MPACT are within the statistical uncertainty of the benchmark reference and agree well with the averages of other participants. The MPACT k eff differs from the benchmark results for rodded and un-rodded cases by 11 and -40 pcm, respectively. The calculations were performed for various numbers of processors and parallel decompositions up to 15625 processors; all producing the same result at convergence. The parallel efficiency of the worst case was 60%, while very good efficiency (>95%) was observed for cases using 500 processors. The overall run time for the 500 processor case was 231 seconds and 19 seconds for the case with 15625 processors. Ongoing work is focused on developing theoretical performance models and the implementation of acceleration techniques to minimize the number of iterations to converge. (authors)
3D, parallel fluid-structure interaction code
CSIR Research Space (South Africa)
Oxtoby, Oliver F
2011-01-01
Full Text Available The authors describe the development of a 3D parallel Fluid–Structure–Interaction (FSI) solver and its application to benchmark problems. Fluid and solid domains are discretised using and edge-based finite-volume scheme for efficient parallel...
Frontiers of massively parallel scientific computation
International Nuclear Information System (INIS)
Fischer, J.R.
1987-07-01
Practical applications using massively parallel computer hardware first appeared during the 1980s. Their development was motivated by the need for computing power orders of magnitude beyond that available today for tasks such as numerical simulation of complex physical and biological processes, generation of interactive visual displays, satellite image analysis, and knowledge based systems. Representative of the first generation of this new class of computers is the Massively Parallel Processor (MPP). A team of scientists was provided the opportunity to test and implement their algorithms on the MPP. The first results are presented. The research spans a broad variety of applications including Earth sciences, physics, signal and image processing, computer science, and graphics. The performance of the MPP was very good. Results obtained using the Connection Machine and the Distributed Array Processor (DAP) are presented
Massively parallel quantum computer simulator
De Raedt, K.; Michielsen, K.; De Raedt, H.; Trieu, B.; Arnold, G.; Richter, M.; Lippert, Th.; Watanabe, H.; Ito, N.
2007-01-01
We describe portable software to simulate universal quantum computers on massive parallel Computers. We illustrate the use of the simulation software by running various quantum algorithms on different computer architectures, such as a IBM BlueGene/L, a IBM Regatta p690+, a Hitachi SR11000/J1, a Cray
Development of massively parallel quantum chemistry program SMASH
International Nuclear Information System (INIS)
Ishimura, Kazuya
2015-01-01
A massively parallel program for quantum chemistry calculations SMASH was released under the Apache License 2.0 in September 2014. The SMASH program is written in the Fortran90/95 language with MPI and OpenMP standards for parallelization. Frequently used routines, such as one- and two-electron integral calculations, are modularized to make program developments simple. The speed-up of the B3LYP energy calculation for (C 150 H 30 ) 2 with the cc-pVDZ basis set (4500 basis functions) was 50,499 on 98,304 cores of the K computer
The 2nd Symposium on the Frontiers of Massively Parallel Computations
Mills, Ronnie (Editor)
1988-01-01
Programming languages, computer graphics, neural networks, massively parallel computers, SIMD architecture, algorithms, digital terrain models, sort computation, simulation of charged particle transport on the massively parallel processor and image processing are among the topics discussed.
Massively parallel multicanonical simulations
Gross, Jonathan; Zierenberg, Johannes; Weigel, Martin; Janke, Wolfhard
2018-03-01
Generalized-ensemble Monte Carlo simulations such as the multicanonical method and similar techniques are among the most efficient approaches for simulations of systems undergoing discontinuous phase transitions or with rugged free-energy landscapes. As Markov chain methods, they are inherently serial computationally. It was demonstrated recently, however, that a combination of independent simulations that communicate weight updates at variable intervals allows for the efficient utilization of parallel computational resources for multicanonical simulations. Implementing this approach for the many-thread architecture provided by current generations of graphics processing units (GPUs), we show how it can be efficiently employed with of the order of 104 parallel walkers and beyond, thus constituting a versatile tool for Monte Carlo simulations in the era of massively parallel computing. We provide the fully documented source code for the approach applied to the paradigmatic example of the two-dimensional Ising model as starting point and reference for practitioners in the field.
Impact analysis on a massively parallel computer
International Nuclear Information System (INIS)
Zacharia, T.; Aramayo, G.A.
1994-01-01
Advanced mathematical techniques and computer simulation play a major role in evaluating and enhancing the design of beverage cans, industrial, and transportation containers for improved performance. Numerical models are used to evaluate the impact requirements of containers used by the Department of Energy (DOE) for transporting radioactive materials. Many of these models are highly compute-intensive. An analysis may require several hours of computational time on current supercomputers despite the simplicity of the models being studied. As computer simulations and materials databases grow in complexity, massively parallel computers have become important tools. Massively parallel computational research at the Oak Ridge National Laboratory (ORNL) and its application to the impact analysis of shipping containers is briefly described in this paper
Solving the Stokes problem on a massively parallel computer
DEFF Research Database (Denmark)
Axelsson, Owe; Barker, Vincent A.; Neytcheva, Maya
2001-01-01
boundary value problem for each velocity component, are solved by the conjugate gradient method with a preconditioning based on the algebraic multi‐level iteration (AMLI) technique. The velocity is found from the computed pressure. The method is optimal in the sense that the computational work...... is proportional to the number of unknowns. Further, it is designed to exploit a massively parallel computer with distributed memory architecture. Numerical experiments on a Cray T3E computer illustrate the parallel performance of the method....
Massively parallel evolutionary computation on GPGPUs
Tsutsui, Shigeyoshi
2013-01-01
Evolutionary algorithms (EAs) are metaheuristics that learn from natural collective behavior and are applied to solve optimization problems in domains such as scheduling, engineering, bioinformatics, and finance. Such applications demand acceptable solutions with high-speed execution using finite computational resources. Therefore, there have been many attempts to develop platforms for running parallel EAs using multicore machines, massively parallel cluster machines, or grid computing environments. Recent advances in general-purpose computing on graphics processing units (GPGPU) have opened u
Development of massively parallel quantum chemistry program SMASH
Energy Technology Data Exchange (ETDEWEB)
Ishimura, Kazuya [Department of Theoretical and Computational Molecular Science, Institute for Molecular Science 38 Nishigo-Naka, Myodaiji, Okazaki, Aichi 444-8585 (Japan)
2015-12-31
A massively parallel program for quantum chemistry calculations SMASH was released under the Apache License 2.0 in September 2014. The SMASH program is written in the Fortran90/95 language with MPI and OpenMP standards for parallelization. Frequently used routines, such as one- and two-electron integral calculations, are modularized to make program developments simple. The speed-up of the B3LYP energy calculation for (C{sub 150}H{sub 30}){sub 2} with the cc-pVDZ basis set (4500 basis functions) was 50,499 on 98,304 cores of the K computer.
Newman, Gregory A.; Commer, Michael
2009-07-01
Three-dimensional (3D) geophysical imaging is now receiving considerable attention for electrical conductivity mapping of potential offshore oil and gas reservoirs. The imaging technology employs controlled source electromagnetic (CSEM) and magnetotelluric (MT) fields and treats geological media exhibiting transverse anisotropy. Moreover when combined with established seismic methods, direct imaging of reservoir fluids is possible. Because of the size of the 3D conductivity imaging problem, strategies are required exploiting computational parallelism and optimal meshing. The algorithm thus developed has been shown to scale to tens of thousands of processors. In one imaging experiment, 32,768 tasks/processors on the IBM Watson Research Blue Gene/L supercomputer were successfully utilized. Over a 24 hour period we were able to image a large scale field data set that previously required over four months of processing time on distributed clusters based on Intel or AMD processors utilizing 1024 tasks on an InfiniBand fabric. Electrical conductivity imaging using massively parallel computational resources produces results that cannot be obtained otherwise and are consistent with timeframes required for practical exploration problems.
International Nuclear Information System (INIS)
Newman, Gregory A; Commer, Michael
2009-01-01
Three-dimensional (3D) geophysical imaging is now receiving considerable attention for electrical conductivity mapping of potential offshore oil and gas reservoirs. The imaging technology employs controlled source electromagnetic (CSEM) and magnetotelluric (MT) fields and treats geological media exhibiting transverse anisotropy. Moreover when combined with established seismic methods, direct imaging of reservoir fluids is possible. Because of the size of the 3D conductivity imaging problem, strategies are required exploiting computational parallelism and optimal meshing. The algorithm thus developed has been shown to scale to tens of thousands of processors. In one imaging experiment, 32,768 tasks/processors on the IBM Watson Research Blue Gene/L supercomputer were successfully utilized. Over a 24 hour period we were able to image a large scale field data set that previously required over four months of processing time on distributed clusters based on Intel or AMD processors utilizing 1024 tasks on an InfiniBand fabric. Electrical conductivity imaging using massively parallel computational resources produces results that cannot be obtained otherwise and are consistent with timeframes required for practical exploration problems.
Massively parallel sequencing of forensic STRs
DEFF Research Database (Denmark)
Parson, Walther; Ballard, David; Budowle, Bruce
2016-01-01
The DNA Commission of the International Society for Forensic Genetics (ISFG) is reviewing factors that need to be considered ahead of the adoption by the forensic community of short tandem repeat (STR) genotyping by massively parallel sequencing (MPS) technologies. MPS produces sequence data that...
A discrete ordinate response matrix method for massively parallel computers
International Nuclear Information System (INIS)
Hanebutte, U.R.; Lewis, E.E.
1991-01-01
A discrete ordinate response matrix method is formulated for the solution of neutron transport problems on massively parallel computers. The response matrix formulation eliminates iteration on the scattering source. The nodal matrices which result from the diamond-differenced equations are utilized in a factored form which minimizes memory requirements and significantly reduces the required number of algorithm utilizes massive parallelism by assigning each spatial node to a processor. The algorithm is accelerated effectively by a synthetic method in which the low-order diffusion equations are also solved by massively parallel red/black iterations. The method has been implemented on a 16k Connection Machine-2, and S 8 and S 16 solutions have been obtained for fixed-source benchmark problems in X--Y geometry
The language parallel Pascal and other aspects of the massively parallel processor
Reeves, A. P.; Bruner, J. D.
1982-01-01
A high level language for the Massively Parallel Processor (MPP) was designed. This language, called Parallel Pascal, is described in detail. A description of the language design, a description of the intermediate language, Parallel P-Code, and details for the MPP implementation are included. Formal descriptions of Parallel Pascal and Parallel P-Code are given. A compiler was developed which converts programs in Parallel Pascal into the intermediate Parallel P-Code language. The code generator to complete the compiler for the MPP is being developed independently. A Parallel Pascal to Pascal translator was also developed. The architecture design for a VLSI version of the MPP was completed with a description of fault tolerant interconnection networks. The memory arrangement aspects of the MPP are discussed and a survey of other high level languages is given.
Template based parallel checkpointing in a massively parallel computer system
Archer, Charles Jens [Rochester, MN; Inglett, Todd Alan [Rochester, MN
2009-01-13
A method and apparatus for a template based parallel checkpoint save for a massively parallel super computer system using a parallel variation of the rsync protocol, and network broadcast. In preferred embodiments, the checkpoint data for each node is compared to a template checkpoint file that resides in the storage and that was previously produced. Embodiments herein greatly decrease the amount of data that must be transmitted and stored for faster checkpointing and increased efficiency of the computer system. Embodiments are directed to a parallel computer system with nodes arranged in a cluster with a high speed interconnect that can perform broadcast communication. The checkpoint contains a set of actual small data blocks with their corresponding checksums from all nodes in the system. The data blocks may be compressed using conventional non-lossy data compression algorithms to further reduce the overall checkpoint size.
CALTRANS: A parallel, deterministic, 3D neutronics code
Energy Technology Data Exchange (ETDEWEB)
Carson, L.; Ferguson, J.; Rogers, J.
1994-04-01
Our efforts to parallelize the deterministic solution of the neutron transport equation has culminated in a new neutronics code CALTRANS, which has full 3D capability. In this article, we describe the layout and algorithms of CALTRANS and present performance measurements of the code on a variety of platforms. Explicit implementation of the parallel algorithms of CALTRANS using both the function calls of the Parallel Virtual Machine software package (PVM 3.2) and the Meiko CS-2 tagged message passing library (based on the Intel NX/2 interface) are provided in appendices.
A Programming Model for Massive Data Parallelism with Data Dependencies
International Nuclear Information System (INIS)
Cui, Xiaohui; Mueller, Frank; Potok, Thomas E.; Zhang, Yongpeng
2009-01-01
Accelerating processors can often be more cost and energy effective for a wide range of data-parallel computing problems than general-purpose processors. For graphics processor units (GPUs), this is particularly the case when program development is aided by environments such as NVIDIA s Compute Unified Device Architecture (CUDA), which dramatically reduces the gap between domain-specific architectures and general purpose programming. Nonetheless, general-purpose GPU (GPGPU) programming remains subject to several restrictions. Most significantly, the separation of host (CPU) and accelerator (GPU) address spaces requires explicit management of GPU memory resources, especially for massive data parallelism that well exceeds the memory capacity of GPUs. One solution to this problem is to transfer data between the GPU and host memories frequently. In this work, we investigate another approach. We run massively data-parallel applications on GPU clusters. We further propose a programming model for massive data parallelism with data dependencies for this scenario. Experience from micro benchmarks and real-world applications shows that our model provides not only ease of programming but also significant performance gains
DGDFT: A massively parallel method for large scale density functional theory calculations.
Hu, Wei; Lin, Lin; Yang, Chao
2015-09-28
We describe a massively parallel implementation of the recently developed discontinuous Galerkin density functional theory (DGDFT) method, for efficient large-scale Kohn-Sham DFT based electronic structure calculations. The DGDFT method uses adaptive local basis (ALB) functions generated on-the-fly during the self-consistent field iteration to represent the solution to the Kohn-Sham equations. The use of the ALB set provides a systematic way to improve the accuracy of the approximation. By using the pole expansion and selected inversion technique to compute electron density, energy, and atomic forces, we can make the computational complexity of DGDFT scale at most quadratically with respect to the number of electrons for both insulating and metallic systems. We show that for the two-dimensional (2D) phosphorene systems studied here, using 37 basis functions per atom allows us to reach an accuracy level of 1.3 × 10(-4) Hartree/atom in terms of the error of energy and 6.2 × 10(-4) Hartree/bohr in terms of the error of atomic force, respectively. DGDFT can achieve 80% parallel efficiency on 128,000 high performance computing cores when it is used to study the electronic structure of 2D phosphorene systems with 3500-14 000 atoms. This high parallel efficiency results from a two-level parallelization scheme that we will describe in detail.
DGDFT: A massively parallel method for large scale density functional theory calculations
International Nuclear Information System (INIS)
Hu, Wei; Yang, Chao; Lin, Lin
2015-01-01
We describe a massively parallel implementation of the recently developed discontinuous Galerkin density functional theory (DGDFT) method, for efficient large-scale Kohn-Sham DFT based electronic structure calculations. The DGDFT method uses adaptive local basis (ALB) functions generated on-the-fly during the self-consistent field iteration to represent the solution to the Kohn-Sham equations. The use of the ALB set provides a systematic way to improve the accuracy of the approximation. By using the pole expansion and selected inversion technique to compute electron density, energy, and atomic forces, we can make the computational complexity of DGDFT scale at most quadratically with respect to the number of electrons for both insulating and metallic systems. We show that for the two-dimensional (2D) phosphorene systems studied here, using 37 basis functions per atom allows us to reach an accuracy level of 1.3 × 10 −4 Hartree/atom in terms of the error of energy and 6.2 × 10 −4 Hartree/bohr in terms of the error of atomic force, respectively. DGDFT can achieve 80% parallel efficiency on 128,000 high performance computing cores when it is used to study the electronic structure of 2D phosphorene systems with 3500-14 000 atoms. This high parallel efficiency results from a two-level parallelization scheme that we will describe in detail
DGDFT: A massively parallel method for large scale density functional theory calculations
Energy Technology Data Exchange (ETDEWEB)
Hu, Wei, E-mail: whu@lbl.gov; Yang, Chao, E-mail: cyang@lbl.gov [Computational Research Division, Lawrence Berkeley National Laboratory, Berkeley, California 94720 (United States); Lin, Lin, E-mail: linlin@math.berkeley.edu [Computational Research Division, Lawrence Berkeley National Laboratory, Berkeley, California 94720 (United States); Department of Mathematics, University of California, Berkeley, California 94720 (United States)
2015-09-28
We describe a massively parallel implementation of the recently developed discontinuous Galerkin density functional theory (DGDFT) method, for efficient large-scale Kohn-Sham DFT based electronic structure calculations. The DGDFT method uses adaptive local basis (ALB) functions generated on-the-fly during the self-consistent field iteration to represent the solution to the Kohn-Sham equations. The use of the ALB set provides a systematic way to improve the accuracy of the approximation. By using the pole expansion and selected inversion technique to compute electron density, energy, and atomic forces, we can make the computational complexity of DGDFT scale at most quadratically with respect to the number of electrons for both insulating and metallic systems. We show that for the two-dimensional (2D) phosphorene systems studied here, using 37 basis functions per atom allows us to reach an accuracy level of 1.3 × 10{sup −4} Hartree/atom in terms of the error of energy and 6.2 × 10{sup −4} Hartree/bohr in terms of the error of atomic force, respectively. DGDFT can achieve 80% parallel efficiency on 128,000 high performance computing cores when it is used to study the electronic structure of 2D phosphorene systems with 3500-14 000 atoms. This high parallel efficiency results from a two-level parallelization scheme that we will describe in detail.
Biferale, L.; Mantovani, F.; Pivanti, M.; Pozzati, F.; Sbragaglia, M.; Schifano, S.F.; Toschi, F.; Tripiccione, R.
2011-01-01
We develop a Lattice Boltzmann code for computational fluid-dynamics and optimize it for massively parallel systems based on multi-core processors. Our code describes 2D multi-phase compressible flows. We analyze the performance bottlenecks that we find as we gradually expose a larger fraction of
International Nuclear Information System (INIS)
Stankovski, Z.
1995-01-01
The collision probability method in neutron transport, as applied to 2D geometries, consume a great amount of computer time, for a typical 2D assembly calculation evaluations. Consequently RZ or 3D calculations became prohibitive. In this paper we present a simple but efficient parallel algorithm based on the message passing host/node programing model. Parallelization was applied to the energy group treatment. Such approach permits parallelization of the existing code, requiring only limited modifications. Sequential/parallel computer portability is preserved, witch is a necessary condition for a industrial code. Sequential performances are also preserved. The algorithm is implemented on a CRAY 90 coupled to a 128 processor T3D computer, a 16 processor IBM SP1 and a network of workstations, using the Public Domain PVM library. The tests were executed for a 2D geometry with the standard 99-group library. All results were very satisfactory, the best ones with IBM SP1. Because of heterogeneity of the workstation network, we did ask high performances for this architecture. The same source code was used for all computers. A more impressive advantage of this algorithm will appear in the calculations of the SAPHYR project (with the future fine multigroup library of about 8000 groups) with a massively parallel computer, using several hundreds of processors. (author). 5 refs., 6 figs., 2 tabs
International Nuclear Information System (INIS)
Stankovski, Z.
1995-01-01
The collision probability method in neutron transport, as applied to 2D geometries, consume a great amount of computer time, for a typical 2D assembly calculation about 90% of the computing time is consumed in the collision probability evaluations. Consequently RZ or 3D calculations became prohibitive. In this paper the author presents a simple but efficient parallel algorithm based on the message passing host/node programmation model. Parallelization was applied to the energy group treatment. Such approach permits parallelization of the existing code, requiring only limited modifications. Sequential/parallel computer portability is preserved, which is a necessary condition for a industrial code. Sequential performances are also preserved. The algorithm is implemented on a CRAY 90 coupled to a 128 processor T3D computer, a 16 processor IBM SPI and a network of workstations, using the Public Domain PVM library. The tests were executed for a 2D geometry with the standard 99-group library. All results were very satisfactory, the best ones with IBM SPI. Because of heterogeneity of the workstation network, the author did not ask high performances for this architecture. The same source code was used for all computers. A more impressive advantage of this algorithm will appear in the calculations of the SAPHYR project (with the future fine multigroup library of about 8000 groups) with a massively parallel computer, using several hundreds of processors
Massively parallel red-black algorithms for x-y-z response matrix equations
International Nuclear Information System (INIS)
Hanebutte, U.R.; Laurin-Kovitz, K.; Lewis, E.E.
1992-01-01
Recently, both discrete ordinates and spherical harmonic (S n and P n ) methods have been cast in the form of response matrices. In x-y geometry, massively parallel algorithms have been developed to solve the resulting response matrix equations on the Connection Machine family of parallel computers, the CM-2, CM-200, and CM-5. These algorithms utilize two-cycle iteration on a red-black checkerboard. In this work we examine the use of massively parallel red-black algorithms to solve response matric equations in three dimensions. This longer term objective is to utilize massively parallel algorithms to solve S n and/or P n response matrix problems. In this exploratory examination, however, we consider the simple 6 x 6 response matrices that are derivable from fine-mesh diffusion approximations in three dimensions
A massively-parallel electronic-structure calculations based on real-space density functional theory
International Nuclear Information System (INIS)
Iwata, Jun-Ichi; Takahashi, Daisuke; Oshiyama, Atsushi; Boku, Taisuke; Shiraishi, Kenji; Okada, Susumu; Yabana, Kazuhiro
2010-01-01
Based on the real-space finite-difference method, we have developed a first-principles density functional program that efficiently performs large-scale calculations on massively-parallel computers. In addition to efficient parallel implementation, we also implemented several computational improvements, substantially reducing the computational costs of O(N 3 ) operations such as the Gram-Schmidt procedure and subspace diagonalization. Using the program on a massively-parallel computer cluster with a theoretical peak performance of several TFLOPS, we perform electronic-structure calculations for a system consisting of over 10,000 Si atoms, and obtain a self-consistent electronic-structure in a few hundred hours. We analyze in detail the costs of the program in terms of computation and of inter-node communications to clarify the efficiency, the applicability, and the possibility for further improvements.
A massively parallel discrete ordinates response matrix method for neutron transport
International Nuclear Information System (INIS)
Hanebutte, U.R.; Lewis, E.E.
1992-01-01
In this paper a discrete ordinates response matrix method is formulated with anisotropic scattering for the solution of neutron transport problems on massively parallel computers. The response matrix formulation eliminates iteration on the scattering source. The nodal matrices that result from the diamond-differenced equations are utilized in a factored form that minimizes memory requirements and significantly reduces the number of arithmetic operations required per node. The red-black solution algorithm utilizes massive parallelism by assigning each spatial node to one or more processors. The algorithm is accelerated by a synthetic method in which the low-order diffusion equations are also solved by massively parallel red-black iterations. The method is implemented on a 16K Connection Machine-2, and S 8 and S 16 solutions are obtained for fixed-source benchmark problems in x-y geometry
Parallel Implementation of the Multi-Dimensional Spectral Code SPECT3D on large 3D grids.
Golovkin, Igor E.; Macfarlane, Joseph J.; Woodruff, Pamela R.; Pereyra, Nicolas A.
2006-10-01
The multi-dimensional collisional-radiative, spectral analysis code SPECT3D can be used to study radiation from complex plasmas. SPECT3D can generate instantaneous and time-gated images and spectra, space-resolved and streaked spectra, which makes it a valuable tool for post-processing hydrodynamics calculations and direct comparison between simulations and experimental data. On large three dimensional grids, transporting radiation along lines of sight (LOS) requires substantial memory and CPU resources. Currently, the parallel option in SPECT3D is based on parallelization over photon frequencies and allows for a nearly linear speed-up for a variety of problems. In addition, we are introducing a new parallel mechanism that will greatly reduce memory requirements. In the new implementation, spatial domain decomposition will be utilized allowing transport along a LOS to be performed only on the mesh cells the LOS crosses. The ability to operate on a fraction of the grid is crucial for post-processing the results of large-scale three-dimensional hydrodynamics simulations. We will present a parallel implementation of the code and provide a scalability study performed on a Linux cluster.
Increasing the reach of forensic genetics with massively parallel sequencing.
Budowle, Bruce; Schmedes, Sarah E; Wendt, Frank R
2017-09-01
The field of forensic genetics has made great strides in the analysis of biological evidence related to criminal and civil matters. More so, the discipline has set a standard of performance and quality in the forensic sciences. The advent of massively parallel sequencing will allow the field to expand its capabilities substantially. This review describes the salient features of massively parallel sequencing and how it can impact forensic genetics. The features of this technology offer increased number and types of genetic markers that can be analyzed, higher throughput of samples, and the capability of targeting different organisms, all by one unifying methodology. While there are many applications, three are described where massively parallel sequencing will have immediate impact: molecular autopsy, microbial forensics and differentiation of monozygotic twins. The intent of this review is to expose the forensic science community to the potential enhancements that have or are soon to arrive and demonstrate the continued expansion the field of forensic genetics and its service in the investigation of legal matters.
Neural nets for massively parallel optimization
Dixon, Laurence C. W.; Mills, David
1992-07-01
To apply massively parallel processing systems to the solution of large scale optimization problems it is desirable to be able to evaluate any function f(z), z (epsilon) Rn in a parallel manner. The theorem of Cybenko, Hecht Nielsen, Hornik, Stinchcombe and White, and Funahasi shows that this can be achieved by a neural network with one hidden layer. In this paper we address the problem of the number of nodes required in the layer to achieve a given accuracy in the function and gradient values at all points within a given n dimensional interval. The type of activation function needed to obtain nonsingular Hessian matrices is described and a strategy for obtaining accurate minimal networks presented.
Massive-Star Magnetospheres: Now in 3-D!
Townsend, Richard
Magnetic fields are unexpected in massive stars, due to the absence of a dynamo convection zone beneath their surface layers. Nevertheless, kilogauss-strength, ordered fields were detected in a small subset of these stars over three decades ago, and the intervening years have witnessed the steady expansion of this subset. A distinctive feature of magnetic massive stars is that they harbor magnetospheres --- circumstellar environments where the magnetic field interacts strongly with the star's radiation-driven wind, confining it and channelling it into energetic shocks. A wide range of observational signatures are associated with these magnetospheres, in diagnostics ranging from X-rays all the way through to radio emission. Moreover, these magnetospheres can play an important role in massive-star evolution, by amplifying angular momentum loss in the wind. Recent progress in understanding massive-star magnetospheres has largely been driven by magnetohydrodynamical (MHD) simulations. However, these have been restricted to two- dimensional axisymmetric configurations, with three-dimensional configurations possible only in certain special cases. These restrictions are limiting further progress; we therefore propose to develop completely general three-dimensional models for the magnetospheres of massive stars, on the one hand to understand their observational properties and exploit them as plasma-physics laboratories, and on the other to gain a comprehensive understanding of how they influence the evolution of their host star. For weak- and intermediate-field stars, the models will be based on 3-D MHD simulations using a modified version of the ZEUS-MP code. For strong-field stars, we will extend our existing Rigid Field Hydrodynamics (RFHD) code to handle completely arbitrary field topologies. To explore a putative 'photoionization-moderated mass loss' mechanism for massive-star magnetospheres, we will also further develop a photoionization code we have recently
A scalable approach to modeling groundwater flow on massively parallel computers
International Nuclear Information System (INIS)
Ashby, S.F.; Falgout, R.D.; Tompson, A.F.B.
1995-12-01
We describe a fully scalable approach to the simulation of groundwater flow on a hierarchy of computing platforms, ranging from workstations to massively parallel computers. Specifically, we advocate the use of scalable conceptual models in which the subsurface model is defined independently of the computational grid on which the simulation takes place. We also describe a scalable multigrid algorithm for computing the groundwater flow velocities. We axe thus able to leverage both the engineer's time spent developing the conceptual model and the computing resources used in the numerical simulation. We have successfully employed this approach at the LLNL site, where we have run simulations ranging in size from just a few thousand spatial zones (on workstations) to more than eight million spatial zones (on the CRAY T3D)-all using the same conceptual model
PIXIE3D: An efficient, fully implicit, parallel, 3D extended MHD code for fusion plasma modeling
International Nuclear Information System (INIS)
Chacon, L.
2007-01-01
PIXIE3D is a modern, parallel, state-of-the-art extended MHD code that employs fully implicit methods for efficiency and accuracy. It features a general geometry formulation, and is therefore suitable for the study of many magnetic fusion configurations of interest. PIXIE3D advances the state of the art in extended MHD modeling in two fundamental ways. Firstly, it employs a novel conservative finite volume scheme which is remarkably robust and stable, and demands very small physical and/or numerical dissipation. This is a fundamental requirement when one wants to study fusion plasmas with realistic conductivities. Secondly, PIXIE3D features fully-implicit time stepping, employing Newton-Krylov methods for inverting the associated nonlinear systems. These methods have been shown to be scalable and efficient when preconditioned properly. Novel preconditioned ideas (so-called physics based), which were prototypes in the context of reduced MHD, have been adapted for 3D primitive-variable resistive MHD in PIXIE3D, and are currently being extended to Hall MHD. PIXIE3D is fully parallel, employing PETSc for parallelism. PIXIE3D has been thoroughly benchmarked against linear theory and against other available extended MHD codes on nonlinear test problems (such as the GEM reconnection challenge). We are currently in the process of extending such comparisons to fusion-relevant problems in realistic geometries. In this talk, we will describe both the spatial discretization approach and the preconditioning strategy employed for extended MHD in PIXIE3D. We will report on recent benchmarking studies between PIXIE3D and other 3D extended MHD codes, and will demonstrate its usefulness in a variety of fusion-relevant configurations such as Tokamaks and Reversed Field Pinches. (Author)
Kundt solutions of minimal massive 3D gravity
Deger, Nihat Sadik; Sarıoǧlu, Ã.-zgür
2015-11-01
We construct Kundt solutions of minimal massive gravity theory and show that, similar to topologically massive gravity (TMG), most of them are constant scalar invariant (CSI) spacetimes that correspond to deformations of round and warped (A)dS. We also find an explicit non-CSI Kundt solution at the merger point. Finally, we give their algebraic classification with respect to the traceless Ricci tensor (Segre classification) and show that their Segre types match with the types of their counterparts in TMG.
International Nuclear Information System (INIS)
McGhee, J.M.; Roberts, R.M.; Morel, J.E.
1997-01-01
A spherical harmonics research code (DANTE) has been developed which is compatible with parallel computer architectures. DANTE provides 3-D, multi-material, deterministic, transport capabilities using an arbitrary finite element mesh. The linearized Boltzmann transport equation is solved in a second order self-adjoint form utilizing a Galerkin finite element spatial differencing scheme. The core solver utilizes a preconditioned conjugate gradient algorithm. Other distinguishing features of the code include options for discrete-ordinates and simplified spherical harmonics angular differencing, an exact Marshak boundary treatment for arbitrarily oriented boundary faces, in-line matrix construction techniques to minimize memory consumption, and an effective diffusion based preconditioner for scattering dominated problems. Algorithm efficiency is demonstrated for a massively parallel SIMD architecture (CM-5), and compatibility with MPP multiprocessor platforms or workstation clusters is anticipated
International Nuclear Information System (INIS)
Bergshoeff, Eric; Merbis, Wout; Hohm, Olaf; Routh, Alasdair J; Townsend, Paul K
2014-01-01
We present an alternative to topologically massive gravity (TMG) with the same ‘minimal’ bulk properties; i.e. a single local degree of freedom that is realized as a massive graviton in linearization about an anti-de Sitter (AdS) vacuum. However, in contrast to TMG, the new ‘minimal massive gravity’ has both a positive energy graviton and positive central charges for the asymptotic AdS-boundary conformal algebra. (paper)
PARALLEL SPATIOTEMPORAL SPECTRAL CLUSTERING WITH MASSIVE TRAJECTORY DATA
Directory of Open Access Journals (Sweden)
Y. Z. Gu
2017-09-01
Full Text Available Massive trajectory data contains wealth useful information and knowledge. Spectral clustering, which has been shown to be effective in finding clusters, becomes an important clustering approaches in the trajectory data mining. However, the traditional spectral clustering lacks the temporal expansion on the algorithm and limited in its applicability to large-scale problems due to its high computational complexity. This paper presents a parallel spatiotemporal spectral clustering based on multiple acceleration solutions to make the algorithm more effective and efficient, the performance is proved due to the experiment carried out on the massive taxi trajectory dataset in Wuhan city, China.
Performance Analysis of 3D Massive MIMO Cellular Systems with Collaborative Base Station
Directory of Open Access Journals (Sweden)
Xingwang Li
2014-01-01
Full Text Available Massive MIMO have drawn considerable attention as they enable significant capacity and coverage improvement in wireless cellular network. However, pilot contamination is a great challenge in massive MIMO systems. Under this circumstance, cooperation and three-dimensional (3D MIMO are emerging technologies to eliminate the pilot contamination and to enhance the performance relative to the traditional interference-limited implementations. Motivated by this, we investigate the achievable sum rate performance of MIMO systems in the uplink employing cooperative base station (BS and 3D MIMO systems. In our model, we consider the effects of both large-scale and small-scale fading, as well as the spatial correlation and indoor-to-outdoor high-rise propagation environment. In particular, we investigate the cooperative communication model based on 3D MIMO and propose a closed-form lower bound on the sum rate. Utilizing this bound, we pursue a “large-system” analysis and provide the asymptotic expression when the number of antennas at the BS grows large, and when the numbers of antennas at transceiver grow large with a fixed ratio. We demonstrate that the lower bound is very tight and becomes exact in the massive MIMO system limits. Finally, under the sum rate maximization condition, we derive the optimal number of UTs to be served.
Massively Parallel Sort-Merge Joins in Main Memory Multi-Core Database Systems
Albutiu, Martina-Cezara; Kemper, Alfons; Neumann, Thomas
2012-01-01
Two emerging hardware trends will dominate the database system technology in the near future: increasing main memory capacities of several TB per server and massively parallel multi-core processing. Many algorithmic and control techniques in current database technology were devised for disk-based systems where I/O dominated the performance. In this work we take a new look at the well-known sort-merge join which, so far, has not been in the focus of research in scalable massively parallel mult...
Proxy-equation paradigm: A strategy for massively parallel asynchronous computations
Mittal, Ankita; Girimaji, Sharath
2017-09-01
Massively parallel simulations of transport equation systems call for a paradigm change in algorithm development to achieve efficient scalability. Traditional approaches require time synchronization of processing elements (PEs), which severely restricts scalability. Relaxing synchronization requirement introduces error and slows down convergence. In this paper, we propose and develop a novel "proxy equation" concept for a general transport equation that (i) tolerates asynchrony with minimal added error, (ii) preserves convergence order and thus, (iii) expected to scale efficiently on massively parallel machines. The central idea is to modify a priori the transport equation at the PE boundaries to offset asynchrony errors. Proof-of-concept computations are performed using a one-dimensional advection (convection) diffusion equation. The results demonstrate the promise and advantages of the present strategy.
Massively parallel sparse matrix function calculations with NTPoly
Dawson, William; Nakajima, Takahito
2018-04-01
We present NTPoly, a massively parallel library for computing the functions of sparse, symmetric matrices. The theory of matrix functions is a well developed framework with a wide range of applications including differential equations, graph theory, and electronic structure calculations. One particularly important application area is diagonalization free methods in quantum chemistry. When the input and output of the matrix function are sparse, methods based on polynomial expansions can be used to compute matrix functions in linear time. We present a library based on these methods that can compute a variety of matrix functions. Distributed memory parallelization is based on a communication avoiding sparse matrix multiplication algorithm. OpenMP task parallellization is utilized to implement hybrid parallelization. We describe NTPoly's interface and show how it can be integrated with programs written in many different programming languages. We demonstrate the merits of NTPoly by performing large scale calculations on the K computer.
Massively parallel Fokker-Planck calculations
International Nuclear Information System (INIS)
Mirin, A.A.
1990-01-01
This paper reports that the Fokker-Planck package FPPAC, which solves the complete nonlinear multispecies Fokker-Planck collision operator for a plasma in two-dimensional velocity space, has been rewritten for the Connection Machine 2. This has involved allocation of variables either to the front end or the CM2, minimization of data flow, and replacement of Cray-optimized algorithms with ones suitable for a massively parallel architecture. Calculations have been carried out on various Connection Machines throughout the country. Results and timings on these machines have been compared to each other and to those on the static memory Cray-2. For large problem size, the Connection Machine 2 is found to be cost-efficient
Recent progress in 3D EM/EM-PIC simulation with ARGUS and parallel ARGUS
International Nuclear Information System (INIS)
Mankofsky, A.; Petillo, J.; Krueger, W.; Mondelli, A.; McNamara, B.; Philp, R.
1994-01-01
ARGUS is an integrated, 3-D, volumetric simulation model for systems involving electric and magnetic fields and charged particles, including materials embedded in the simulation region. The code offers the capability to carry out time domain and frequency domain electromagnetic simulations of complex physical systems. ARGUS offers a boolean solid model structure input capability that can include essentially arbitrary structures on the computational domain, and a modular architecture that allows multiple physics packages to access the same data structure and to share common code utilities. Physics modules are in place to compute electrostatic and electromagnetic fields, the normal modes of RF structures, and self-consistent particle-in-cell (PIC) simulation in either a time dependent mode or a steady state mode. The PIC modules include multiple particle species, the Lorentz equations of motion, and algorithms for the creation of particles by emission from material surfaces, injection onto the grid, and ionization. In this paper, we present an updated overview of ARGUS, with particular emphasis given in recent algorithmic and computational advances. These include a completely rewritten frequency domain solver which efficiently treats lossy materials and periodic structures, a parallel version of ARGUS with support for both shared memory parallel vector (i.e. CRAY) machines and distributed memory massively parallel MIMD systems, and numerous new applications of the code
A parallel implementation of 3-d CT image reconstruction on a hypercube multiprocessor
International Nuclear Information System (INIS)
Chen, C.M.; Lee, S.Y.; Cho, Z.H.
1990-01-01
In this paper, the authors describe how image reconstruction in computerized tomography (CT) can be parallelized on a message-passing multiprocessor. In particular, the results obtained from parallel implementation of 3-D CT image reconstruction for parallel beam geometries on the Intel hypercube, iPSC/2, are presented. A two stage pipelining approach is employed for filtering (convolution) and backprojection. The conventional sequential convolution algorithm is modified such that the symmetry of the filter kernel is fully utilized for parallelization. In the backprojection stage, the 3-D incremental algorithm, the authors' recently developed backprojection scheme which is shown to be faster than conventional algorithm, is parallelized
3D Massive MIMO Systems: Channel Modeling and Performance Analysis
Nadeem, Qurrat-Ul-Ain
2015-03-01
Multiple-input-multiple-output (MIMO) systems of current LTE releases are capable of adaptation in the azimuth only. More recently, the trend is to enhance the system performance by exploiting the channel\\'s degrees of freedom in the elevation through the dynamic adaptation of the vertical antenna beam pattern. This necessitates the derivation and characterization of three-dimensional (3D) channels. Over the years, channel models have evolved to address the challenges of wireless communication technologies. In parallel to theoretical studies on channel modeling, many standardized channel models like COST-based models, 3GPP SCM, WINNER, ITU have emerged that act as references for industries and telecommunication companies to assess system-level and link-level performances of advanced signal processing techniques over real-like channels. Given the existing channels are only two dimensional (2D) in nature; a large effort in channel modeling is needed to study the impact of the channel component in the elevation direction. The first part of this work sheds light on the current 3GPP activity around 3D channel modeling and beamforming, an aspect that to our knowledge has not been extensively covered by a research publication. The standardized MIMO channel model is presented, that incorporates both the propagation effects of the environment and the radio effects of the antennas. In order to facilitate future studies on the use of 3D beamforming, the main features of the proposed 3D channel model are discussed. A brief overview of the future 3GPP 3D channel model being outlined for the next generation of wireless networks is also provided. In the subsequent part of this work, we present an information-theoretic channel model for MIMO systems that supports the elevation dimension. The model is based on the principle of maximum entropy, which enables us to determine the distribution of the channel matrix consistent with the prior information on the angles of departure and
Performance of Air Pollution Models on Massively Parallel Computers
DEFF Research Database (Denmark)
Brown, John; Hansen, Per Christian; Wasniewski, Jerzy
1996-01-01
To compare the performance and use of three massively parallel SIMD computers, we implemented a large air pollution model on the computers. Using a realistic large-scale model, we gain detailed insight about the performance of the three computers when used to solve large-scale scientific problems...
Practical 3-D Beam Pattern Based Channel Modeling for Multi-Polarized Massive MIMO Systems.
Aghaeinezhadfirouzja, Saeid; Liu, Hui; Balador, Ali
2018-04-12
In this paper, a practical non-stationary three-dimensional (3-D) channel models for massive multiple-input multiple-output (MIMO) systems, considering beam patterns for different antenna elements, is proposed. The beam patterns using dipole antenna elements with different phase excitation toward the different direction of travels (DoTs) contributes various correlation weights for rays related towards/from the cluster, thus providing different elevation angle of arrivals (EAoAs) and elevation angle of departures (EAoDs) for each antenna element. These include the movements of the user that makes our channel to be a non-stationary model of clusters at the receiver (RX) on both the time and array axes. In addition, their impacts on 3-D massive MIMO channels are investigated via statistical properties including received spatial correlation. Additionally, the impact of elevation/azimuth angles of arrival on received spatial correlation is discussed. Furthermore, experimental validation of the proposed 3-D channel models on azimuth and elevation angles of the polarized antenna are specifically evaluated and compared through simulations. The proposed 3-D generic models are verified using relevant measurement data.
Massively parallel computation of conservation laws
Energy Technology Data Exchange (ETDEWEB)
Garbey, M [Univ. Claude Bernard, Villeurbanne (France); Levine, D [Argonne National Lab., IL (United States)
1990-01-01
The authors present a new method for computing solutions of conservation laws based on the use of cellular automata with the method of characteristics. The method exploits the high degree of parallelism available with cellular automata and retains important features of the method of characteristics. It yields high numerical accuracy and extends naturally to adaptive meshes and domain decomposition methods for perturbed conservation laws. They describe the method and its implementation for a Dirichlet problem with a single conservation law for the one-dimensional case. Numerical results for the one-dimensional law with the classical Burgers nonlinearity or the Buckley-Leverett equation show good numerical accuracy outside the neighborhood of the shocks. The error in the area of the shocks is of the order of the mesh size. The algorithm is well suited for execution on both massively parallel computers and vector machines. They present timing results for an Alliant FX/8, Connection Machine Model 2, and CRAY X-MP.
High Performance Programming Using Explicit Shared Memory Model on the Cray T3D
Saini, Subhash; Simon, Horst D.; Lasinski, T. A. (Technical Monitor)
1994-01-01
The Cray T3D is the first-phase system in Cray Research Inc.'s (CRI) three-phase massively parallel processing program. In this report we describe the architecture of the T3D, as well as the CRAFT (Cray Research Adaptive Fortran) programming model, and contrast it with PVM, which is also supported on the T3D We present some performance data based on the NAS Parallel Benchmarks to illustrate both architectural and software features of the T3D.
Parallel Processor for 3D Recovery from Optical Flow
Directory of Open Access Journals (Sweden)
Jose Hugo Barron-Zambrano
2009-01-01
Full Text Available 3D recovery from motion has received a major effort in computer vision systems in the recent years. The main problem lies in the number of operations and memory accesses to be performed by the majority of the existing techniques when translated to hardware or software implementations. This paper proposes a parallel processor for 3D recovery from optical flow. Its main feature is the maximum reuse of data and the low number of clock cycles to calculate the optical flow, along with the precision with which 3D recovery is achieved. The results of the proposed architecture as well as those from processor synthesis are presented.
Scalable Parallel Distributed Coprocessor System for Graph Searching Problems with Massive Data
Directory of Open Access Journals (Sweden)
Wanrong Huang
2017-01-01
Full Text Available The Internet applications, such as network searching, electronic commerce, and modern medical applications, produce and process massive data. Considerable data parallelism exists in computation processes of data-intensive applications. A traversal algorithm, breadth-first search (BFS, is fundamental in many graph processing applications and metrics when a graph grows in scale. A variety of scientific programming methods have been proposed for accelerating and parallelizing BFS because of the poor temporal and spatial locality caused by inherent irregular memory access patterns. However, new parallel hardware could provide better improvement for scientific methods. To address small-world graph problems, we propose a scalable and novel field-programmable gate array-based heterogeneous multicore system for scientific programming. The core is multithread for streaming processing. And the communication network InfiniBand is adopted for scalability. We design a binary search algorithm to address mapping to unify all processor addresses. Within the limits permitted by the Graph500 test bench after 1D parallel hybrid BFS algorithm testing, our 8-core and 8-thread-per-core system achieved superior performance and efficiency compared with the prior work under the same degree of parallelism. Our system is efficient not as a special acceleration unit but as a processor platform that deals with graph searching applications.
Massively Parallel Computing at Sandia and Its Application to National Defense
National Research Council Canada - National Science Library
Dosanjh, Sudip
1991-01-01
Two years ago, researchers at Sandia National Laboratories showed that a massively parallel computer with 1024 processors could solve scientific problems more than 1000 times faster than a single processor...
A parallel algorithm for 3D dislocation dynamics
International Nuclear Information System (INIS)
Wang Zhiqiang; Ghoniem, Nasr; Swaminarayan, Sriram; LeSar, Richard
2006-01-01
Dislocation dynamics (DD), a discrete dynamic simulation method in which dislocations are the fundamental entities, is a powerful tool for investigation of plasticity, deformation and fracture of materials at the micron length scale. However, severe computational difficulties arising from complex, long-range interactions between these curvilinear line defects limit the application of DD in the study of large-scale plastic deformation. We present here the development of a parallel algorithm for accelerated computer simulations of DD. By representing dislocations as a 3D set of dislocation particles, we show here that the problem of an interacting ensemble of dislocations can be converted to a problem of a particle ensemble, interacting with a long-range force field. A grid using binary space partitioning is constructed to keep track of node connectivity across domains. We demonstrate the computational efficiency of the parallel micro-plasticity code and discuss how O(N) methods map naturally onto the parallel data structure. Finally, we present results from applications of the parallel code to deformation in single crystal fcc metals
Massively Parallel Dimension Independent Adaptive Metropolis
Chen, Yuxin
2015-05-14
This work considers black-box Bayesian inference over high-dimensional parameter spaces. The well-known and widely respected adaptive Metropolis (AM) algorithm is extended herein to asymptotically scale uniformly with respect to the underlying parameter dimension, by respecting the variance, for Gaussian targets. The result- ing algorithm, referred to as the dimension-independent adaptive Metropolis (DIAM) algorithm, also shows improved performance with respect to adaptive Metropolis on non-Gaussian targets. This algorithm is further improved, and the possibility of probing high-dimensional targets is enabled, via GPU-accelerated numerical libraries and periodically synchronized concurrent chains (justified a posteriori). Asymptoti- cally in dimension, this massively parallel dimension-independent adaptive Metropolis (MPDIAM) GPU implementation exhibits a factor of four improvement versus the CPU-based Intel MKL version alone, which is itself already a factor of three improve- ment versus the serial version. The scaling to multiple CPUs and GPUs exhibits a form of strong scaling in terms of the time necessary to reach a certain convergence criterion, through a combination of longer time per sample batch (weak scaling) and yet fewer necessary samples to convergence. This is illustrated by e ciently sampling from several Gaussian and non-Gaussian targets for dimension d 1000.
Massive Asynchronous Parallelization of Sparse Matrix Factorizations
Energy Technology Data Exchange (ETDEWEB)
Chow, Edmond [Georgia Inst. of Technology, Atlanta, GA (United States)
2018-01-08
Solving sparse problems is at the core of many DOE computational science applications. We focus on the challenge of developing sparse algorithms that can fully exploit the parallelism in extreme-scale computing systems, in particular systems with massive numbers of cores per node. Our approach is to express a sparse matrix factorization as a large number of bilinear constraint equations, and then solving these equations via an asynchronous iterative method. The unknowns in these equations are the matrix entries of the factorization that is desired.
Implementation of PHENIX trigger algorithms on massively parallel computers
International Nuclear Information System (INIS)
Petridis, A.N.; Wohn, F.K.
1995-01-01
The event selection requirements of contemporary high energy and nuclear physics experiments are met by the introduction of on-line trigger algorithms which identify potentially interesting events and reduce the data acquisition rate to levels that are manageable by the electronics. Such algorithms being parallel in nature can be simulated off-line using massively parallel computers. The PHENIX experiment intends to investigate the possible existence of a new phase of matter called the quark gluon plasma which has been theorized to have existed in very early stages of the evolution of the universe by studying collisions of heavy nuclei at ultra-relativistic energies. Such interactions can also reveal important information regarding the structure of the nucleus and mandate a thorough investigation of the simpler proton-nucleus collisions at the same energies. The complexity of PHENIX events and the need to analyze and also simulate them at rates similar to the data collection ones imposes enormous computation demands. This work is a first effort to implement PHENIX trigger algorithms on parallel computers and to study the feasibility of using such machines to run the complex programs necessary for the simulation of the PHENIX detector response. Fine and coarse grain approaches have been studied and evaluated. Depending on the application the performance of a massively parallel computer can be much better or much worse than that of a serial workstation. A comparison between single instruction and multiple instruction computers is also made and possible applications of the single instruction machines to high energy and nuclear physics experiments are outlined. copyright 1995 American Institute of Physics
A 3D gyrokinetic particle-in-cell simulation of fusion plasma microturbulence on parallel computers
Williams, T. J.
1992-12-01
One of the grand challenge problems now supported by HPCC is the Numerical Tokamak Project. A goal of this project is the study of low-frequency micro-instabilities in tokamak plasmas, which are believed to cause energy loss via turbulent thermal transport across the magnetic field lines. An important tool in this study is gyrokinetic particle-in-cell (PIC) simulation. Gyrokinetic, as opposed to fully-kinetic, methods are particularly well suited to the task because they are optimized to study the frequency and wavelength domain of the microinstabilities. Furthermore, many researchers now employ low-noise delta(f) methods to greatly reduce statistical noise by modelling only the perturbation of the gyrokinetic distribution function from a fixed background, not the entire distribution function. In spite of the increased efficiency of these improved algorithms over conventional PIC algorithms, gyrokinetic PIC simulations of tokamak micro-turbulence are still highly demanding of computer power--even fully-vectorized codes on vector supercomputers. For this reason, we have worked for several years to redevelop these codes on massively parallel computers. We have developed 3D gyrokinetic PIC simulation codes for SIMD and MIMD parallel processors, using control-parallel, data-parallel, and domain-decomposition message-passing (DDMP) programming paradigms. This poster summarizes our earlier work on codes for the Connection Machine and BBN TC2000 and our development of a generic DDMP code for distributed-memory parallel machines. We discuss the memory-access issues which are of key importance in writing parallel PIC codes, with special emphasis on issues peculiar to gyrokinetic PIC. We outline the domain decompositions in our new DDMP code and discuss the interplay of different domain decompositions suited for the particle-pushing and field-solution components of the PIC algorithm.
Parallel Optimization of 3D Cardiac Electrophysiological Model Using GPU
Directory of Open Access Journals (Sweden)
Yong Xia
2015-01-01
Full Text Available Large-scale 3D virtual heart model simulations are highly demanding in computational resources. This imposes a big challenge to the traditional computation resources based on CPU environment, which already cannot meet the requirement of the whole computation demands or are not easily available due to expensive costs. GPU as a parallel computing environment therefore provides an alternative to solve the large-scale computational problems of whole heart modeling. In this study, using a 3D sheep atrial model as a test bed, we developed a GPU-based simulation algorithm to simulate the conduction of electrical excitation waves in the 3D atria. In the GPU algorithm, a multicellular tissue model was split into two components: one is the single cell model (ordinary differential equation and the other is the diffusion term of the monodomain model (partial differential equation. Such a decoupling enabled realization of the GPU parallel algorithm. Furthermore, several optimization strategies were proposed based on the features of the virtual heart model, which enabled a 200-fold speedup as compared to a CPU implementation. In conclusion, an optimized GPU algorithm has been developed that provides an economic and powerful platform for 3D whole heart simulations.
A Massively Parallel Code for Polarization Calculations
Akiyama, Shizuka; Höflich, Peter
2001-03-01
We present an implementation of our Monte-Carlo radiation transport method for rapidly expanding, NLTE atmospheres for massively parallel computers which utilizes both the distributed and shared memory models. This allows us to take full advantage of the fast communication and low latency inherent to nodes with multiple CPUs, and to stretch the limits of scalability with the number of nodes compared to a version which is based on the shared memory model. Test calculations on a local 20-node Beowulf cluster with dual CPUs showed an improved scalability by about 40%.
Analysis of multigrid methods on massively parallel computers: Architectural implications
Matheson, Lesley R.; Tarjan, Robert E.
1993-01-01
We study the potential performance of multigrid algorithms running on massively parallel computers with the intent of discovering whether presently envisioned machines will provide an efficient platform for such algorithms. We consider the domain parallel version of the standard V cycle algorithm on model problems, discretized using finite difference techniques in two and three dimensions on block structured grids of size 10(exp 6) and 10(exp 9), respectively. Our models of parallel computation were developed to reflect the computing characteristics of the current generation of massively parallel multicomputers. These models are based on an interconnection network of 256 to 16,384 message passing, 'workstation size' processors executing in an SPMD mode. The first model accomplishes interprocessor communications through a multistage permutation network. The communication cost is a logarithmic function which is similar to the costs in a variety of different topologies. The second model allows single stage communication costs only. Both models were designed with information provided by machine developers and utilize implementation derived parameters. With the medium grain parallelism of the current generation and the high fixed cost of an interprocessor communication, our analysis suggests an efficient implementation requires the machine to support the efficient transmission of long messages, (up to 1000 words) or the high initiation cost of a communication must be significantly reduced through an alternative optimization technique. Furthermore, with variable length message capability, our analysis suggests the low diameter multistage networks provide little or no advantage over a simple single stage communications network.
Reactor Dosimetry Applications Using RAPTOR-M3G:. a New Parallel 3-D Radiation Transport Code
Longoni, Gianluca; Anderson, Stanwood L.
2009-08-01
The numerical solution of the Linearized Boltzmann Equation (LBE) via the Discrete Ordinates method (SN) requires extensive computational resources for large 3-D neutron and gamma transport applications due to the concurrent discretization of the angular, spatial, and energy domains. This paper will discuss the development RAPTOR-M3G (RApid Parallel Transport Of Radiation - Multiple 3D Geometries), a new 3-D parallel radiation transport code, and its application to the calculation of ex-vessel neutron dosimetry responses in the cavity of a commercial 2-loop Pressurized Water Reactor (PWR). RAPTOR-M3G is based domain decomposition algorithms, where the spatial and angular domains are allocated and processed on multi-processor computer architectures. As compared to traditional single-processor applications, this approach reduces the computational load as well as the memory requirement per processor, yielding an efficient solution methodology for large 3-D problems. Measured neutron dosimetry responses in the reactor cavity air gap will be compared to the RAPTOR-M3G predictions. This paper is organized as follows: Section 1 discusses the RAPTOR-M3G methodology; Section 2 describes the 2-loop PWR model and the numerical results obtained. Section 3 addresses the parallel performance of the code, and Section 4 concludes this paper with final remarks and future work.
Heydt, Carina; Fassunke, Jana; Künstlinger, Helen; Ihle, Michaela Angelika; König, Katharina; Heukamp, Lukas Carl; Schildhaus, Hans-Ulrich; Odenthal, Margarete; Büttner, Reinhard; Merkelbach-Bruse, Sabine
2014-01-01
Over the last years, massively parallel sequencing has rapidly evolved and has now transitioned into molecular pathology routine laboratories. It is an attractive platform for analysing multiple genes at the same time with very little input material. Therefore, the need for high quality DNA obtained from automated DNA extraction systems has increased, especially to those laboratories which are dealing with formalin-fixed paraffin-embedded (FFPE) material and high sample throughput. This study evaluated five automated FFPE DNA extraction systems as well as five DNA quantification systems using the three most common techniques, UV spectrophotometry, fluorescent dye-based quantification and quantitative PCR, on 26 FFPE tissue samples. Additionally, the effects on downstream applications were analysed to find the most suitable pre-analytical methods for massively parallel sequencing in routine diagnostics. The results revealed that the Maxwell 16 from Promega (Mannheim, Germany) seems to be the superior system for DNA extraction from FFPE material. The extracts had a 1.3–24.6-fold higher DNA concentration in comparison to the other extraction systems, a higher quality and were most suitable for downstream applications. The comparison of the five quantification methods showed intermethod variations but all methods could be used to estimate the right amount for PCR amplification and for massively parallel sequencing. Interestingly, the best results in massively parallel sequencing were obtained with a DNA input of 15 ng determined by the NanoDrop 2000c spectrophotometer (Thermo Fisher Scientific, Waltham, MA, USA). No difference could be detected in mutation analysis based on the results of the quantification methods. These findings emphasise, that it is particularly important to choose the most reliable and constant DNA extraction system, especially when using small biopsies and low elution volumes, and that all common DNA quantification techniques can be used for
Directory of Open Access Journals (Sweden)
Carina Heydt
Full Text Available Over the last years, massively parallel sequencing has rapidly evolved and has now transitioned into molecular pathology routine laboratories. It is an attractive platform for analysing multiple genes at the same time with very little input material. Therefore, the need for high quality DNA obtained from automated DNA extraction systems has increased, especially to those laboratories which are dealing with formalin-fixed paraffin-embedded (FFPE material and high sample throughput. This study evaluated five automated FFPE DNA extraction systems as well as five DNA quantification systems using the three most common techniques, UV spectrophotometry, fluorescent dye-based quantification and quantitative PCR, on 26 FFPE tissue samples. Additionally, the effects on downstream applications were analysed to find the most suitable pre-analytical methods for massively parallel sequencing in routine diagnostics. The results revealed that the Maxwell 16 from Promega (Mannheim, Germany seems to be the superior system for DNA extraction from FFPE material. The extracts had a 1.3-24.6-fold higher DNA concentration in comparison to the other extraction systems, a higher quality and were most suitable for downstream applications. The comparison of the five quantification methods showed intermethod variations but all methods could be used to estimate the right amount for PCR amplification and for massively parallel sequencing. Interestingly, the best results in massively parallel sequencing were obtained with a DNA input of 15 ng determined by the NanoDrop 2000c spectrophotometer (Thermo Fisher Scientific, Waltham, MA, USA. No difference could be detected in mutation analysis based on the results of the quantification methods. These findings emphasise, that it is particularly important to choose the most reliable and constant DNA extraction system, especially when using small biopsies and low elution volumes, and that all common DNA quantification techniques can
International Nuclear Information System (INIS)
Laurent, C.; Chassery, J.M.; Peyrin, F.; Girerd, C.
1996-01-01
This paper deals with the parallel implementations of reconstruction methods in 3D tomography. 3D tomography requires voluminous data and long computation times. Parallel computing, on MIMD computers, seems to be a good approach to manage this problem. In this study, we present the different steps of the parallelization on an abstract parallel computer. Depending on the method, we use two main approaches to parallelize the algorithms: the local approach and the global approach. Experimental results on MIMD computers are presented. Two 3D images reconstructed from realistic data are showed
Massive hybrid parallelism for fully implicit multiphysics
International Nuclear Information System (INIS)
Gaston, D. R.; Permann, C. J.; Andrs, D.; Peterson, J. W.
2013-01-01
As hardware advances continue to modify the supercomputing landscape, traditional scientific software development practices will become more outdated, ineffective, and inefficient. The process of rewriting/retooling existing software for new architectures is a Sisyphean task, and results in substantial hours of development time, effort, and money. Software libraries which provide an abstraction of the resources provided by such architectures are therefore essential if the computational engineering and science communities are to continue to flourish in this modern computing environment. The Multiphysics Object Oriented Simulation Environment (MOOSE) framework enables complex multiphysics analysis tools to be built rapidly by scientists, engineers, and domain specialists, while also allowing them to both take advantage of current HPC architectures, and efficiently prepare for future supercomputer designs. MOOSE employs a hybrid shared-memory and distributed-memory parallel model and provides a complete and consistent interface for creating multiphysics analysis tools. In this paper, a brief discussion of the mathematical algorithms underlying the framework and the internal object-oriented hybrid parallel design are given. Representative massively parallel results from several applications areas are presented, and a brief discussion of future areas of research for the framework are provided. (authors)
Massive hybrid parallelism for fully implicit multiphysics
Energy Technology Data Exchange (ETDEWEB)
Gaston, D. R.; Permann, C. J.; Andrs, D.; Peterson, J. W. [Idaho National Laboratory, 2525 N. Fremont Ave., Idaho Falls, ID 83415 (United States)
2013-07-01
As hardware advances continue to modify the supercomputing landscape, traditional scientific software development practices will become more outdated, ineffective, and inefficient. The process of rewriting/retooling existing software for new architectures is a Sisyphean task, and results in substantial hours of development time, effort, and money. Software libraries which provide an abstraction of the resources provided by such architectures are therefore essential if the computational engineering and science communities are to continue to flourish in this modern computing environment. The Multiphysics Object Oriented Simulation Environment (MOOSE) framework enables complex multiphysics analysis tools to be built rapidly by scientists, engineers, and domain specialists, while also allowing them to both take advantage of current HPC architectures, and efficiently prepare for future supercomputer designs. MOOSE employs a hybrid shared-memory and distributed-memory parallel model and provides a complete and consistent interface for creating multiphysics analysis tools. In this paper, a brief discussion of the mathematical algorithms underlying the framework and the internal object-oriented hybrid parallel design are given. Representative massively parallel results from several applications areas are presented, and a brief discussion of future areas of research for the framework are provided. (authors)
MASSIVE HYBRID PARALLELISM FOR FULLY IMPLICIT MULTIPHYSICS
Energy Technology Data Exchange (ETDEWEB)
Cody J. Permann; David Andrs; John W. Peterson; Derek R. Gaston
2013-05-01
As hardware advances continue to modify the supercomputing landscape, traditional scientific software development practices will become more outdated, ineffective, and inefficient. The process of rewriting/retooling existing software for new architectures is a Sisyphean task, and results in substantial hours of development time, effort, and money. Software libraries which provide an abstraction of the resources provided by such architectures are therefore essential if the computational engineering and science communities are to continue to flourish in this modern computing environment. The Multiphysics Object Oriented Simulation Environment (MOOSE) framework enables complex multiphysics analysis tools to be built rapidly by scientists, engineers, and domain specialists, while also allowing them to both take advantage of current HPC architectures, and efficiently prepare for future supercomputer designs. MOOSE employs a hybrid shared-memory and distributed-memory parallel model and provides a complete and consistent interface for creating multiphysics analysis tools. In this paper, a brief discussion of the mathematical algorithms underlying the framework and the internal object-oriented hybrid parallel design are given. Representative massively parallel results from several applications areas are presented, and a brief discussion of future areas of research for the framework are provided.
A parallel algorithm for 3D particle tracking and Lagrangian trajectory reconstruction
International Nuclear Information System (INIS)
Barker, Douglas; Zhang, Yuanhui; Lifflander, Jonathan; Arya, Anshu
2012-01-01
Particle-tracking methods are widely used in fluid mechanics and multi-target tracking research because of their unique ability to reconstruct long trajectories with high spatial and temporal resolution. Researchers have recently demonstrated 3D tracking of several objects in real time, but as the number of objects is increased, real-time tracking becomes impossible due to data transfer and processing bottlenecks. This problem may be solved by using parallel processing. In this paper, a parallel-processing framework has been developed based on frame decomposition and is programmed using the asynchronous object-oriented Charm++ paradigm. This framework can be a key step in achieving a scalable Lagrangian measurement system for particle-tracking velocimetry and may lead to real-time measurement capabilities. The parallel tracking algorithm was evaluated with three data sets including the particle image velocimetry standard 3D images data set #352, a uniform data set for optimal parallel performance and a computational-fluid-dynamics-generated non-uniform data set to test trajectory reconstruction accuracy, consistency with the sequential version and scalability to more than 500 processors. The algorithm showed strong scaling up to 512 processors and no inherent limits of scalability were seen. Ultimately, up to a 200-fold speedup is observed compared to the serial algorithm when 256 processors were used. The parallel algorithm is adaptable and could be easily modified to use any sequential tracking algorithm, which inputs frames of 3D particle location data and outputs particle trajectories
User's Guide for TOUGH2-MP - A Massively Parallel Version of the TOUGH2 Code
International Nuclear Information System (INIS)
Earth Sciences Division; Zhang, Keni; Zhang, Keni; Wu, Yu-Shu; Pruess, Karsten
2008-01-01
TOUGH2-MP is a massively parallel (MP) version of the TOUGH2 code, designed for computationally efficient parallel simulation of isothermal and nonisothermal flows of multicomponent, multiphase fluids in one, two, and three-dimensional porous and fractured media. In recent years, computational requirements have become increasingly intensive in large or highly nonlinear problems for applications in areas such as radioactive waste disposal, CO2 geological sequestration, environmental assessment and remediation, reservoir engineering, and groundwater hydrology. The primary objective of developing the parallel-simulation capability is to significantly improve the computational performance of the TOUGH2 family of codes. The particular goal for the parallel simulator is to achieve orders-of-magnitude improvement in computational time for models with ever-increasing complexity. TOUGH2-MP is designed to perform parallel simulation on multi-CPU computational platforms. An earlier version of TOUGH2-MP (V1.0) was based on the TOUGH2 Version 1.4 with EOS3, EOS9, and T2R3D modules, a software previously qualified for applications in the Yucca Mountain project, and was designed for execution on CRAY T3E and IBM SP supercomputers. The current version of TOUGH2-MP (V2.0) includes all fluid property modules of the standard version TOUGH2 V2.0. It provides computationally efficient capabilities using supercomputers, Linux clusters, or multi-core PCs, and also offers many user-friendly features. The parallel simulator inherits all process capabilities from V2.0 together with additional capabilities for handling fractured media from V1.4. This report provides a quick starting guide on how to set up and run the TOUGH2-MP program for users with a basic knowledge of running the (standard) version TOUGH2 code. The report also gives a brief technical description of the code, including a discussion of parallel methodology, code structure, as well as mathematical and numerical methods used
Massively parallel Monte Carlo for many-particle simulations on GPUs
Energy Technology Data Exchange (ETDEWEB)
Anderson, Joshua A.; Jankowski, Eric [Department of Chemical Engineering, University of Michigan, Ann Arbor, MI 48109 (United States); Grubb, Thomas L. [Department of Materials Science and Engineering, University of Michigan, Ann Arbor, MI 48109 (United States); Engel, Michael [Department of Chemical Engineering, University of Michigan, Ann Arbor, MI 48109 (United States); Glotzer, Sharon C., E-mail: sglotzer@umich.edu [Department of Chemical Engineering, University of Michigan, Ann Arbor, MI 48109 (United States); Department of Materials Science and Engineering, University of Michigan, Ann Arbor, MI 48109 (United States)
2013-12-01
Current trends in parallel processors call for the design of efficient massively parallel algorithms for scientific computing. Parallel algorithms for Monte Carlo simulations of thermodynamic ensembles of particles have received little attention because of the inherent serial nature of the statistical sampling. In this paper, we present a massively parallel method that obeys detailed balance and implement it for a system of hard disks on the GPU. We reproduce results of serial high-precision Monte Carlo runs to verify the method. This is a good test case because the hard disk equation of state over the range where the liquid transforms into the solid is particularly sensitive to small deviations away from the balance conditions. On a Tesla K20, our GPU implementation executes over one billion trial moves per second, which is 148 times faster than on a single Intel Xeon E5540 CPU core, enables 27 times better performance per dollar, and cuts energy usage by a factor of 13. With this improved performance we are able to calculate the equation of state for systems of up to one million hard disks. These large system sizes are required in order to probe the nature of the melting transition, which has been debated for the last forty years. In this paper we present the details of our computational method, and discuss the thermodynamics of hard disks separately in a companion paper.
A massively parallel corpus: the Bible in 100 languages.
Christodouloupoulos, Christos; Steedman, Mark
We describe the creation of a massively parallel corpus based on 100 translations of the Bible. We discuss some of the difficulties in acquiring and processing the raw material as well as the potential of the Bible as a corpus for natural language processing. Finally we present a statistical analysis of the corpora collected and a detailed comparison between the English translation and other English corpora.
Scientific programming on massively parallel processor CP-PACS
International Nuclear Information System (INIS)
Boku, Taisuke
1998-01-01
The massively parallel processor CP-PACS takes various problems of calculation physics as the object, and it has been designed so that its architecture has been devised to do various numerical processings. In this report, the outline of the CP-PACS and the example of programming in the Kernel CG benchmark in NAS Parallel Benchmarks, version 1, are shown, and the pseudo vector processing mechanism and the parallel processing tuning of scientific and technical computation utilizing the three-dimensional hyper crossbar net, which are two great features of the architecture of the CP-PACS are described. As for the CP-PACS, the PUs based on RISC processor and added with pseudo vector processor are used. Pseudo vector processing is realized as the loop processing by scalar command. The features of the connection net of PUs are explained. The algorithm of the NPB version 1 Kernel CG is shown. The part that takes the time for processing most in the main loop is the product of matrix and vector (matvec), and the parallel processing of the matvec is explained. The time for the computation by the CPU is determined. As the evaluation of the performance, the evaluation of the time for execution, the short vector processing of pseudo vector processor based on slide window, and the comparison with other parallel computers are reported. (K.I.)
Engineering-Based Thermal CFD Simulations on Massive Parallel Systems
Frisch, Jérôme
2015-05-22
The development of parallel Computational Fluid Dynamics (CFD) codes is a challenging task that entails efficient parallelization concepts and strategies in order to achieve good scalability values when running those codes on modern supercomputers with several thousands to millions of cores. In this paper, we present a hierarchical data structure for massive parallel computations that supports the coupling of a Navier–Stokes-based fluid flow code with the Boussinesq approximation in order to address complex thermal scenarios for energy-related assessments. The newly designed data structure is specifically designed with the idea of interactive data exploration and visualization during runtime of the simulation code; a major shortcoming of traditional high-performance computing (HPC) simulation codes. We further show and discuss speed-up values obtained on one of Germany’s top-ranked supercomputers with up to 140,000 processes and present simulation results for different engineering-based thermal problems.
Shared memory parallelism for 3D cartesian discrete ordinates solver
International Nuclear Information System (INIS)
Moustafa, S.; Dutka-Malen, I.; Plagne, L.; Poncot, A.; Ramet, P.
2013-01-01
This paper describes the design and the performance of DOMINO, a 3D Cartesian SN solver that implements two nested levels of parallelism (multi-core + SIMD - Single Instruction on Multiple Data) on shared memory computation nodes. DOMINO is written in C++, a multi-paradigm programming language that enables the use of powerful and generic parallel programming tools such as Intel TBB and Eigen. These two libraries allow us to combine multi-thread parallelism with vector operations in an efficient and yet portable way. As a result, DOMINO can exploit the full power of modern multi-core processors and is able to tackle very large simulations, that usually require large HPC clusters, using a single computing node. For example, DOMINO solves a 3D full core PWR eigenvalue problem involving 26 energy groups, 288 angular directions (S16), 46*10 6 spatial cells and 1*10 12 DoFs within 11 hours on a single 32-core SMP node. This represents a sustained performance of 235 GFlops and 40.74% of the SMP node peak performance for the DOMINO sweep implementation. The very high Flops/Watt ratio of DOMINO makes it a very interesting building block for a future many-nodes nuclear simulation tool. (authors)
Development of parallel 3D discrete ordinates transport program on JASMIN framework
International Nuclear Information System (INIS)
Cheng, T.; Wei, J.; Shen, H.; Zhong, B.; Deng, L.
2015-01-01
A parallel 3D discrete ordinates radiation transport code JSNT-S is developed, aiming at simulating real-world radiation shielding and reactor physics applications in a reasonable time. Through the patch-based domain partition algorithm, the memory requirement is shared among processors and a space-angle parallel sweeping algorithm is developed based on data-driven algorithm. Acceleration methods such as partial current rebalance are implemented. The correctness is proved through the VENUS-3 and other benchmark models. In the radiation shielding calculation of the Qinshan-II reactor pressure vessel model with 24.3 billion DoF, only 88 seconds is required and the overall parallel efficiency of 44% is achieved on 1536 CPU cores. (author)
Energy Technology Data Exchange (ETDEWEB)
Gentzsch, W.; Ferstl, F.; Paap, H.G.; Riedel, E.
1998-03-20
In the ARTS project, system software has been developed to support smog and fluid dynamic applications on massively parallel systems. The aim is to implement and test specific software structures within an adaptive run-time system to separate the parallel core algorithms of the applications from the platform independent runtime aspects. Only slight modifications is existing Fortran and C code are necessary to integrate the application code into the new object oriented parallel integrated ARTS framework. The OO-design offers easy control, re-use and adaptation of the system services, resulting in a dramatic decrease in development time of the application and in ease of maintainability of the application software in the future. (orig.) [Deutsch] Im Projekt ARTS wird Basissoftware zur Unterstuetzung von Anwendungen aus den Bereichen Smoganalyse und Stroemungsmechanik auf massiv parallelen Systemen entwickelt und optimiert. Im Vordergrund steht die Erprobung geeigneter Strukturen, um systemnahe Funktionalitaeten in einer Laufzeitumgebung anzusiedeln und dadurch die parallelen Kernalgorithmen der Anwendungsprogramme von den plattformunabhaengigen Laufzeitaspekten zu trennen. Es handelt sich dabei um herkoemmlich strukturierten Fortran-Code, der unter minimalen Aenderungen auch weiterhin nutzbar sein muss, sowie um objektbasiert entworfenen C-Code, der die volle Funktionalitaet der ARTS-Plattform ausnutzen kann. Ein objektorientiertes Design erlaubt eine einfache Kontrolle, Wiederverwendung und Adaption der vom System vorgegebenen Basisdienste. Daraus resultiert ein deutlich reduzierter Entwicklungs- und Laufzeitaufwand fuer die Anwendung. ARTS schafft eine integrierende Plattform, die moderne Technologien aus dem Bereich objektorientierter Laufzeitsysteme mit praxisrelevanten Anforderungen aus dem Bereich des wissenschaftlichen Hoechstleistungsrechnens kombiniert. (orig.)
Climate models on massively parallel computers
International Nuclear Information System (INIS)
Vitart, F.; Rouvillois, P.
1993-01-01
First results got on massively parallel computers (Multiple Instruction Multiple Data and Simple Instruction Multiple Data) allow to consider building of coupled models with high resolutions. This would make possible simulation of thermoaline circulation and other interaction phenomena between atmosphere and ocean. The increasing of computers powers, and then the improvement of resolution will go us to revise our approximations. Then hydrostatic approximation (in ocean circulation) will not be valid when the grid mesh will be of a dimension lower than a few kilometers: We shall have to find other models. The expert appraisement got in numerical analysis at the Center of Limeil-Valenton (CEL-V) will be used again to imagine global models taking in account atmosphere, ocean, ice floe and biosphere, allowing climate simulation until a regional scale
Simulating streamer discharges in 3D with the parallel adaptive Afivo framework
H.J. Teunissen (Jannis); U. M. Ebert (Ute)
2017-01-01
htmlabstractWe present an open-source plasma fluid code for 2D, cylindrical and 3D simulations of streamer discharges, based on the Afivo framework that features adaptive mesh refinement, geometric multigrid methods for Poisson's equation, and OpenMP parallelism. We describe the numerical
Micro-mechanical Simulations of Soils using Massively Parallel Supercomputers
Directory of Open Access Journals (Sweden)
David W. Washington
2004-06-01
Full Text Available In this research a computer program, Trubal version 1.51, based on the Discrete Element Method was converted to run on a Connection Machine (CM-5,a massively parallel supercomputer with 512 nodes, to expedite the computational times of simulating Geotechnical boundary value problems. The dynamic memory algorithm in Trubal program did not perform efficiently in CM-2 machine with the Single Instruction Multiple Data (SIMD architecture. This was due to the communication overhead involving global array reductions, global array broadcast and random data movement. Therefore, a dynamic memory algorithm in Trubal program was converted to a static memory arrangement and Trubal program was successfully converted to run on CM-5 machines. The converted program was called "TRUBAL for Parallel Machines (TPM." Simulating two physical triaxial experiments and comparing simulation results with Trubal simulations validated the TPM program. With a 512 nodes CM-5 machine TPM produced a nine-fold speedup demonstrating the inherent parallelism within algorithms based on the Discrete Element Method.
3-D electromagnetic plasma particle simulations on the Intel Delta parallel computer
International Nuclear Information System (INIS)
Wang, J.; Liewer, P.C.
1994-01-01
A three-dimensional electromagnetic PIC code has been developed on the 512 node Intel Touchstone Delta MIMD parallel computer. This code is based on the General Concurrent PIC algorithm which uses a domain decomposition to divide the computation among the processors. The 3D simulation domain can be partitioned into 1-, 2-, or 3-dimensional sub-domains. Particles must be exchanged between processors as they move among the subdomains. The Intel Delta allows one to use this code for very-large-scale simulations (i.e. over 10 8 particles and 10 6 grid cells). The parallel efficiency of this code is measured, and the overall code performance on the Delta is compared with that on Cray supercomputers. It is shown that their code runs with a high parallel efficiency of ≥ 95% for large size problems. The particle push time achieved is 115 nsecs/particle/time step for 162 million particles on 512 nodes. Comparing with the performance on a single processor Cray C90, this represents a factor of 58 speedup. The code uses a finite-difference leap frog method for field solve which is significantly more efficient than fast fourier transforms on parallel computers. The performance of this code on the 128 node Cray T3D will also be discussed
Directory of Open Access Journals (Sweden)
Cronn Richard
2009-12-01
Full Text Available Abstract Background Molecular evolutionary studies share the common goal of elucidating historical relationships, and the common challenge of adequately sampling taxa and characters. Particularly at low taxonomic levels, recent divergence, rapid radiations, and conservative genome evolution yield limited sequence variation, and dense taxon sampling is often desirable. Recent advances in massively parallel sequencing make it possible to rapidly obtain large amounts of sequence data, and multiplexing makes extensive sampling of megabase sequences feasible. Is it possible to efficiently apply massively parallel sequencing to increase phylogenetic resolution at low taxonomic levels? Results We reconstruct the infrageneric phylogeny of Pinus from 37 nearly-complete chloroplast genomes (average 109 kilobases each of an approximately 120 kilobase genome generated using multiplexed massively parallel sequencing. 30/33 ingroup nodes resolved with ≥ 95% bootstrap support; this is a substantial improvement relative to prior studies, and shows massively parallel sequencing-based strategies can produce sufficient high quality sequence to reach support levels originally proposed for the phylogenetic bootstrap. Resampling simulations show that at least the entire plastome is necessary to fully resolve Pinus, particularly in rapidly radiating clades. Meta-analysis of 99 published infrageneric phylogenies shows that whole plastome analysis should provide similar gains across a range of plant genera. A disproportionate amount of phylogenetic information resides in two loci (ycf1, ycf2, highlighting their unusual evolutionary properties. Conclusion Plastome sequencing is now an efficient option for increasing phylogenetic resolution at lower taxonomic levels in plant phylogenetic and population genetic analyses. With continuing improvements in sequencing capacity, the strategies herein should revolutionize efforts requiring dense taxon and character sampling
Rastogi, Richa; Srivastava, Abhishek; Khonde, Kiran; Sirasala, Kirannmayi M.; Londhe, Ashutosh; Chavhan, Hitesh
2015-07-01
This paper presents an efficient parallel 3D Kirchhoff depth migration algorithm suitable for current class of multicore architecture. The fundamental Kirchhoff depth migration algorithm exhibits inherent parallelism however, when it comes to 3D data migration, as the data size increases the resource requirement of the algorithm also increases. This challenges its practical implementation even on current generation high performance computing systems. Therefore a smart parallelization approach is essential to handle 3D data for migration. The most compute intensive part of Kirchhoff depth migration algorithm is the calculation of traveltime tables due to its resource requirements such as memory/storage and I/O. In the current research work, we target this area and develop a competent parallel algorithm for post and prestack 3D Kirchhoff depth migration, using hybrid MPI+OpenMP programming techniques. We introduce a concept of flexi-depth iterations while depth migrating data in parallel imaging space, using optimized traveltime table computations. This concept provides flexibility to the algorithm by migrating data in a number of depth iterations, which depends upon the available node memory and the size of data to be migrated during runtime. Furthermore, it minimizes the requirements of storage, I/O and inter-node communication, thus making it advantageous over the conventional parallelization approaches. The developed parallel algorithm is demonstrated and analysed on Yuva II, a PARAM series of supercomputers. Optimization, performance and scalability experiment results along with the migration outcome show the effectiveness of the parallel algorithm.
Reduced complexity and latency for a massive MIMO system using a parallel detection algorithm
Directory of Open Access Journals (Sweden)
Shoichi Higuchi
2017-09-01
Full Text Available In recent years, massive MIMO systems have been widely researched to realize high-speed data transmission. Since massive MIMO systems use a large number of antennas, these systems require huge complexity to detect the signal. In this paper, we propose a novel detection method for massive MIMO using parallel detection with maximum likelihood detection with QR decomposition and M-algorithm (QRM-MLD to reduce the complexity and latency. The proposed scheme obtains an R matrix after permutation of an H matrix and QR decomposition. The R matrix is also eliminated using a Gauss–Jordan elimination method. By using a modified R matrix, the proposed method can detect the transmitted signal using parallel detection. From the simulation results, the proposed scheme can achieve a reduced complexity and latency with a little degradation of the bit error rate (BER performance compared with the conventional method.
Sylwestrzak, Marcin; Szlag, Daniel; Marchand, Paul J.; Kumar, Ashwin S.; Lasser, Theo
2017-08-01
We present an application of massively parallel processing of quantitative flow measurements data acquired using spectral optical coherence microscopy (SOCM). The need for massive signal processing of these particular datasets has been a major hurdle for many applications based on SOCM. In view of this difficulty, we implemented and adapted quantitative total flow estimation algorithms on graphics processing units (GPU) and achieved a 150 fold reduction in processing time when compared to a former CPU implementation. As SOCM constitutes the microscopy counterpart to spectral optical coherence tomography (SOCT), the developed processing procedure can be applied to both imaging modalities. We present the developed DLL library integrated in MATLAB (with an example) and have included the source code for adaptations and future improvements. Catalogue identifier: AFBT_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AFBT_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: GNU GPLv3 No. of lines in distributed program, including test data, etc.: 913552 No. of bytes in distributed program, including test data, etc.: 270876249 Distribution format: tar.gz Programming language: CUDA/C, MATLAB. Computer: Intel x64 CPU, GPU supporting CUDA technology. Operating system: 64-bit Windows 7 Professional. Has the code been vectorized or parallelized?: Yes, CPU code has been vectorized in MATLAB, CUDA code has been parallelized. RAM: Dependent on users parameters, typically between several gigabytes and several tens of gigabytes Classification: 6.5, 18. Nature of problem: Speed up of data processing in optical coherence microscopy Solution method: Utilization of GPU for massively parallel data processing Additional comments: Compiled DLL library with source code and documentation, example of utilization (MATLAB script with raw data) Running time: 1,8 s for one B-scan (150 × faster in comparison to the CPU
DEFF Research Database (Denmark)
Romero, N. A.; Glinsvad, Christian; Larsen, Ask Hjorth
2013-01-01
Density function theory (DFT) is the most widely employed electronic structure method because of its favorable scaling with system size and accuracy for a broad range of molecular and condensed-phase systems. The advent of massively parallel supercomputers has enhanced the scientific community...
Quantitative analysis of pulmonary perfusion using time-resolved parallel 3D MRI - initial results
International Nuclear Information System (INIS)
Fink, C.; Buhmann, R.; Plathow, C.; Puderbach, M.; Kauczor, H.U.; Risse, F.; Ley, S.; Meyer, F.J.
2004-01-01
Purpose: to assess the use of time-resolved parallel 3D MRI for a quantitative analysis of pulmonary perfusion in patients with cardiopulmonary disease. Materials and methods: eight patients with pulmonary embolism or pulmonary hypertension were examined with a time-resolved 3D gradient echo pulse sequence with parallel imaging techniques (FLASH 3D, TE/TR: 0.8/1.9 ms; flip angle: 40 ; GRAPPA). A quantitative perfusion analysis based on indicator dilution theory was performed using a dedicated software. Results: patients with pulmonary embolism or chronic thromboembolic pulmonary hypertension revealed characteristic wedge-shaped perfusion defects at perfusion MRI. They were characterized by a decreased pulmonary blood flow (PBF) and pulmonary blood volume (PBV) and increased mean transit time (MTT). Patients with primary pulmonary hypertension or eisenmenger syndrome showed a more homogeneous perfusion pattern. The mean MTT of all patients was 3.3 - 4.7 s. The mean PBF and PBV showed a broader interindividual variation (PBF: 104-322 ml/100 ml/min; PBV: 8 - 21 ml/100 ml). Conclusion: time-resolved parallel 3D MRI allows at least a semi-quantitative assessment of lung perfusion. Future studies will have to assess the clinical value of this quantitative information for the diagnosis and management of cardiopulmonary disease. (orig.) [de
Master actions for massive spin-3 particles in D = 2 + 1
Energy Technology Data Exchange (ETDEWEB)
Leite Mendonca, Elias; Dalmazi, Denis [UNESP, Campus de Guaratingueta, DFQ, Guaratingueta, SP (Brazil)
2016-04-15
We present here a relationship between massive self-dual models for spin-3 particles in D = 2 + 1 via the master action procedure. Starting with a first-order model (in the derivatives) S{sub SD(1)} we have constructed a master action which interpolates between a sequence of four self-dual models S{sub SD(i)} where i = 1, 2, 3, 4. By analyzing the particle content of the mixing terms, we give additional arguments that explain why it is apparently impossible to jump from the fourth-order model to a higher-order model. We have also analyzed similarities and differences between the fourth-order K-term in the spin-2 case and the analogous fourth-order term in the spin-3 context. (orig.)
3D Massive MIMO Systems: Modeling and Performance Analysis
Nadeem, Qurrat-Ul-Ain
2015-07-30
Multiple-input-multiple-output (MIMO) systems of current LTE releases are capable of adaptation in the azimuth only. Recently, the trend is to enhance system performance by exploiting the channel’s degrees of freedom in the elevation, which necessitates the characterization of 3D channels. We present an information-theoretic channel model for MIMO systems that supports the elevation dimension. The model is based on the principle of maximum entropy, which enables us to determine the distribution of the channel matrix consistent with the prior information on the angles. Based on this model, we provide analytical expression for the cumulative density function (CDF) of the mutual information (MI) for systems with a single receive and finite number of transmit antennas in the general signalto- interference-plus-noise-ratio (SINR) regime. The result is extended to systems with finite receive antennas in the low SINR regime. A Gaussian approximation to the asymptotic behavior of MI distribution is derived for the large number of transmit antennas and paths regime. We corroborate our analysis with simulations that study the performance gains realizable through meticulous selection of the transmit antenna downtilt angles, confirming the potential of elevation beamforming to enhance system performance. The results are directly applicable to the analysis of 5G 3D-Massive MIMO-systems.
Image processing with massively parallel computer Quadrics Q1
International Nuclear Information System (INIS)
Della Rocca, A.B.; La Porta, L.; Ferriani, S.
1995-05-01
Aimed to evaluate the image processing capabilities of the massively parallel computer Quadrics Q1, a convolution algorithm that has been implemented is described in this report. At first the discrete convolution mathematical definition is recalled together with the main Q1 h/w and s/w features. Then the different codification forms of the algorythm are described and the Q1 performances are compared with those obtained by different computers. Finally, the conclusions report on main results and suggestions
A novel two-level dynamic parallel data scheme for large 3-D SN calculations
International Nuclear Information System (INIS)
Sjoden, G.E.; Shedlock, D.; Haghighat, A.; Yi, C.
2005-01-01
We introduce a new dynamic parallel memory optimization scheme for executing large scale 3-D discrete ordinates (Sn) simulations on distributed memory parallel computers. In order for parallel transport codes to be truly scalable, they must use parallel data storage, where only the variables that are locally computed are locally stored. Even with parallel data storage for the angular variables, cumulative storage requirements for large discrete ordinates calculations can be prohibitive. To address this problem, Memory Tuning has been implemented into the PENTRAN 3-D parallel discrete ordinates code as an optimized, two-level ('large' array, 'small' array) parallel data storage scheme. Memory Tuning can be described as the process of parallel data memory optimization. Memory Tuning dynamically minimizes the amount of required parallel data in allocated memory on each processor using a statistical sampling algorithm. This algorithm is based on the integral average and standard deviation of the number of fine meshes contained in each coarse mesh in the global problem. Because PENTRAN only stores the locally computed problem phase space, optimal two-level memory assignments can be unique on each node, depending upon the parallel decomposition used (hybrid combinations of angular, energy, or spatial). As demonstrated in the two large discrete ordinates models presented (a storage cask and an OECD MOX Benchmark), Memory Tuning can save a substantial amount of memory per parallel processor, allowing one to accomplish very large scale Sn computations. (authors)
International Nuclear Information System (INIS)
Schultz, Anthony; Caspar, Thibault; Schaeffer, Mickael; Labani, Aissam; Jeung, Mi-Young; El Ghannudi, Soraya; Roy, Catherine; Ohana, Mickael
2016-01-01
To qualitatively and quantitatively compare different late gadolinium enhancement (LGE) sequences acquired at 3T with a parallel RF transmission technique. One hundred and sixty participants prospectively enrolled underwent a 3T cardiac MRI with 3 different LGE sequences: 3D Phase-Sensitive Inversion-Recovery (3D-PSIR) acquired 5 minutes after injection, 3D Inversion-Recovery (3D-IR) at 9 minutes and 3D-PSIR at 13 minutes. All LGE-positive patients were qualitatively evaluated both independently and blindly by two radiologists using a 4-level scale, and quantitatively assessed with measurement of contrast-to-noise ratio and LGE maximal surface. Statistical analyses were calculated under a Bayesian paradigm using MCMC methods. Fifty patients (70 % men, 56yo ± 19) exhibited LGE (62 % were post-ischemic, 30 % related to cardiomyopathy and 8 % post-myocarditis). Early and late 3D-PSIR were superior to 3D-IR sequences (global quality, estimated coefficient IR > early-PSIR: -2.37 CI = [-3.46; -1.38], prob(coef > 0) = 0 % and late-PSIR > IR: 3.12 CI = [0.62; 4.41], prob(coef > 0) = 100 %), LGE surface estimated coefficient IR > early-PSIR: -0.09 CI = [-1.11; -0.74], prob(coef > 0) = 0 % and late-PSIR > IR: 0.96 CI = [0.77; 1.15], prob(coef > 0) = 100 %. Probabilities for late PSIR being superior to early PSIR concerning global quality and CNR were over 90 %, regardless of the aetiological subgroup. In 3T cardiac MRI acquired with parallel RF transmission technique, 3D-PSIR is qualitatively and quantitatively superior to 3D-IR. (orig.)
Diversity and Multiplexing Technologies by 3D Beams in Polarized Massive MIMO Systems
Directory of Open Access Journals (Sweden)
Xin Su
2016-01-01
Full Text Available Massive multiple input, multiple output (M-MIMO technologies have been proposed to scale up data rates reaching gigabits per second in the forthcoming 5G mobile communications systems. However, one of crucial constraints is a dimension in space to implement the M-MIMO. To cope with the space constraint and to utilize more flexibility in 3D beamforming (3D-BF, we propose antenna polarization in M-MIMO systems. In this paper, we design a polarized M-MIMO (PM-MIMO system associated with 3D-BF applications, where the system architectures for diversity and multiplexing technologies achieved by polarized 3D beams are provided. Different from the conventional 3D-BF achieved by planar M-MIMO technology to control the downtilted beam in a vertical domain, the proposed PM-MIMO realizes 3D-BF via the linear combination of polarized beams. In addition, an effective array selection scheme is proposed to optimize the beam-width and to enhance system performance by the exploration of diversity and multiplexing gains; and a blind channel estimation (BCE approach is also proposed to avoid pilot contamination in PM-MIMO. Based on the Long Term Evolution-Advanced (LTE-A specification, the simulation results finally confirm the validity of our proposals.
Implementation of a Monte Carlo algorithm for neutron transport on a massively parallel SIMD machine
International Nuclear Information System (INIS)
Baker, R.S.
1992-01-01
We present some results from the recent adaptation of a vectorized Monte Carlo algorithm to a massively parallel architecture. The performance of the algorithm on a single processor Cray Y-MP and a Thinking Machine Corporations CM-2 and CM-200 is compared for several test problems. The results show that significant speedups are obtainable for vectorized Monte Carlo algorithms on massively parallel machines, even when the algorithms are applied to realistic problems which require extensive variance reduction. However, the architecture of the Connection Machine does place some limitations on the regime in which the Monte Carlo algorithm may be expected to perform well
Implementation of a Monte Carlo algorithm for neutron transport on a massively parallel SIMD machine
International Nuclear Information System (INIS)
Baker, R.S.
1993-01-01
We present some results from the recent adaptation of a vectorized Monte Carlo algorithm to a massively parallel architecture. The performance of the algorithm on a single processor Cray Y-MP and a Thinking Machine Corporations CM-2 and CM-200 is compared for several test problems. The results show that significant speedups are obtainable for vectorized Monte Carlo algorithms on massively parallel machines, even when the algorithms are applied to realistic problems which require extensive variance reduction. However, the architecture of the Connection Machine does place some limitations on the regime in which the Monte Carlo algorithm may be expected to perform well. (orig.)
cellGPU: Massively parallel simulations of dynamic vertex models
Sussman, Daniel M.
2017-10-01
Vertex models represent confluent tissue by polygonal or polyhedral tilings of space, with the individual cells interacting via force laws that depend on both the geometry of the cells and the topology of the tessellation. This dependence on the connectivity of the cellular network introduces several complications to performing molecular-dynamics-like simulations of vertex models, and in particular makes parallelizing the simulations difficult. cellGPU addresses this difficulty and lays the foundation for massively parallelized, GPU-based simulations of these models. This article discusses its implementation for a pair of two-dimensional models, and compares the typical performance that can be expected between running cellGPU entirely on the CPU versus its performance when running on a range of commercial and server-grade graphics cards. By implementing the calculation of topological changes and forces on cells in a highly parallelizable fashion, cellGPU enables researchers to simulate time- and length-scales previously inaccessible via existing single-threaded CPU implementations. Program Files doi:http://dx.doi.org/10.17632/6j2cj29t3r.1 Licensing provisions: MIT Programming language: CUDA/C++ Nature of problem: Simulations of off-lattice "vertex models" of cells, in which the interaction forces depend on both the geometry and the topology of the cellular aggregate. Solution method: Highly parallelized GPU-accelerated dynamical simulations in which the force calculations and the topological features can be handled on either the CPU or GPU. Additional comments: The code is hosted at https://gitlab.com/dmsussman/cellGPU, with documentation additionally maintained at http://dmsussman.gitlab.io/cellGPUdocumentation
High performance 3D neutron transport on peta scale and hybrid architectures within APOLLO3 code
International Nuclear Information System (INIS)
Jamelot, E.; Dubois, J.; Lautard, J-J.; Calvin, C.; Baudron, A-M.
2011-01-01
APOLLO3 code is a common project of CEA, AREVA and EDF for the development of a new generation system for core physics analysis. We present here the parallelization of two deterministic transport solvers of APOLLO3: MINOS, a simplified 3D transport solver on structured Cartesian and hexagonal grids, and MINARET, a transport solver based on triangular meshes on 2D and prismatic ones in 3D. We used two different techniques to accelerate MINOS: a domain decomposition method, combined with an accelerated algorithm using GPU. The domain decomposition is based on the Schwarz iterative algorithm, with Robin boundary conditions to exchange information. The Robin parameters influence the convergence and we detail how we optimized the choice of these parameters. MINARET parallelization is based on angular directions calculation using explicit message passing. Fine grain parallelization is also available for each angular direction using shared memory multithreaded acceleration. Many performance results are presented on massively parallel architectures using more than 103 cores and on hybrid architectures using some tens of GPUs. This work contributes to the HPC development in reactor physics at the CEA Nuclear Energy Division. (author)
Implementation of QR up- and downdating on a massively parallel |computer
DEFF Research Database (Denmark)
Bendtsen, Claus; Hansen, Per Christian; Madsen, Kaj
1995-01-01
We describe an implementation of QR up- and downdating on a massively parallel computer (the Connection Machine CM-200) and show that the algorithm maps well onto the computer. In particular, we show how the use of corrected semi-normal equations for downdating can be efficiently implemented. We...... also illustrate the use of our algorithms in a new LP algorithm....
Fast implementations of 3D PET reconstruction using vector and parallel programming techniques
International Nuclear Information System (INIS)
Guerrero, T.M.; Cherry, S.R.; Dahlbom, M.; Ricci, A.R.; Hoffman, E.J.
1993-01-01
Computationally intensive techniques that offer potential clinical use have arisen in nuclear medicine. Examples include iterative reconstruction, 3D PET data acquisition and reconstruction, and 3D image volume manipulation including image registration. One obstacle in achieving clinical acceptance of these techniques is the computational time required. This study focuses on methods to reduce the computation time for 3D PET reconstruction through the use of fast computer hardware, vector and parallel programming techniques, and algorithm optimization. The strengths and weaknesses of i860 microprocessor based workstation accelerator boards are investigated in implementations of 3D PET reconstruction
Spin-3 topologically massive gravity
Energy Technology Data Exchange (ETDEWEB)
Chen Bin, E-mail: bchen01@pku.edu.cn [Department of Physics, and State Key Laboratory of Nuclear Physics and Technology, Peking University, Beijing 100871 (China); Center for High Energy Physics, Peking University, Beijing 100871 (China); Long Jiang, E-mail: longjiang0301@gmail.com [Department of Physics, and State Key Laboratory of Nuclear Physics and Technology, Peking University, Beijing 100871 (China); Wu Junbao, E-mail: wujb@ihep.ac.cn [Institute of High Energy Physics, and Theoretical Physics Center for Science Facilities, Chinese Academy of Sciences, Beijing 100049 (China)
2011-11-24
In this Letter, we study the spin-3 topologically massive gravity (TMG), paying special attention to its properties at the chiral point. We propose an action describing the higher spin fields coupled to TMG. We discuss the traceless spin-3 fluctuations around the AdS{sub 3} vacuum and find that there is an extra local massive mode, besides the left-moving and right-moving boundary massless modes. At the chiral point, such extra mode becomes massless and degenerates with the left-moving mode. We show that at the chiral point the only degrees of freedom in the theory are the boundary right-moving graviton and spin-3 field. We conjecture that spin-3 chiral gravity with generalized Brown-Henneaux boundary condition is holographically dual to 2D chiral CFT with classical W{sub 3} algebra and central charge c{sub R}=3l/G.
On the mutual information of 3D massive MIMO systems: An asymptotic approach
Nadeem, Qurrat-Ul-Ain
2015-10-01
Motivated by the recent interest in 3D beamforming to enhance system performance, we present an information-theoretic channel model for multiple-input multiple-output (MIMO) systems, that can support the elevation dimension. The principle of maximum entropy is used to determine the distribution of the channel matrix consistent with the prior angular information. We provide an explicit expression for the cumulative density function (CDF) of the mutual information in the large number of transmit antennas and paths regime. The derived Gaussian approximation is quite accurate even for realistic system dimensions. The simulation results study the achievable performance through the meticulous selection of the transmit antenna downtilt angles. The results are directly applicable to the analysis of 5G 3D massive MIMO systems. © 2015 IEEE.
Revealing the Physics of Galactic Winds Through Massively-Parallel Hydrodynamics Simulations
Schneider, Evan Elizabeth
This thesis documents the hydrodynamics code Cholla and a numerical study of multiphase galactic winds. Cholla is a massively-parallel, GPU-based code designed for astrophysical simulations that is freely available to the astrophysics community. A static-mesh Eulerian code, Cholla is ideally suited to carrying out massive simulations (> 20483 cells) that require very high resolution. The code incorporates state-of-the-art hydrodynamics algorithms including third-order spatial reconstruction, exact and linearized Riemann solvers, and unsplit integration algorithms that account for transverse fluxes on multidimensional grids. Operator-split radiative cooling and a dual-energy formalism for high mach number flows are also included. An extensive test suite demonstrates Cholla's superior ability to model shocks and discontinuities, while the GPU-native design makes the code extremely computationally efficient - speeds of 5-10 million cell updates per GPU-second are typical on current hardware for 3D simulations with all of the aforementioned physics. The latter half of this work comprises a comprehensive study of the mixing between a hot, supernova-driven wind and cooler clouds representative of those observed in multiphase galactic winds. Both adiabatic and radiatively-cooling clouds are investigated. The analytic theory of cloud-crushing is applied to the problem, and adiabatic turbulent clouds are found to be mixed with the hot wind on similar timescales as the classic spherical case (4-5 t cc) with an appropriate rescaling of the cloud-crushing time. Radiatively cooling clouds survive considerably longer, and the differences in evolution between turbulent and spherical clouds cannot be reconciled with a simple rescaling. The rapid incorporation of low-density material into the hot wind implies efficient mass-loading of hot phases of galactic winds. At the same time, the extreme compression of high-density cloud material leads to long-lived but slow-moving clumps
Parallel 3D Mortar Element Method for Adaptive Nonconforming Meshes
Feng, Huiyu; Mavriplis, Catherine; VanderWijngaart, Rob; Biswas, Rupak
2004-01-01
High order methods are frequently used in computational simulation for their high accuracy. An efficient way to avoid unnecessary computation in smooth regions of the solution is to use adaptive meshes which employ fine grids only in areas where they are needed. Nonconforming spectral elements allow the grid to be flexibly adjusted to satisfy the computational accuracy requirements. The method is suitable for computational simulations of unsteady problems with very disparate length scales or unsteady moving features, such as heat transfer, fluid dynamics or flame combustion. In this work, we select the Mark Element Method (MEM) to handle the non-conforming interfaces between elements. A new technique is introduced to efficiently implement MEM in 3-D nonconforming meshes. By introducing an "intermediate mortar", the proposed method decomposes the projection between 3-D elements and mortars into two steps. In each step, projection matrices derived in 2-D are used. The two-step method avoids explicitly forming/deriving large projection matrices for 3-D meshes, and also helps to simplify the implementation. This new technique can be used for both h- and p-type adaptation. This method is applied to an unsteady 3-D moving heat source problem. With our new MEM implementation, mesh adaptation is able to efficiently refine the grid near the heat source and coarsen the grid once the heat source passes. The savings in computational work resulting from the dynamic mesh adaptation is demonstrated by the reduction of the the number of elements used and CPU time spent. MEM and mesh adaptation, respectively, bring irregularity and dynamics to the computer memory access pattern. Hence, they provide a good way to gauge the performance of computer systems when running scientific applications whose memory access patterns are irregular and unpredictable. We select a 3-D moving heat source problem as the Unstructured Adaptive (UA) grid benchmark, a new component of the NAS Parallel
Massively Parallel Single-Molecule Manipulation Using Centrifugal Force
Wong, Wesley; Halvorsen, Ken
2011-03-01
Precise manipulation of single molecules has led to remarkable insights in physics, chemistry, biology, and medicine. However, two issues that have impeded the widespread adoption of these techniques are equipment cost and the laborious nature of making measurements one molecule at a time. To meet these challenges, we have developed an approach that enables massively parallel single- molecule force measurements using centrifugal force. This approach is realized in the centrifuge force microscope, an instrument in which objects in an orbiting sample are subjected to a calibration-free, macroscopically uniform force- field while their micro-to-nanoscopic motions are observed. We demonstrate high- throughput single-molecule force spectroscopy with this technique by performing thousands of rupture experiments in parallel, characterizing force-dependent unbinding kinetics of an antibody-antigen pair in minutes rather than days. Currently, we are taking steps to integrate high-resolution detection, fluorescence, temperature control and a greater dynamic range in force. With significant benefits in efficiency, cost, simplicity, and versatility, single-molecule centrifugation has the potential to expand single-molecule experimentation to a wider range of researchers and experimental systems.
MADmap: A Massively Parallel Maximum-Likelihood Cosmic Microwave Background Map-Maker
Energy Technology Data Exchange (ETDEWEB)
Cantalupo, Christopher; Borrill, Julian; Jaffe, Andrew; Kisner, Theodore; Stompor, Radoslaw
2009-06-09
MADmap is a software application used to produce maximum-likelihood images of the sky from time-ordered data which include correlated noise, such as those gathered by Cosmic Microwave Background (CMB) experiments. It works efficiently on platforms ranging from small workstations to the most massively parallel supercomputers. Map-making is a critical step in the analysis of all CMB data sets, and the maximum-likelihood approach is the most accurate and widely applicable algorithm; however, it is a computationally challenging task. This challenge will only increase with the next generation of ground-based, balloon-borne and satellite CMB polarization experiments. The faintness of the B-mode signal that these experiments seek to measure requires them to gather enormous data sets. MADmap is already being run on up to O(1011) time samples, O(108) pixels and O(104) cores, with ongoing work to scale to the next generation of data sets and supercomputers. We describe MADmap's algorithm based around a preconditioned conjugate gradient solver, fast Fourier transforms and sparse matrix operations. We highlight MADmap's ability to address problems typically encountered in the analysis of realistic CMB data sets and describe its application to simulations of the Planck and EBEX experiments. The massively parallel and distributed implementation is detailed and scaling complexities are given for the resources required. MADmap is capable of analysing the largest data sets now being collected on computing resources currently available, and we argue that, given Moore's Law, MADmap will be capable of reducing the most massive projected data sets.
Directory of Open Access Journals (Sweden)
Christopher D. Dharmaraj
2009-01-01
Full Text Available Three-dimensional Oximetric Electron Paramagnetic Resonance Imaging using the Single Point Imaging modality generates unpaired spin density and oxygen images that can readily distinguish between normal and tumor tissues in small animals. It is also possible with fast imaging to track the changes in tissue oxygenation in response to the oxygen content in the breathing air. However, this involves dealing with gigabytes of data for each 3D oximetric imaging experiment involving digital band pass filtering and background noise subtraction, followed by 3D Fourier reconstruction. This process is rather slow in a conventional uniprocessor system. This paper presents a parallelization framework using OpenMP runtime support and parallel MATLAB to execute such computationally intensive programs. The Intel compiler is used to develop a parallel C++ code based on OpenMP. The code is executed on four Dual-Core AMD Opteron shared memory processors, to reduce the computational burden of the filtration task significantly. The results show that the parallel code for filtration has achieved a speed up factor of 46.66 as against the equivalent serial MATLAB code. In addition, a parallel MATLAB code has been developed to perform 3D Fourier reconstruction. Speedup factors of 4.57 and 4.25 have been achieved during the reconstruction process and oximetry computation, for a data set with 23×23×23 gradient steps. The execution time has been computed for both the serial and parallel implementations using different dimensions of the data and presented for comparison. The reported system has been designed to be easily accessible even from low-cost personal computers through local internet (NIHnet. The experimental results demonstrate that the parallel computing provides a source of high computational power to obtain biophysical parameters from 3D EPR oximetric imaging, almost in real-time.
Dharmaraj, Christopher D; Thadikonda, Kishan; Fletcher, Anthony R; Doan, Phuc N; Devasahayam, Nallathamby; Matsumoto, Shingo; Johnson, Calvin A; Cook, John A; Mitchell, James B; Subramanian, Sankaran; Krishna, Murali C
2009-01-01
Three-dimensional Oximetric Electron Paramagnetic Resonance Imaging using the Single Point Imaging modality generates unpaired spin density and oxygen images that can readily distinguish between normal and tumor tissues in small animals. It is also possible with fast imaging to track the changes in tissue oxygenation in response to the oxygen content in the breathing air. However, this involves dealing with gigabytes of data for each 3D oximetric imaging experiment involving digital band pass filtering and background noise subtraction, followed by 3D Fourier reconstruction. This process is rather slow in a conventional uniprocessor system. This paper presents a parallelization framework using OpenMP runtime support and parallel MATLAB to execute such computationally intensive programs. The Intel compiler is used to develop a parallel C++ code based on OpenMP. The code is executed on four Dual-Core AMD Opteron shared memory processors, to reduce the computational burden of the filtration task significantly. The results show that the parallel code for filtration has achieved a speed up factor of 46.66 as against the equivalent serial MATLAB code. In addition, a parallel MATLAB code has been developed to perform 3D Fourier reconstruction. Speedup factors of 4.57 and 4.25 have been achieved during the reconstruction process and oximetry computation, for a data set with 23 x 23 x 23 gradient steps. The execution time has been computed for both the serial and parallel implementations using different dimensions of the data and presented for comparison. The reported system has been designed to be easily accessible even from low-cost personal computers through local internet (NIHnet). The experimental results demonstrate that the parallel computing provides a source of high computational power to obtain biophysical parameters from 3D EPR oximetric imaging, almost in real-time.
Development of parallel Fokker-Planck code ALLAp
International Nuclear Information System (INIS)
Batishcheva, A.A.; Sigmar, D.J.; Koniges, A.E.
1996-01-01
We report on our ongoing development of the 3D Fokker-Planck code ALLA for a highly collisional scrape-off-layer (SOL) plasma. A SOL with strong gradients of density and temperature in the spatial dimension is modeled. Our method is based on a 3-D adaptive grid (in space, magnitude of the velocity, and cosine of the pitch angle) and a second order conservative scheme. Note that the grid size is typically 100 x 257 x 65 nodes. It was shown in our previous work that only these capabilities make it possible to benchmark a 3D code against a spatially-dependent self-similar solution of a kinetic equation with the Landau collision term. In the present work we show results of a more precise benchmarking against the exact solutions of the kinetic equation using a new parallel code ALLAp with an improved method of parallelization and a modified boundary condition at the plasma edge. We also report first results from the code parallelization using Message Passing Interface for a Massively Parallel CRI T3D platform. We evaluate the ALLAp code performance versus the number of T3D processors used and compare its efficiency against a Work/Data Sharing parallelization scheme and a workstation version
Massively Parallel Interrogation of Aptamer Sequence, Structure and Function
Energy Technology Data Exchange (ETDEWEB)
Fischer, N O; Tok, J B; Tarasow, T M
2008-02-08
Optimization of high affinity reagents is a significant bottleneck in medicine and the life sciences. The ability to synthetically create thousands of permutations of a lead high-affinity reagent and survey the properties of individual permutations in parallel could potentially relieve this bottleneck. Aptamers are single stranded oligonucleotides affinity reagents isolated by in vitro selection processes and as a class have been shown to bind a wide variety of target molecules. Methodology/Principal Findings. High density DNA microarray technology was used to synthesize, in situ, arrays of approximately 3,900 aptamer sequence permutations in triplicate. These sequences were interrogated on-chip for their ability to bind the fluorescently-labeled cognate target, immunoglobulin E, resulting in the parallel execution of thousands of experiments. Fluorescence intensity at each array feature was well resolved and shown to be a function of the sequence present. The data demonstrated high intra- and interchip correlation between the same features as well as among the sequence triplicates within a single array. Consistent with aptamer mediated IgE binding, fluorescence intensity correlated strongly with specific aptamer sequences and the concentration of IgE applied to the array. The massively parallel sequence-function analyses provided by this approach confirmed the importance of a consensus sequence found in all 21 of the original IgE aptamer sequences and support a common stem:loop structure as being the secondary structure underlying IgE binding. The microarray application, data and results presented illustrate an efficient, high information content approach to optimizing aptamer function. It also provides a foundation from which to better understand and manipulate this important class of high affinity biomolecules.
Massively parallel interrogation of aptamer sequence, structure and function.
Directory of Open Access Journals (Sweden)
Nicholas O Fischer
Full Text Available BACKGROUND: Optimization of high affinity reagents is a significant bottleneck in medicine and the life sciences. The ability to synthetically create thousands of permutations of a lead high-affinity reagent and survey the properties of individual permutations in parallel could potentially relieve this bottleneck. Aptamers are single stranded oligonucleotides affinity reagents isolated by in vitro selection processes and as a class have been shown to bind a wide variety of target molecules. METHODOLOGY/PRINCIPAL FINDINGS: High density DNA microarray technology was used to synthesize, in situ, arrays of approximately 3,900 aptamer sequence permutations in triplicate. These sequences were interrogated on-chip for their ability to bind the fluorescently-labeled cognate target, immunoglobulin E, resulting in the parallel execution of thousands of experiments. Fluorescence intensity at each array feature was well resolved and shown to be a function of the sequence present. The data demonstrated high intra- and inter-chip correlation between the same features as well as among the sequence triplicates within a single array. Consistent with aptamer mediated IgE binding, fluorescence intensity correlated strongly with specific aptamer sequences and the concentration of IgE applied to the array. CONCLUSION AND SIGNIFICANCE: The massively parallel sequence-function analyses provided by this approach confirmed the importance of a consensus sequence found in all 21 of the original IgE aptamer sequences and support a common stem:loop structure as being the secondary structure underlying IgE binding. The microarray application, data and results presented illustrate an efficient, high information content approach to optimizing aptamer function. It also provides a foundation from which to better understand and manipulate this important class of high affinity biomolecules.
Massively parallel performance of neutron transport response matrix algorithms
International Nuclear Information System (INIS)
Hanebutte, U.R.; Lewis, E.E.
1993-01-01
Massively parallel red/black response matrix algorithms for the solution of within-group neutron transport problems are implemented on the Connection Machines-2, 200 and 5. The response matrices are dericed from the diamond-differences and linear-linear nodal discrete ordinate and variational nodal P 3 approximations. The unaccelerated performance of the iterative procedure is examined relative to the maximum rated performances of the machines. The effects of processor partitions size, of virtual processor ratio and of problems size are examined in detail. For the red/black algorithm, the ratio of inter-node communication to computing times is found to be quite small, normally of the order of ten percent or less. Performance increases with problems size and with virtual processor ratio, within the memeory per physical processor limitation. Algorithm adaptation to courser grain machines is straight-forward, with total computing time being virtually inversely proportional to the number of physical processors. (orig.)
Computational chaos in massively parallel neural networks
Barhen, Jacob; Gulati, Sandeep
1989-01-01
A fundamental issue which directly impacts the scalability of current theoretical neural network models to massively parallel embodiments, in both software as well as hardware, is the inherent and unavoidable concurrent asynchronicity of emerging fine-grained computational ensembles and the possible emergence of chaotic manifestations. Previous analyses attributed dynamical instability to the topology of the interconnection matrix, to parasitic components or to propagation delays. However, researchers have observed the existence of emergent computational chaos in a concurrently asynchronous framework, independent of the network topology. Researcher present a methodology enabling the effective asynchronous operation of large-scale neural networks. Necessary and sufficient conditions guaranteeing concurrent asynchronous convergence are established in terms of contracting operators. Lyapunov exponents are computed formally to characterize the underlying nonlinear dynamics. Simulation results are presented to illustrate network convergence to the correct results, even in the presence of large delays.
3D-radiative transfer in terrestrial atmosphere: An efficient parallel numerical procedure
Bass, L. P.; Germogenova, T. A.; Nikolaeva, O. V.; Kokhanovsky, A. A.; Kuznetsov, V. S.
2003-04-01
Light propagation and scattering in terrestrial atmosphere is usually studied in the framework of the 1D radiative transfer theory [1]. However, in reality particles (e.g., ice crystals, solid and liquid aerosols, cloud droplets) are randomly distributed in 3D space. In particular, their concentrations vary both in vertical and horizontal directions. Therefore, 3D effects influence modern cloud and aerosol retrieval procedures, which are currently based on the 1D radiative transfer theory. It should be pointed out that the standard radiative transfer equation allows to study these more complex situations as well [2]. In recent year the parallel version of the 2D and 3D RADUGA code has been developed. This version is successfully used in gammas and neutrons transport problems [3]. Applications of this code to radiative transfer in atmosphere problems are contained in [4]. Possibilities of code RADUGA are presented in [5]. The RADUGA code system is an universal solver of radiative transfer problems for complicated models, including 2D and 3D aerosol and cloud fields with arbitrary scattering anisotropy, light absorption, inhomogeneous underlying surface and topography. Both delta type and distributed light sources can be accounted for in the framework of the algorithm developed. The accurate numerical procedure is based on the new discrete ordinate SWDD scheme [6]. The algorithm is specifically designed for parallel supercomputers. The version RADUGA 5.1(P) can run on MBC1000M [7] (768 processors with 10 Gb of hard disc memory for each processor). The peak productivity is equal 1 Tfl. Corresponding scalar version RADUGA 5.1 is working on PC. As a first example of application of the algorithm developed, we have studied the shadowing effects of clouds on neighboring cloudless atmosphere, depending on the cloud optical thickness, surface albedo, and illumination conditions. This is of importance for modern satellite aerosol retrieval algorithms development. [1] Sobolev
Massively Parallel and Scalable Implicit Time Integration Algorithms for Structural Dynamics
Farhat, Charbel
1997-01-01
Explicit codes are often used to simulate the nonlinear dynamics of large-scale structural systems, even for low frequency response, because the storage and CPU requirements entailed by the repeated factorizations traditionally found in implicit codes rapidly overwhelm the available computing resources. With the advent of parallel processing, this trend is accelerating because of the following additional facts: (a) explicit schemes are easier to parallelize than implicit ones, and (b) explicit schemes induce short range interprocessor communications that are relatively inexpensive, while the factorization methods used in most implicit schemes induce long range interprocessor communications that often ruin the sought-after speed-up. However, the time step restriction imposed by the Courant stability condition on all explicit schemes cannot yet be offset by the speed of the currently available parallel hardware. Therefore, it is essential to develop efficient alternatives to direct methods that are also amenable to massively parallel processing because implicit codes using unconditionally stable time-integration algorithms are computationally more efficient when simulating the low-frequency dynamics of aerospace structures.
A parallel 3D particle-in-cell code with dynamic load balancing
International Nuclear Information System (INIS)
Wolfheimer, Felix; Gjonaj, Erion; Weiland, Thomas
2006-01-01
A parallel 3D electrostatic Particle-In-Cell (PIC) code including an algorithm for modelling Space Charge Limited (SCL) emission [E. Gjonaj, T. Weiland, 3D-modeling of space-charge-limited electron emission. A charge conserving algorithm, Proceedings of the 11th Biennial IEEE Conference on Electromagnetic Field Computation, 2004] is presented. A domain decomposition technique based on orthogonal recursive bisection is used to parallelize the computation on a distributed memory environment of clustered workstations. For problems with a highly nonuniform and time dependent distribution of particles, e.g., bunch dynamics, a dynamic load balancing between the processes is needed to preserve the parallel performance. The algorithm for the detection of a load imbalance and the redistribution of the tasks among the processes is based on a weight function criterion, where the weight of a cell measures the computational load associated with it. The algorithm is studied with two examples. In the first example, multiple electron bunches as occurring in the S-DALINAC [A. Richter, Operational experience at the S-DALINAC, Proceedings of the Fifth European Particle Accelerator Conference, 1996] accelerator are simulated in the absence of space charge fields. In the second example, the SCL emission and electron trajectories in an electron gun are simulated
A parallel 3D particle-in-cell code with dynamic load balancing
Energy Technology Data Exchange (ETDEWEB)
Wolfheimer, Felix [Technische Universitaet Darmstadt, Institut fuer Theorie Elektromagnetischer Felder, Schlossgartenstr.8, 64283 Darmstadt (Germany)]. E-mail: wolfheimer@temf.de; Gjonaj, Erion [Technische Universitaet Darmstadt, Institut fuer Theorie Elektromagnetischer Felder, Schlossgartenstr.8, 64283 Darmstadt (Germany); Weiland, Thomas [Technische Universitaet Darmstadt, Institut fuer Theorie Elektromagnetischer Felder, Schlossgartenstr.8, 64283 Darmstadt (Germany)
2006-03-01
A parallel 3D electrostatic Particle-In-Cell (PIC) code including an algorithm for modelling Space Charge Limited (SCL) emission [E. Gjonaj, T. Weiland, 3D-modeling of space-charge-limited electron emission. A charge conserving algorithm, Proceedings of the 11th Biennial IEEE Conference on Electromagnetic Field Computation, 2004] is presented. A domain decomposition technique based on orthogonal recursive bisection is used to parallelize the computation on a distributed memory environment of clustered workstations. For problems with a highly nonuniform and time dependent distribution of particles, e.g., bunch dynamics, a dynamic load balancing between the processes is needed to preserve the parallel performance. The algorithm for the detection of a load imbalance and the redistribution of the tasks among the processes is based on a weight function criterion, where the weight of a cell measures the computational load associated with it. The algorithm is studied with two examples. In the first example, multiple electron bunches as occurring in the S-DALINAC [A. Richter, Operational experience at the S-DALINAC, Proceedings of the Fifth European Particle Accelerator Conference, 1996] accelerator are simulated in the absence of space charge fields. In the second example, the SCL emission and electron trajectories in an electron gun are simulated.
3D Hyperpolarized C-13 EPI with Calibrationless Parallel Imaging
DEFF Research Database (Denmark)
Gordon, Jeremy W.; Hansen, Rie Beck; Shin, Peter J.
2018-01-01
With the translation of metabolic MRI with hyperpolarized 13C agents into the clinic, imaging approaches will require large volumetric FOVs to support clinical applications. Parallel imaging techniques will be crucial to increasing volumetric scan coverage while minimizing RF requirements and tem...... strategies to accelerate and undersample hyperpolarized 13C data using 3D blipped EPI acquisitions and multichannel receive coils, and demonstrated its application in a human study of [1-13C]pyruvate metabolism....
Parallel computers and three-dimensional computational electromagnetics
International Nuclear Information System (INIS)
Madsen, N.K.
1994-01-01
The authors have continued to enhance their ability to use new massively parallel processing computers to solve time-domain electromagnetic problems. New vectorization techniques have improved the performance of their code DSI3D by factors of 5 to 15, depending on the computer used. New radiation boundary conditions and far-field transformations now allow the computation of radar cross-section values for complex objects. A new parallel-data extraction code has been developed that allows the extraction of data subsets from large problems, which have been run on parallel computers, for subsequent post-processing on workstations with enhanced graphics capabilities. A new charged-particle-pushing version of DSI3D is under development. Finally, DSI3D has become a focal point for several new Cooperative Research and Development Agreement activities with industrial companies such as Lockheed Advanced Development Company, Varian, Hughes Electron Dynamics Division, General Atomic, and Cray
A dynamical theory for linearized massive superspin 3/2
International Nuclear Information System (INIS)
Gates, James S. Jr.; Koutrolikos, Konstantinos
2014-01-01
We present a new theory of free massive superspin Y=3/2 irreducible representation of the 4D, N=1 Super-Poincaré group, which has linearized non-minimal supergravity (superhelicity Y=3/2) as it’s massless limit. The new results will illuminate the underlying structure of auxiliary superfields required for the description of higher massive superspin systems
Routing performance analysis and optimization within a massively parallel computer
Archer, Charles Jens; Peters, Amanda; Pinnow, Kurt Walter; Swartz, Brent Allen
2013-04-16
An apparatus, program product and method optimize the operation of a massively parallel computer system by, in part, receiving actual performance data concerning an application executed by the plurality of interconnected nodes, and analyzing the actual performance data to identify an actual performance pattern. A desired performance pattern may be determined for the application, and an algorithm may be selected from among a plurality of algorithms stored within a memory, the algorithm being configured to achieve the desired performance pattern based on the actual performance data.
DEFF Research Database (Denmark)
Kielpinski, Lukasz J; Boyd, Mette; Sandelin, Albin
2013-01-01
Detection of reverse transcriptase termination sites is important in many different applications, such as structural probing of RNAs, rapid amplification of cDNA 5' ends (5' RACE), cap analysis of gene expression, and detection of RNA modifications and protein-RNA cross-links. The throughput...... of these methods can be increased by applying massive parallel sequencing technologies.Here, we describe a versatile method for detection of reverse transcriptase termination sites based on ligation of an adapter to the 3' end of cDNA with bacteriophage TS2126 RNA ligase (CircLigase™). In the following PCR...
FIRST RESULTS FROM THE 3D-HST SURVEY: THE STRIKING DIVERSITY OF MASSIVE GALAXIES AT z > 1
International Nuclear Information System (INIS)
Van Dokkum, Pieter G.; Nelson, Erica; Skelton, Rosalind E.; Bezanson, Rachel; Lundgren, Britt; Brammer, Gabriel; Fumagalli, Mattia; Franx, Marijn; Patel, Shannon; Labbé, Ivo; Rix, Hans-Walter; Schmidt, Kasper B.; Da Cunha, Elisabete; Kriek, Mariska; Bian Fuyan; Fan Xiaohui; Erb, Dawn K.; Förster Schreiber, Natascha; Illingworth, Garth D.; Magee, Dan
2011-01-01
We present first results from the 3D-HST program, a near-IR spectroscopic survey performed with the Wide Field Camera 3 (WFC3) on the HST. We have used 3D-HST spectra to measure redshifts and Hα equivalent widths (EW Hα ) for a complete, stellar mass-limited sample of 34 galaxies at 1 star > 10 11 M ☉ in the COSMOS, GOODS, and AEGIS fields. We find that a substantial fraction of massive galaxies at this epoch are forming stars at a high rate: the fraction of galaxies with EW Hα >10 Å is 59%, compared to 10% among Sloan Digital Sky Survey galaxies of similar masses at z = 0.1. Galaxies with weak Hα emission show absorption lines typical of 2-4 Gyr old stellar populations. The structural parameters of the galaxies, derived from the associated WFC3 F140W imaging data, correlate with the presence of Hα; quiescent galaxies are compact with high Sérsic index and high inferred velocity dispersion, whereas star-forming galaxies are typically large two-armed spiral galaxies, with low Sérsic index. Some of these star-forming galaxies might be progenitors of the most massive S0 and Sa galaxies. Our results challenge the idea that galaxies at fixed mass form a homogeneous population with small scatter in their properties. Instead, we find that massive galaxies form a highly diverse population at z > 1, in marked contrast to the local universe.
Representing and computing regular languages on massively parallel networks
Energy Technology Data Exchange (ETDEWEB)
Miller, M.I.; O' Sullivan, J.A. (Electronic Systems and Research Lab., of Electrical Engineering, Washington Univ., St. Louis, MO (US)); Boysam, B. (Dept. of Electrical, Computer and Systems Engineering, Rensselaer Polytechnic Inst., Troy, NY (US)); Smith, K.R. (Dept. of Electrical Engineering, Southern Illinois Univ., Edwardsville, IL (US))
1991-01-01
This paper proposes a general method for incorporating rule-based constraints corresponding to regular languages into stochastic inference problems, thereby allowing for a unified representation of stochastic and syntactic pattern constraints. The authors' approach first established the formal connection of rules to Chomsky grammars, and generalizes the original work of Shannon on the encoding of rule-based channel sequences to Markov chains of maximum entropy. This maximum entropy probabilistic view leads to Gibb's representations with potentials which have their number of minima growing at precisely the exponential rate that the language of deterministically constrained sequences grow. These representations are coupled to stochastic diffusion algorithms, which sample the language-constrained sequences by visiting the energy minima according to the underlying Gibbs' probability law. The coupling to stochastic search methods yields the all-important practical result that fully parallel stochastic cellular automata may be derived to generate samples from the rule-based constraint sets. The production rules and neighborhood state structure of the language of sequences directly determines the necessary connection structures of the required parallel computing surface. Representations of this type have been mapped to the DAP-510 massively-parallel processor consisting of 1024 mesh-connected bit-serial processing elements for performing automated segmentation of electron-micrograph images.
Intelligent trigger by massively parallel processors for high energy physics experiments
International Nuclear Information System (INIS)
Rohrbach, F.; Vesztergombi, G.
1992-01-01
The CERN-MPPC collaboration concentrates its effort on the development of machines based on massive parallelism with thousands of integrated processing elements, arranged in a string. Seven applications are under detailed studies within the collaboration: three for LHC, one for SSC, two for fixed target high energy physics at CERN and one for HDTV. Preliminary results are presented. They show that the objectives should be reached with the use of the ASP architecture. (author)
3D hydrodynamic simulations of carbon burning in massive stars
Cristini, A.; Meakin, C.; Hirschi, R.; Arnett, D.; Georgy, C.; Viallet, M.; Walkington, I.
2017-10-01
We present the first detailed 3D hydrodynamic implicit large eddy simulations of turbulent convection of carbon burning in massive stars. Simulations begin with radial profiles mapped from a carbon-burning shell within a 15 M⊙ 1D stellar evolution model. We consider models with 1283, 2563, 5123, and 10243 zones. The turbulent flow properties of these carbon-burning simulations are very similar to the oxygen-burning case. We performed a mean field analysis of the kinetic energy budgets within the Reynolds-averaged Navier-Stokes framework. For the upper convective boundary region, we find that the numerical dissipation is insensitive to resolution for linear mesh resolutions above 512 grid points. For the stiffer, more stratified lower boundary, our highest resolution model still shows signs of decreasing sub-grid dissipation suggesting it is not yet numerically converged. We find that the widths of the upper and lower boundaries are roughly 30 per cent and 10 per cent of the local pressure scaleheights, respectively. The shape of the boundaries is significantly different from those used in stellar evolution models. As in past oxygen-shell-burning simulations, we observe entrainment at both boundaries in our carbon-shell-burning simulations. In the large Péclet number regime found in the advanced phases, the entrainment rate is roughly inversely proportional to the bulk Richardson number, RiB (∝RiB-α, 0.5 ≲ α ≲ 1.0). We thus suggest the use of RiB as a means to take into account the results of 3D hydrodynamics simulations in new 1D prescriptions of convective boundary mixing.
Massively parallel fabrication of repetitive nanostructures: nanolithography for nanoarrays
International Nuclear Information System (INIS)
Luttge, Regina
2009-01-01
This topical review provides an overview of nanolithographic techniques for nanoarrays. Using patterning techniques such as lithography, normally we aim for a higher order architecture similarly to functional systems in nature. Inspired by the wealth of complexity in nature, these architectures are translated into technical devices, for example, found in integrated circuitry or other systems in which structural elements work as discrete building blocks in microdevices. Ordered artificial nanostructures (arrays of pillars, holes and wires) have shown particular properties and bring about the opportunity to modify and tune the device operation. Moreover, these nanostructures deliver new applications, for example, the nanoscale control of spin direction within a nanomagnet. Subsequently, we can look for applications where this unique property of the smallest manufactured element is repetitively used such as, for example with respect to spin, in nanopatterned magnetic media for data storage. These nanostructures are generally called nanoarrays. Most of these applications require massively parallel produced nanopatterns which can be directly realized by laser interference (areas up to 4 cm 2 are easily achieved with a Lloyd's mirror set-up). In this topical review we will further highlight the application of laser interference as a tool for nanofabrication, its limitations and ultimate advantages towards a variety of devices including nanostructuring for photonic crystal devices, high resolution patterned media and surface modifications of medical implants. The unique properties of nanostructured surfaces have also found applications in biomedical nanoarrays used either for diagnostic or functional assays including catalytic reactions on chip. Bio-inspired templated nanoarrays will be presented in perspective to other massively parallel nanolithography techniques currently discussed in the scientific literature. (topical review)
Parallelization of 2-D lattice Boltzmann codes
International Nuclear Information System (INIS)
Suzuki, Soichiro; Kaburaki, Hideo; Yokokawa, Mitsuo.
1996-03-01
Lattice Boltzmann (LB) codes to simulate two dimensional fluid flow are developed on vector parallel computer Fujitsu VPP500 and scalar parallel computer Intel Paragon XP/S. While a 2-D domain decomposition method is used for the scalar parallel LB code, a 1-D domain decomposition method is used for the vector parallel LB code to be vectorized along with the axis perpendicular to the direction of the decomposition. High parallel efficiency of 95.1% by the vector parallel calculation on 16 processors with 1152x1152 grid and 88.6% by the scalar parallel calculation on 100 processors with 800x800 grid are obtained. The performance models are developed to analyze the performance of the LB codes. It is shown by our performance models that the execution speed of the vector parallel code is about one hundred times faster than that of the scalar parallel code with the same number of processors up to 100 processors. We also analyze the scalability in keeping the available memory size of one processor element at maximum. Our performance model predicts that the execution time of the vector parallel code increases about 3% on 500 processors. Although the 1-D domain decomposition method has in general a drawback in the interprocessor communication, the vector parallel LB code is still suitable for the large scale and/or high resolution simulations. (author)
Parallelization of 2-D lattice Boltzmann codes
Energy Technology Data Exchange (ETDEWEB)
Suzuki, Soichiro; Kaburaki, Hideo; Yokokawa, Mitsuo
1996-03-01
Lattice Boltzmann (LB) codes to simulate two dimensional fluid flow are developed on vector parallel computer Fujitsu VPP500 and scalar parallel computer Intel Paragon XP/S. While a 2-D domain decomposition method is used for the scalar parallel LB code, a 1-D domain decomposition method is used for the vector parallel LB code to be vectorized along with the axis perpendicular to the direction of the decomposition. High parallel efficiency of 95.1% by the vector parallel calculation on 16 processors with 1152x1152 grid and 88.6% by the scalar parallel calculation on 100 processors with 800x800 grid are obtained. The performance models are developed to analyze the performance of the LB codes. It is shown by our performance models that the execution speed of the vector parallel code is about one hundred times faster than that of the scalar parallel code with the same number of processors up to 100 processors. We also analyze the scalability in keeping the available memory size of one processor element at maximum. Our performance model predicts that the execution time of the vector parallel code increases about 3% on 500 processors. Although the 1-D domain decomposition method has in general a drawback in the interprocessor communication, the vector parallel LB code is still suitable for the large scale and/or high resolution simulations. (author).
Chiang, Mao-Hsiung; Lin, Hao-Ting
2011-01-01
This study aimed to develop a novel 3D parallel mechanism robot driven by three vertical-axial pneumatic actuators with a stereo vision system for path tracking control. The mechanical system and the control system are the primary novel parts for developing a 3D parallel mechanism robot. In the mechanical system, a 3D parallel mechanism robot contains three serial chains, a fixed base, a movable platform and a pneumatic servo system. The parallel mechanism are designed and analyzed first for realizing a 3D motion in the X-Y-Z coordinate system of the robot's end-effector. The inverse kinematics and the forward kinematics of the parallel mechanism robot are investigated by using the Denavit-Hartenberg notation (D-H notation) coordinate system. The pneumatic actuators in the three vertical motion axes are modeled. In the control system, the Fourier series-based adaptive sliding-mode controller with H(∞) tracking performance is used to design the path tracking controllers of the three vertical servo pneumatic actuators for realizing 3D path tracking control of the end-effector. Three optical linear scales are used to measure the position of the three pneumatic actuators. The 3D position of the end-effector is then calculated from the measuring position of the three pneumatic actuators by means of the kinematics. However, the calculated 3D position of the end-effector cannot consider the manufacturing and assembly tolerance of the joints and the parallel mechanism so that errors between the actual position and the calculated 3D position of the end-effector exist. In order to improve this situation, sensor collaboration is developed in this paper. A stereo vision system is used to collaborate with the three position sensors of the pneumatic actuators. The stereo vision system combining two CCD serves to measure the actual 3D position of the end-effector and calibrate the error between the actual and the calculated 3D position of the end-effector. Furthermore, to
Massively parallel DNA sequencing facilitates diagnosis of patients with Usher syndrome type 1.
Directory of Open Access Journals (Sweden)
Hidekane Yoshimura
Full Text Available Usher syndrome is an autosomal recessive disorder manifesting hearing loss, retinitis pigmentosa and vestibular dysfunction, and having three clinical subtypes. Usher syndrome type 1 is the most severe subtype due to its profound hearing loss, lack of vestibular responses, and retinitis pigmentosa that appears in prepuberty. Six of the corresponding genes have been identified, making early diagnosis through DNA testing possible, with many immediate and several long-term advantages for patients and their families. However, the conventional genetic techniques, such as direct sequence analysis, are both time-consuming and expensive. Targeted exon sequencing of selected genes using the massively parallel DNA sequencing technology will potentially enable us to systematically tackle previously intractable monogenic disorders and improve molecular diagnosis. Using this technique combined with direct sequence analysis, we screened 17 unrelated Usher syndrome type 1 patients and detected probable pathogenic variants in the 16 of them (94.1% who carried at least one mutation. Seven patients had the MYO7A mutation (41.2%, which is the most common type in Japanese. Most of the mutations were detected by only the massively parallel DNA sequencing. We report here four patients, who had probable pathogenic mutations in two different Usher syndrome type 1 genes, and one case of MYO7A/PCDH15 digenic inheritance. This is the first report of Usher syndrome mutation analysis using massively parallel DNA sequencing and the frequency of Usher syndrome type 1 genes in Japanese. Mutation screening using this technique has the power to quickly identify mutations of many causative genes while maintaining cost-benefit performance. In addition, the simultaneous mutation analysis of large numbers of genes is useful for detecting mutations in different genes that are possibly disease modifiers or of digenic inheritance.
Massively parallel DNA sequencing facilitates diagnosis of patients with Usher syndrome type 1.
Yoshimura, Hidekane; Iwasaki, Satoshi; Nishio, Shin-Ya; Kumakawa, Kozo; Tono, Tetsuya; Kobayashi, Yumiko; Sato, Hiroaki; Nagai, Kyoko; Ishikawa, Kotaro; Ikezono, Tetsuo; Naito, Yasushi; Fukushima, Kunihiro; Oshikawa, Chie; Kimitsuki, Takashi; Nakanishi, Hiroshi; Usami, Shin-Ichi
2014-01-01
Usher syndrome is an autosomal recessive disorder manifesting hearing loss, retinitis pigmentosa and vestibular dysfunction, and having three clinical subtypes. Usher syndrome type 1 is the most severe subtype due to its profound hearing loss, lack of vestibular responses, and retinitis pigmentosa that appears in prepuberty. Six of the corresponding genes have been identified, making early diagnosis through DNA testing possible, with many immediate and several long-term advantages for patients and their families. However, the conventional genetic techniques, such as direct sequence analysis, are both time-consuming and expensive. Targeted exon sequencing of selected genes using the massively parallel DNA sequencing technology will potentially enable us to systematically tackle previously intractable monogenic disorders and improve molecular diagnosis. Using this technique combined with direct sequence analysis, we screened 17 unrelated Usher syndrome type 1 patients and detected probable pathogenic variants in the 16 of them (94.1%) who carried at least one mutation. Seven patients had the MYO7A mutation (41.2%), which is the most common type in Japanese. Most of the mutations were detected by only the massively parallel DNA sequencing. We report here four patients, who had probable pathogenic mutations in two different Usher syndrome type 1 genes, and one case of MYO7A/PCDH15 digenic inheritance. This is the first report of Usher syndrome mutation analysis using massively parallel DNA sequencing and the frequency of Usher syndrome type 1 genes in Japanese. Mutation screening using this technique has the power to quickly identify mutations of many causative genes while maintaining cost-benefit performance. In addition, the simultaneous mutation analysis of large numbers of genes is useful for detecting mutations in different genes that are possibly disease modifiers or of digenic inheritance.
Butterfly effect in 3D gravity
Qaemmaqami, Mohammad M.
2017-11-01
We study the butterfly effect by considering shock wave solutions near the horizon of the anti-de Sitter black hole in some three-dimensional gravity models including 3D Einstein gravity, minimal massive 3D gravity, new massive gravity, generalized massive gravity, Born-Infeld 3D gravity, and new bigravity. We calculate the butterfly velocities of these models and also we consider the critical points and different limits in some of these models. By studying the butterfly effect in the generalized massive gravity, we observe a correspondence between the butterfly velocities and right-left moving degrees of freedom or the central charges of the dual 2D conformal field theories.
D2-D8 system with massive strings and the Lifshitz spacetimes
Energy Technology Data Exchange (ETDEWEB)
Singh, Harvendra [Theory Division, Saha Institute of Nuclear Physics,1/AF, Bidhannagar, Kolkata 700064 (India); Homi Bhabha National Institute,Anushakti Nagar, Mumbai 400094 (India)
2017-04-04
The Romans’ type IIA supergravity allows fundamental strings to have explicit mass term at the tree level. We show that there exists a (F1,D2,D8) brane configuration which gives rise to Lif{sub 4}{sup (2)}×R{sup 1}×S{sup 5} vacua supported by the massive strings. The presence of D8-branes naturally excites massive fundamental strings. A compactification on circle relates these Lifshitz massive type-IIA background with the axion-flux Lif{sub 4}{sup (2)}×S{sup 1}×S{sup 5} vacua in ordinary type-IIB theory. The massive T-duality in eight dimensions further relates them to yet another (Lif)-tilde {sub 4}{sup (2)}×S{sup 1}×S{sup 5} vacua constituted by (F1,D0,D6) system in ordinary type IIA theory. The latter vacua after compactification to four dimensions generate two ‘distinct’ electric charges and a constant magnetic field, all living over 2-dimensional plane. This somewhat reminds us of a similar set up in quantum Hall systems.
A parallel implementation of 3D Zernike moment analysis
Berjón, Daniel; Arnaldo, Sergio; Morán, Francisco
2011-01-01
Zernike polynomials are a well known set of functions that find many applications in image or pattern characterization because they allow to construct shape descriptors that are invariant against translations, rotations or scale changes. The concepts behind them can be extended to higher dimension spaces, making them also fit to describe volumetric data. They have been less used than their properties might suggest due to their high computational cost. We present a parallel implementation of 3D Zernike moments analysis, written in C with CUDA extensions, which makes it practical to employ Zernike descriptors in interactive applications, yielding a performance of several frames per second in voxel datasets about 2003 in size. In our contribution, we describe the challenges of implementing 3D Zernike analysis in a general-purpose GPU. These include how to deal with numerical inaccuracies, due to the high precision demands of the algorithm, or how to deal with the high volume of input data so that it does not become a bottleneck for the system.
A massively parallel strategy for STR marker development, capture, and genotyping.
Kistler, Logan; Johnson, Stephen M; Irwin, Mitchell T; Louis, Edward E; Ratan, Aakrosh; Perry, George H
2017-09-06
Short tandem repeat (STR) variants are highly polymorphic markers that facilitate powerful population genetic analyses. STRs are especially valuable in conservation and ecological genetic research, yielding detailed information on population structure and short-term demographic fluctuations. Massively parallel sequencing has not previously been leveraged for scalable, efficient STR recovery. Here, we present a pipeline for developing STR markers directly from high-throughput shotgun sequencing data without a reference genome, and an approach for highly parallel target STR recovery. We employed our approach to capture a panel of 5000 STRs from a test group of diademed sifakas (Propithecus diadema, n = 3), endangered Malagasy rainforest lemurs, and we report extremely efficient recovery of targeted loci-97.3-99.6% of STRs characterized with ≥10x non-redundant sequence coverage. We then tested our STR capture strategy on P. diadema fecal DNA, and report robust initial results and suggestions for future implementations. In addition to STR targets, this approach also generates large, genome-wide single nucleotide polymorphism (SNP) panels from flanking regions. Our method provides a cost-effective and scalable solution for rapid recovery of large STR and SNP datasets in any species without needing a reference genome, and can be used even with suboptimal DNA more easily acquired in conservation and ecological studies. Published by Oxford University Press on behalf of Nucleic Acids Research 2017.
Energy Technology Data Exchange (ETDEWEB)
1991-10-23
An account of the Caltech Concurrent Computation Program (C{sup 3}P), a five year project that focused on answering the question: Can parallel computers be used to do large-scale scientific computations '' As the title indicates, the question is answered in the affirmative, by implementing numerous scientific applications on real parallel computers and doing computations that produced new scientific results. In the process of doing so, C{sup 3}P helped design and build several new computers, designed and implemented basic system software, developed algorithms for frequently used mathematical computations on massively parallel machines, devised performance models and measured the performance of many computers, and created a high performance computing facility based exclusively on parallel computers. While the initial focus of C{sup 3}P was the hypercube architecture developed by C. Seitz, many of the methods developed and lessons learned have been applied successfully on other massively parallel architectures.
Enhanced memory architecture for massively parallel vision chip
Chen, Zhe; Yang, Jie; Liu, Liyuan; Wu, Nanjian
2015-04-01
Local memory architecture plays an important role in high performance massively parallel vision chip. In this paper, we propose an enhanced memory architecture with compact circuit area designed in a full-custom flow. The memory consists of separate master-stage static latches and shared slave-stage dynamic latches. We use split transmission transistors on the input data path to enhance tolerance for charge sharing and to achieve random read/write capabilities. The memory is designed in a 0.18 μm CMOS process. The area overhead of the memory achieves 16.6 μm2/bit. Simulation results show that the maximum operating frequency reaches 410 MHz and the corresponding peak dynamic power consumption for a 64-bit memory unit is 190 μW under 1.8 V supply voltage.
PUMA: An Operating System for Massively Parallel Systems
Directory of Open Access Journals (Sweden)
Stephen R. Wheat
1994-01-01
Full Text Available This article presents an overview of PUMA (Performance-oriented, User-managed Messaging Architecture, a message-passing kernel for massively parallel systems. Message passing in PUMA is based on portals – an opening in the address space of an application process. Once an application process has established a portal, other processes can write values into the portal using a simple send operation. Because messages are written directly into the address space of the receiving process, there is no need to buffer messages in the PUMA kernel and later copy them into the applications address space. PUMA consists of two components: the quintessential kernel (Q-Kernel and the process control thread (PCT. Although the PCT provides management decisions, the Q-Kernel controls access and implements the policies specified by the PCT.
Parallel Computer System for 3D Visualization Stereo on GPU
Al-Oraiqat, Anas M.; Zori, Sergii A.
2018-03-01
This paper proposes the organization of a parallel computer system based on Graphic Processors Unit (GPU) for 3D stereo image synthesis. The development is based on the modified ray tracing method developed by the authors for fast search of tracing rays intersections with scene objects. The system allows significant increase in the productivity for the 3D stereo synthesis of photorealistic quality. The generalized procedure of 3D stereo image synthesis on the Graphics Processing Unit/Graphics Processing Clusters (GPU/GPC) is proposed. The efficiency of the proposed solutions by GPU implementation is compared with single-threaded and multithreaded implementations on the CPU. The achieved average acceleration in multi-thread implementation on the test GPU and CPU is about 7.5 and 1.6 times, respectively. Studying the influence of choosing the size and configuration of the computational Compute Unified Device Archi-tecture (CUDA) network on the computational speed shows the importance of their correct selection. The obtained experimental estimations can be significantly improved by new GPUs with a large number of processing cores and multiprocessors, as well as optimized configuration of the computing CUDA network.
Wang, Jin; Zhang, Cao; Katz, Joseph
2016-11-01
A PIV based method to reconstruct the volumetric pressure field by direct integration of the 3D material acceleration directions has been developed. Extending the 2D virtual-boundary omni-directional method (Omni2D, Liu & Katz, 2013), the new 3D parallel-line omni-directional method (Omni3D) integrates the material acceleration along parallel lines aligned in multiple directions. Their angles are set by a spherical virtual grid. The integration is parallelized on a Tesla K40c GPU, which reduced the computing time from three hours to one minute for a single realization. To validate its performance, this method is utilized to calculate the 3D pressure fields in isotropic turbulence and channel flow using the JHU DNS Databases (http://turbulence.pha.jhu.edu). Both integration of the DNS acceleration as well as acceleration from synthetic 3D particles are tested. Results are compared to other method, e.g. solution to the Pressure Poisson Equation (e.g. PPE, Ghaemi et al., 2012) with Bernoulli based Dirichlet boundary conditions, and the Omni2D method. The error in Omni3D prediction is uniformly low, and its sensitivity to acceleration errors is local. It agrees with the PPE/Bernoulli prediction away from the Dirichlet boundary. The Omni3D method is also applied to experimental data obtained using tomographic PIV, and results are correlated with deformation of a compliant wall. ONR.
Incorporation of parallel electrospun fibers for improved topographical guidance in 3D nerve guides
International Nuclear Information System (INIS)
Jeffries, Eric M; Wang Yadong
2013-01-01
Three dimensional (3D) conduits facilitate nerve regeneration. Parallel microfibers have been shown to guide axon extension and Schwann cell migration on flat sheets via topographical cues. However, incorporation of aligned microfibers into 3D conduits to accelerate nerve regeneration has proven challenging. We report an electrospinning technique to incorporate parallel microfibers into 3D constructs at high surface areas while retaining an open architecture. The nerve guide consists of many microchannels lined with a thin layer of longitudinally-aligned microfibers. This design aims to maximize benefits of topographical cues without inhibiting cellular infiltration. We support this hypothesis by demonstrating efficient cell infiltration in vitro. Additionally, this new technique reduces wall thickness compared to our previous design, providing a greater total area for tissue growth. This approach results in an architecture that very closely mimics the structure of decellularized nerve but with larger microchannel diameters to encourage cell infiltration. We believe that reproducing the native architecture is the first step toward matching autograph efficacy. Furthermore, this design can be combined with other biochemical cues to promote nerve regeneration. (paper)
Matthew Parks; Richard Cronn; Aaron Liston
2009-01-01
We reconstruct the infrageneric phylogeny of Pinus from 37 nearly-complete chloroplast genomes (average 109 kilobases each of an approximately 120 kilobase genome) generated using multiplexed massively parallel sequencing. We found that 30/33 ingroup nodes resolved wlth > 95-percent bootstrap support; this is a substantial improvement relative...
International Nuclear Information System (INIS)
1994-08-01
This is the first annual report of the MPP pilot project 93MPR05. In this pilot project four research groups with different, complementary backgrounds collaborate with the aim to develop new algorithms and codes to simulate the magnetohydrodynamics of thermonuclear and astrophysical plasmas on massively parallel machines. The expected speed-up is required to simulate the dynamics of the hot plasmas of interest which are characterized by very large magnetic Reynolds numbers and, hence, require high spatial and temporal resolutions (for details see section 1). The four research groups that collaborated to produce the results reported here are: The MHD group of Prof. Dr. J.P. Goedbloed at the FOM-Institute for Plasma Physics 'Rijnhuizen' in Nieuwegein, the group of Prof. Dr. H. van der Vorst at the Mathematics Institute of Utrecht University, the group of Prof. Dr. A.G. Hearn at the Astronomical Institute of Utrecht University, and the group of Dr. Ir. H.J.J. te Riele at the CWI in Amsterdam. The full project team met frequently during this first project year to discuss progress reports, current problems, etc. (see section 2). The main results of the first project year are: - Proof of the scalability of typical linear and nonlinear MHD codes - development and testing of a parallel version of the Arnoldi algorithm - development and testing of alternative methods for solving large non-Hermitian eigenvalue problems - porting of the 3D nonlinear semi-implicit time evolution code HERA to an MPP system. The steps that were scheduled to reach these intended results are given in section 3. (orig./WL)
Energy Technology Data Exchange (ETDEWEB)
NONE
1994-08-01
This is the first annual report of the MPP pilot project 93MPR05. In this pilot project four research groups with different, complementary backgrounds collaborate with the aim to develop new algorithms and codes to simulate the magnetohydrodynamics of thermonuclear and astrophysical plasmas on massively parallel machines. The expected speed-up is required to simulate the dynamics of the hot plasmas of interest which are characterized by very large magnetic Reynolds numbers and, hence, require high spatial and temporal resolutions (for details see section 1). The four research groups that collaborated to produce the results reported here are: The MHD group of Prof. Dr. J.P. Goedbloed at the FOM-Institute for Plasma Physics `Rijnhuizen` in Nieuwegein, the group of Prof. Dr. H. van der Vorst at the Mathematics Institute of Utrecht University, the group of Prof. Dr. A.G. Hearn at the Astronomical Institute of Utrecht University, and the group of Dr. Ir. H.J.J. te Riele at the CWI in Amsterdam. The full project team met frequently during this first project year to discuss progress reports, current problems, etc. (see section 2). The main results of the first project year are: - Proof of the scalability of typical linear and nonlinear MHD codes - development and testing of a parallel version of the Arnoldi algorithm - development and testing of alternative methods for solving large non-Hermitian eigenvalue problems - porting of the 3D nonlinear semi-implicit time evolution code HERA to an MPP system. The steps that were scheduled to reach these intended results are given in section 3. (orig./WL).
Research in Parallel Algorithms and Software for Computational Aerosciences
Domel, Neal D.
1996-01-01
Phase 1 is complete for the development of a computational fluid dynamics CFD) parallel code with automatic grid generation and adaptation for the Euler analysis of flow over complex geometries. SPLITFLOW, an unstructured Cartesian grid code developed at Lockheed Martin Tactical Aircraft Systems, has been modified for a distributed memory/massively parallel computing environment. The parallel code is operational on an SGI network, Cray J90 and C90 vector machines, SGI Power Challenge, and Cray T3D and IBM SP2 massively parallel machines. Parallel Virtual Machine (PVM) is the message passing protocol for portability to various architectures. A domain decomposition technique was developed which enforces dynamic load balancing to improve solution speed and memory requirements. A host/node algorithm distributes the tasks. The solver parallelizes very well, and scales with the number of processors. Partially parallelized and non-parallelized tasks consume most of the wall clock time in a very fine grain environment. Timing comparisons on a Cray C90 demonstrate that Parallel SPLITFLOW runs 2.4 times faster on 8 processors than its non-parallel counterpart autotasked over 8 processors.
MCBooster: a tool for MC generation for massively parallel platforms
Alves Junior, Antonio Augusto
2016-01-01
MCBooster is a header-only, C++11-compliant library for the generation of large samples of phase-space Monte Carlo events on massively parallel platforms. It was released on GitHub in the spring of 2016. The library core algorithms implement the Raubold-Lynch method; they are able to generate the full kinematics of decays with up to nine particles in the final state. The library supports the generation of sequential decays as well as the parallel evaluation of arbitrary functions over the generated events. The output of MCBooster completely accords with popular and well-tested software packages such as GENBOD (W515 from CERNLIB) and TGenPhaseSpace from the ROOT framework. MCBooster is developed on top of the Thrust library and runs on Linux systems. It deploys transparently on NVidia CUDA-enabled GPUs as well as multicore CPUs. This contribution summarizes the main features of MCBooster. A basic description of the user interface and some examples of applications are provided, along with measurements of perfor...
A highly scalable massively parallel fast marching method for the Eikonal equation
Yang, Jianming; Stern, Frederick
2017-03-01
The fast marching method is a widely used numerical method for solving the Eikonal equation arising from a variety of scientific and engineering fields. It is long deemed inherently sequential and an efficient parallel algorithm applicable to large-scale practical applications is not available in the literature. In this study, we present a highly scalable massively parallel implementation of the fast marching method using a domain decomposition approach. Central to this algorithm is a novel restarted narrow band approach that coordinates the frequency of communications and the amount of computations extra to a sequential run for achieving an unprecedented parallel performance. Within each restart, the narrow band fast marching method is executed; simple synchronous local exchanges and global reductions are adopted for communicating updated data in the overlapping regions between neighboring subdomains and getting the latest front status, respectively. The independence of front characteristics is exploited through special data structures and augmented status tags to extract the masked parallelism within the fast marching method. The efficiency, flexibility, and applicability of the parallel algorithm are demonstrated through several examples. These problems are extensively tested on six grids with up to 1 billion points using different numbers of processes ranging from 1 to 65536. Remarkable parallel speedups are achieved using tens of thousands of processes. Detailed pseudo-codes for both the sequential and parallel algorithms are provided to illustrate the simplicity of the parallel implementation and its similarity to the sequential narrow band fast marching algorithm.
Hosokawa, Masahito; Nishikawa, Yohei; Kogawa, Masato; Takeyama, Haruko
2017-07-12
Massively parallel single-cell genome sequencing is required to further understand genetic diversities in complex biological systems. Whole genome amplification (WGA) is the first step for single-cell sequencing, but its throughput and accuracy are insufficient in conventional reaction platforms. Here, we introduce single droplet multiple displacement amplification (sd-MDA), a method that enables massively parallel amplification of single cell genomes while maintaining sequence accuracy and specificity. Tens of thousands of single cells are compartmentalized in millions of picoliter droplets and then subjected to lysis and WGA by passive droplet fusion in microfluidic channels. Because single cells are isolated in compartments, their genomes are amplified to saturation without contamination. This enables the high-throughput acquisition of contamination-free and cell specific sequence reads from single cells (21,000 single-cells/h), resulting in enhancement of the sequence data quality compared to conventional methods. This method allowed WGA of both single bacterial cells and human cancer cells. The obtained sequencing coverage rivals those of conventional techniques with superior sequence quality. In addition, we also demonstrate de novo assembly of uncultured soil bacteria and obtain draft genomes from single cell sequencing. This sd-MDA is promising for flexible and scalable use in single-cell sequencing.
Luke, Edward Allen
1993-01-01
Two algorithms capable of computing a transonic 3-D inviscid flow field about rotating machines are considered for parallel implementation. During the study of these algorithms, a significant new method of measuring the performance of parallel algorithms is developed. The theory that supports this new method creates an empirical definition of scalable parallel algorithms that is used to produce quantifiable evidence that a scalable parallel application was developed. The implementation of the parallel application and an automated domain decomposition tool are also discussed.
Block iterative restoration of astronomical images with the massively parallel processor
International Nuclear Information System (INIS)
Heap, S.R.; Lindler, D.J.
1987-01-01
A method is described for algebraic image restoration capable of treating astronomical images. For a typical 500 x 500 image, direct algebraic restoration would require the solution of a 250,000 x 250,000 linear system. The block iterative approach is used to reduce the problem to solving 4900 121 x 121 linear systems. The algorithm was implemented on the Goddard Massively Parallel Processor, which can solve a 121 x 121 system in approximately 0.06 seconds. Examples are shown of the results for various astronomical images
Advanced quadratures and periodic boundary conditions in parallel 3D Sn transport
International Nuclear Information System (INIS)
Manalo, K.; Yi, C.; Huang, M.; Sjoden, G.
2013-01-01
Significant updates in numerical quadratures have warranted investigation with 3D Sn discrete ordinates transport. We show new applications of quadrature departing from level symmetric ( 2 o) and Pn-Tn (>S 2 o). investigating 3 recently developed quadratures: Even-Odd (EO), Linear-Discontinuous Finite Element - Surface Area (LDFE-SA), and the non-symmetric Icosahedral Quadrature (IC). We discuss implementation changes to 3D Sn codes (applied to Hybrid MOC-Sn TITAN and 3D parallel PENTRAN) that can be performed to accommodate Icosahedral Quadrature, as this quadrature is not 90-degree rotation invariant. In particular, as demonstrated using PENTRAN, the properties of Icosahedral Quadrature are suitable for trivial application using periodic BCs versus that of reflective BCs. In addition to implementing periodic BCs for 3D Sn PENTRAN, we implemented a technique termed 'angular re-sweep' which properly conditions periodic BCs for outer eigenvalue iterative loop convergence. As demonstrated by two simple transport problems (3-group fixed source and 3-group reflected/periodic eigenvalue pin cell), we remark that all of the quadratures we investigated are generally superior to level symmetric quadrature, with Icosahedral Quadrature performing the most efficiently for problems tested. (authors)
Directory of Open Access Journals (Sweden)
Jiayi Wu
Full Text Available Structural heterogeneity in single-particle cryo-electron microscopy (cryo-EM data represents a major challenge for high-resolution structure determination. Unsupervised classification may serve as the first step in the assessment of structural heterogeneity. However, traditional algorithms for unsupervised classification, such as K-means clustering and maximum likelihood optimization, may classify images into wrong classes with decreasing signal-to-noise-ratio (SNR in the image data, yet demand increased computational costs. Overcoming these limitations requires further development of clustering algorithms for high-performance cryo-EM data processing. Here we introduce an unsupervised single-particle clustering algorithm derived from a statistical manifold learning framework called generative topographic mapping (GTM. We show that unsupervised GTM clustering improves classification accuracy by about 40% in the absence of input references for data with lower SNRs. Applications to several experimental datasets suggest that our algorithm can detect subtle structural differences among classes via a hierarchical clustering strategy. After code optimization over a high-performance computing (HPC environment, our software implementation was able to generate thousands of reference-free class averages within hours in a massively parallel fashion, which allows a significant improvement on ab initio 3D reconstruction and assists in the computational purification of homogeneous datasets for high-resolution visualization.
Wu, Jiayi; Ma, Yong-Bei; Congdon, Charles; Brett, Bevin; Chen, Shuobing; Xu, Yaofang; Ouyang, Qi; Mao, Youdong
2017-01-01
Structural heterogeneity in single-particle cryo-electron microscopy (cryo-EM) data represents a major challenge for high-resolution structure determination. Unsupervised classification may serve as the first step in the assessment of structural heterogeneity. However, traditional algorithms for unsupervised classification, such as K-means clustering and maximum likelihood optimization, may classify images into wrong classes with decreasing signal-to-noise-ratio (SNR) in the image data, yet demand increased computational costs. Overcoming these limitations requires further development of clustering algorithms for high-performance cryo-EM data processing. Here we introduce an unsupervised single-particle clustering algorithm derived from a statistical manifold learning framework called generative topographic mapping (GTM). We show that unsupervised GTM clustering improves classification accuracy by about 40% in the absence of input references for data with lower SNRs. Applications to several experimental datasets suggest that our algorithm can detect subtle structural differences among classes via a hierarchical clustering strategy. After code optimization over a high-performance computing (HPC) environment, our software implementation was able to generate thousands of reference-free class averages within hours in a massively parallel fashion, which allows a significant improvement on ab initio 3D reconstruction and assists in the computational purification of homogeneous datasets for high-resolution visualization.
Gauge invariant Lagrangian formulation of massive higher spin fields in (A)dS3 space
International Nuclear Information System (INIS)
Buchbinder, I.L.; Snegirev, T.V.; Zinoviev, Yu.M.
2012-01-01
We develop the frame-like formulation of massive bosonic higher spin fields in the case of three-dimensional (A)dS space with the arbitrary cosmological constant. The formulation is based on gauge invariant description by involving the Stueckelberg auxiliary fields. The explicit form of the Lagrangians and the gauge transformation laws are found. The theory can be written in terms of gauge invariant objects similar to the massless theories, thus allowing us to hope to use the same methods for investigation of interactions. In the massive spin 3 field example we are able to rewrite the Lagrangian using the new the so-called separated variables, so that the study of Lagrangian formulation reduces to finding the Lagrangian containing only half of the fields. The same construction takes places for arbitrary integer spin field as well. Further working in terms of separated variables, we build Lagrangian for arbitrary integer spin and write it in terms of gauge invariant objects. Also, we demonstrate how to restore the full set of variables, thus receiving Lagrangian for the massive fields of arbitrary spin in the terms of initial fields.
GPU-accelerated 3-D model-based tracking
International Nuclear Information System (INIS)
Brown, J Anthony; Capson, David W
2010-01-01
Model-based approaches to tracking the pose of a 3-D object in video are effective but computationally demanding. While statistical estimation techniques, such as the particle filter, are often employed to minimize the search space, real-time performance remains unachievable on current generation CPUs. Recent advances in graphics processing units (GPUs) have brought massively parallel computational power to the desktop environment and powerful developer tools, such as NVIDIA Compute Unified Device Architecture (CUDA), have provided programmers with a mechanism to exploit it. NVIDIA GPUs' single-instruction multiple-thread (SIMT) programming model is well-suited to many computer vision tasks, particularly model-based tracking, which requires several hundred 3-D model poses to be dynamically configured, rendered, and evaluated against each frame in the video sequence. Using 6 degree-of-freedom (DOF) rigid hand tracking as an example application, this work harnesses consumer-grade GPUs to achieve real-time, 3-D model-based, markerless object tracking in monocular video.
Roever, Stefan
2012-01-01
A massively parallel, low cost molecular analysis platform will dramatically change the nature of protein, molecular and genomics research, DNA sequencing, and ultimately, molecular diagnostics. An integrated circuit (IC) with 264 sensors was fabricated using standard CMOS semiconductor processing technology. Each of these sensors is individually controlled with precision analog circuitry and is capable of single molecule measurements. Under electronic and software control, the IC was used to demonstrate the feasibility of creating and detecting lipid bilayers and biological nanopores using wild type α-hemolysin. The ability to dynamically create bilayers over each of the sensors will greatly accelerate pore development and pore mutation analysis. In addition, the noise performance of the IC was measured to be 30fA(rms). With this noise performance, single base detection of DNA was demonstrated using α-hemolysin. The data shows that a single molecule, electrical detection platform using biological nanopores can be operationalized and can ultimately scale to millions of sensors. Such a massively parallel platform will revolutionize molecular analysis and will completely change the field of molecular diagnostics in the future.
Massively Parallel Sort-Merge Joins in Main Memory Multi-Core Database Systems
Martina-Cezara Albutiu, Alfons Kemper, Thomas Neumann
2012-01-01
Two emerging hardware trends will dominate the database system technology in the near future: increasing main memory capacities of several TB per server and massively parallel multi-core processing. Many algorithmic and control techniques in current database technology were devised for disk-based systems where I/O dominated the performance. In this work we take a new look at the well-known sort-merge join which, so far, has not been in the focus of research ...
A Computational Fluid Dynamics Algorithm on a Massively Parallel Computer
Jespersen, Dennis C.; Levit, Creon
1989-01-01
The discipline of computational fluid dynamics is demanding ever-increasing computational power to deal with complex fluid flow problems. We investigate the performance of a finite-difference computational fluid dynamics algorithm on a massively parallel computer, the Connection Machine. Of special interest is an implicit time-stepping algorithm; to obtain maximum performance from the Connection Machine, it is necessary to use a nonstandard algorithm to solve the linear systems that arise in the implicit algorithm. We find that the Connection Machine ran achieve very high computation rates on both explicit and implicit algorithms. The performance of the Connection Machine puts it in the same class as today's most powerful conventional supercomputers.
Xu, Jincheng; Liu, Wei; Wang, Jin; Liu, Linong; Zhang, Jianfeng
2018-02-01
De-absorption pre-stack time migration (QPSTM) compensates for the absorption and dispersion of seismic waves by introducing an effective Q parameter, thereby making it an effective tool for 3D, high-resolution imaging of seismic data. Although the optimal aperture obtained via stationary-phase migration reduces the computational cost of 3D QPSTM and yields 3D stationary-phase QPSTM, the associated computational efficiency is still the main problem in the processing of 3D, high-resolution images for real large-scale seismic data. In the current paper, we proposed a division method for large-scale, 3D seismic data to optimize the performance of stationary-phase QPSTM on clusters of graphics processing units (GPU). Then, we designed an imaging point parallel strategy to achieve an optimal parallel computing performance. Afterward, we adopted an asynchronous double buffering scheme for multi-stream to perform the GPU/CPU parallel computing. Moreover, several key optimization strategies of computation and storage based on the compute unified device architecture (CUDA) were adopted to accelerate the 3D stationary-phase QPSTM algorithm. Compared with the initial GPU code, the implementation of the key optimization steps, including thread optimization, shared memory optimization, register optimization and special function units (SFU), greatly improved the efficiency. A numerical example employing real large-scale, 3D seismic data showed that our scheme is nearly 80 times faster than the CPU-QPSTM algorithm. Our GPU/CPU heterogeneous parallel computing framework significant reduces the computational cost and facilitates 3D high-resolution imaging for large-scale seismic data.
Directory of Open Access Journals (Sweden)
Matthew O'keefe
1995-01-01
Full Text Available Massively parallel processors (MPPs hold the promise of extremely high performance that, if realized, could be used to study problems of unprecedented size and complexity. One of the primary stumbling blocks to this promise has been the lack of tools to translate application codes to MPP form. In this article we show how applications codes written in a subset of Fortran 77, called Fortran-P, can be translated to achieve good performance on several massively parallel machines. This subset can express codes that are self-similar, where the algorithm applied to the global data domain is also applied to each subdomain. We have found many codes that match the Fortran-P programming style and have converted them using our tools. We believe a self-similar coding style will accomplish what a vectorizable style has accomplished for vector machines by allowing the construction of robust, user-friendly, automatic translation systems that increase programmer productivity and generate fast, efficient code for MPPs.
Verbeek, F. J.; de Groot, M. M.; Huijsmans, D. P.; Lamers, W. H.; Young, I. T.
1993-01-01
In this paper we discuss a geometrical data base that includes three different geometrical representations of one and the same reconstructed 3D shape: the contour-pile, the voxel enumeration, and the triangulation of a surface. The data base is tailored for 3D shapes obtained from plan-parallel
Massive Parallelism of Monte-Carlo Simulation on Low-End Hardware using Graphic Processing Units
International Nuclear Information System (INIS)
Mburu, Joe Mwangi; Hah, Chang Joo Hah
2014-01-01
Within the past decade, research has been done on utilizing GPU massive parallelization in core simulation with impressive results but unfortunately, not much commercial application has been done in the nuclear field especially in reactor core simulation. The purpose of this paper is to give an introductory concept on the topic and illustrate the potential of exploiting the massive parallel nature of GPU computing on a simple monte-carlo simulation with very minimal hardware specifications. To do a comparative analysis, a simple two dimension monte-carlo simulation is implemented for both the CPU and GPU in order to evaluate performance gain based on the computing devices. The heterogeneous platform utilized in this analysis is done on a slow notebook with only 1GHz processor. The end results are quite surprising whereby high speedups obtained are almost a factor of 10. In this work, we have utilized heterogeneous computing in a GPU-based approach in applying potential high arithmetic intensive calculation. By applying a complex monte-carlo simulation on GPU platform, we have speed up the computational process by almost a factor of 10 based on one million neutrons. This shows how easy, cheap and efficient it is in using GPU in accelerating scientific computing and the results should encourage in exploring further this avenue especially in nuclear reactor physics simulation where deterministic and stochastic calculations are quite favourable in parallelization
Massive Parallelism of Monte-Carlo Simulation on Low-End Hardware using Graphic Processing Units
Energy Technology Data Exchange (ETDEWEB)
Mburu, Joe Mwangi; Hah, Chang Joo Hah [KEPCO International Nuclear Graduate School, Ulsan (Korea, Republic of)
2014-05-15
Within the past decade, research has been done on utilizing GPU massive parallelization in core simulation with impressive results but unfortunately, not much commercial application has been done in the nuclear field especially in reactor core simulation. The purpose of this paper is to give an introductory concept on the topic and illustrate the potential of exploiting the massive parallel nature of GPU computing on a simple monte-carlo simulation with very minimal hardware specifications. To do a comparative analysis, a simple two dimension monte-carlo simulation is implemented for both the CPU and GPU in order to evaluate performance gain based on the computing devices. The heterogeneous platform utilized in this analysis is done on a slow notebook with only 1GHz processor. The end results are quite surprising whereby high speedups obtained are almost a factor of 10. In this work, we have utilized heterogeneous computing in a GPU-based approach in applying potential high arithmetic intensive calculation. By applying a complex monte-carlo simulation on GPU platform, we have speed up the computational process by almost a factor of 10 based on one million neutrons. This shows how easy, cheap and efficient it is in using GPU in accelerating scientific computing and the results should encourage in exploring further this avenue especially in nuclear reactor physics simulation where deterministic and stochastic calculations are quite favourable in parallelization.
Wiens, Curtis N; Artz, Nathan S; Jang, Hyungseok; McMillan, Alan B; Reeder, Scott B
2017-06-01
To develop an externally calibrated parallel imaging technique for three-dimensional multispectral imaging (3D-MSI) in the presence of metallic implants. A fast, ultrashort echo time (UTE) calibration acquisition is proposed to enable externally calibrated parallel imaging techniques near metallic implants. The proposed calibration acquisition uses a broadband radiofrequency (RF) pulse to excite the off-resonance induced by the metallic implant, fully phase-encoded imaging to prevent in-plane distortions, and UTE to capture rapidly decaying signal. The performance of the externally calibrated parallel imaging reconstructions was assessed using phantoms and in vivo examples. Phantom and in vivo comparisons to self-calibrated parallel imaging acquisitions show that significant reductions in acquisition times can be achieved using externally calibrated parallel imaging with comparable image quality. Acquisition time reductions are particularly large for fully phase-encoded methods such as spectrally resolved fully phase-encoded three-dimensional (3D) fast spin-echo (SR-FPE), in which scan time reductions of up to 8 min were obtained. A fully phase-encoded acquisition with broadband excitation and UTE enabled externally calibrated parallel imaging for 3D-MSI, eliminating the need for repeated calibration regions at each frequency offset. Significant reductions in acquisition time can be achieved, particularly for fully phase-encoded methods like SR-FPE. Magn Reson Med 77:2303-2309, 2017. © 2016 International Society for Magnetic Resonance in Medicine. © 2016 International Society for Magnetic Resonance in Medicine.
Zaidi, H; Morel, Christian
1998-01-01
This paper describes the implementation of the Eidolon Monte Carlo program designed to simulate fully three-dimensional (3D) cylindrical positron tomographs on a MIMD parallel architecture. The original code was written in Objective-C and developed under the NeXTSTEP development environment. Different steps involved in porting the software on a parallel architecture based on PowerPC 604 processors running under AIX 4.1 are presented. Basic aspects and strategies of running Monte Carlo calculations on parallel computers are described. A linear decrease of the computing time was achieved with the number of computing nodes. The improved time performances resulting from parallelisation of the Monte Carlo calculations makes it an attractive tool for modelling photon transport in 3D positron tomography. The parallelisation paradigm used in this work is independent from the chosen parallel architecture
Yu, Leiming; Nina-Paravecino, Fanny; Kaeli, David; Fang, Qianqian
2018-01-01
We present a highly scalable Monte Carlo (MC) three-dimensional photon transport simulation platform designed for heterogeneous computing systems. Through the development of a massively parallel MC algorithm using the Open Computing Language framework, this research extends our existing graphics processing unit (GPU)-accelerated MC technique to a highly scalable vendor-independent heterogeneous computing environment, achieving significantly improved performance and software portability. A number of parallel computing techniques are investigated to achieve portable performance over a wide range of computing hardware. Furthermore, multiple thread-level and device-level load-balancing strategies are developed to obtain efficient simulations using multiple central processing units and GPUs. (2018) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE).
Andrade, Xavier; Alberdi-Rodriguez, Joseba; Strubbe, David A; Oliveira, Micael J T; Nogueira, Fernando; Castro, Alberto; Muguerza, Javier; Arruabarrena, Agustin; Louie, Steven G; Aspuru-Guzik, Alán; Rubio, Angel; Marques, Miguel A L
2012-06-13
Octopus is a general-purpose density-functional theory (DFT) code, with a particular emphasis on the time-dependent version of DFT (TDDFT). In this paper we present the ongoing efforts to achieve the parallelization of octopus. We focus on the real-time variant of TDDFT, where the time-dependent Kohn-Sham equations are directly propagated in time. This approach has great potential for execution in massively parallel systems such as modern supercomputers with thousands of processors and graphics processing units (GPUs). For harvesting the potential of conventional supercomputers, the main strategy is a multi-level parallelization scheme that combines the inherent scalability of real-time TDDFT with a real-space grid domain-partitioning approach. A scalable Poisson solver is critical for the efficiency of this scheme. For GPUs, we show how using blocks of Kohn-Sham states provides the required level of data parallelism and that this strategy is also applicable for code optimization on standard processors. Our results show that real-time TDDFT, as implemented in octopus, can be the method of choice for studying the excited states of large molecular systems in modern parallel architectures.
Andrade, Xavier; Alberdi-Rodriguez, Joseba; Strubbe, David A.; Oliveira, Micael J. T.; Nogueira, Fernando; Castro, Alberto; Muguerza, Javier; Arruabarrena, Agustin; Louie, Steven G.; Aspuru-Guzik, Alán; Rubio, Angel; Marques, Miguel A. L.
2012-06-01
Octopus is a general-purpose density-functional theory (DFT) code, with a particular emphasis on the time-dependent version of DFT (TDDFT). In this paper we present the ongoing efforts to achieve the parallelization of octopus. We focus on the real-time variant of TDDFT, where the time-dependent Kohn-Sham equations are directly propagated in time. This approach has great potential for execution in massively parallel systems such as modern supercomputers with thousands of processors and graphics processing units (GPUs). For harvesting the potential of conventional supercomputers, the main strategy is a multi-level parallelization scheme that combines the inherent scalability of real-time TDDFT with a real-space grid domain-partitioning approach. A scalable Poisson solver is critical for the efficiency of this scheme. For GPUs, we show how using blocks of Kohn-Sham states provides the required level of data parallelism and that this strategy is also applicable for code optimization on standard processors. Our results show that real-time TDDFT, as implemented in octopus, can be the method of choice for studying the excited states of large molecular systems in modern parallel architectures.
International Nuclear Information System (INIS)
Andrade, Xavier; Aspuru-Guzik, Alán; Alberdi-Rodriguez, Joseba; Rubio, Angel; Strubbe, David A; Louie, Steven G; Oliveira, Micael J T; Nogueira, Fernando; Castro, Alberto; Muguerza, Javier; Arruabarrena, Agustin; Marques, Miguel A L
2012-01-01
Octopus is a general-purpose density-functional theory (DFT) code, with a particular emphasis on the time-dependent version of DFT (TDDFT). In this paper we present the ongoing efforts to achieve the parallelization of octopus. We focus on the real-time variant of TDDFT, where the time-dependent Kohn-Sham equations are directly propagated in time. This approach has great potential for execution in massively parallel systems such as modern supercomputers with thousands of processors and graphics processing units (GPUs). For harvesting the potential of conventional supercomputers, the main strategy is a multi-level parallelization scheme that combines the inherent scalability of real-time TDDFT with a real-space grid domain-partitioning approach. A scalable Poisson solver is critical for the efficiency of this scheme. For GPUs, we show how using blocks of Kohn-Sham states provides the required level of data parallelism and that this strategy is also applicable for code optimization on standard processors. Our results show that real-time TDDFT, as implemented in octopus, can be the method of choice for studying the excited states of large molecular systems in modern parallel architectures. (topical review)
Tolerating correlated failures in Massively Parallel Stream Processing Engines
DEFF Research Database (Denmark)
Su, L.; Zhou, Y.
2016-01-01
Fault-tolerance techniques for stream processing engines can be categorized into passive and active approaches. A typical passive approach periodically checkpoints a processing task's runtime states and can recover a failed task by restoring its runtime state using its latest checkpoint. On the o......Fault-tolerance techniques for stream processing engines can be categorized into passive and active approaches. A typical passive approach periodically checkpoints a processing task's runtime states and can recover a failed task by restoring its runtime state using its latest checkpoint....... On the other hand, an active approach usually employs backup nodes to run replicated tasks. Upon failure, the active replica can take over the processing of the failed task with minimal latency. However, both approaches have their own inadequacies in Massively Parallel Stream Processing Engines (MPSPE...
EUV blank defect and particle inspection with high throughput immersion AFM with 1nm 3D resolution
Es, M.H. van; Sadeghian Marnani, H.
2016-01-01
Inspection of EUV mask substrates and blanks is demanding. We envision this is a good target application for massively parallel Atomic Force Microscopy (AFM). We envision to do a full surface characterization of EUV masks with AFM enabling 1nm true 3D resolution over the entire surface. The limiting
Schmieschek, S.; Shamardin, L.; Frijters, S.; Krüger, T.; Schiller, U.D.; Harting, J.; Coveney, P.V.
2017-01-01
We introduce the lattice-Boltzmann code LB3D, version 7.1. Building on a parallel program and supporting tools which have enabled research utilising high performance computing resources for nearly two decades, LB3D version 7 provides a subset of the research code functionality as an open source
ASSET: Analysis of Sequences of Synchronous Events in Massively Parallel Spike Trains
Canova, Carlos; Denker, Michael; Gerstein, George; Helias, Moritz
2016-01-01
With the ability to observe the activity from large numbers of neurons simultaneously using modern recording technologies, the chance to identify sub-networks involved in coordinated processing increases. Sequences of synchronous spike events (SSEs) constitute one type of such coordinated spiking that propagates activity in a temporally precise manner. The synfire chain was proposed as one potential model for such network processing. Previous work introduced a method for visualization of SSEs in massively parallel spike trains, based on an intersection matrix that contains in each entry the degree of overlap of active neurons in two corresponding time bins. Repeated SSEs are reflected in the matrix as diagonal structures of high overlap values. The method as such, however, leaves the task of identifying these diagonal structures to visual inspection rather than to a quantitative analysis. Here we present ASSET (Analysis of Sequences of Synchronous EvenTs), an improved, fully automated method which determines diagonal structures in the intersection matrix by a robust mathematical procedure. The method consists of a sequence of steps that i) assess which entries in the matrix potentially belong to a diagonal structure, ii) cluster these entries into individual diagonal structures and iii) determine the neurons composing the associated SSEs. We employ parallel point processes generated by stochastic simulations as test data to demonstrate the performance of the method under a wide range of realistic scenarios, including different types of non-stationarity of the spiking activity and different correlation structures. Finally, the ability of the method to discover SSEs is demonstrated on complex data from large network simulations with embedded synfire chains. Thus, ASSET represents an effective and efficient tool to analyze massively parallel spike data for temporal sequences of synchronous activity. PMID:27420734
Simulating Growth Kinetics in a Data-Parallel 3D Lattice Photobioreactor
Directory of Open Access Journals (Sweden)
A. V. Husselmann
2013-01-01
Full Text Available Though there have been many attempts to address growth kinetics in algal photobioreactors, surprisingly little have attempted an agent-based modelling (ABM approach. ABM has been heralded as a method of practical scientific inquiry into systems of a complex nature and has been applied liberally in a range of disciplines including ecology, physics, social science, and microbiology with special emphasis on pathogenic bacterial growth. We bring together agent-based simulation with the Photosynthetic Factory (PSF model, as well as certain key bioreactor characteristics in a visual 3D, parallel computing fashion. Despite being at small scale, the simulation gives excellent visual cues on the dynamics of such a reactor, and we further investigate the model in a variety of ways. Our parallel implementation on graphical processing units of the simulation provides key advantages, which we also briefly discuss. We also provide some performance data, along with particular effort in visualisation, using volumetric and isosurface rendering.
A visualization of null geodesics for the bonnor massive dipole
Directory of Open Access Journals (Sweden)
G. Andree Oliva Mercado
2015-08-01
Full Text Available In this work we simulate null geodesics for the Bonnor massive dipole metric by implementing a symbolic-numerical algorithm in Sage and Python. This program is also capable of visualizing in 3D, in principle, the geodesics for any given metric. Geodesics are launched from a common point, collectively forming a cone of light beams, simulating a solid-angle section of a point source in front of a massive object with a magnetic field. Parallel light beams also were considered, and their bending due to the curvature of the space-time was simulated.
Directory of Open Access Journals (Sweden)
Ivone U. S. Leong
2014-06-01
Full Text Available Sudden cardiac death in people between the ages of 1–40 years is a devastating event and is frequently caused by several heritable cardiac disorders. These disorders include cardiac ion channelopathies, such as long QT syndrome, catecholaminergic polymorphic ventricular tachycardia and Brugada syndrome and cardiomyopathies, such as hypertrophic cardiomyopathy and arrhythmogenic right ventricular cardiomyopathy. Through careful molecular genetic evaluation of DNA from sudden death victims, the causative gene mutation can be uncovered, and the rest of the family can be screened and preventative measures implemented in at-risk individuals. The current screening approach in most diagnostic laboratories uses Sanger-based sequencing; however, this method is time consuming and labour intensive. The development of massively parallel sequencing has made it possible to produce millions of sequence reads simultaneously and is potentially an ideal approach to screen for mutations in genes that are associated with sudden cardiac death. This approach offers mutation screening at reduced cost and turnaround time. Here, we will review the current commercially available enrichment kits, massively parallel sequencing (MPS platforms, downstream data analysis and its application to sudden cardiac death in a diagnostic environment.
CHOLLA: A NEW MASSIVELY PARALLEL HYDRODYNAMICS CODE FOR ASTROPHYSICAL SIMULATION
International Nuclear Information System (INIS)
Schneider, Evan E.; Robertson, Brant E.
2015-01-01
We present Computational Hydrodynamics On ParaLLel Architectures (Cholla ), a new three-dimensional hydrodynamics code that harnesses the power of graphics processing units (GPUs) to accelerate astrophysical simulations. Cholla models the Euler equations on a static mesh using state-of-the-art techniques, including the unsplit Corner Transport Upwind algorithm, a variety of exact and approximate Riemann solvers, and multiple spatial reconstruction techniques including the piecewise parabolic method (PPM). Using GPUs, Cholla evolves the fluid properties of thousands of cells simultaneously and can update over 10 million cells per GPU-second while using an exact Riemann solver and PPM reconstruction. Owing to the massively parallel architecture of GPUs and the design of the Cholla code, astrophysical simulations with physically interesting grid resolutions (≳256 3 ) can easily be computed on a single device. We use the Message Passing Interface library to extend calculations onto multiple devices and demonstrate nearly ideal scaling beyond 64 GPUs. A suite of test problems highlights the physical accuracy of our modeling and provides a useful comparison to other codes. We then use Cholla to simulate the interaction of a shock wave with a gas cloud in the interstellar medium, showing that the evolution of the cloud is highly dependent on its density structure. We reconcile the computed mixing time of a turbulent cloud with a realistic density distribution destroyed by a strong shock with the existing analytic theory for spherical cloud destruction by describing the system in terms of its median gas density
CHOLLA: A NEW MASSIVELY PARALLEL HYDRODYNAMICS CODE FOR ASTROPHYSICAL SIMULATION
Energy Technology Data Exchange (ETDEWEB)
Schneider, Evan E.; Robertson, Brant E. [Steward Observatory, University of Arizona, 933 North Cherry Avenue, Tucson, AZ 85721 (United States)
2015-04-15
We present Computational Hydrodynamics On ParaLLel Architectures (Cholla ), a new three-dimensional hydrodynamics code that harnesses the power of graphics processing units (GPUs) to accelerate astrophysical simulations. Cholla models the Euler equations on a static mesh using state-of-the-art techniques, including the unsplit Corner Transport Upwind algorithm, a variety of exact and approximate Riemann solvers, and multiple spatial reconstruction techniques including the piecewise parabolic method (PPM). Using GPUs, Cholla evolves the fluid properties of thousands of cells simultaneously and can update over 10 million cells per GPU-second while using an exact Riemann solver and PPM reconstruction. Owing to the massively parallel architecture of GPUs and the design of the Cholla code, astrophysical simulations with physically interesting grid resolutions (≳256{sup 3}) can easily be computed on a single device. We use the Message Passing Interface library to extend calculations onto multiple devices and demonstrate nearly ideal scaling beyond 64 GPUs. A suite of test problems highlights the physical accuracy of our modeling and provides a useful comparison to other codes. We then use Cholla to simulate the interaction of a shock wave with a gas cloud in the interstellar medium, showing that the evolution of the cloud is highly dependent on its density structure. We reconcile the computed mixing time of a turbulent cloud with a realistic density distribution destroyed by a strong shock with the existing analytic theory for spherical cloud destruction by describing the system in terms of its median gas density.
LiNbO3: A photovoltaic substrate for massive parallel manipulation and patterning of nano-objects
International Nuclear Information System (INIS)
Carrascosa, M.; García-Cabañes, A.; Jubera, M.; Ramiro, J. B.; Agulló-López, F.
2015-01-01
The application of evanescent photovoltaic (PV) fields, generated by visible illumination of Fe:LiNbO 3 substrates, for parallel massive trapping and manipulation of micro- and nano-objects is critically reviewed. The technique has been often referred to as photovoltaic or photorefractive tweezers. The main advantage of the new method is that the involved electrophoretic and/or dielectrophoretic forces do not require any electrodes and large scale manipulation of nano-objects can be easily achieved using the patterning capabilities of light. The paper describes the experimental techniques for particle trapping and the main reported experimental results obtained with a variety of micro- and nano-particles (dielectric and conductive) and different illumination configurations (single beam, holographic geometry, and spatial light modulator projection). The report also pays attention to the physical basis of the method, namely, the coupling of the evanescent photorefractive fields to the dielectric response of the nano-particles. The role of a number of physical parameters such as the contrast and spatial periodicities of the illumination pattern or the particle deposition method is discussed. Moreover, the main properties of the obtained particle patterns in relation to potential applications are summarized, and first demonstrations reviewed. Finally, the PV method is discussed in comparison to other patterning strategies, such as those based on the pyroelectric response and the electric fields associated to domain poling of ferroelectric materials
Alishahiha, Mohsen; Naseh, Ali; Shirzad, Ahmad
2014-12-03
We study linearized equations of motion of the newly proposed three dimensional gravity, known as minimal massive gravity, using its metric formulation. We observe that the resultant linearized equations are exactly the same as that of TMG by making use of a redefinition of the parameters of the model. In particular the model admits logarithmic modes at the critical points. We also study several vacuum solutions of the model, specially at a certain limit where the contribution of Chern-Simons term vanishes.
From Massively Parallel Algorithms and Fluctuating Time Horizons to Nonequilibrium Surface Growth
International Nuclear Information System (INIS)
Korniss, G.; Toroczkai, Z.; Novotny, M. A.; Rikvold, P. A.
2000-01-01
We study the asymptotic scaling properties of a massively parallel algorithm for discrete-event simulations where the discrete events are Poisson arrivals. The evolution of the simulated time horizon is analogous to a nonequilibrium surface. Monte Carlo simulations and a coarse-grained approximation indicate that the macroscopic landscape in the steady state is governed by the Edwards-Wilkinson Hamiltonian. Since the efficiency of the algorithm corresponds to the density of local minima in the associated surface, our results imply that the algorithm is asymptotically scalable. (c) 2000 The American Physical Society
Passive and partially active fault tolerance for massively parallel stream processing engines
DEFF Research Database (Denmark)
Su, Li; Zhou, Yongluan
2018-01-01
. On the other hand, an active approach usually employs backup nodes to run replicated tasks. Upon failure, the active replica can take over the processing of the failed task with minimal latency. However, both approaches have their own inadequacies in Massively Parallel Stream Processing Engines (MPSPE...... also propose effective and efficient algorithms to optimize a partially active replication plan to maximize the quality of tentative outputs. We implemented PPA on top of Storm, an open-source MPSPE and conducted extensive experiments using both real and synthetic datasets to verify the effectiveness...
Hidiroglou, M; Knipfel, J E
1984-01-01
Plasma levels of vitamin D3 or 25-hydroxyvitamin D3 in ewes after administration of a single massive intravenous dose of vitamin D3 (2 X 10(6) IU) or 25-hydroxy vitamin D3 (5 mg) were determined at zero, one, two, three, five, ten and 20 days postinjection. In six ewes injected with vitamin D3 conversion of vitamin D3 to 25-hydroxy vitamin D3 resulted in a six-fold increase in the plasma 25-hydroxy vitamin D3 level within one day. Elevated levels were maintained until day 10 but by day 20 a s...
Dual descriptions of massive spin-2 particles in D=3+1
International Nuclear Information System (INIS)
Dalmazi, Denis
2013-01-01
Full text: Since the sixties (last century) one speculates on the effects of a possible (tiny) mass for the graviton. One expects a decrease in the gravitational interaction at large distances which comes handy regarding the experimental data of the last 15 years on the accelerated expansion of the universe. There has been a growing interest in massive quantum gravity in the last years. Almost all recent works are built up on the top of a free (quadratic) action for a massive spin-2 particle known as massive Fierz-Pauli (FP) theory which has first appeared in 1939. In this theory the basic field is a symmetric rank-2 tensor. It is a common belief in the massive gravity community that the massive FP theory is the unique self-consistent (ghost free, Poincare covariant, correct number of degrees of freedom) description of massive spin-2 particles in terms of a rank-2 tensor. We have shown recently that there are other possibilities if we start with a general (non-symmetric) rank-2 tensor. Here we show how our previous work is related with the well known massive FP theory via the introduction of spectators fields of rank-0 (scalar) and rank-1 (vector). We comment on the introduction of interacting vertices and how they affect the free duality with the massive FP theory (author)
A theoretical concept for a thermal-hydraulic 3D parallel channel core model
International Nuclear Information System (INIS)
Hoeld, A.
2004-01-01
A detailed description of the theoretical concept of the 3D thermal-hydraulic single- and two-phase flow phenomena is presented. The theoretical concept is based on important development lines such as separate treatment of the mass and energy from the momentum balance eqs. The other line is the establishment of a procedure for the calculation of the mass flow distributions into different parallel channels based on the fact that the sum of pressure decrease terms over a closed loop must stay, despite of un-symmetric perturbations, zero. The concept is realized in the experimental code HERO-X3D, concentrating in a first step on an artificial BWR or PWR core which may consist of a central channel, four quadrants, and a bypass channel. (authors)
A parallel code named NEPTUNE for 3D fully electromagnetic and pic simulations
International Nuclear Information System (INIS)
Dong Ye; Yang Wenyuan; Chen Jun; Zhao Qiang; Xia Fang; Ma Yan; Xiao Li; Sun Huifang; Chen Hong; Zhou Haijing; Mao Zeyao; Dong Zhiwei
2010-01-01
A parallel code named NEPTUNE for 3D fully electromagnetic and particle-in-cell (PIC) simulations is introduced, which could run on the Linux system with hundreds to thousand CPUs. NEPTUNE is suitable to simulate entire 3D HPM devices; many HPM devices are simulated and designed by using it. In NEPTUNE code, the electromagnetic fields are updated by using the finite-difference in time domain (FDTD) method of solving Maxwell equations and the particles are moved by using Buneman-Boris advance method of solving relativistic Newton-Lorentz equation. Electromagnetic fields and particles are coupled by using liner weighing interpolation PIC method, and the electric filed components are corrected by using Boris method of solve Poisson equation in order to ensure charge-conservation. NEPTUNE code could construct many complicated geometric structures, such as arbitrary axial-symmetric structures, plane transforming structures, slow-wave-structures, coupling holes, foils, and so on. The boundary conditions used in NEPTUNE code are introduced in brief, including perfectly electric conductor boundary, external wave boundary, and particle boundary. Finally, some typical HPM devices are simulated and test by using NEPTUNE code, including MILO, RBWO, VCO, and RKA. The simulation results are with correct and credible physical images, and the parallel efficiencies are also given. (authors)
Recent progress in modelling 3D lithospheric deformation
Kaus, B. J. P.; Popov, A.; May, D. A.
2012-04-01
Modelling 3D lithospheric deformation remains a challenging task, predominantly because the variations in rock types, as well as nonlinearities due to for example plastic deformation result in sharp and very large jumps in effective viscosity contrast. As a result, there are only a limited number of 3D codes available, most of which are using direct solvers which are computationally and memory-wise very demanding. As a result, the resolutions for typical model runs are quite modest, despite the use of hundreds of processors (and using much larger computers is unlikely to bring much improvement in this situation). For this reason we recently developed a new 3D deformation code,called LaMEM: Lithosphere and Mantle Evolution Model. LaMEM is written on top of PETSc, and as a result it runs on massive parallel machines and we have a large number of iterative solvers available (including geometric and algebraic multigrid methods). As it remains unclear which solver combinations work best under which conditions, we have implemented most currently suggested methods (such as schur complement reduction or Fully coupled iterations). In addition, we can use either a finite element discretization (with Q1P0, stabilized Q1Q1 or Q2P-1 elements) or a staggered finite difference discretization for the same input geometry, which is based on a marker and cell technique). This gives us he flexibility to test various solver methodologies on the same model setup, in terms of accuracy, speed, memory usage etc. Here, we will report on some features of LaMEM, on recent code additions, as well as on some lessons we learned which are important for modelling 3D lithospheric deformation. Specifically we will discuss: 1) How we combine a particle-and-cell method to make it work with both a finite difference and a (lagrangian, eulerian or ALE) finite element formulation, with only minor code modifications code 2) How finite difference and finite element discretizations compare in terms of
Directory of Open Access Journals (Sweden)
Chien-Lun Hou
2011-02-01
Full Text Available In this paper, a stereo vision 3D position measurement system for a three-axial pneumatic parallel mechanism robot arm is presented. The stereo vision 3D position measurement system aims to measure the 3D trajectories of the end-effector of the robot arm. To track the end-effector of the robot arm, the circle detection algorithm is used to detect the desired target and the SAD algorithm is used to track the moving target and to search the corresponding target location along the conjugate epipolar line in the stereo pair. After camera calibration, both intrinsic and extrinsic parameters of the stereo rig can be obtained, so images can be rectified according to the camera parameters. Thus, through the epipolar rectification, the stereo matching process is reduced to a horizontal search along the conjugate epipolar line. Finally, 3D trajectories of the end-effector are computed by stereo triangulation. The experimental results show that the stereo vision 3D position measurement system proposed in this paper can successfully track and measure the fifth-order polynomial trajectory and sinusoidal trajectory of the end-effector of the three- axial pneumatic parallel mechanism robot arm.
International Nuclear Information System (INIS)
Fevotte, F.; Lathuiliere, B.
2013-01-01
The large increase in computing power over the past few years now makes it possible to consider developing 3D full-core heterogeneous deterministic neutron transport solvers for reference calculations. Among all approaches presented in the literature, the method first introduced in [1] seems very promising. It consists in iterating over resolutions of 2D and ID MOC problems by taking advantage of prismatic geometries without introducing approximations of a low order operator such as diffusion. However, before developing a solver with all industrial options at EDF, several points needed to be clarified. In this work, we first prove the convergence of this iterative process, under some assumptions. We then present our high-performance, parallel implementation of this algorithm in the MICADO solver. Benchmarking the solver against the Takeda case shows that the 2D-1D coupling algorithm does not seem to affect the spatial convergence order of the MOC solver. As for performance issues, our study shows that even though the data distribution is suited to the 2D solver part, the efficiency of the ID part is sufficient to ensure a good parallel efficiency of the global algorithm. After this study, the main remaining difficulty implementation-wise is about the memory requirement of a vector used for initialization. An efficient acceleration operator will also need to be developed. (authors)
Multiplexed microsatellite recovery using massively parallel sequencing
Jennings, T.N.; Knaus, B.J.; Mullins, T.D.; Haig, S.M.; Cronn, R.C.
2011-01-01
Conservation and management of natural populations requires accurate and inexpensive genotyping methods. Traditional microsatellite, or simple sequence repeat (SSR), marker analysis remains a popular genotyping method because of the comparatively low cost of marker development, ease of analysis and high power of genotype discrimination. With the availability of massively parallel sequencing (MPS), it is now possible to sequence microsatellite-enriched genomic libraries in multiplex pools. To test this approach, we prepared seven microsatellite-enriched, barcoded genomic libraries from diverse taxa (two conifer trees, five birds) and sequenced these on one lane of the Illumina Genome Analyzer using paired-end 80-bp reads. In this experiment, we screened 6.1 million sequences and identified 356958 unique microreads that contained di- or trinucleotide microsatellites. Examination of four species shows that our conversion rate from raw sequences to polymorphic markers compares favourably to Sanger- and 454-based methods. The advantage of multiplexed MPS is that the staggering capacity of modern microread sequencing is spread across many libraries; this reduces sample preparation and sequencing costs to less than $400 (USD) per species. This price is sufficiently low that microsatellite libraries could be prepared and sequenced for all 1373 organisms listed as 'threatened' and 'endangered' in the United States for under $0.5M (USD).
GPAW - massively parallel electronic structure calculations with Python-based software
DEFF Research Database (Denmark)
Enkovaara, Jussi; Romero, Nichols A.; Shende, Sameer
2011-01-01
of the productivity enhancing features together with a good numerical performance. We have used this approach in implementing an electronic structure simulation software GPAW using the combination of Python and C programming languages. While the chosen approach works well in standard workstations and Unix...... popular choice. While dynamic, interpreted languages, such as Python, can increase the effciency of programmer, they cannot compete directly with the raw performance of compiled languages. However, by using an interpreted language together with a compiled language, it is possible to have most...... environments, massively parallel supercomputing systems can present some challenges in porting, debugging and profiling the software. In this paper we describe some details of the implementation and discuss the advantages and challenges of the combined Python/C approach. We show that despite the challenges...
Massively Parallel Finite Element Programming
Heister, Timo; Kronbichler, Martin; Bangerth, Wolfgang
2010-01-01
Today's large finite element simulations require parallel algorithms to scale on clusters with thousands or tens of thousands of processor cores. We present data structures and algorithms to take advantage of the power of high performance computers in generic finite element codes. Existing generic finite element libraries often restrict the parallelization to parallel linear algebra routines. This is a limiting factor when solving on more than a few hundreds of cores. We describe routines for distributed storage of all major components coupled with efficient, scalable algorithms. We give an overview of our effort to enable the modern and generic finite element library deal.II to take advantage of the power of large clusters. In particular, we describe the construction of a distributed mesh and develop algorithms to fully parallelize the finite element calculation. Numerical results demonstrate good scalability. © 2010 Springer-Verlag.
Massively Parallel Finite Element Programming
Heister, Timo
2010-01-01
Today\\'s large finite element simulations require parallel algorithms to scale on clusters with thousands or tens of thousands of processor cores. We present data structures and algorithms to take advantage of the power of high performance computers in generic finite element codes. Existing generic finite element libraries often restrict the parallelization to parallel linear algebra routines. This is a limiting factor when solving on more than a few hundreds of cores. We describe routines for distributed storage of all major components coupled with efficient, scalable algorithms. We give an overview of our effort to enable the modern and generic finite element library deal.II to take advantage of the power of large clusters. In particular, we describe the construction of a distributed mesh and develop algorithms to fully parallelize the finite element calculation. Numerical results demonstrate good scalability. © 2010 Springer-Verlag.
High Performance Programming Using Explicit Shared Memory Model on Cray T3D1
Simon, Horst D.; Saini, Subhash; Grassi, Charles
1994-01-01
The Cray T3D system is the first-phase system in Cray Research, Inc.'s (CRI) three-phase massively parallel processing (MPP) program. This system features a heterogeneous architecture that closely couples DEC's Alpha microprocessors and CRI's parallel-vector technology, i.e., the Cray Y-MP and Cray C90. An overview of the Cray T3D hardware and available programming models is presented. Under Cray Research adaptive Fortran (CRAFT) model four programming methods (data parallel, work sharing, message-passing using PVM, and explicit shared memory model) are available to the users. However, at this time data parallel and work sharing programming models are not available to the user community. The differences between standard PVM and CRI's PVM are highlighted with performance measurements such as latencies and communication bandwidths. We have found that the performance of neither standard PVM nor CRI s PVM exploits the hardware capabilities of the T3D. The reasons for the bad performance of PVM as a native message-passing library are presented. This is illustrated by the performance of NAS Parallel Benchmarks (NPB) programmed in explicit shared memory model on Cray T3D. In general, the performance of standard PVM is about 4 to 5 times less than obtained by using explicit shared memory model. This degradation in performance is also seen on CM-5 where the performance of applications using native message-passing library CMMD on CM-5 is also about 4 to 5 times less than using data parallel methods. The issues involved (such as barriers, synchronization, invalidating data cache, aligning data cache etc.) while programming in explicit shared memory model are discussed. Comparative performance of NPB using explicit shared memory programming model on the Cray T3D and other highly parallel systems such as the TMC CM-5, Intel Paragon, Cray C90, IBM-SP1, etc. is presented.
Parallel Implicit Algorithms for CFD
Keyes, David E.
1998-01-01
The main goal of this project was efficient distributed parallel and workstation cluster implementations of Newton-Krylov-Schwarz (NKS) solvers for implicit Computational Fluid Dynamics (CFD.) "Newton" refers to a quadratically convergent nonlinear iteration using gradient information based on the true residual, "Krylov" to an inner linear iteration that accesses the Jacobian matrix only through highly parallelizable sparse matrix-vector products, and "Schwarz" to a domain decomposition form of preconditioning the inner Krylov iterations with primarily neighbor-only exchange of data between the processors. Prior experience has established that Newton-Krylov methods are competitive solvers in the CFD context and that Krylov-Schwarz methods port well to distributed memory computers. The combination of the techniques into Newton-Krylov-Schwarz was implemented on 2D and 3D unstructured Euler codes on the parallel testbeds that used to be at LaRC and on several other parallel computers operated by other agencies or made available by the vendors. Early implementations were made directly in Massively Parallel Integration (MPI) with parallel solvers we adapted from legacy NASA codes and enhanced for full NKS functionality. Later implementations were made in the framework of the PETSC library from Argonne National Laboratory, which now includes pseudo-transient continuation Newton-Krylov-Schwarz solver capability (as a result of demands we made upon PETSC during our early porting experiences). A secondary project pursued with funding from this contract was parallel implicit solvers in acoustics, specifically in the Helmholtz formulation. A 2D acoustic inverse problem has been solved in parallel within the PETSC framework.
Massively parallel de novo protein design for targeted therapeutics
Chevalier, Aaron
2017-09-26
De novo protein design holds promise for creating small stable proteins with shapes customized to bind therapeutic targets. We describe a massively parallel approach for designing, manufacturing and screening mini-protein binders, integrating large-scale computational design, oligonucleotide synthesis, yeast display screening and next-generation sequencing. We designed and tested 22,660 mini-proteins of 37-43 residues that target influenza haemagglutinin and botulinum neurotoxin B, along with 6,286 control sequences to probe contributions to folding and binding, and identified 2,618 high-affinity binders. Comparison of the binding and non-binding design sets, which are two orders of magnitude larger than any previously investigated, enabled the evaluation and improvement of the computational model. Biophysical characterization of a subset of the binder designs showed that they are extremely stable and, unlike antibodies, do not lose activity after exposure to high temperatures. The designs elicit little or no immune response and provide potent prophylactic and therapeutic protection against influenza, even after extensive repeated dosing.
Massively parallel de novo protein design for targeted therapeutics
Chevalier, Aaron; Silva, Daniel-Adriano; Rocklin, Gabriel J.; Hicks, Derrick R.; Vergara, Renan; Murapa, Patience; Bernard, Steffen M.; Zhang, Lu; Lam, Kwok-Ho; Yao, Guorui; Bahl, Christopher D.; Miyashita, Shin-Ichiro; Goreshnik, Inna; Fuller, James T.; Koday, Merika T.; Jenkins, Cody M.; Colvin, Tom; Carter, Lauren; Bohn, Alan; Bryan, Cassie M.; Ferná ndez-Velasco, D. Alejandro; Stewart, Lance; Dong, Min; Huang, Xuhui; Jin, Rongsheng; Wilson, Ian A.; Fuller, Deborah H.; Baker, David
2017-01-01
De novo protein design holds promise for creating small stable proteins with shapes customized to bind therapeutic targets. We describe a massively parallel approach for designing, manufacturing and screening mini-protein binders, integrating large-scale computational design, oligonucleotide synthesis, yeast display screening and next-generation sequencing. We designed and tested 22,660 mini-proteins of 37-43 residues that target influenza haemagglutinin and botulinum neurotoxin B, along with 6,286 control sequences to probe contributions to folding and binding, and identified 2,618 high-affinity binders. Comparison of the binding and non-binding design sets, which are two orders of magnitude larger than any previously investigated, enabled the evaluation and improvement of the computational model. Biophysical characterization of a subset of the binder designs showed that they are extremely stable and, unlike antibodies, do not lose activity after exposure to high temperatures. The designs elicit little or no immune response and provide potent prophylactic and therapeutic protection against influenza, even after extensive repeated dosing.
Massively parallel de novo protein design for targeted therapeutics
Chevalier, Aaron; Silva, Daniel-Adriano; Rocklin, Gabriel J.; Hicks, Derrick R.; Vergara, Renan; Murapa, Patience; Bernard, Steffen M.; Zhang, Lu; Lam, Kwok-Ho; Yao, Guorui; Bahl, Christopher D.; Miyashita, Shin-Ichiro; Goreshnik, Inna; Fuller, James T.; Koday, Merika T.; Jenkins, Cody M.; Colvin, Tom; Carter, Lauren; Bohn, Alan; Bryan, Cassie M.; Fernández-Velasco, D. Alejandro; Stewart, Lance; Dong, Min; Huang, Xuhui; Jin, Rongsheng; Wilson, Ian A.; Fuller, Deborah H.; Baker, David
2018-01-01
De novo protein design holds promise for creating small stable proteins with shapes customized to bind therapeutic targets. We describe a massively parallel approach for designing, manufacturing and screening mini-protein binders, integrating large-scale computational design, oligonucleotide synthesis, yeast display screening and next-generation sequencing. We designed and tested 22,660 mini-proteins of 37–43 residues that target influenza haemagglutinin and botulinum neurotoxin B, along with 6,286 control sequences to probe contributions to folding and binding, and identified 2,618 high-affinity binders. Comparison of the binding and non-binding design sets, which are two orders of magnitude larger than any previously investigated, enabled the evaluation and improvement of the computational model. Biophysical characterization of a subset of the binder designs showed that they are extremely stable and, unlike antibodies, do not lose activity after exposure to high temperatures. The designs elicit little or no immune response and provide potent prophylactic and therapeutic protection against influenza, even after extensive repeated dosing. PMID:28953867
International Nuclear Information System (INIS)
Godoy, William F.; Liu Xu
2012-01-01
The present study introduces a parallel Jacobian-free Newton Krylov (JFNK) general minimal residual (GMRES) solution for the discretized radiative transfer equation (RTE) in 3D, absorbing, emitting and scattering media. For the angular and spatial discretization of the RTE, the discrete ordinates method (DOM) and the finite volume method (FVM) including flux limiters are employed, respectively. Instead of forming and storing a large Jacobian matrix, JFNK methods allow for large memory savings as the required Jacobian-vector products are rather approximated by semiexact and numerical formulations, for which convergence and computational times are presented. Parallelization of the GMRES solution is introduced in a combined memory-shared/memory-distributed formulation that takes advantage of the fact that only large vector arrays remain in the JFNK process. Results are presented for 3D test cases including a simple homogeneous, isotropic medium and a more complex non-homogeneous, non-isothermal, absorbing–emitting and anisotropic scattering medium with collimated intensities. Additionally, convergence and stability of Gram–Schmidt and Householder orthogonalizations for the Arnoldi process in the parallel GMRES algorithms are discussed and analyzed. Overall, the introduction of JFNK methods results in a parallel, yet scalable to the tested 2048 processors, and memory affordable solution to 3D radiative transfer problems without compromising the accuracy and convergence of a Newton-like solution.
Fully 3D GPU PET reconstruction
Energy Technology Data Exchange (ETDEWEB)
Herraiz, J.L., E-mail: joaquin@nuclear.fis.ucm.es [Grupo de Fisica Nuclear, Departmento Fisica Atomica, Molecular y Nuclear, Universidad Complutense de Madrid (Spain); Espana, S. [Department of Radiation Oncology, Massachusetts General Hospital and Harvard Medical School, Boston, MA (United States); Cal-Gonzalez, J. [Grupo de Fisica Nuclear, Departmento Fisica Atomica, Molecular y Nuclear, Universidad Complutense de Madrid (Spain); Vaquero, J.J. [Departmento de Bioingenieria e Ingenieria Espacial, Universidad Carlos III, Madrid (Spain); Desco, M. [Departmento de Bioingenieria e Ingenieria Espacial, Universidad Carlos III, Madrid (Spain); Unidad de Medicina y Cirugia Experimental, Hospital General Universitario Gregorio Maranon, Madrid (Spain); Udias, J.M. [Grupo de Fisica Nuclear, Departmento Fisica Atomica, Molecular y Nuclear, Universidad Complutense de Madrid (Spain)
2011-08-21
Fully 3D iterative tomographic image reconstruction is computationally very demanding. Graphics Processing Unit (GPU) has been proposed for many years as potential accelerators in complex scientific problems, but it has not been used until the recent advances in the programmability of GPUs that the best available reconstruction codes have started to be implemented to be run on GPUs. This work presents a GPU-based fully 3D PET iterative reconstruction software. This new code may reconstruct sinogram data from several commercially available PET scanners. The most important and time-consuming parts of the code, the forward and backward projection operations, are based on an accurate model of the scanner obtained with the Monte Carlo code PeneloPET and they have been massively parallelized on the GPU. For the PET scanners considered, the GPU-based code is more than 70 times faster than a similar code running on a single core of a fast CPU, obtaining in both cases the same images. The code has been designed to be easily adapted to reconstruct sinograms from any other PET scanner, including scanner prototypes.
Fully 3D GPU PET reconstruction
International Nuclear Information System (INIS)
Herraiz, J.L.; Espana, S.; Cal-Gonzalez, J.; Vaquero, J.J.; Desco, M.; Udias, J.M.
2011-01-01
Fully 3D iterative tomographic image reconstruction is computationally very demanding. Graphics Processing Unit (GPU) has been proposed for many years as potential accelerators in complex scientific problems, but it has not been used until the recent advances in the programmability of GPUs that the best available reconstruction codes have started to be implemented to be run on GPUs. This work presents a GPU-based fully 3D PET iterative reconstruction software. This new code may reconstruct sinogram data from several commercially available PET scanners. The most important and time-consuming parts of the code, the forward and backward projection operations, are based on an accurate model of the scanner obtained with the Monte Carlo code PeneloPET and they have been massively parallelized on the GPU. For the PET scanners considered, the GPU-based code is more than 70 times faster than a similar code running on a single core of a fast CPU, obtaining in both cases the same images. The code has been designed to be easily adapted to reconstruct sinograms from any other PET scanner, including scanner prototypes.
Quantification of massively parallel sequencing libraries - a comparative study of eight methods
DEFF Research Database (Denmark)
Hussing, Christian; Kampmann, Marie-Louise; Mogensen, Helle Smidt
2018-01-01
Quantification of massively parallel sequencing libraries is important for acquisition of monoclonal beads or clusters prior to clonal amplification and to avoid large variations in library coverage when multiple samples are included in one sequencing analysis. No gold standard for quantification...... estimates followed by Qubit and electrophoresis-based instruments (Bioanalyzer, TapeStation, GX Touch, and Fragment Analyzer), while SYBR Green and TaqMan based qPCR assays gave the lowest estimates. qPCR gave more accurate predictions of sequencing coverage than Qubit and TapeStation did. Costs, time......-consumption, workflow simplicity, and ability to quantify multiple samples are discussed. Technical specifications, advantages, and disadvantages of the various methods are pointed out....
Massive parallel sequencing in sarcoma pathobiology: state of the art and perspectives.
Brenca, Monica; Maestro, Roberta
2015-01-01
Sarcomas are an aggressive and highly heterogeneous group of mesenchymal malignancies with different morphologies and clinical behavior. Current therapeutic strategies remain unsatisfactory. Cytogenetic and molecular characterization of these tumors is resulting in the breakdown of the classical histopathological categories into molecular subgroups that better define sarcoma pathobiology and pave the way to more precise diagnostic criteria and novel therapeutic opportunities. The purpose of this short review is to summarize the state-of-the-art on the exploitation of massive parallel sequencing technologies, also known as next generation sequencing, in the elucidation of sarcoma pathobiology and to discuss how these applications may impact on diagnosis, prognosis and therapy of these tumors.
Directory of Open Access Journals (Sweden)
Abdeslam Mehenni
2017-03-01
Full Text Available As our populations grow in a world of limited resources enterprise seek ways to lighten our load on the planet. The idea of modifying consumer behavior appears as a foundation for smart grids. Enterprise demonstrates the value available from deep analysis of electricity consummation histories, consumers’ messages, and outage alerts, etc. Enterprise mines massive structured and unstructured data. In a nutshell, smart grids result in a flood of data that needs to be analyzed, for better adjust to demand and give customers more ability to delve into their power consumption. Simply put, smart grids will increasingly have a flexible data warehouse attached to them. The key driver for the adoption of data management strategies is clearly the need to handle and analyze the large amounts of information utilities are now faced with. New approaches to data integration are nauseating moment; Hadoop is in fact now being used by the utility to help manage the huge growth in data whilst maintaining coherence of the Data Warehouse. In this paper we define a new Meter Data Management System Architecture repository that differ with three leaders MDMS, where we use MapReduce programming model for ETL and Parallel DBMS in Query statements(Massive Parallel Processing MPP.
Massively parallel simulator of optical coherence tomography of inhomogeneous turbid media.
Malektaji, Siavash; Lima, Ivan T; Escobar I, Mauricio R; Sherif, Sherif S
2017-10-01
An accurate and practical simulator for Optical Coherence Tomography (OCT) could be an important tool to study the underlying physical phenomena in OCT such as multiple light scattering. Recently, many researchers have investigated simulation of OCT of turbid media, e.g., tissue, using Monte Carlo methods. The main drawback of these earlier simulators is the long computational time required to produce accurate results. We developed a massively parallel simulator of OCT of inhomogeneous turbid media that obtains both Class I diffusive reflectivity, due to ballistic and quasi-ballistic scattered photons, and Class II diffusive reflectivity due to multiply scattered photons. This Monte Carlo-based simulator is implemented on graphic processing units (GPUs), using the Compute Unified Device Architecture (CUDA) platform and programming model, to exploit the parallel nature of propagation of photons in tissue. It models an arbitrary shaped sample medium as a tetrahedron-based mesh and uses an advanced importance sampling scheme. This new simulator speeds up simulations of OCT of inhomogeneous turbid media by about two orders of magnitude. To demonstrate this result, we have compared the computation times of our new parallel simulator and its serial counterpart using two samples of inhomogeneous turbid media. We have shown that our parallel implementation reduced simulation time of OCT of the first sample medium from 407 min to 92 min by using a single GPU card, to 12 min by using 8 GPU cards and to 7 min by using 16 GPU cards. For the second sample medium, the OCT simulation time was reduced from 209 h to 35.6 h by using a single GPU card, and to 4.65 h by using 8 GPU cards, and to only 2 h by using 16 GPU cards. Therefore our new parallel simulator is considerably more practical to use than its central processing unit (CPU)-based counterpart. Our new parallel OCT simulator could be a practical tool to study the different physical phenomena underlying OCT
Directory of Open Access Journals (Sweden)
Paolo Cazzaniga
2014-01-01
high-performance computing solutions is motivated by the need of performing large numbers of in silico analysis to study the behavior of biological systems in different conditions, which necessitate a computing power that usually overtakes the capability of standard desktop computers. In this work we present coagSODA, a CUDA-powered computational tool that was purposely developed for the analysis of a large mechanistic model of the blood coagulation cascade (BCC, defined according to both mass-action kinetics and Hill functions. coagSODA allows the execution of parallel simulations of the dynamics of the BCC by automatically deriving the system of ordinary differential equations and then exploiting the numerical integration algorithm LSODA. We present the biological results achieved with a massive exploration of perturbed conditions of the BCC, carried out with one-dimensional and bi-dimensional parameter sweep analysis, and show that GPU-accelerated parallel simulations of this model can increase the computational performances up to a 181× speedup compared to the corresponding sequential simulations.
Sakr, Rita A; Schizas, Michail; Carniello, Jose V Scarpa; Ng, Charlotte K Y; Piscuoglio, Salvatore; Giri, Dilip; Andrade, Victor P; De Brot, Marina; Lim, Raymond S; Towers, Russell; Weigelt, Britta; Reis-Filho, Jorge S; King, Tari A
2016-02-01
Lobular carcinoma in situ (LCIS) has been proposed as a non-obligate precursor of invasive lobular carcinoma (ILC). Here we sought to define the repertoire of somatic genetic alterations in pure LCIS and in synchronous LCIS and ILC using targeted massively parallel sequencing. DNA samples extracted from microdissected LCIS, ILC and matched normal breast tissue or peripheral blood from 30 patients were subjected to massively parallel sequencing targeting all exons of 273 genes, including the genes most frequently mutated in breast cancer and DNA repair-related genes. Single nucleotide variants and insertions and deletions were identified using state-of-the-art bioinformatics approaches. The constellation of somatic mutations found in LCIS (n = 34) and ILC (n = 21) were similar, with the most frequently mutated genes being CDH1 (56% and 66%, respectively), PIK3CA (41% and 52%, respectively) and CBFB (12% and 19%, respectively). Among 19 LCIS and ILC synchronous pairs, 14 (74%) had at least one identical mutation in common, including identical PIK3CA and CDH1 mutations. Paired analysis of independent foci of LCIS from 3 breasts revealed at least one common mutation in each of the 3 pairs (CDH1, PIK3CA, CBFB and PKHD1L1). LCIS and ILC have a similar repertoire of somatic mutations, with PIK3CA and CDH1 being the most frequently mutated genes. The presence of identical mutations between LCIS-LCIS and LCIS-ILC pairs demonstrates that LCIS is a clonal neoplastic lesion, and provides additional evidence that at least some LCIS are non-obligate precursors of ILC. Copyright © 2015 Federation of European Biochemical Societies. Published by Elsevier B.V. All rights reserved.
Beam dynamics calculations and particle tracking using massively parallel processors
International Nuclear Information System (INIS)
Ryne, R.D.; Habib, S.
1995-01-01
During the past decade massively parallel processors (MPPs) have slowly gained acceptance within the scientific community. At present these machines typically contain a few hundred to one thousand off-the-shelf microprocessors and a total memory of up to 32 GBytes. The potential performance of these machines is illustrated by the fact that a month long job on a high end workstation might require only a few hours on an MPP. The acceptance of MPPs has been slow for a variety of reasons. For example, some algorithms are not easily parallelizable. Also, in the past these machines were difficult to program. But in recent years the development of Fortran-like languages such as CM Fortran and High Performance Fortran have made MPPs much easier to use. In the following we will describe how MPPs can be used for beam dynamics calculations and long term particle tracking
International Nuclear Information System (INIS)
Pandya, Tara M.; Johnson, Seth R.; Evans, Thomas M.; Davidson, Gregory G.; Hamilton, Steven P.; Godfrey, Andrew T.
2015-01-01
This paper discusses the implementation, capabilities, and validation of Shift, a massively parallel Monte Carlo radiation transport package developed and maintained at Oak Ridge National Laboratory. It has been developed to scale well from laptop to small computing clusters to advanced supercomputers. Special features of Shift include hybrid capabilities for variance reduction such as CADIS and FW-CADIS, and advanced parallel decomposition and tally methods optimized for scalability on supercomputing architectures. Shift has been validated and verified against various reactor physics benchmarks and compares well to other state-of-the-art Monte Carlo radiation transport codes such as MCNP5, CE KENO-VI, and OpenMC. Some specific benchmarks used for verification and validation include the CASL VERA criticality test suite and several Westinghouse AP1000 ® problems. These benchmark and scaling studies show promising results
Guermond, Jean-Luc; Minev, Peter D.; Salgado, Abner J.
2012-01-01
We provide a convergence analysis for a new fractional timestepping technique for the incompressible Navier-Stokes equations based on direction splitting. This new technique is of linear complexity, unconditionally stable and convergent, and suitable for massive parallelization. © 2012 American Mathematical Society.
DEFF Research Database (Denmark)
Eduardoff, M; Gross, T E; Santos, C
2016-01-01
Seq™ PCR primers was designed for the Global AIM-SNPs to perform massively parallel sequencing using the Ion PGM™ system. This study assessed individual SNP genotyping precision using the Ion PGM™, the forensic sensitivity of the multiplex using dilution series, degraded DNA plus simple mixtures...
Energy Technology Data Exchange (ETDEWEB)
Madduri, Kamesh; Ediger, David; Jiang, Karl; Bader, David A.; Chavarria-Miranda, Daniel
2009-02-15
We present a new lock-free parallel algorithm for computing betweenness centralityof massive small-world networks. With minor changes to the data structures, ouralgorithm also achieves better spatial cache locality compared to previous approaches. Betweenness centrality is a key algorithm kernel in HPCS SSCA#2, a benchmark extensively used to evaluate the performance of emerging high-performance computing architectures for graph-theoretic computations. We design optimized implementations of betweenness centrality and the SSCA#2 benchmark for two hardware multithreaded systems: a Cray XMT system with the Threadstorm processor, and a single-socket Sun multicore server with the UltraSPARC T2 processor. For a small-world network of 134 million vertices and 1.073 billion edges, the 16-processor XMT system and the 8-core Sun Fire T5120 server achieve TEPS scores (an algorithmic performance count for the SSCA#2 benchmark) of 160 million and 90 million respectively, which corresponds to more than a 2X performance improvement over the previous parallel implementations. To better characterize the performance of these multithreaded systems, we correlate the SSCA#2 performance results with data from the memory-intensive STREAM and RandomAccess benchmarks. Finally, we demonstrate the applicability of our implementation to analyze massive real-world datasets by computing approximate betweenness centrality for a large-scale IMDb movie-actor network.
Baumgärtel, M.; Ghanem, K.; Kiani, A.; Koch, E.; Pavarini, E.; Sims, H.; Zhang, G.
2017-07-01
We discuss the efficient implementation of general impurity solvers for dynamical mean-field theory. We show that both Lanczos and quantum Monte Carlo in different flavors (Hirsch-Fye, continuous-time hybridization- and interaction-expansion) exhibit excellent scaling on massively parallel supercomputers. We apply these algorithms to simulate realistic model Hamiltonians including the full Coulomb vertex, crystal-field splitting, and spin-orbit interaction. We discuss how to remove the sign problem in the presence of non-diagonal crystal-field and hybridization matrices. We show how to extract the physically observable quantities from imaginary time data, in particular correlation functions and susceptibilities. Finally, we present benchmarks and applications for representative correlated systems.
Multi-mode sensor processing on a dynamically reconfigurable massively parallel processor array
Chen, Paul; Butts, Mike; Budlong, Brad; Wasson, Paul
2008-04-01
This paper introduces a novel computing architecture that can be reconfigured in real time to adapt on demand to multi-mode sensor platforms' dynamic computational and functional requirements. This 1 teraOPS reconfigurable Massively Parallel Processor Array (MPPA) has 336 32-bit processors. The programmable 32-bit communication fabric provides streamlined inter-processor connections with deterministically high performance. Software programmability, scalability, ease of use, and fast reconfiguration time (ranging from microseconds to milliseconds) are the most significant advantages over FPGAs and DSPs. This paper introduces the MPPA architecture, its programming model, and methods of reconfigurability. An MPPA platform for reconfigurable computing is based on a structural object programming model. Objects are software programs running concurrently on hundreds of 32-bit RISC processors and memories. They exchange data and control through a network of self-synchronizing channels. A common application design pattern on this platform, called a work farm, is a parallel set of worker objects, with one input and one output stream. Statically configured work farms with homogeneous and heterogeneous sets of workers have been used in video compression and decompression, network processing, and graphics applications.
Analysis of scalability of high-performance 3D image processing platform for virtual colonoscopy.
Yoshida, Hiroyuki; Wu, Yin; Cai, Wenli
2014-03-19
One of the key challenges in three-dimensional (3D) medical imaging is to enable the fast turn-around time, which is often required for interactive or real-time response. This inevitably requires not only high computational power but also high memory bandwidth due to the massive amount of data that need to be processed. For this purpose, we previously developed a software platform for high-performance 3D medical image processing, called HPC 3D-MIP platform, which employs increasingly available and affordable commodity computing systems such as the multicore, cluster, and cloud computing systems. To achieve scalable high-performance computing, the platform employed size-adaptive, distributable block volumes as a core data structure for efficient parallelization of a wide range of 3D-MIP algorithms, supported task scheduling for efficient load distribution and balancing, and consisted of a layered parallel software libraries that allow image processing applications to share the common functionalities. We evaluated the performance of the HPC 3D-MIP platform by applying it to computationally intensive processes in virtual colonoscopy. Experimental results showed a 12-fold performance improvement on a workstation with 12-core CPUs over the original sequential implementation of the processes, indicating the efficiency of the platform. Analysis of performance scalability based on the Amdahl's law for symmetric multicore chips showed the potential of a high performance scalability of the HPC 3D-MIP platform when a larger number of cores is available.
Search for charged massive long-lived particles at D0
Xie, Yunhe
2009-05-01
We report on a new search for charged massive long-lived particles (CMLLP) by the D0 Experiment at Fermilab's Teva- tron. CMLLP are predicted in many theories beyond Standard Model. Time-of-flight information was used in the search for pair-produced CMLLPs, based on the signature of two particles, reconstructed as muons, with speed and invariant mass inconsistent with beam-produced muons. The analysis was done with the data taken by D0 detector in Run II cor- responding to an integrated luminosity of 3 fb-1. Limits on the pair production of CMLLPs are presented quasi-model independently.
Grid-optimized Web 3D applications on wide area network
Wang, Frank; Helian, Na; Meng, Lingkui; Wu, Sining; Zhang, Wen; Guo, Yike; Parker, Michael Andrew
2008-08-01
Geographical information system has come into the Web Service times now. In this paper, Web3D applications have been developed based on our developed Gridjet platform, which provides a more effective solution for massive 3D geo-dataset sharing in distributed environments. Web3D services enabling web users could access the services as 3D scenes, virtual geographical environment and so on. However, Web3D services should be shared by thousands of essential users that inherently distributed on different geography locations. Large 3D geo-datasets need to be transferred to distributed clients via conventional HTTP, NFS and FTP protocols, which often encounters long waits and frustration in distributed wide area network environments. GridJet was used as the underlying engine between the Web 3D application node and geo-data server that utilizes a wide range of technologies including the one of paralleling the remote file access, which is a WAN/Grid-optimized protocol and provides "local-like" accesses to remote 3D geo-datasets. No change in the way of using software is required since the multi-streamed GridJet protocol remains fully compatible with existing IP infrastructures. Our recent progress includes a real-world test that Web3D applications as Google Earth over the GridJet protocol beats those over the classic ones by a factor of 2-7 where the transfer distance is over 10,000 km.
High-Fidelity RF Gun Simulations with the Parallel 3D Finite Element Particle-In-Cell Code Pic3P
Energy Technology Data Exchange (ETDEWEB)
Candel, A; Kabel, A.; Lee, L.; Li, Z.; Limborg, C.; Ng, C.; Schussman, G.; Ko, K.; /SLAC
2009-06-19
SLAC's Advanced Computations Department (ACD) has developed the first parallel Finite Element 3D Particle-In-Cell (PIC) code, Pic3P, for simulations of RF guns and other space-charge dominated beam-cavity interactions. Pic3P solves the complete set of Maxwell-Lorentz equations and thus includes space charge, retardation and wakefield effects from first principles. Pic3P uses higher-order Finite Elementmethods on unstructured conformal meshes. A novel scheme for causal adaptive refinement and dynamic load balancing enable unprecedented simulation accuracy, aiding the design and operation of the next generation of accelerator facilities. Application to the Linac Coherent Light Source (LCLS) RF gun is presented.
Differences Between Distributed and Parallel Systems
Energy Technology Data Exchange (ETDEWEB)
Brightwell, R.; Maccabe, A.B.; Rissen, R.
1998-10-01
Distributed systems have been studied for twenty years and are now coming into wider use as fast networks and powerful workstations become more readily available. In many respects a massively parallel computer resembles a network of workstations and it is tempting to port a distributed operating system to such a machine. However, there are significant differences between these two environments and a parallel operating system is needed to get the best performance out of a massively parallel system. This report characterizes the differences between distributed systems, networks of workstations, and massively parallel systems and analyzes the impact of these differences on operating system design. In the second part of the report, we introduce Puma, an operating system specifically developed for massively parallel systems. We describe Puma portals, the basic building blocks for message passing paradigms implemented on top of Puma, and show how the differences observed in the first part of the report have influenced the design and implementation of Puma.
Phase space simulation of collisionless stellar systems on the massively parallel processor
International Nuclear Information System (INIS)
White, R.L.
1987-01-01
A numerical technique for solving the collisionless Boltzmann equation describing the time evolution of a self gravitating fluid in phase space was implemented on the Massively Parallel Processor (MPP). The code performs calculations for a two dimensional phase space grid (with one space and one velocity dimension). Some results from calculations are presented. The execution speed of the code is comparable to the speed of a single processor of a Cray-XMP. Advantages and disadvantages of the MPP architecture for this type of problem are discussed. The nearest neighbor connectivity of the MPP array does not pose a significant obstacle. Future MPP-like machines should have much more local memory and easier access to staging memory and disks in order to be effective for this type of problem
Hybrid shared/distributed parallelism for 3D characteristics transport solvers
International Nuclear Information System (INIS)
Dahmani, M.; Roy, R.
2005-01-01
In this paper, we will present a new hybrid parallel model for solving large-scale 3-dimensional neutron transport problems used in nuclear reactor simulations. Large heterogeneous reactor problems, like the ones that occurs when simulating Candu cores, have remained computationally intensive and impractical for routine applications on single-node or even vector computers. Based on the characteristics method, this new model is designed to solve the transport equation after distributing the calculation load on a network of shared memory multi-processors. The tracks are either generated on the fly at each characteristics sweep or stored in sequential files. The load balancing is taken into account by estimating the calculation load of tracks and by distributing batches of uniform load on each node of the network. Moreover, the communication overhead can be predicted after benchmarking the latency and bandwidth using appropriate network test suite. These models are useful for predicting the performance of the parallel applications and to analyze the scalability of the parallel systems. (authors)
Treinish, Lloyd A.; Gough, Michael L.; Wildenhain, W. David
1987-01-01
The capability was developed of rapidly producing visual representations of large, complex, multi-dimensional space and earth sciences data sets via the implementation of computer graphics modeling techniques on the Massively Parallel Processor (MPP) by employing techniques recently developed for typically non-scientific applications. Such capabilities can provide a new and valuable tool for the understanding of complex scientific data, and a new application of parallel computing via the MPP. A prototype system with such capabilities was developed and integrated into the National Space Science Data Center's (NSSDC) Pilot Climate Data System (PCDS) data-independent environment for computer graphics data display to provide easy access to users. While developing these capabilities, several problems had to be solved independently of the actual use of the MPP, all of which are outlined.
International Nuclear Information System (INIS)
Wang Shi; Kang Kejun; Wang Jingjin
1996-01-01
Computerized Tomography (CT) is expected to become an inevitable diagnostic technique in the future. However, the long time required to reconstruct an image has been one of the major drawbacks associated with this technique. Parallel process is one of the best way to solve this problem. This paper gives the architecture, hardware and software design of PIRS-4 (4-processor Parallel Image Reconstruction System), which is a parallel processing system for fast 3D-CT image reconstruction by circular shifting float memory architecture. It includes the structure and components of the system, the design of crossbar switch and details of control model, the description of RPBP image reconstruction, the choice of OS (Operate System) and language, the principle of imitating EMS, direct memory R/W of float and programming in the protect model. Finally, the test results are given
Seismic processing using Parallel 3D FMM
Borlaug, Idar
2007-01-01
This thesis develops and tests 3D Fast Marching Method (FMM) algorithm and apply these to seismic simulations. The FMM is a general method for monotonically advancing fronts, originally developed by Sethian. It calculates the first arrival time for an advancing front or wave. FMM methods are used for a variety of applications including, fatigue cracks in materials, lymph node segmentation in CT images, computing skeletons and centerlines in 3D objects and for finding salt formations in seismi...
Keppenne, Christian L.; Rienecker, Michele; Borovikov, Anna Y.; Suarez, Max
1999-01-01
A massively parallel ensemble Kalman filter (EnKF)is used to assimilate temperature data from the TOGA/TAO array and altimetry from TOPEX/POSEIDON into a Pacific basin version of the NASA Seasonal to Interannual Prediction Project (NSIPP)ls quasi-isopycnal ocean general circulation model. The EnKF is an approximate Kalman filter in which the error-covariance propagation step is modeled by the integration of multiple instances of a numerical model. An estimate of the true error covariances is then inferred from the distribution of the ensemble of model state vectors. This inplementation of the filter takes advantage of the inherent parallelism in the EnKF algorithm by running all the model instances concurrently. The Kalman filter update step also occurs in parallel by having each processor process the observations that occur in the region of physical space for which it is responsible. The massively parallel data assimilation system is validated by withholding some of the data and then quantifying the extent to which the withheld information can be inferred from the assimilation of the remaining data. The distributions of the forecast and analysis error covariances predicted by the ENKF are also examined.
3-D Hybrid Simulation of Quasi-Parallel Bow Shock and Its Effects on the Magnetosphere
International Nuclear Information System (INIS)
Lin, Y.; Wang, X.Y.
2005-01-01
A three-dimensional (3-D) global-scale hybrid simulation is carried out for the structure of the quasi-parallel bow shock, in particular the foreshock waves and pressure pulses. The wave evolution and interaction with the dayside magnetosphere are discussed. It is shown that diamagnetic cavities are generated in the turbulent foreshock due to the ion beam plasma interaction, and these compressional pulses lead to strong surface perturbations at the magnetopause and Alfven waves/field line resonance in the magnetosphere
Advanced 3-D Ultrasound Imaging
DEFF Research Database (Denmark)
Rasmussen, Morten Fischer
The main purpose of the PhD project was to develop methods that increase the 3-D ultrasound imaging quality available for the medical personnel in the clinic. Acquiring a 3-D volume gives the medical doctor the freedom to investigate the measured anatomy in any slice desirable after the scan has...... been completed. This allows for precise measurements of organs dimensions and makes the scan more operator independent. Real-time 3-D ultrasound imaging is still not as widespread in use in the clinics as 2-D imaging. A limiting factor has traditionally been the low image quality achievable using...... a channel limited 2-D transducer array and the conventional 3-D beamforming technique, Parallel Beamforming. The first part of the scientific contributions demonstrate that 3-D synthetic aperture imaging achieves a better image quality than the Parallel Beamforming technique. Data were obtained using both...
Directory of Open Access Journals (Sweden)
Sonja Hall-Mendelin
Full Text Available Human disease incidence attributed to arbovirus infection is increasing throughout the world, with effective control interventions limited by issues of sustainability, insecticide resistance and the lack of effective vaccines. Several promising control strategies are currently under development, such as the release of mosquitoes trans-infected with virus-blocking Wolbachia bacteria. Implementation of any control program is dependent on effective virus surveillance and a thorough understanding of virus-vector interactions. Massively parallel sequencing has enormous potential for providing comprehensive genomic information that can be used to assess many aspects of arbovirus ecology, as well as to evaluate novel control strategies. To demonstrate proof-of-principle, we analyzed Aedes aegypti or Aedes albopictus experimentally infected with dengue, yellow fever or chikungunya viruses. Random amplification was used to prepare sufficient template for sequencing on the Personal Genome Machine. Viral sequences were present in all infected mosquitoes. In addition, in most cases, we were also able to identify the mosquito species and mosquito micro-organisms, including the bacterial endosymbiont Wolbachia. Importantly, naturally occurring Wolbachia strains could be differentiated from strains that had been trans-infected into the mosquito. The method allowed us to assemble near full-length viral genomes and detect other micro-organisms without prior sequence knowledge, in a single reaction. This is a step toward the application of massively parallel sequencing as an arbovirus surveillance tool. It has the potential to provide insight into virus transmission dynamics, and has applicability to the post-release monitoring of Wolbachia in mosquito populations.
Computations on the massively parallel processor at the Goddard Space Flight Center
Strong, James P.
1991-01-01
Described are four significant algorithms implemented on the massively parallel processor (MPP) at the Goddard Space Flight Center. Two are in the area of image analysis. Of the other two, one is a mathematical simulation experiment and the other deals with the efficient transfer of data between distantly separated processors in the MPP array. The first algorithm presented is the automatic determination of elevations from stereo pairs. The second algorithm solves mathematical logistic equations capable of producing both ordered and chaotic (or random) solutions. This work can potentially lead to the simulation of artificial life processes. The third algorithm is the automatic segmentation of images into reasonable regions based on some similarity criterion, while the fourth is an implementation of a bitonic sort of data which significantly overcomes the nearest neighbor interconnection constraints on the MPP for transferring data between distant processors.
An FPGA-Based Massively Parallel Neuromorphic Cortex Simulator.
Wang, Runchun M; Thakur, Chetan S; van Schaik, André
2018-01-01
This paper presents a massively parallel and scalable neuromorphic cortex simulator designed for simulating large and structurally connected spiking neural networks, such as complex models of various areas of the cortex. The main novelty of this work is the abstraction of a neuromorphic architecture into clusters represented by minicolumns and hypercolumns, analogously to the fundamental structural units observed in neurobiology. Without this approach, simulating large-scale fully connected networks needs prohibitively large memory to store look-up tables for point-to-point connections. Instead, we use a novel architecture, based on the structural connectivity in the neocortex, such that all the required parameters and connections can be stored in on-chip memory. The cortex simulator can be easily reconfigured for simulating different neural networks without any change in hardware structure by programming the memory. A hierarchical communication scheme allows one neuron to have a fan-out of up to 200 k neurons. As a proof-of-concept, an implementation on one Altera Stratix V FPGA was able to simulate 20 million to 2.6 billion leaky-integrate-and-fire (LIF) neurons in real time. We verified the system by emulating a simplified auditory cortex (with 100 million neurons). This cortex simulator achieved a low power dissipation of 1.62 μW per neuron. With the advent of commercially available FPGA boards, our system offers an accessible and scalable tool for the design, real-time simulation, and analysis of large-scale spiking neural networks.
An FPGA-Based Massively Parallel Neuromorphic Cortex Simulator
Directory of Open Access Journals (Sweden)
Runchun M. Wang
2018-04-01
Full Text Available This paper presents a massively parallel and scalable neuromorphic cortex simulator designed for simulating large and structurally connected spiking neural networks, such as complex models of various areas of the cortex. The main novelty of this work is the abstraction of a neuromorphic architecture into clusters represented by minicolumns and hypercolumns, analogously to the fundamental structural units observed in neurobiology. Without this approach, simulating large-scale fully connected networks needs prohibitively large memory to store look-up tables for point-to-point connections. Instead, we use a novel architecture, based on the structural connectivity in the neocortex, such that all the required parameters and connections can be stored in on-chip memory. The cortex simulator can be easily reconfigured for simulating different neural networks without any change in hardware structure by programming the memory. A hierarchical communication scheme allows one neuron to have a fan-out of up to 200 k neurons. As a proof-of-concept, an implementation on one Altera Stratix V FPGA was able to simulate 20 million to 2.6 billion leaky-integrate-and-fire (LIF neurons in real time. We verified the system by emulating a simplified auditory cortex (with 100 million neurons. This cortex simulator achieved a low power dissipation of 1.62 μW per neuron. With the advent of commercially available FPGA boards, our system offers an accessible and scalable tool for the design, real-time simulation, and analysis of large-scale spiking neural networks.
Lagardère, Louis; Jolly, Luc-Henri; Lipparini, Filippo; Aviat, Félix; Stamm, Benjamin; Jing, Zhifeng F; Harger, Matthew; Torabifard, Hedieh; Cisneros, G Andrés; Schnieders, Michael J; Gresh, Nohad; Maday, Yvon; Ren, Pengyu Y; Ponder, Jay W; Piquemal, Jean-Philip
2018-01-28
We present Tinker-HP, a massively MPI parallel package dedicated to classical molecular dynamics (MD) and to multiscale simulations, using advanced polarizable force fields (PFF) encompassing distributed multipoles electrostatics. Tinker-HP is an evolution of the popular Tinker package code that conserves its simplicity of use and its reference double precision implementation for CPUs. Grounded on interdisciplinary efforts with applied mathematics, Tinker-HP allows for long polarizable MD simulations on large systems up to millions of atoms. We detail in the paper the newly developed extension of massively parallel 3D spatial decomposition to point dipole polarizable models as well as their coupling to efficient Krylov iterative and non-iterative polarization solvers. The design of the code allows the use of various computer systems ranging from laboratory workstations to modern petascale supercomputers with thousands of cores. Tinker-HP proposes therefore the first high-performance scalable CPU computing environment for the development of next generation point dipole PFFs and for production simulations. Strategies linking Tinker-HP to Quantum Mechanics (QM) in the framework of multiscale polarizable self-consistent QM/MD simulations are also provided. The possibilities, performances and scalability of the software are demonstrated via benchmarks calculations using the polarizable AMOEBA force field on systems ranging from large water boxes of increasing size and ionic liquids to (very) large biosystems encompassing several proteins as well as the complete satellite tobacco mosaic virus and ribosome structures. For small systems, Tinker-HP appears to be competitive with the Tinker-OpenMM GPU implementation of Tinker. As the system size grows, Tinker-HP remains operational thanks to its access to distributed memory and takes advantage of its new algorithmic enabling for stable long timescale polarizable simulations. Overall, a several thousand-fold acceleration over
Hegde, Ganapathi; Vaya, Pukhraj
2013-10-01
This article presents a parallel architecture for 3-D discrete wavelet transform (3-DDWT). The proposed design is based on the 1-D pipelined lifting scheme. The architecture is fully scalable beyond the present coherent Daubechies filter bank (9, 7). This 3-DDWT architecture has advantages such as no group of pictures restriction and reduced memory referencing. It offers low power consumption, low latency and high throughput. The computing technique is based on the concept that lifting scheme minimises the storage requirement. The application specific integrated circuit implementation of the proposed architecture is done by synthesising it using 65 nm Taiwan Semiconductor Manufacturing Company standard cell library. It offers a speed of 486 MHz with a power consumption of 2.56 mW. This architecture is suitable for real-time video compression even with large frame dimensions.
Calafiura, Paolo; The ATLAS collaboration; Seuster, Rolf; Tsulaia, Vakhtang; van Gemmeren, Peter
2015-01-01
AthenaMP is a multi-process version of the ATLAS reconstruction and data analysis framework Athena. By leveraging Linux fork and copy-on-write, it allows the sharing of memory pages between event processors running on the same compute node with little to no change in the application code. Originally targeted to optimize the memory footprint of reconstruction jobs, AthenaMP has demonstrated that it can reduce the memory usage of certain confugurations of ATLAS production jobs by a factor of 2. AthenaMP has also evolved to become the parallel event-processing core of the recently developed ATLAS infrastructure for fine-grained event processing (Event Service) which allows to run AthenaMP inside massively parallel distributed applications on hundreds of compute nodes simultaneously. We present the architecture of AthenaMP, various strategies implemented by AthenaMP for scheduling workload to worker processes (for example: Shared Event Queue and Shared Distributor of Event Tokens) and the usage of AthenaMP in the...
Calafiura, Paolo; Seuster, Rolf; Tsulaia, Vakhtang; van Gemmeren, Peter
2015-01-01
AthenaMP is a multi-process version of the ATLAS reconstruction, simulation and data analysis framework Athena. By leveraging Linux fork and copy-on-write, it allows for sharing of memory pages between event processors running on the same compute node with little to no change in the application code. Originally targeted to optimize the memory footprint of reconstruction jobs, AthenaMP has demonstrated that it can reduce the memory usage of certain configurations of ATLAS production jobs by a factor of 2. AthenaMP has also evolved to become the parallel event-processing core of the recently developed ATLAS infrastructure for fine-grained event processing (Event Service) which allows to run AthenaMP inside massively parallel distributed applications on hundreds of compute nodes simultaneously. We present the architecture of AthenaMP, various strategies implemented by AthenaMP for scheduling workload to worker processes (for example: Shared Event Queue and Shared Distributor of Event Tokens) and the usage of Ath...
Flexbar 3.0 - SIMD and multicore parallelization.
Roehr, Johannes T; Dieterich, Christoph; Reinert, Knut
2017-09-15
High-throughput sequencing machines can process many samples in a single run. For Illumina systems, sequencing reads are barcoded with an additional DNA tag that is contained in the respective sequencing adapters. The recognition of barcode and adapter sequences is hence commonly needed for the analysis of next-generation sequencing data. Flexbar performs demultiplexing based on barcodes and adapter trimming for such data. The massive amounts of data generated on modern sequencing machines demand that this preprocessing is done as efficiently as possible. We present Flexbar 3.0, the successor of the popular program Flexbar. It employs now twofold parallelism: multi-threading and additionally SIMD vectorization. Both types of parallelism are used to speed-up the computation of pair-wise sequence alignments, which are used for the detection of barcodes and adapters. Furthermore, new features were included to cover a wide range of applications. We evaluated the performance of Flexbar based on a simulated sequencing dataset. Our program outcompetes other tools in terms of speed and is among the best tools in the presented quality benchmark. https://github.com/seqan/flexbar. johannes.roehr@fu-berlin.de or knut.reinert@fu-berlin.de. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com
Tryton Supercomputer Capabilities for Analysis of Massive Data Streams
Directory of Open Access Journals (Sweden)
Krawczyk Henryk
2015-09-01
Full Text Available The recently deployed supercomputer Tryton, located in the Academic Computer Center of Gdansk University of Technology, provides great means for massive parallel processing. Moreover, the status of the Center as one of the main network nodes in the PIONIER network enables the fast and reliable transfer of data produced by miscellaneous devices scattered in the area of the whole country. The typical examples of such data are streams containing radio-telescope and satellite observations. Their analysis, especially with real-time constraints, can be challenging and requires the usage of dedicated software components. We propose a solution for such parallel analysis using the supercomputer, supervised by the KASKADA platform, which with the conjunction with immerse 3D visualization techniques can be used to solve problems such as pulsar detection and chronometric or oil-spill simulation on the sea surface.
Cloud identification using genetic algorithms and massively parallel computation
Buckles, Bill P.; Petry, Frederick E.
1996-01-01
As a Guest Computational Investigator under the NASA administered component of the High Performance Computing and Communication Program, we implemented a massively parallel genetic algorithm on the MasPar SIMD computer. Experiments were conducted using Earth Science data in the domains of meteorology and oceanography. Results obtained in these domains are competitive with, and in most cases better than, similar problems solved using other methods. In the meteorological domain, we chose to identify clouds using AVHRR spectral data. Four cloud speciations were used although most researchers settle for three. Results were remarkedly consistent across all tests (91% accuracy). Refinements of this method may lead to more timely and complete information for Global Circulation Models (GCMS) that are prevalent in weather forecasting and global environment studies. In the oceanographic domain, we chose to identify ocean currents from a spectrometer having similar characteristics to AVHRR. Here the results were mixed (60% to 80% accuracy). Given that one is willing to run the experiment several times (say 10), then it is acceptable to claim the higher accuracy rating. This problem has never been successfully automated. Therefore, these results are encouraging even though less impressive than the cloud experiment. Successful conclusion of an automated ocean current detection system would impact coastal fishing, naval tactics, and the study of micro-climates. Finally we contributed to the basic knowledge of GA (genetic algorithm) behavior in parallel environments. We developed better knowledge of the use of subpopulations in the context of shared breeding pools and the migration of individuals. Rigorous experiments were conducted based on quantifiable performance criteria. While much of the work confirmed current wisdom, for the first time we were able to submit conclusive evidence. The software developed under this grant was placed in the public domain. An extensive user
Energy Technology Data Exchange (ETDEWEB)
Liu, F. S. [College of Physical Science and Technology, Shenyang Normal University, Shenyang 110034 (China); Guo Yicheng; Koo, David C.; Trump, Jonathan R.; Barro, Guillermo; Yesuf, Hassen; Faber, S. M.; Cheung, Edmond [UCO/Lick Observatory, Department of Astronomy and Astrophysics, University of California, Santa Cruz, CA 95064 (United States); Giavalisco, M. [Department of Astronomy, University of Massachusetts, Amherst, MA 01003 (United States); Cassata, P. [Aix Marseille Universite, CNRS, LAM-Laboratoire d' Astrophysique de Marseille, F-13388 Marseille (France); Koekemoer, A. M.; Grogin, Norman A. [Space Telescope Science Institute, 3700 San Martin Boulevard, Baltimore, MD 21218 (United States); Pentericci, L.; Castellano, M. [INAF Osservatorio Astronomico di Roma, Via Frascati 33, I-00040 Monteporzio (RM) (Italy); Mao, Shude [National Astronomical Observatories, Chinese Academy of Sciences, A20 Datun Road, Beijing 100012 (China); Xia, X. Y. [Tianjin Astrophysics Center, Tianjin Normal University, Tianjin 300387 (China); Hathi, Nimish P. [Observatories of the Carnegie Institution for Science, Pasadena, CA 91101 (United States); Huang, Kuang-Han [Johns Hopkins University, 3400 North Charles Street, Baltimore, MD 21218 (United States); Kocevski, Dale [Department of Physics and Astronomy, University of Kentucky, Lexington, KY 40506-0055 (United States); McGrath, Elizabeth J., E-mail: fengshan@ucolick.org [Department of Physics and Astronomy, Colby College, Mayflower Hill Drive, Waterville, ME 0490 (United States); and others
2013-06-01
We have made a serendipitous discovery of a massive ({approx}5 Multiplication-Sign 10{sup 11} M{sub Sun }) cD galaxy at z = 1.096 in a candidate-rich cluster in the Hubble Ultra Deep Field (HUDF) area of GOODS-South. This brightest cluster galaxy (BCG) is the most distant cD galaxy confirmed to date. Ultra-deep HST/WFC3 images reveal an extended envelope starting from {approx}10 kpc and reaching {approx}70 kpc in radius along the semimajor axis. The spectral energy distributions indicate that both its inner component and outer envelope are composed of an old, passively evolving (specific star formation rate <10{sup -4} Gyr{sup -1}) stellar population. The cD galaxy lies on the same mass-size relation as the bulk of quiescent galaxies at similar redshifts. The cD galaxy has a higher stellar mass surface density ({approx}M{sub *}/R{sub 50}{sup 2}) but a similar velocity dispersion ({approx}{radical}(M{sub *}/R{sub 50})) to those of more massive, nearby cDs. If the cD galaxy is one of the progenitors of today's more massive cDs, its size (R{sub 50}) and stellar mass have had to increase on average by factors of 3.4 {+-} 1.1 and 3.3 {+-} 1.3 over the past {approx}8 Gyr, respectively. Such increases in size and stellar mass without being accompanied by significant increases in velocity dispersion are consistent with evolutionary scenarios driven by both major and minor dissipationless (dry) mergers. If such cD envelopes originate from dry mergers, our discovery of even one example proves that some BCGs entered the dry merger phase at epochs earlier than z = 1. Our data match theoretical models which predict that the continuance of dry mergers at z < 1 can result in structures similar to those of massive cD galaxies seen today. Moreover, our discovery is a surprise given that the extreme depth of the HUDF is essential to reveal such an extended cD envelope at z > 1 and, yet, the HUDF covers only a minuscule region of sky ({approx}3.1 Multiplication-Sign 10{sup -8
Application of parallel computing techniques to a large-scale reservoir simulation
International Nuclear Information System (INIS)
Zhang, Keni; Wu, Yu-Shu; Ding, Chris; Pruess, Karsten
2001-01-01
Even with the continual advances made in both computational algorithms and computer hardware used in reservoir modeling studies, large-scale simulation of fluid and heat flow in heterogeneous reservoirs remains a challenge. The problem commonly arises from intensive computational requirement for detailed modeling investigations of real-world reservoirs. This paper presents the application of a massive parallel-computing version of the TOUGH2 code developed for performing large-scale field simulations. As an application example, the parallelized TOUGH2 code is applied to develop a three-dimensional unsaturated-zone numerical model simulating flow of moisture, gas, and heat in the unsaturated zone of Yucca Mountain, Nevada, a potential repository for high-level radioactive waste. The modeling approach employs refined spatial discretization to represent the heterogeneous fractured tuffs of the system, using more than a million 3-D gridblocks. The problem of two-phase flow and heat transfer within the model domain leads to a total of 3,226,566 linear equations to be solved per Newton iteration. The simulation is conducted on a Cray T3E-900, a distributed-memory massively parallel computer. Simulation results indicate that the parallel computing technique, as implemented in the TOUGH2 code, is very efficient. The reliability and accuracy of the model results have been demonstrated by comparing them to those of small-scale (coarse-grid) models. These comparisons show that simulation results obtained with the refined grid provide more detailed predictions of the future flow conditions at the site, aiding in the assessment of proposed repository performance
A parallel implementation of 3D Zernike moment analysis
Berjón Díez, Daniel; Arnaldo Duart, Sergio; Morán Burgos, Francisco
2011-01-01
Zernike polynomials are a well known set of functions that find many applications in image or pattern characterization because they allow to construct shape descriptors that are invariant against translations, rotations or scale changes. The concepts behind them can be extended to higher dimension spaces, making them also fit to describe volumetric data. They have been less used than their properties might suggest due to their high computational cost. We present a parallel implementation of 3...
Koldan, Jelena; Puzyrev, Vladimir; de la Puente, Josep; Houzeaux, Guillaume; Cela, José María
2014-06-01
We present an elaborate preconditioning scheme for Krylov subspace methods which has been developed to improve the performance and reduce the execution time of parallel node-based finite-element (FE) solvers for 3-D electromagnetic (EM) numerical modelling in exploration geophysics. This new preconditioner is based on algebraic multigrid (AMG) that uses different basic relaxation methods, such as Jacobi, symmetric successive over-relaxation (SSOR) and Gauss-Seidel, as smoothers and the wave front algorithm to create groups, which are used for a coarse-level generation. We have implemented and tested this new preconditioner within our parallel nodal FE solver for 3-D forward problems in EM induction geophysics. We have performed series of experiments for several models with different conductivity structures and characteristics to test the performance of our AMG preconditioning technique when combined with biconjugate gradient stabilized method. The results have shown that, the more challenging the problem is in terms of conductivity contrasts, ratio between the sizes of grid elements and/or frequency, the more benefit is obtained by using this preconditioner. Compared to other preconditioning schemes, such as diagonal, SSOR and truncated approximate inverse, the AMG preconditioner greatly improves the convergence of the iterative solver for all tested models. Also, when it comes to cases in which other preconditioners succeed to converge to a desired precision, AMG is able to considerably reduce the total execution time of the forward-problem code-up to an order of magnitude. Furthermore, the tests have confirmed that our AMG scheme ensures grid-independent rate of convergence, as well as improvement in convergence regardless of how big local mesh refinements are. In addition, AMG is designed to be a black-box preconditioner, which makes it easy to use and combine with different iterative methods. Finally, it has proved to be very practical and efficient in the
A Parallel Sweeping Preconditioner for Heterogeneous 3D Helmholtz Equations
Poulson, Jack
2013-05-02
A parallelization of a sweeping preconditioner for three-dimensional Helmholtz equations without large cavities is introduced and benchmarked for several challenging velocity models. The setup and application costs of the sequential preconditioner are shown to be O(γ2N4/3) and O(γN logN), where γ(ω) denotes the modestly frequency-dependent number of grid points per perfectly matched layer. Several computational and memory improvements are introduced relative to using black-box sparse-direct solvers for the auxiliary problems, and competitive runtimes and iteration counts are reported for high-frequency problems distributed over thousands of cores. Two open-source packages are released along with this paper: Parallel Sweeping Preconditioner (PSP) and the underlying distributed multifrontal solver, Clique. © 2013 Society for Industrial and Applied Mathematics.
Morse, H Stephen
1994-01-01
Practical Parallel Computing provides information pertinent to the fundamental aspects of high-performance parallel processing. This book discusses the development of parallel applications on a variety of equipment.Organized into three parts encompassing 12 chapters, this book begins with an overview of the technology trends that converge to favor massively parallel hardware over traditional mainframes and vector machines. This text then gives a tutorial introduction to parallel hardware architectures. Other chapters provide worked-out examples of programs using several parallel languages. Thi
Directory of Open Access Journals (Sweden)
Fang Huang
2016-06-01
Full Text Available In some digital Earth engineering applications, spatial interpolation algorithms are required to process and analyze large amounts of data. Due to its powerful computing capacity, heterogeneous computing has been used in many applications for data processing in various fields. In this study, we explore the design and implementation of a parallel universal kriging spatial interpolation algorithm using the OpenCL programming model on heterogeneous computing platforms for massive Geo-spatial data processing. This study focuses primarily on transforming the hotspots in serial algorithms, i.e., the universal kriging interpolation function, into the corresponding kernel function in OpenCL. We also employ parallelization and optimization techniques in our implementation to improve the code performance. Finally, based on the results of experiments performed on two different high performance heterogeneous platforms, i.e., an NVIDIA graphics processing unit system and an Intel Xeon Phi system (MIC, we show that the parallel universal kriging algorithm can achieve the highest speedup of up to 40× with a single computing device and the highest speedup of up to 80× with multiple devices.
Scalable Multi-Platform Distribution of Spatial 3d Contents
Klimke, J.; Hagedorn, B.; Döllner, J.
2013-09-01
Virtual 3D city models provide powerful user interfaces for communication of 2D and 3D geoinformation. Providing high quality visualization of massive 3D geoinformation in a scalable, fast, and cost efficient manner is still a challenging task. Especially for mobile and web-based system environments, software and hardware configurations of target systems differ significantly. This makes it hard to provide fast, visually appealing renderings of 3D data throughout a variety of platforms and devices. Current mobile or web-based solutions for 3D visualization usually require raw 3D scene data such as triangle meshes together with textures delivered from server to client, what makes them strongly limited in terms of size and complexity of the models they can handle. In this paper, we introduce a new approach for provisioning of massive, virtual 3D city models on different platforms namely web browsers, smartphones or tablets, by means of an interactive map assembled from artificial oblique image tiles. The key concept is to synthesize such images of a virtual 3D city model by a 3D rendering service in a preprocessing step. This service encapsulates model handling and 3D rendering techniques for high quality visualization of massive 3D models. By generating image tiles using this service, the 3D rendering process is shifted from the client side, which provides major advantages: (a) The complexity of the 3D city model data is decoupled from data transfer complexity (b) the implementation of client applications is simplified significantly as 3D rendering is encapsulated on server side (c) 3D city models can be easily deployed for and used by a large number of concurrent users, leading to a high degree of scalability of the overall approach. All core 3D rendering techniques are performed on a dedicated 3D rendering server, and thin-client applications can be compactly implemented for various devices and platforms.
Hierarchical Image Segmentation of Remotely Sensed Data using Massively Parallel GNU-LINUX Software
Tilton, James C.
2003-01-01
A hierarchical set of image segmentations is a set of several image segmentations of the same image at different levels of detail in which the segmentations at coarser levels of detail can be produced from simple merges of regions at finer levels of detail. In [1], Tilton, et a1 describes an approach for producing hierarchical segmentations (called HSEG) and gave a progress report on exploiting these hierarchical segmentations for image information mining. The HSEG algorithm is a hybrid of region growing and constrained spectral clustering that produces a hierarchical set of image segmentations based on detected convergence points. In the main, HSEG employs the hierarchical stepwise optimization (HSWO) approach to region growing, which was described as early as 1989 by Beaulieu and Goldberg. The HSWO approach seeks to produce segmentations that are more optimized than those produced by more classic approaches to region growing (e.g. Horowitz and T. Pavlidis, [3]). In addition, HSEG optionally interjects between HSWO region growing iterations, merges between spatially non-adjacent regions (i.e., spectrally based merging or clustering) constrained by a threshold derived from the previous HSWO region growing iteration. While the addition of constrained spectral clustering improves the utility of the segmentation results, especially for larger images, it also significantly increases HSEG s computational requirements. To counteract this, a computationally efficient recursive, divide-and-conquer, implementation of HSEG (RHSEG) was devised, which includes special code to avoid processing artifacts caused by RHSEG s recursive subdivision of the image data. The recursive nature of RHSEG makes for a straightforward parallel implementation. This paper describes the HSEG algorithm, its recursive formulation (referred to as RHSEG), and the implementation of RHSEG using massively parallel GNU-LINUX software. Results with Landsat TM data are included comparing RHSEG with classic
Parallel thermal radiation transport in two dimensions
International Nuclear Information System (INIS)
Smedley-Stevenson, R.P.; Ball, S.R.
2003-01-01
This paper describes the distributed memory parallel implementation of a deterministic thermal radiation transport algorithm in a 2-dimensional ALE hydrodynamics code. The parallel algorithm consists of a variety of components which are combined in order to produce a state of the art computational capability, capable of solving large thermal radiation transport problems using Blue-Oak, the 3 Tera-Flop MPP (massive parallel processors) computing facility at AWE (United Kingdom). Particular aspects of the parallel algorithm are described together with examples of the performance on some challenging applications. (author)
Parallel thermal radiation transport in two dimensions
Energy Technology Data Exchange (ETDEWEB)
Smedley-Stevenson, R.P.; Ball, S.R. [AWE Aldermaston (United Kingdom)
2003-07-01
This paper describes the distributed memory parallel implementation of a deterministic thermal radiation transport algorithm in a 2-dimensional ALE hydrodynamics code. The parallel algorithm consists of a variety of components which are combined in order to produce a state of the art computational capability, capable of solving large thermal radiation transport problems using Blue-Oak, the 3 Tera-Flop MPP (massive parallel processors) computing facility at AWE (United Kingdom). Particular aspects of the parallel algorithm are described together with examples of the performance on some challenging applications. (author)
SWAMP+: multiple subsequence alignment using associative massive parallelism
Energy Technology Data Exchange (ETDEWEB)
Steinfadt, Shannon Irene [Los Alamos National Laboratory; Baker, Johnnie W [KENT STATE UNIV.
2010-10-18
A new parallel algorithm SWAMP+ incorporates the Smith-Waterman sequence alignment on an associative parallel model known as ASC. It is a highly sensitive parallel approach that expands traditional pairwise sequence alignment. This is the first parallel algorithm to provide multiple non-overlapping, non-intersecting subsequence alignments with the accuracy of Smith-Waterman. The efficient algorithm provides multiple alignments similar to BLAST while creating a better workflow for the end users. The parallel portions of the code run in O(m+n) time using m processors. When m = n, the algorithmic analysis becomes O(n) with a coefficient of two, yielding a linear speedup. Implementation of the algorithm on the SIMD ClearSpeed CSX620 confirms this theoretical linear speedup with real timings.
Laurie, Matthew T; Bertout, Jessica A; Taylor, Sean D; Burton, Joshua N; Shendure, Jay A; Bielas, Jason H
2013-08-01
Due to the high cost of failed runs and suboptimal data yields, quantification and determination of fragment size range are crucial steps in the library preparation process for massively parallel sequencing (or next-generation sequencing). Current library quality control methods commonly involve quantification using real-time quantitative PCR and size determination using gel or capillary electrophoresis. These methods are laborious and subject to a number of significant limitations that can make library calibration unreliable. Herein, we propose and test an alternative method for quality control of sequencing libraries using droplet digital PCR (ddPCR). By exploiting a correlation we have discovered between droplet fluorescence and amplicon size, we achieve the joint quantification and size determination of target DNA with a single ddPCR assay. We demonstrate the accuracy and precision of applying this method to the preparation of sequencing libraries.
Fox, Geoffrey C; Messina, Guiseppe C
2014-01-01
A clear illustration of how parallel computers can be successfully appliedto large-scale scientific computations. This book demonstrates how avariety of applications in physics, biology, mathematics and other scienceswere implemented on real parallel computers to produce new scientificresults. It investigates issues of fine-grained parallelism relevant forfuture supercomputers with particular emphasis on hypercube architecture. The authors describe how they used an experimental approach to configuredifferent massively parallel machines, design and implement basic systemsoftware, and develop
Design for scalability in 3D computer graphics architectures
DEFF Research Database (Denmark)
Holten-Lund, Hans Erik
2002-01-01
This thesis describes useful methods and techniques for designing scalable hybrid parallel rendering architectures for 3D computer graphics. Various techniques for utilizing parallelism in a pipelines system are analyzed. During the Ph.D study a prototype 3D graphics architecture named Hybris has...
Energy Technology Data Exchange (ETDEWEB)
Lichtner, Peter C. [OFM Research, Redmond, WA (United States); Hammond, Glenn E. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Lu, Chuan [Idaho National Lab. (INL), Idaho Falls, ID (United States); Karra, Satish [Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Bisht, Gautam [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Andre, Benjamin [National Center for Atmospheric Research, Boulder, CO (United States); Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Mills, Richard [Intel Corporation, Portland, OR (United States); Univ. of Tennessee, Knoxville, TN (United States); Kumar, Jitendra [Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
2015-01-20
PFLOTRAN solves a system of generally nonlinear partial differential equations describing multi-phase, multicomponent and multiscale reactive flow and transport in porous materials. The code is designed to run on massively parallel computing architectures as well as workstations and laptops (e.g. Hammond et al., 2011). Parallelization is achieved through domain decomposition using the PETSc (Portable Extensible Toolkit for Scientific Computation) libraries for the parallelization framework (Balay et al., 1997). PFLOTRAN has been developed from the ground up for parallel scalability and has been run on up to 218 processor cores with problem sizes up to 2 billion degrees of freedom. Written in object oriented Fortran 90, the code requires the latest compilers compatible with Fortran 2003. At the time of this writing this requires gcc 4.7.x, Intel 12.1.x and PGC compilers. As a requirement of running problems with a large number of degrees of freedom, PFLOTRAN allows reading input data that is too large to fit into memory allotted to a single processor core. The current limitation to the problem size PFLOTRAN can handle is the limitation of the HDF5 file format used for parallel IO to 32 bit integers. Noting that 2^{32} = 4; 294; 967; 296, this gives an estimate of the maximum problem size that can be currently run with PFLOTRAN. Hopefully this limitation will be remedied in the near future.
Statistical method to compare massive parallel sequencing pipelines.
Elsensohn, M H; Leblay, N; Dimassi, S; Campan-Fournier, A; Labalme, A; Roucher-Boulez, F; Sanlaville, D; Lesca, G; Bardel, C; Roy, P
2017-03-01
Today, sequencing is frequently carried out by Massive Parallel Sequencing (MPS) that cuts drastically sequencing time and expenses. Nevertheless, Sanger sequencing remains the main validation method to confirm the presence of variants. The analysis of MPS data involves the development of several bioinformatic tools, academic or commercial. We present here a statistical method to compare MPS pipelines and test it in a comparison between an academic (BWA-GATK) and a commercial pipeline (TMAP-NextGENe®), with and without reference to a gold standard (here, Sanger sequencing), on a panel of 41 genes in 43 epileptic patients. This method used the number of variants to fit log-linear models for pairwise agreements between pipelines. To assess the heterogeneity of the margins and the odds ratios of agreement, four log-linear models were used: a full model, a homogeneous-margin model, a model with single odds ratio for all patients, and a model with single intercept. Then a log-linear mixed model was fitted considering the biological variability as a random effect. Among the 390,339 base-pairs sequenced, TMAP-NextGENe® and BWA-GATK found, on average, 2253.49 and 1857.14 variants (single nucleotide variants and indels), respectively. Against the gold standard, the pipelines had similar sensitivities (63.47% vs. 63.42%) and close but significantly different specificities (99.57% vs. 99.65%; p < 0.001). Same-trend results were obtained when only single nucleotide variants were considered (99.98% specificity and 76.81% sensitivity for both pipelines). The method allows thus pipeline comparison and selection. It is generalizable to all types of MPS data and all pipelines.
Black string in dRGT massive gravity
Energy Technology Data Exchange (ETDEWEB)
Tannukij, Lunchakorn [Mahidol University, Department of Physics, Faculty of Science, Bangkok (Thailand); Hanyang University, Department of Physics, Seoul (Korea, Republic of); Naresuan University, The Institute for Fundamental Study, Phitsanulok (Thailand); Wongjun, Pitayuth [Naresuan University, The Institute for Fundamental Study, Phitsanulok (Thailand); Ministry of Education, Thailand Center of Excellence in Physics, Bangkok (Thailand); Ghosh, Suchant G. [Jamia Millia Islamia, Centre of Theoretical Physics, New Delhi (India); University of Kwazulu-Natal, Astrophysics and Cosmology Research Unit, School of Mathematical Sciences, Durban (South Africa)
2017-12-15
We present a cylindrically symmetric solution, both charged and uncharged, which is known as a black string solution to the nonlinear ghost-free massive gravity found by de Rham, Gabadadze, and Tolley (dRGT). This ''dRGT black string'' can be thought of as a generalization of the black string solution found by Lemos. Moreover, the dRGT black string solution includes other classes of black string solution such as the monopole-black string ones since the graviton mass contributes to the global monopole term as well as the cosmological-constant term. To investigate the solution, we compute mass, temperature, and entropy of the dRGT black string. We found that the existence of the graviton mass drastically affects the thermodynamics of the black string. Furthermore, the Hawking-Page phase transition is found to be possible for the dRGT black string as well as the charged dRGT black string. The dRGT black string solution is thermodynamically stable for r > r{sub c} with negative thermodynamical potential and positive heat capacity while it is unstable for r < r{sub c} where the potential is positive. (orig.)
3D Body Scanning Measurement System Associated with RF Imaging, Zero-padding and Parallel Processing
Directory of Open Access Journals (Sweden)
Kim Hyung Tae
2016-04-01
Full Text Available This work presents a novel signal processing method for high-speed 3D body measurements using millimeter waves with a general processing unit (GPU and zero-padding fast Fourier transform (ZPFFT. The proposed measurement system consists of a radio-frequency (RF antenna array for a penetrable measurement, a high-speed analog-to-digital converter (ADC for significant data acquisition, and a general processing unit for fast signal processing. The RF waves of the transmitter and the receiver are converted to real and imaginary signals that are sampled by a high-speed ADC and synchronized with the kinematic positions of the scanner. Because the distance between the surface and the antenna is related to the peak frequency of the conjugate signals, a fast Fourier transform (FFT is applied to the signal processing after the sampling. The sampling time is finite owing to a short scanning time, and the physical resolution needs to be increased; further, zero-padding is applied to interpolate the spectra of the sampled signals to consider a 1/m floating point frequency. The GPU and parallel algorithm are applied to accelerate the speed of the ZPFFT because of the large number of additional mathematical operations of the ZPFFT. 3D body images are finally obtained by spectrograms that are the arrangement of the ZPFFT in a 3D space.
Crossover from 2d to 3d in anisotropic Kondo lattices
International Nuclear Information System (INIS)
Reyes, D.; Continentino, M.A.
2008-01-01
We study the crossover from two to three dimensions in Kondo lattices (KLM) using the Kondo necklace model (KNM). In order to diagonalize the KNM, we use a representation for the localized and conduction electron spins in terms of bond operators and a decoupling for the relevant Green's functions. Both models have a quantum critical point at a finite value of the ratio (J/t) between the Kondo coupling (J) and the hopping (t). In 2d there is no line of finite temperature antiferromagnetic (AF) transitions while for d≥3 this line is given by, T N ∝|g| 1/(d-1) [D. Reyes, M.A. Continentino, Phys. Rev. B 76 (2007) 075114]. The crossover from 2d to 3d is investigated by turning on the electronic hopping (t -perpendicular ) of conduction electrons between different planes. The phase diagram as a function of temperature T, J/t -parallel and ξ=t -perpendicular /t -parallel , where t -parallel is the hopping within the planes is calculated
Chu, Chunlei
2009-01-01
The major performance bottleneck of the parallel Fourier method on distributed memory systems is the network communication cost. In this study, we investigate the potential of using non‐blocking all‐to‐all communications to solve this problem by overlapping computation and communication. We present the runtime comparison of a 3D seismic modeling problem with the Fourier method using non‐blocking and blocking calls, respectively, on a Linux cluster. The data demonstrate that a performance improvement of up to 40% can be achieved by simply changing blocking all‐to‐all communication calls to non‐blocking ones to introduce the overlapping capability. A 3D reverse‐time migration result is also presented as an extension to the modeling work based on non‐blocking collective communications.
Directory of Open Access Journals (Sweden)
Asker Noomi
2009-07-01
Full Text Available Abstract Background The teleost Zoarces viviparus (eelpout lives along the coasts of Northern Europe and has long been an established model organism for marine ecology and environmental monitoring. The scarce information about this species genome has however restrained the use of efficient molecular-level assays, such as gene expression microarrays. Results In the present study we present the first comprehensive characterization of the Zoarces viviparus liver transcriptome. From 400,000 reads generated by massively parallel pyrosequencing, more than 50,000 pieces of putative transcripts were assembled, annotated and functionally classified. The data was estimated to cover roughly 40% of the total transcriptome and homologues for about half of the genes of Gasterosteus aculeatus (stickleback were identified. The sequence data was consequently used to design an oligonucleotide microarray for large-scale gene expression analysis. Conclusion Our results show that one run using a Genome Sequencer FLX from 454 Life Science/Roche generates enough genomic information for adequate de novo assembly of a large number of genes in a higher vertebrate. The generated sequence data, including the validated microarray probes, are publicly available to promote genome-wide research in Zoarces viviparus.
Parallel 3D Simulation of Seismic Wave Propagation in the Structure of Nobi Plain, Central Japan
Kotani, A.; Furumura, T.; Hirahara, K.
2003-12-01
We performed large-scale parallel simulations of the seismic wave propagation to understand the complex wave behavior in the 3D basin structure of the Nobi Plain, which is one of the high population cities in central Japan. In this area, many large earthquakes occurred in the past, such as the 1891 Nobi earthquake (M8.0), the 1944 Tonankai earthquake (M7.9) and the 1945 Mikawa earthquake (M6.8). In order to mitigate the potential disasters for future earthquakes, 3D subsurface structure of Nobi Plain has recently been investigated by local governments. We referred to this model together with bouguer anomaly data to construct a detail 3D basin structure model for Nobi plain, and conducted computer simulations of ground motions. We first evaluated the ground motions for two small earthquakes (M4~5); one occurred just beneath the basin edge at west, and the other occurred at south. The ground motions from these earthquakes were well recorded by the strong motion networks; K-net, Kik-net, and seismic intensity instruments operated by local governments. We compare the observed seismograms with simulations to validate the 3D model. For the 3D simulation we sliced the 3D model into a number of layers to assign to many processors for concurrent computing. The equation of motions are solved using a high order (32nd) staggered-grid FDM in horizontal directions, and a conventional (4th-order) FDM in vertical direction with the MPI inter-processor communications between neighbor region. The simulation model is 128km by 128km by 43km, which is discritized at variable grid size of 62.5-125m in horizontal directions and of 31.25-62.5m in vertical direction. We assigned a minimum shear wave velocity is Vs=0.4km/s, at the top of the sedimentary basin. The seismic sources for the small events are approximated by double-couple point source and we simulate the seismic wave propagation at maximum frequency of 2Hz. We used the Earth Simulator (JAMSTEC, Yokohama Inst) to conduct such
Eduardoff, M; Gross, T E; Santos, C; de la Puente, M; Ballard, D; Strobl, C; Børsting, C; Morling, N; Fusco, L; Hussing, C; Egyed, B; Souto, L; Uacyisrael, J; Syndercombe Court, D; Carracedo, Á; Lareu, M V; Schneider, P M; Parson, W; Phillips, C; Parson, W; Phillips, C
2016-07-01
The EUROFORGEN Global ancestry-informative SNP (AIM-SNPs) panel is a forensic multiplex of 128 markers designed to differentiate an individual's ancestry from amongst the five continental population groups of Africa, Europe, East Asia, Native America, and Oceania. A custom multiplex of AmpliSeq™ PCR primers was designed for the Global AIM-SNPs to perform massively parallel sequencing using the Ion PGM™ system. This study assessed individual SNP genotyping precision using the Ion PGM™, the forensic sensitivity of the multiplex using dilution series, degraded DNA plus simple mixtures, and the ancestry differentiation power of the final panel design, which required substitution of three original ancestry-informative SNPs with alternatives. Fourteen populations that had not been previously analyzed were genotyped using the custom multiplex and these studies allowed assessment of genotyping performance by comparison of data across five laboratories. Results indicate a low level of genotyping error can still occur from sequence misalignment caused by homopolymeric tracts close to the target SNP, despite careful scrutiny of candidate SNPs at the design stage. Such sequence misalignment required the exclusion of component SNP rs2080161 from the Global AIM-SNPs panel. However, the overall genotyping precision and sensitivity of this custom multiplex indicates the Ion PGM™ assay for the Global AIM-SNPs is highly suitable for forensic ancestry analysis with massively parallel sequencing. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Hong, You-Lee; Asakura, Tetsuo; Nishiyama, Yusuke
2018-05-08
β-sheet structure of oligo- and poly-peptides can be formed in anti-parallel (AP)- and parallel (P)-structure, which is the important feature to understand the structures. In principle, P- and AP-β-sheet structures can be identified by the presence (AP) and absence (P) of the interstrand 1HNH/1HNH correlations on a diagonal in 2D 1H double quantum (DQ)/1H single quantum (SQ) spectrum due to the different interstrand 1HNH/1HNH distances between these two arrangements. However, the 1HNH/1HNH peaks overlap to the 1HNH3+/1HNH3+ peaks, which always give cross peaks regardless of the β-sheet arrangement. The 1HNH3+/1HNH3+ peaks disturb the observation of the presence/absence of 1HNH/1HNH correlations and the assignment of 1HNH and 1HNH3+ is not always available. Here, 3D 14N/1H DQ/1H SQ correlation solid-state NMR experiments at fast magic angle spinning (70 kHz) are introduced to distinguish AP and P β-sheet structure. The 14N dimension allows the separate observation of 1HNH/1HNH peaks from 1HNH3+/1HNH3+ peaks with clear assignment of 1HNH and 1HNH3+. In addition, the high natural abundance of 1H and 14N enables 3D 14N/1H DQ/1H SQ experiments of oligo-alanines (Ala3-6) in four hours without any isotope labelling. © 2018 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Frequency of Usher syndrome type 1 in deaf children by massively parallel DNA sequencing.
Yoshimura, Hidekane; Miyagawa, Maiko; Kumakawa, Kozo; Nishio, Shin-Ya; Usami, Shin-Ichi
2016-05-01
Usher syndrome type 1 (USH1) is the most severe of the three USH subtypes due to its profound hearing loss, absent vestibular response and retinitis pigmentosa appearing at a prepubescent age. Six causative genes have been identified for USH1, making early diagnosis and therapy possible through DNA testing. Targeted exon sequencing of selected genes using massively parallel DNA sequencing (MPS) technology enables clinicians to systematically tackle previously intractable monogenic disorders and improve molecular diagnosis. Using MPS along with direct sequence analysis, we screened 227 unrelated non-syndromic deaf children and detected recessive mutations in USH1 causative genes in five patients (2.2%): three patients harbored MYO7A mutations and one each carried CDH23 or PCDH15 mutations. As indicated by an earlier genotype-phenotype correlation study of the CDH23 and PCDH15 genes, we considered the latter two patients to have USH1. Based on clinical findings, it was also highly likely that one patient with MYO7A mutations possessed USH1 due to a late onset age of walking. This first report describing the frequency (1.3-2.2%) of USH1 among non-syndromic deaf children highlights the importance of comprehensive genetic testing for early disease diagnosis.
A massively parallel method of characteristic neutral particle transport code for GPUs
International Nuclear Information System (INIS)
Boyd, W. R.; Smith, K.; Forget, B.
2013-01-01
Over the past 20 years, parallel computing has enabled computers to grow ever larger and more powerful while scientific applications have advanced in sophistication and resolution. This trend is being challenged, however, as the power consumption for conventional parallel computing architectures has risen to unsustainable levels and memory limitations have come to dominate compute performance. Heterogeneous computing platforms, such as Graphics Processing Units (GPUs), are an increasingly popular paradigm for solving these issues. This paper explores the applicability of GPUs for deterministic neutron transport. A 2D method of characteristics (MOC) code - OpenMOC - has been developed with solvers for both shared memory multi-core platforms as well as GPUs. The multi-threading and memory locality methodologies for the GPU solver are presented. Performance results for the 2D C5G7 benchmark demonstrate 25-35 x speedup for MOC on the GPU. The lessons learned from this case study will provide the basis for further exploration of MOC on GPUs as well as design decisions for hardware vendors exploring technologies for the next generation of machines for scientific computing. (authors)
Directory of Open Access Journals (Sweden)
Gabriele Jost
2010-01-01
Full Text Available Today most systems in high-performance computing (HPC feature a hierarchical hardware design: shared-memory nodes with several multi-core CPUs are connected via a network infrastructure. When parallelizing an application for these architectures it seems natural to employ a hierarchical programming model such as combining MPI and OpenMP. Nevertheless, there is the general lore that pure MPI outperforms the hybrid MPI/OpenMP approach. In this paper, we describe the hybrid MPI/OpenMP parallelization of IR3D (Incompressible Realistic 3-D code, a full-scale real-world application, which simulates the environmental effects on the evolution of vortices trailing behind control surfaces of underwater vehicles. We discuss performance, scalability and limitations of the pure MPI version of the code on a variety of hardware platforms and show how the hybrid approach can help to overcome certain limitations.
International Nuclear Information System (INIS)
Liu, F. S.; Guo Yicheng; Koo, David C.; Trump, Jonathan R.; Barro, Guillermo; Yesuf, Hassen; Faber, S. M.; Cheung, Edmond; Giavalisco, M.; Cassata, P.; Koekemoer, A. M.; Grogin, Norman A.; Pentericci, L.; Castellano, M.; Mao, Shude; Xia, X. Y.; Hathi, Nimish P.; Huang, Kuang-Han; Kocevski, Dale; McGrath, Elizabeth J.
2013-01-01
We have made a serendipitous discovery of a massive (∼5 × 10 11 M ☉ ) cD galaxy at z = 1.096 in a candidate-rich cluster in the Hubble Ultra Deep Field (HUDF) area of GOODS-South. This brightest cluster galaxy (BCG) is the most distant cD galaxy confirmed to date. Ultra-deep HST/WFC3 images reveal an extended envelope starting from ∼10 kpc and reaching ∼70 kpc in radius along the semimajor axis. The spectral energy distributions indicate that both its inner component and outer envelope are composed of an old, passively evolving (specific star formation rate –4 Gyr –1 ) stellar population. The cD galaxy lies on the same mass-size relation as the bulk of quiescent galaxies at similar redshifts. The cD galaxy has a higher stellar mass surface density (∼M * /R 50 2 ) but a similar velocity dispersion (∼√(M * /R 50 )) to those of more massive, nearby cDs. If the cD galaxy is one of the progenitors of today's more massive cDs, its size (R 50 ) and stellar mass have had to increase on average by factors of 3.4 ± 1.1 and 3.3 ± 1.3 over the past ∼8 Gyr, respectively. Such increases in size and stellar mass without being accompanied by significant increases in velocity dispersion are consistent with evolutionary scenarios driven by both major and minor dissipationless (dry) mergers. If such cD envelopes originate from dry mergers, our discovery of even one example proves that some BCGs entered the dry merger phase at epochs earlier than z = 1. Our data match theoretical models which predict that the continuance of dry mergers at z 1 and, yet, the HUDF covers only a minuscule region of sky (∼3.1 × 10 –8 ). Adding that cDs are rare, our serendipitous discovery hints that such cDs may be more common than expected, perhaps even ubiquitous. Images reaching HUDF depths of more area (especially with cluster BCGs at z > 1) are needed to confirm this conjecture.
Introduction to massively-parallel computing in high-energy physics
AUTHOR|(CDS)2083520
1993-01-01
Ever since computers were first used for scientific and numerical work, there has existed an "arms race" between the technical development of faster computing hardware, and the desires of scientists to solve larger problems in shorter time-scales. However, the vast leaps in processor performance achieved through advances in semi-conductor science have reached a hiatus as the technology comes up against the physical limits of the speed of light and quantum effects. This has lead all high performance computer manufacturers to turn towards a parallel architecture for their new machines. In these lectures we will introduce the history and concepts behind parallel computing, and review the various parallel architectures and software environments currently available. We will then introduce programming methodologies that allow efficient exploitation of parallel machines, and present case studies of the parallelization of typical High Energy Physics codes for the two main classes of parallel computing architecture (S...
International Nuclear Information System (INIS)
Chen, C.M.; Lee, S.Y.
1995-01-01
The EM algorithm promises an estimated image with the maximal likelihood for 3D PET image reconstruction. However, due to its long computation time, the EM algorithm has not been widely used in practice. While several parallel implementations of the EM algorithm have been developed to make the EM algorithm feasible, they do not guarantee an optimal parallelization efficiency. In this paper, the authors propose a new parallel EM algorithm which maximizes the performance by optimizing data replication on a mesh-connected message-passing multiprocessor. To optimize data replication, the authors have formally derived the optimal allocation of shared data, group sizes, integration and broadcasting of replicated data as well as the scheduling of shared data accesses. The proposed parallel EM algorithm has been implemented on an iPSC/860 with 16 PEs. The experimental and theoretical results, which are consistent with each other, have shown that the proposed parallel EM algorithm could improve performance substantially over those using unoptimized data replication
Siretskiy, Alexey; Sundqvist, Tore; Voznesenskiy, Mikhail; Spjuth, Ola
2015-01-01
New high-throughput technologies, such as massively parallel sequencing, have transformed the life sciences into a data-intensive field. The most common e-infrastructure for analyzing this data consists of batch systems that are based on high-performance computing resources; however, the bioinformatics software that is built on this platform does not scale well in the general case. Recently, the Hadoop platform has emerged as an interesting option to address the challenges of increasingly large datasets with distributed storage, distributed processing, built-in data locality, fault tolerance, and an appealing programming methodology. In this work we introduce metrics and report on a quantitative comparison between Hadoop and a single node of conventional high-performance computing resources for the tasks of short read mapping and variant calling. We calculate efficiency as a function of data size and observe that the Hadoop platform is more efficient for biologically relevant data sizes in terms of computing hours for both split and un-split data files. We also quantify the advantages of the data locality provided by Hadoop for NGS problems, and show that a classical architecture with network-attached storage will not scale when computing resources increase in numbers. Measurements were performed using ten datasets of different sizes, up to 100 gigabases, using the pipeline implemented in Crossbow. To make a fair comparison, we implemented an improved preprocessor for Hadoop with better performance for splittable data files. For improved usability, we implemented a graphical user interface for Crossbow in a private cloud environment using the CloudGene platform. All of the code and data in this study are freely available as open source in public repositories. From our experiments we can conclude that the improved Hadoop pipeline scales better than the same pipeline on high-performance computing resources, we also conclude that Hadoop is an economically viable
Hanafy, Sherif M.
2014-01-01
Objective: Collect 3D seismic data at Qademah Fault location to 1. 3D traveltime tomography 2. 3D surface wave migration 3. 3D phase velocity 4. Possible reflection processing Acquisition Date: 26 – 28 September 2014 Acquisition Team: Sherif, Kai, Mrinal, Bowen, Ahmed Acquisition Layout: We used 288 receiver arranged in 12 parallel lines, each line has 24 receiver. Inline offset is 5 m and crossline offset is 10 m. One shot is fired at each receiver location. We use the 40 kgm weight drop as seismic source, with 8 to 15 stacks at each shot location.
3D Massive MIMO Systems: Modeling and Performance Analysis
Nadeem, Qurrat-Ul-Ain; Kammoun, Abla; Debbah, Merouane; Alouini, Mohamed-Slim
2015-01-01
necessitates the characterization of 3D channels. We present an information-theoretic channel model for MIMO systems that supports the elevation dimension. The model is based on the principle of maximum entropy, which enables us to determine the distribution
3D Laser Imprint Using a Smoother Ray-Traced Power Deposition Method
Schmitt, Andrew J.
2017-10-01
Imprinting of laser nonuniformities in directly-driven icf targets is a challenging problem to accurately simulate with large radiation-hydro codes. One of the most challenging aspects is the proper construction of the complex and rapidly changing laser interference structure driving the imprint using the reduced laser propagation models (usually ray-tracing) found in these codes. We have upgraded the modelling capability in our massively-parallel fastrad3d code by adding a more realistic EM-wave interference structure. This interference model adds an axial laser speckle to the previous transverse-only laser structure, and can be impressed on our improved smoothed 3D raytrace package. This latter package, which connects rays to form bundles and performs power deposition calculations on the bundles, is intended to decrease ray-trace noise (which can mask or add to imprint) while using fewer rays. We apply this improved model to 3D simulations of recent imprint experiments performed on the Omega-EP laser and the Nike laser that examined the reduction of imprinting due to very thin high-Z target coatings. We report on the conditions in which this new model makes a significant impact on the development of laser imprint. Supported by US DoE/NNSA.
High fidelity thermal-hydraulic analysis using CFD and massively parallel computers
International Nuclear Information System (INIS)
Weber, D.P.; Wei, T.Y.C.; Brewster, R.A.; Rock, Daniel T.; Rizwan-uddin
2000-01-01
Thermal-hydraulic analyses play an important role in design and reload analysis of nuclear power plants. These analyses have historically relied on early generation computational fluid dynamics capabilities, originally developed in the 1960s and 1970s. Over the last twenty years, however, dramatic improvements in both computational fluid dynamics codes in the commercial sector and in computing power have taken place. These developments offer the possibility of performing large scale, high fidelity, core thermal hydraulics analysis. Such analyses will allow a determination of the conservatism employed in traditional design approaches and possibly justify the operation of nuclear power systems at higher powers without compromising safety margins. The objective of this work is to demonstrate such a large scale analysis approach using a state of the art CFD code, STAR-CD, and the computing power of massively parallel computers, provided by IBM. A high fidelity representation of a current generation PWR was analyzed with the STAR-CD CFD code and the results were compared to traditional analyses based on the VIPRE code. Current design methodology typically involves a simplified representation of the assemblies, where a single average pin is used in each assembly to determine the hot assembly from a whole core analysis. After determining this assembly, increased refinement is used in the hot assembly, and possibly some of its neighbors, to refine the analysis for purposes of calculating DNBR. This latter calculation is performed with sub-channel codes such as VIPRE. The modeling simplifications that are used involve the approximate treatment of surrounding assemblies and coarse representation of the hot assembly, where the subchannel is the lowest level of discretization. In the high fidelity analysis performed in this study, both restrictions have been removed. Within the hot assembly, several hundred thousand to several million computational zones have been used, to
Parallel computing by Monte Carlo codes MVP/GMVP
International Nuclear Information System (INIS)
Nagaya, Yasunobu; Nakagawa, Masayuki; Mori, Takamasa
2001-01-01
General-purpose Monte Carlo codes MVP/GMVP are well-vectorized and thus enable us to perform high-speed Monte Carlo calculations. In order to achieve more speedups, we parallelized the codes on the different types of parallel computing platforms or by using a standard parallelization library MPI. The platforms used for benchmark calculations are a distributed-memory vector-parallel computer Fujitsu VPP500, a distributed-memory massively parallel computer Intel paragon and a distributed-memory scalar-parallel computer Hitachi SR2201, IBM SP2. As mentioned generally, linear speedup could be obtained for large-scale problems but parallelization efficiency decreased as the batch size per a processing element(PE) was smaller. It was also found that the statistical uncertainty for assembly powers was less than 0.1% by the PWR full-core calculation with more than 10 million histories and it took about 1.5 hours by massively parallel computing. (author)
Rubus: A compiler for seamless and extensible parallelism
Adnan, Muhammad; Aslam, Faisal; Sarwar, Syed Mansoor
2017-01-01
Nowadays, a typical processor may have multiple processing cores on a single chip. Furthermore, a special purpose processing unit called Graphic Processing Unit (GPU), originally designed for 2D/3D games, is now available for general purpose use in computers and mobile devices. However, the traditional programming languages which were designed to work with machines having single core CPUs, cannot utilize the parallelism available on multi-core processors efficiently. Therefore, to exploit the extraordinary processing power of multi-core processors, researchers are working on new tools and techniques to facilitate parallel programming. To this end, languages like CUDA and OpenCL have been introduced, which can be used to write code with parallelism. The main shortcoming of these languages is that programmer needs to specify all the complex details manually in order to parallelize the code across multiple cores. Therefore, the code written in these languages is difficult to understand, debug and maintain. Furthermore, to parallelize legacy code can require rewriting a significant portion of code in CUDA or OpenCL, which can consume significant time and resources. Thus, the amount of parallelism achieved is proportional to the skills of the programmer and the time spent in code optimizations. This paper proposes a new open source compiler, Rubus, to achieve seamless parallelism. The Rubus compiler relieves the programmer from manually specifying the low-level details. It analyses and transforms a sequential program into a parallel program automatically, without any user intervention. This achieves massive speedup and better utilization of the underlying hardware without a programmer’s expertise in parallel programming. For five different benchmarks, on average a speedup of 34.54 times has been achieved by Rubus as compared to Java on a basic GPU having only 96 cores. Whereas, for a matrix multiplication benchmark the average execution speedup of 84 times has been
Rubus: A compiler for seamless and extensible parallelism.
Directory of Open Access Journals (Sweden)
Muhammad Adnan
Full Text Available Nowadays, a typical processor may have multiple processing cores on a single chip. Furthermore, a special purpose processing unit called Graphic Processing Unit (GPU, originally designed for 2D/3D games, is now available for general purpose use in computers and mobile devices. However, the traditional programming languages which were designed to work with machines having single core CPUs, cannot utilize the parallelism available on multi-core processors efficiently. Therefore, to exploit the extraordinary processing power of multi-core processors, researchers are working on new tools and techniques to facilitate parallel programming. To this end, languages like CUDA and OpenCL have been introduced, which can be used to write code with parallelism. The main shortcoming of these languages is that programmer needs to specify all the complex details manually in order to parallelize the code across multiple cores. Therefore, the code written in these languages is difficult to understand, debug and maintain. Furthermore, to parallelize legacy code can require rewriting a significant portion of code in CUDA or OpenCL, which can consume significant time and resources. Thus, the amount of parallelism achieved is proportional to the skills of the programmer and the time spent in code optimizations. This paper proposes a new open source compiler, Rubus, to achieve seamless parallelism. The Rubus compiler relieves the programmer from manually specifying the low-level details. It analyses and transforms a sequential program into a parallel program automatically, without any user intervention. This achieves massive speedup and better utilization of the underlying hardware without a programmer's expertise in parallel programming. For five different benchmarks, on average a speedup of 34.54 times has been achieved by Rubus as compared to Java on a basic GPU having only 96 cores. Whereas, for a matrix multiplication benchmark the average execution speedup of 84
Massively parallel computing and the search for jets and black holes at the LHC
Energy Technology Data Exchange (ETDEWEB)
Halyo, V., E-mail: vhalyo@gmail.com; LeGresley, P.; Lujan, P.
2014-04-21
Massively parallel computing at the LHC could be the next leap necessary to reach an era of new discoveries at the LHC after the Higgs discovery. Scientific computing is a critical component of the LHC experiment, including operation, trigger, LHC computing GRID, simulation, and analysis. One way to improve the physics reach of the LHC is to take advantage of the flexibility of the trigger system by integrating coprocessors based on Graphics Processing Units (GPUs) or the Many Integrated Core (MIC) architecture into its server farm. This cutting edge technology provides not only the means to accelerate existing algorithms, but also the opportunity to develop new algorithms that select events in the trigger that previously would have evaded detection. In this paper we describe new algorithms that would allow us to select in the trigger new topological signatures that include non-prompt jet and black hole-like objects in the silicon tracker.
A task parallel implementation of fast multipole methods
Taura, Kenjiro; Nakashima, Jun; Yokota, Rio; Maruyama, Naoya
2012-01-01
This paper describes a task parallel implementation of ExaFMM, an open source implementation of fast multipole methods (FMM), using a lightweight task parallel library MassiveThreads. Although there have been many attempts on parallelizing FMM
On massive gravitons in 2+1 dimensions
Bergshoeff, Eric; Hohm, Olaf; Townsend, Paul; Lazkoz, R; Vera, R
2010-01-01
The Fierz-Pauli (FP) free field theory for massive spin-2 particles can be extended, in a spacetime of (1+2) dimensions (3D), to a generally covariant parity-preserving interacting field theory, in at least two ways. One is "new massive gravity" (NMG), with an action that involves curvature-squared
Engineering-Based Thermal CFD Simulations on Massive Parallel Systems
Frisch, Jé rô me; Mundani, Ralf-Peter; Rank, Ernst; van Treeck, Christoph
2015-01-01
The development of parallel Computational Fluid Dynamics (CFD) codes is a challenging task that entails efficient parallelization concepts and strategies in order to achieve good scalability values when running those codes on modern supercomputers
Reentrant phase transitions of higher-dimensional AdS black holes in dRGT massive gravity
International Nuclear Information System (INIS)
Zou, De-Cheng; Yue, Ruihong; Zhang, Ming
2017-01-01
We study the P-V criticality and phase transition in the extended phase space of anti-de Sitter (AdS) black holes in higher-dimensional de Rham, Gabadadze and Tolley (dRGT) massive gravity, treating the cosmological constant as pressure and the corresponding conjugate quantity is interpreted as thermodynamic volume. Besides the usual small/large black hole phase transitions, the interesting thermodynamic phenomena of reentrant phase transitions (RPTs) are observed for black holes in all d ≥ 6-dimensional spacetime when the coupling coefficients c_im"2 of massive potential satisfy some certain conditions. (orig.)
The spectrum of massive excitations of 3d 3-state Potts model and universality
International Nuclear Information System (INIS)
Falcone, R.; Fiore, R.; Gravina, M.; Papa, A.
2007-01-01
We consider the mass spectrum of the 3d 3-state Potts model in the broken phase (a) near the second order Ising critical point in the temperature-magnetic field plane and (b) near the weakly first order transition point at zero magnetic field. In the case (a), we compare the mass spectrum with the prediction from universality of mass ratios in the 3d Ising class; in the case (b), we determine a mass ratio to be compared with the corresponding one in the spectrum of screening masses of the (3+1)d SU(3) pure gauge theory at finite temperature in the deconfined phase near the transition. The agreement in the comparison in the case (a) would represent a non-trivial test of validity of the conjecture of spectrum universality. A positive answer to the comparison in the case (b) would suggest the possibility to extend this conjecture to weakly first order phase transitions
Kim, M.-H.; Cho, J. H.; Park, S.-J.; Eden, J. G.
2017-08-01
Plasmachemical systems based on the production of a specific molecule (O3) in literally thousands of microchannel plasmas simultaneously have been demonstrated, developed and engineered over the past seven years, and commercialized. At the heart of this new plasma technology is the plasma chip, a flat aluminum strip fabricated by photolithographic and wet chemical processes and comprising 24-48 channels, micromachined into nanoporous aluminum oxide, with embedded electrodes. By integrating 4-6 chips into a module, the mass output of an ozone microplasma system is scaled linearly with the number of modules operating in parallel. A 115 g/hr (2.7 kg/day) ozone system, for example, is realized by the combined output of 18 modules comprising 72 chips and 1,800 microchannels. The implications of this plasma processing architecture for scaling ozone production capability, and reducing capital and service costs when introducing redundancy into the system, are profound. In contrast to conventional ozone generator technology, microplasma systems operate reliably (albeit with reduced output) in ambient air and humidity levels up to 90%, a characteristic attributable to the water adsorption/desorption properties and electrical breakdown strength of nanoporous alumina. Extensive testing has documented chip and system lifetimes (MTBF) beyond 5,000 hours, and efficiencies >130 g/kWh when oxygen is the feedstock gas. Furthermore, the weight and volume of microplasma systems are a factor of 3-10 lower than those for conventional ozone systems of comparable output. Massively-parallel plasmachemical processing offers functionality, performance, and commercial value beyond that afforded by conventional technology, and is currently in operation in more than 30 countries worldwide.
Simulation of current generation in a 3-D plasma model
International Nuclear Information System (INIS)
Tsung, F.S.; Dawson, J.M.
1996-01-01
Two wires carrying current in the same direction will attract each other, and two wires carrying current in the opposite direction will repel each other. Now, consider a test charge in a plasma. If the test charge carries current parallel to the plasma, then it will be pulled toward the plasma core, and if the test charge carries current anti-parallel to the plasma, then it will be pushed to the edge. The electromagnetic coupling between the plasma and a test charge (i.e., the A parallel circ v parallel term in the test charge's Hamiltonian) breaks the symmetry in the parallel direction, and gives rise to a diffusion coefficient which is dependent on the particle's parallel velocity. This is the basis for the open-quotes preferential lossclose quotes mechanism described in the work by Nunan et al. In our previous 2+1/2 D work, in both cylindrical and toroidal geometries, showed that if the plasma column is centrally fueled, then an initial current increases steadily. The results in straight, cylindrical plasmas showed that self generated parallel current arises without trapped particle or neoclassical diffusion, as assumed by the bootstrap theory. It suggests that the fundamental mechanism seems to be the conservation of particles canonical momenta in the direction of the ignorable coordinate. We have extended the simulation to 3D to verify the model put forth. A scalable 3D EM-PIC code, with a localized field-solver, has been implemented to run on a large class of parallel computers. On the 512-node SP2 at Cornell Theory Center, we have benchmarked the 2+1/2 D calculations using 32 grids in the previously ignored direction, and a 100-fold increase in the number of particles. Our preliminary results show good agreements between the 2+1/2 D and the 3D calculations. We will present our 3D results at the meeting
Uncertainty Analysis of RELAP5-3D
Energy Technology Data Exchange (ETDEWEB)
Alexandra E Gertman; Dr. George L Mesina
2012-07-01
As world-wide energy consumption continues to increase, so does the demand for the use of alternative energy sources, such as Nuclear Energy. Nuclear Power Plants currently supply over 370 gigawatts of electricity, and more than 60 new nuclear reactors have been commissioned by 15 different countries. The primary concern for Nuclear Power Plant operation and lisencing has been safety. The safety of the operation of Nuclear Power Plants is no simple matter- it involves the training of operators, design of the reactor, as well as equipment and design upgrades throughout the lifetime of the reactor, etc. To safely design, operate, and understand nuclear power plants, industry and government alike have relied upon the use of best-estimate simulation codes, which allow for an accurate model of any given plant to be created with well-defined margins of safety. The most widely used of these best-estimate simulation codes in the Nuclear Power industry is RELAP5-3D. Our project focused on improving the modeling capabilities of RELAP5-3D by developing uncertainty estimates for its calculations. This work involved analyzing high, medium, and low ranked phenomena from an INL PIRT on a small break Loss-Of-Coolant Accident as wall as an analysis of a large break Loss-Of- Coolant Accident. Statistical analyses were performed using correlation coefficients. To perform the studies, computer programs were written that modify a template RELAP5 input deck to produce one deck for each combination of key input parameters. Python scripting enabled the running of the generated input files with RELAP5-3D on INL’s massively parallel cluster system. Data from the studies was collected and analyzed with SAS. A summary of the results of our studies are presented.
Mn-silicide nanostructures aligned on massively parallel silicon nano-ribbons
International Nuclear Information System (INIS)
De Padova, Paola; Ottaviani, Carlo; Ronci, Fabio; Colonna, Stefano; Quaresima, Claudio; Cricenti, Antonio; Olivieri, Bruno; Dávila, Maria E; Hennies, Franz; Pietzsch, Annette; Shariati, Nina; Le Lay, Guy
2013-01-01
The growth of Mn nanostructures on a 1D grating of silicon nano-ribbons is investigated at atomic scale by means of scanning tunneling microscopy, low energy electron diffraction and core level photoelectron spectroscopy. The grating of silicon nano-ribbons represents an atomic scale template that can be used in a surface-driven route to control the combination of Si with Mn in the development of novel materials for spintronics devices. The Mn atoms show a preferential adsorption site on silicon atoms, forming one-dimensional nanostructures. They are parallel oriented with respect to the surface Si array, which probably predetermines the diffusion pathways of the Mn atoms during the process of nanostructure formation.
Dalle applicazioni professionali al 3D per tutti
Directory of Open Access Journals (Sweden)
Domenico Santarsiero
2008-03-01
Full Text Available From professional applications to 3D for everyone The applications of 3D technology are constantly maturing and so are the solutions designed to satisfy the needs of various professionals. The development of extremely user oriented solutions, which encouraged the use of 3D technology by a wide range of users, have evolved alongside a market in which final user skills are fundamental. The origins of this trend are particularly evident in the massive diffusion of geospatial web applications.
Load-balancing techniques for a parallel electromagnetic particle-in-cell code
Energy Technology Data Exchange (ETDEWEB)
PLIMPTON,STEVEN J.; SEIDEL,DAVID B.; PASIK,MICHAEL F.; COATS,REBECCA S.
2000-01-01
QUICKSILVER is a 3-d electromagnetic particle-in-cell simulation code developed and used at Sandia to model relativistic charged particle transport. It models the time-response of electromagnetic fields and low-density-plasmas in a self-consistent manner: the fields push the plasma particles and the plasma current modifies the fields. Through an LDRD project a new parallel version of QUICKSILVER was created to enable large-scale plasma simulations to be run on massively-parallel distributed-memory supercomputers with thousands of processors, such as the Intel Tflops and DEC CPlant machines at Sandia. The new parallel code implements nearly all the features of the original serial QUICKSILVER and can be run on any platform which supports the message-passing interface (MPI) standard as well as on single-processor workstations. This report describes basic strategies useful for parallelizing and load-balancing particle-in-cell codes, outlines the parallel algorithms used in this implementation, and provides a summary of the modifications made to QUICKSILVER. It also highlights a series of benchmark simulations which have been run with the new code that illustrate its performance and parallel efficiency. These calculations have up to a billion grid cells and particles and were run on thousands of processors. This report also serves as a user manual for people wishing to run parallel QUICKSILVER.
Load-balancing techniques for a parallel electromagnetic particle-in-cell code
International Nuclear Information System (INIS)
Plimpton, Steven J.; Seidel, David B.; Pasik, Michael F.; Coats, Rebecca S.
2000-01-01
QUICKSILVER is a 3-d electromagnetic particle-in-cell simulation code developed and used at Sandia to model relativistic charged particle transport. It models the time-response of electromagnetic fields and low-density-plasmas in a self-consistent manner: the fields push the plasma particles and the plasma current modifies the fields. Through an LDRD project a new parallel version of QUICKSILVER was created to enable large-scale plasma simulations to be run on massively-parallel distributed-memory supercomputers with thousands of processors, such as the Intel Tflops and DEC CPlant machines at Sandia. The new parallel code implements nearly all the features of the original serial QUICKSILVER and can be run on any platform which supports the message-passing interface (MPI) standard as well as on single-processor workstations. This report describes basic strategies useful for parallelizing and load-balancing particle-in-cell codes, outlines the parallel algorithms used in this implementation, and provides a summary of the modifications made to QUICKSILVER. It also highlights a series of benchmark simulations which have been run with the new code that illustrate its performance and parallel efficiency. These calculations have up to a billion grid cells and particles and were run on thousands of processors. This report also serves as a user manual for people wishing to run parallel QUICKSILVER
GPU: the biggest key processor for AI and parallel processing
Baji, Toru
2017-07-01
Two types of processors exist in the market. One is the conventional CPU and the other is Graphic Processor Unit (GPU). Typical CPU is composed of 1 to 8 cores while GPU has thousands of cores. CPU is good for sequential processing, while GPU is good to accelerate software with heavy parallel executions. GPU was initially dedicated for 3D graphics. However from 2006, when GPU started to apply general-purpose cores, it was noticed that this architecture can be used as a general purpose massive-parallel processor. NVIDIA developed a software framework Compute Unified Device Architecture (CUDA) that make it possible to easily program the GPU for these application. With CUDA, GPU started to be used in workstations and supercomputers widely. Recently two key technologies are highlighted in the industry. The Artificial Intelligence (AI) and Autonomous Driving Cars. AI requires a massive parallel operation to train many-layers of neural networks. With CPU alone, it was impossible to finish the training in a practical time. The latest multi-GPU system with P100 makes it possible to finish the training in a few hours. For the autonomous driving cars, TOPS class of performance is required to implement perception, localization, path planning processing and again SoC with integrated GPU will play a key role there. In this paper, the evolution of the GPU which is one of the biggest commercial devices requiring state-of-the-art fabrication technology will be introduced. Also overview of the GPU demanding key application like the ones described above will be introduced.
Multirate-based fast parallel algorithms for 2-D DHT-based real-valued discrete Gabor transform.
Tao, Liang; Kwan, Hon Keung
2012-07-01
Novel algorithms for the multirate and fast parallel implementation of the 2-D discrete Hartley transform (DHT)-based real-valued discrete Gabor transform (RDGT) and its inverse transform are presented in this paper. A 2-D multirate-based analysis convolver bank is designed for the 2-D RDGT, and a 2-D multirate-based synthesis convolver bank is designed for the 2-D inverse RDGT. The parallel channels in each of the two convolver banks have a unified structure and can apply the 2-D fast DHT algorithm to speed up their computations. The computational complexity of each parallel channel is low and is independent of the Gabor oversampling rate. All the 2-D RDGT coefficients of an image are computed in parallel during the analysis process and can be reconstructed in parallel during the synthesis process. The computational complexity and time of the proposed parallel algorithms are analyzed and compared with those of the existing fastest algorithms for 2-D discrete Gabor transforms. The results indicate that the proposed algorithms are the fastest, which make them attractive for real-time image processing.
Energy Technology Data Exchange (ETDEWEB)
Verma, Prakash; Morales, Jorge A., E-mail: jorge.morales@ttu.edu [Department of Chemistry and Biochemistry, Texas Tech University, P.O. Box 41061, Lubbock, Texas 79409-1061 (United States); Perera, Ajith [Department of Chemistry and Biochemistry, Texas Tech University, P.O. Box 41061, Lubbock, Texas 79409-1061 (United States); Department of Chemistry, Quantum Theory Project, University of Florida, Gainesville, Florida 32611 (United States)
2013-11-07
Coupled cluster (CC) methods provide highly accurate predictions of molecular properties, but their high computational cost has precluded their routine application to large systems. Fortunately, recent computational developments in the ACES III program by the Bartlett group [the OED/ERD atomic integral package, the super instruction processor, and the super instruction architecture language] permit overcoming that limitation by providing a framework for massively parallel CC implementations. In that scheme, we are further extending those parallel CC efforts to systematically predict the three main electron spin resonance (ESR) tensors (A-, g-, and D-tensors) to be reported in a series of papers. In this paper inaugurating that series, we report our new ACES III parallel capabilities that calculate isotropic hyperfine coupling constants in 38 neutral, cationic, and anionic radicals that include the {sup 11}B, {sup 17}O, {sup 9}Be, {sup 19}F, {sup 1}H, {sup 13}C, {sup 35}Cl, {sup 33}S,{sup 14}N, {sup 31}P, and {sup 67}Zn nuclei. Present parallel calculations are conducted at the Hartree-Fock (HF), second-order many-body perturbation theory [MBPT(2)], CC singles and doubles (CCSD), and CCSD with perturbative triples [CCSD(T)] levels using Roos augmented double- and triple-zeta atomic natural orbitals basis sets. HF results consistently overestimate isotropic hyperfine coupling constants. However, inclusion of electron correlation effects in the simplest way via MBPT(2) provides significant improvements in the predictions, but not without occasional failures. In contrast, CCSD results are consistently in very good agreement with experimental results. Inclusion of perturbative triples to CCSD via CCSD(T) leads to small improvements in the predictions, which might not compensate for the extra computational effort at a non-iterative N{sup 7}-scaling in CCSD(T). The importance of these accurate computations of isotropic hyperfine coupling constants to elucidate
Cardenas, Erick; Wu, Wei-Min; Leigh, Mary Beth; Carley, Jack; Carroll, Sue; Gentry, Terry; Luo, Jian; Watson, David; Gu, Baohua; Ginder-Vogel, Matthew; Kitanidis, Peter K.; Jardine, Philip M.; Zhou, Jizhong; Criddle, Craig S.; Marsh, Terence L.
2010-01-01
Massively parallel sequencing has provided a more affordable and high-throughput method to study microbial communities, although it has mostly been used in an exploratory fashion. We combined pyrosequencing with a strict indicator species statistical analysis to test if bacteria specifically responded to ethanol injection that successfully promoted dissimilatory uranium(VI) reduction in the subsurface of a uranium contamination plume at the Oak Ridge Field Research Center in Tennessee. Remedi...
NIF Ignition Target 3D Point Design
Energy Technology Data Exchange (ETDEWEB)
Jones, O; Marinak, M; Milovich, J; Callahan, D
2008-11-05
We have developed an input file for running 3D NIF hohlraums that is optimized such that it can be run in 1-2 days on parallel computers. We have incorporated increasing levels of automation into the 3D input file: (1) Configuration controlled input files; (2) Common file for 2D and 3D, different types of capsules (symcap, etc.); and (3) Can obtain target dimensions, laser pulse, and diagnostics settings automatically from NIF Campaign Management Tool. Using 3D Hydra calculations to investigate different problems: (1) Intrinsic 3D asymmetry; (2) Tolerance to nonideal 3D effects (e.g. laser power balance, pointing errors); and (3) Synthetic diagnostics.
Massively parallel diffuse optical tomography
Energy Technology Data Exchange (ETDEWEB)
Sandusky, John V.; Pitts, Todd A.
2017-09-05
Diffuse optical tomography systems and methods are described herein. In a general embodiment, the diffuse optical tomography system comprises a plurality of sensor heads, the plurality of sensor heads comprising respective optical emitter systems and respective sensor systems. A sensor head in the plurality of sensors heads is caused to act as an illuminator, such that its optical emitter system transmits a transillumination beam towards a portion of a sample. Other sensor heads in the plurality of sensor heads act as observers, detecting portions of the transillumination beam that radiate from the sample in the fields of view of the respective sensory systems of the other sensor heads. Thus, sensor heads in the plurality of sensors heads generate sensor data in parallel.
The new Exponential Directional Iterative (EDI) 3-D Sn scheme for parallel adaptive differencing
International Nuclear Information System (INIS)
Sjoden, G.E.
2005-01-01
The new Exponential Directional Iterative (EDI) discrete ordinates (Sn) scheme for 3-D Cartesian Coordinates is presented. The EDI scheme is a logical extension of the positive, efficient Exponential Directional Weighted (EDW) Sn scheme currently used as the third level of the adaptive spatial differencing algorithm in the PENTRAN parallel discrete ordinates solver. Here, the derivation and advantages of the EDI scheme are presented; EDI uses EDW-rendered exponential coefficients as initial starting values to begin a fixed point iteration of the exponential coefficients. One issue that required evaluation was an iterative cutoff criterion to prevent the application of an unstable fixed point iteration; although this was needed in some cases, it was readily treated with a default to EDW. Iterative refinement of the exponential coefficients in EDI typically converged in fewer than four fixed point iterations. Moreover, EDI yielded more accurate angular fluxes compared to the other schemes tested, particularly in streaming conditions. Overall, it was found that the EDI scheme was up to an order of magnitude more accurate than the EDW scheme on a given mesh interval in streaming cases, and is potentially a good candidate as a fourth-level differencing scheme in the PENTRAN adaptive differencing sequence. The 3-D Cartesian computational cost of EDI was only about 20% more than the EDW scheme, and about 40% more than Diamond Zero (DZ). More evaluation and testing are required to determine suitable upgrade metrics for EDI to be fully integrated into the current adaptive spatial differencing sequence in PENTRAN. (author)
Reentrant phase transitions of higher-dimensional AdS black holes in dRGT massive gravity
Energy Technology Data Exchange (ETDEWEB)
Zou, De-Cheng; Yue, Ruihong [Yangzhou University, College of Physical Science and Technology, Yangzhou (China); Zhang, Ming [Xi' an Aeronautical University, Faculty of Science, Xi' an (China)
2017-04-15
We study the P-V criticality and phase transition in the extended phase space of anti-de Sitter (AdS) black holes in higher-dimensional de Rham, Gabadadze and Tolley (dRGT) massive gravity, treating the cosmological constant as pressure and the corresponding conjugate quantity is interpreted as thermodynamic volume. Besides the usual small/large black hole phase transitions, the interesting thermodynamic phenomena of reentrant phase transitions (RPTs) are observed for black holes in all d ≥ 6-dimensional spacetime when the coupling coefficients c{sub i}m{sup 2} of massive potential satisfy some certain conditions. (orig.)
FANS-3D Users Guide (ESTEP Project ER 201031)
2016-08-01
TECHNICAL DOCUMENT 3293 August 2016 FANS -3D User’s Guide (ESTEP Project ER-201031) Pei-Fang Wang SSC Pacific Hamn-Ching...1.1 THEORY AND NUMERICAL ALGORITHM OF FANS CODE ............................................. 1 2. FANS -3D SOFTWARE DOCUMENTATION AND EXECUTION...5 3. FANS -3D CODE PARALLELIZATION
Lee, Li-Yu; Lin, Gigin; Chen, Shu-Jen; Lu, Yen-Jung; Huang, Huei-Jean; Yen, Chi-Feng; Han, Chien Min; Lee, Yun-Shien; Wang, Tzu-Hao; Chao, Angel
2017-01-01
Benign metastasizing leiomyoma (BML) is a rare disease entity typically presenting as multiple extrauterine leiomyomas associated with a uterine leiomyoma. It has been hypothesized that the extrauterine leiomyomata represent distant metastasis of the uterine leiomyoma. To date, the only molecular evidence supporting this hypothesis was derived from clonality analyses based on X-chromosome inactivation assays. Here, we sought to address this issue by examining paired specimens of synchronous pulmonary and uterine leiomyomata from three patients using targeted massively parallel sequencing and molecular inversion probe array analysis for detecting somatic mutations and copy number aberrations. We detected identical non-hot-spot somatic mutations and similar patterns of copy number aberrations (CNAs) in paired pulmonary and uterine leiomyomata from two patients, indicating the clonal relationship between pulmonary and uterine leiomyomata. In addition to loss of chromosome 22q found in the literature, we identified additional recurrent CNAs including losses of chromosome 3q and 11q. In conclusion, our findings of the clonal relationship between synchronous pulmonary and uterine leiomyomas support the hypothesis that BML represents a condition wherein a uterine leiomyoma disseminates to distant extrauterine locations. PMID:28533481
Wu, Ren-Chin; Chao, An-Shine; Lee, Li-Yu; Lin, Gigin; Chen, Shu-Jen; Lu, Yen-Jung; Huang, Huei-Jean; Yen, Chi-Feng; Han, Chien Min; Lee, Yun-Shien; Wang, Tzu-Hao; Chao, Angel
2017-07-18
Benign metastasizing leiomyoma (BML) is a rare disease entity typically presenting as multiple extrauterine leiomyomas associated with a uterine leiomyoma. It has been hypothesized that the extrauterine leiomyomata represent distant metastasis of the uterine leiomyoma. To date, the only molecular evidence supporting this hypothesis was derived from clonality analyses based on X-chromosome inactivation assays. Here, we sought to address this issue by examining paired specimens of synchronous pulmonary and uterine leiomyomata from three patients using targeted massively parallel sequencing and molecular inversion probe array analysis for detecting somatic mutations and copy number aberrations. We detected identical non-hot-spot somatic mutations and similar patterns of copy number aberrations (CNAs) in paired pulmonary and uterine leiomyomata from two patients, indicating the clonal relationship between pulmonary and uterine leiomyomata. In addition to loss of chromosome 22q found in the literature, we identified additional recurrent CNAs including losses of chromosome 3q and 11q. In conclusion, our findings of the clonal relationship between synchronous pulmonary and uterine leiomyomas support the hypothesis that BML represents a condition wherein a uterine leiomyoma disseminates to distant extrauterine locations.
Programmable level-1 trigger with 3D-Flow processor array
International Nuclear Information System (INIS)
Crosetto, D.
1994-01-01
The 3D-Flow parallel processing system is a new concept in processor architecture, system architecture, and assembly architecture. Compared to the electronics used in present systems, this approach reduces the cost and complexity of the hardware and allows easy assembly, disassembly, incremental upgrading, and maintenance of different interconnection topologies. The 3D-Flow parallel-processing system benefits high energy physics (HEP) by allowing: (1) common less costly hardware to be used in different experiments. (2) new uses of existing installations. (3) tuning of trigger based on the first analyzed data, and (4) selection of desired events directly from raw data. The goal of this parallel-processing architecture is to acquire multiple data in parallel (up to 100 million frames per second) and to process them at high speed, accomplishing digital filtering on the input data, pattern recognition (particle identification), data moving, and data formatting. The main features of the system are its programmability, scalability, high-speed communication, and low cost. The compactness of the 3D-Flow parallel-processing system in concert with the processor architecture allows processor interconnections to be mapped into the geometry of sensors (detectors in HEP) without large interconnection signal delay, enabling real-time pattern recognition. The overall 3D-Flow project has passed a major design review at Fermilab (Reviewers included experts in computers, triggering, system assembly, and electronics)
DEFF Research Database (Denmark)
Jakobsen, Jette; Maribo, H.; Bysted, Anette
2007-01-01
In food databases, the specific contents of vitamin D-3 and 25-hydroxyvitamin D-3 in food have been implemented in the last 10 years. No consensus has yet been established on the relative activity between the components. Therefore, the objective of the present study was to assess the relative...... activity of 25-hydroxyvitamin D-3 compared to vitamin D-3. The design was a parallel study in pigs (n 24), which from an age of 12 weeks until slaughter 11 weeks later were fed approximately 55 mu g vitamin D/d, as vitamin D-3, in a mixture of vitamin D-3 and 25-hydroxyvitamin D-3, or 25-hydroxyvitamin D-3....... The end-points measured were plasma 25-hydroxyvitamin D-3, and in the liver and loin the content of vitamin D-3 and 25-hydroxyvitamin D-3 Vitamin D-3 and 25-hydroxyvitamin D3 in the feed did not affect 25-hydroxyvitamin D-3 in the plasma, liver or loin differently, while a significant effect was shown...
3D neutron transport modelization
International Nuclear Information System (INIS)
Warin, X.
1996-12-01
Some nodal methods to solve the transport equation in 3D are presented. Two nodal methods presented at an OCDE congress are described: a first one is a low degree one called RTN0; a second one is a high degree one called BDM1. The two methods can be made faster with a totally consistent DSA. Some results of parallelization show that: 98% of the time is spent in sweeps; transport sweeps are easily parallelized. (K.A.)
Directory of Open Access Journals (Sweden)
Sven Duda
2014-01-01
Full Text Available We report on the performance of composite nerve grafts with an inner 3D multichannel porous chitosan core and an outer electrospun polycaprolactone shell. The inner chitosan core provided multiple guidance channels for regrowing axons. To analyze the in vivo properties of the bare chitosan cores, we separately implanted them into an epineural sheath. The effects of both graft types on structural and functional regeneration across a 10 mm rat sciatic nerve gap were compared to autologous nerve transplantation (ANT. The mechanical biomaterial properties and the immunological impact of the grafts were assessed with histological techniques before and after transplantation in vivo. Furthermore during a 13-week examination period functional tests and electrophysiological recordings were performed and supplemented by nerve morphometry. The sheathing of the chitosan core with a polycaprolactone shell induced massive foreign body reaction and impairment of nerve regeneration. Although the isolated novel chitosan core did allow regeneration of axons in a similar size distribution as the ANT, the ANT was superior in terms of functional regeneration. We conclude that an outer polycaprolactone shell should not be used for the purpose of bioartificial nerve grafting, while 3D multichannel porous chitosan cores could be candidate scaffolds for structured nerve grafts.
Duda, Sven; Dreyer, Lutz; Behrens, Peter; Wienecke, Soenke; Chakradeo, Tanmay; Glasmacher, Birgit; Haastert-Talini, Kirsten
2014-01-01
We report on the performance of composite nerve grafts with an inner 3D multichannel porous chitosan core and an outer electrospun polycaprolactone shell. The inner chitosan core provided multiple guidance channels for regrowing axons. To analyze the in vivo properties of the bare chitosan cores, we separately implanted them into an epineural sheath. The effects of both graft types on structural and functional regeneration across a 10 mm rat sciatic nerve gap were compared to autologous nerve transplantation (ANT). The mechanical biomaterial properties and the immunological impact of the grafts were assessed with histological techniques before and after transplantation in vivo. Furthermore during a 13-week examination period functional tests and electrophysiological recordings were performed and supplemented by nerve morphometry. The sheathing of the chitosan core with a polycaprolactone shell induced massive foreign body reaction and impairment of nerve regeneration. Although the isolated novel chitosan core did allow regeneration of axons in a similar size distribution as the ANT, the ANT was superior in terms of functional regeneration. We conclude that an outer polycaprolactone shell should not be used for the purpose of bioartificial nerve grafting, while 3D multichannel porous chitosan cores could be candidate scaffolds for structured nerve grafts.
Medium generated gap in gravity and a 3D gauge theory
Gabadadze, Gregory; Older, Daniel
2018-05-01
It is well known that a physical medium that sets a Lorentz frame generates a Lorentz-breaking gap for a graviton. We examine such generated "mass" terms in the presence of a fluid medium whose ground state spontaneously breaks spatial translation invariance in d =D +1 spacetime dimensions, and for a solid in D =2 spatial dimensions. By requiring energy positivity and subluminal propagation, certain constraints are placed on the equation of state of the medium. In the case of D =2 spatial dimensions, classical gravity can be recast as a Chern-Simons gauge theory, and motivated by this we recast the massive theory of gravity in AdS3 as a massive Chern-Simons gauge theory with an unusual mass term. We find that in the flat space limit the Chern-Simons theory has a novel gauge invariance that mixes the kinetic and mass terms, and enables the massive theory with a noncompact internal group to be free of ghosts and tachyons.
Supersymmetric D3/D7 for holographic flavors on curved space
International Nuclear Information System (INIS)
Karch, Andreas; Robinson, Brandon; Uhlemann, Christoph F.
2015-01-01
We derive a new class of supersymmetric D3/D7 brane configurations, which allow to holographically describe N=4 SYM coupled to massive N=2 flavor degrees of freedom on spaces of constant curvature. We systematically solve the κ-symmetry condition for D7-brane embeddings into AdS_4-sliced AdS_5×S"5, and find supersymmetric embeddings in a simple closed form. Up to a critical mass, these embeddings come in surprisingly diverse families, and we present a first study of their (holographic) phenomenology. We carry out the holographic renormalization, compute the one-point functions and attempt a field-theoretic interpretation of the different families. To complete the catalog of supersymmetric D3/D7 configurations, we construct analogous embeddings for flavored N=4 SYM on S"4 and dS_4.
Parallel computing of physical maps--a comparative study in SIMD and MIMD parallelism.
Bhandarkar, S M; Chirravuri, S; Arnold, J
1996-01-01
Ordering clones from a genomic library into physical maps of whole chromosomes presents a central computational problem in genetics. Chromosome reconstruction via clone ordering is usually isomorphic to the NP-complete Optimal Linear Arrangement problem. Parallel SIMD and MIMD algorithms for simulated annealing based on Markov chain distribution are proposed and applied to the problem of chromosome reconstruction via clone ordering. Perturbation methods and problem-specific annealing heuristics are proposed and described. The SIMD algorithms are implemented on a 2048 processor MasPar MP-2 system which is an SIMD 2-D toroidal mesh architecture whereas the MIMD algorithms are implemented on an 8 processor Intel iPSC/860 which is an MIMD hypercube architecture. A comparative analysis of the various SIMD and MIMD algorithms is presented in which the convergence, speedup, and scalability characteristics of the various algorithms are analyzed and discussed. On a fine-grained, massively parallel SIMD architecture with a low synchronization overhead such as the MasPar MP-2, a parallel simulated annealing algorithm based on multiple periodically interacting searches performs the best. For a coarse-grained MIMD architecture with high synchronization overhead such as the Intel iPSC/860, a parallel simulated annealing algorithm based on multiple independent searches yields the best results. In either case, distribution of clonal data across multiple processors is shown to exacerbate the tendency of the parallel simulated annealing algorithm to get trapped in a local optimum.
Quasi-local conserved charges of spin-3 topologically massive gravity
Directory of Open Access Journals (Sweden)
M.R. Setare
2016-08-01
Full Text Available In this paper we obtain conserved charges of spin-3 topologically massive gravity by using a quasi-local formalism. We find a general formula to calculate conserved charge of the spin-3 topologically massive gravity which corresponds to a Killing vector field ξ. We show that this general formula reduces to the previous one for the ordinary spin-3 gravity presented in [18] when we take into account only transformation under diffeomorphism, without considering generalized Lorentz gauge transformation (i.e. λξ=0, and by taking 1μ→0. Then we obtain a general formula for the entropy of black hole solutions of the spin-3 topologically massive gravity. Finally we apply our formalism to calculate energy, angular momentum and entropy of a special black hole solution and we find that obtained results are consistent with previous results in the limiting cases. Moreover our results for energy, angular momentum and entropy are consistent with the first law of black hole mechanics.
Yang, Dikun; Oldenburg, Douglas W.; Haber, Eldad
2014-03-01
Airborne electromagnetic (AEM) methods are highly efficient tools for assessing the Earth's conductivity structures in a large area at low cost. However, the configuration of AEM measurements, which typically have widely distributed transmitter-receiver pairs, makes the rigorous modelling and interpretation extremely time-consuming in 3-D. Excessive overcomputing can occur when working on a large mesh covering the entire survey area and inverting all soundings in the data set. We propose two improvements. The first is to use a locally optimized mesh for each AEM sounding for the forward modelling and calculation of sensitivity. This dedicated local mesh is small with fine cells near the sounding location and coarse cells far away in accordance with EM diffusion and the geometric decay of the signals. Once the forward problem is solved on the local meshes, the sensitivity for the inversion on the global mesh is available through quick interpolation. Using local meshes for AEM forward modelling avoids unnecessary computing on fine cells on a global mesh that are far away from the sounding location. Since local meshes are highly independent, the forward modelling can be efficiently parallelized over an array of processors. The second improvement is random and dynamic down-sampling of the soundings. Each inversion iteration only uses a random subset of the soundings, and the subset is reselected for every iteration. The number of soundings in the random subset, determined by an adaptive algorithm, is tied to the degree of model regularization. This minimizes the overcomputing caused by working with redundant soundings. Our methods are compared against conventional methods and tested with a synthetic example. We also invert a field data set that was previously considered to be too large to be practically inverted in 3-D. These examples show that our methodology can dramatically reduce the processing time of 3-D inversion to a practical level without losing resolution
International Nuclear Information System (INIS)
Byers, J.A.; Williams, T.J.; Cohen, B.I.; Dimits, A.M.
1994-01-01
One of the programs of the Magnetic fusion Energy (MFE) Theory and computations Program is studying the anomalous transport of thermal energy across the field lines in the core of a tokamak. We use the method of gyrokinetic particle-in-cell simulation in this study. For this LDRD project we employed massively parallel processing, new algorithms, and new algorithms, and new formal techniques to improve this research. Specifically, we sought to take steps toward: researching experimentally-relevant parameters in our simulations, learning parallel computing to have as a resource for our group, and achieving a 100 x speedup over our starting-point Cray2 simulation code's performance
3D neutron transport modelization
Energy Technology Data Exchange (ETDEWEB)
Warin, X.
1996-12-01
Some nodal methods to solve the transport equation in 3D are presented. Two nodal methods presented at an OCDE congress are described: a first one is a low degree one called RTN0; a second one is a high degree one called BDM1. The two methods can be made faster with a totally consistent DSA. Some results of parallelization show that: 98% of the time is spent in sweeps; transport sweeps are easily parallelized. (K.A.). 10 refs.
Big Data GPU-Driven Parallel Processing Spatial and Spatio-Temporal Clustering Algorithms
Konstantaras, Antonios; Skounakis, Emmanouil; Kilty, James-Alexander; Frantzeskakis, Theofanis; Maravelakis, Emmanuel
2016-04-01
Advances in graphics processing units' technology towards encompassing parallel architectures [1], comprised of thousands of cores and multiples of parallel threads, provide the foundation in terms of hardware for the rapid processing of various parallel applications regarding seismic big data analysis. Seismic data are normally stored as collections of vectors in massive matrices, growing rapidly in size as wider areas are covered, denser recording networks are being established and decades of data are being compiled together [2]. Yet, many processes regarding seismic data analysis are performed on each seismic event independently or as distinct tiles [3] of specific grouped seismic events within a much larger data set. Such processes, independent of one another can be performed in parallel narrowing down processing times drastically [1,3]. This research work presents the development and implementation of three parallel processing algorithms using Cuda C [4] for the investigation of potentially distinct seismic regions [5,6] present in the vicinity of the southern Hellenic seismic arc. The algorithms, programmed and executed in parallel comparatively, are the: fuzzy k-means clustering with expert knowledge [7] in assigning overall clusters' number; density-based clustering [8]; and a selves-developed spatio-temporal clustering algorithm encompassing expert [9] and empirical knowledge [10] for the specific area under investigation. Indexing terms: GPU parallel programming, Cuda C, heterogeneous processing, distinct seismic regions, parallel clustering algorithms, spatio-temporal clustering References [1] Kirk, D. and Hwu, W.: 'Programming massively parallel processors - A hands-on approach', 2nd Edition, Morgan Kaufman Publisher, 2013 [2] Konstantaras, A., Valianatos, F., Varley, M.R. and Makris, J.P.: 'Soft-Computing Modelling of Seismicity in the Southern Hellenic Arc', Geoscience and Remote Sensing Letters, vol. 5 (3), pp. 323-327, 2008 [3] Papadakis, S. and
Fast parallel approach for 2-D DHT-based real-valued discrete Gabor transform.
Tao, Liang; Kwan, Hon Keung
2009-12-01
Two-dimensional fast Gabor transform algorithms are useful for real-time applications due to the high computational complexity of the traditional 2-D complex-valued discrete Gabor transform (CDGT). This paper presents two block time-recursive algorithms for 2-D DHT-based real-valued discrete Gabor transform (RDGT) and its inverse transform and develops a fast parallel approach for the implementation of the two algorithms. The computational complexity of the proposed parallel approach is analyzed and compared with that of the existing 2-D CDGT algorithms. The results indicate that the proposed parallel approach is attractive for real time image processing.
3D numerical simulations of multiphase continental rifting
Naliboff, J.; Glerum, A.; Brune, S.
2017-12-01
Observations of rifted margin architecture suggest continental breakup occurs through multiple phases of extension with distinct styles of deformation. The initial rifting stages are often characterized by slow extension rates and distributed normal faulting in the upper crust decoupled from deformation in the lower crust and mantle lithosphere. Further rifting marks a transition to higher extension rates and coupling between the crust and mantle lithosphere, with deformation typically focused along large-scale detachment faults. Significantly, recent detailed reconstructions and high-resolution 2D numerical simulations suggest that rather than remaining focused on a single long-lived detachment fault, deformation in this phase may progress toward lithospheric breakup through a complex process of fault interaction and development. The numerical simulations also suggest that an initial phase of distributed normal faulting can play a key role in the development of these complex fault networks and the resulting finite deformation patterns. Motivated by these findings, we will present 3D numerical simulations of continental rifting that examine the role of temporal increases in extension velocity on rifted margin structure. The numerical simulations are developed with the massively parallel finite-element code ASPECT. While originally designed to model mantle convection using advanced solvers and adaptive mesh refinement techniques, ASPECT has been extended to model visco-plastic deformation that combines a Drucker Prager yield criterion with non-linear dislocation and diffusion creep. To promote deformation localization, the internal friction angle and cohesion weaken as a function of accumulated plastic strain. Rather than prescribing a single zone of weakness to initiate deformation, an initial random perturbation of the plastic strain field combined with rapid strain weakening produces distributed normal faulting at relatively slow rates of extension in both 2D and
Farris, M Heath; Scott, Andrew R; Texter, Pamela A; Bartlett, Marta; Coleman, Patricia; Masters, David
2018-04-11
Single nucleotide polymorphisms (SNPs) located within the human genome have been shown to have utility as markers of identity in the differentiation of DNA from individual contributors. Massively parallel DNA sequencing (MPS) technologies and human genome SNP databases allow for the design of suites of identity-linked target regions, amenable to sequencing in a multiplexed and massively parallel manner. Therefore, tools are needed for leveraging the genotypic information found within SNP databases for the discovery of genomic targets that can be evaluated on MPS platforms. The SNP island target identification algorithm (TIA) was developed as a user-tunable system to leverage SNP information within databases. Using data within the 1000 Genomes Project SNP database, human genome regions were identified that contain globally ubiquitous identity-linked SNPs and that were responsive to targeted resequencing on MPS platforms. Algorithmic filters were used to exclude target regions that did not conform to user-tunable SNP island target characteristics. To validate the accuracy of TIA for discovering these identity-linked SNP islands within the human genome, SNP island target regions were amplified from 70 contributor genomic DNA samples using the polymerase chain reaction. Multiplexed amplicons were sequenced using the Illumina MiSeq platform, and the resulting sequences were analyzed for SNP variations. 166 putative identity-linked SNPs were targeted in the identified genomic regions. Of the 309 SNPs that provided discerning power across individual SNP profiles, 74 previously undefined SNPs were identified during evaluation of targets from individual genomes. Overall, DNA samples of 70 individuals were uniquely identified using a subset of the suite of identity-linked SNP islands. TIA offers a tunable genome search tool for the discovery of targeted genomic regions that are scalable in the population frequency and numbers of SNPs contained within the SNP island regions
Parallel plasma fluid turbulence calculations
International Nuclear Information System (INIS)
Leboeuf, J.N.; Carreras, B.A.; Charlton, L.A.; Drake, J.B.; Lynch, V.E.; Newman, D.E.; Sidikman, K.L.; Spong, D.A.
1994-01-01
The study of plasma turbulence and transport is a complex problem of critical importance for fusion-relevant plasmas. To this day, the fluid treatment of plasma dynamics is the best approach to realistic physics at the high resolution required for certain experimentally relevant calculations. Core and edge turbulence in a magnetic fusion device have been modeled using state-of-the-art, nonlinear, three-dimensional, initial-value fluid and gyrofluid codes. Parallel implementation of these models on diverse platforms--vector parallel (National Energy Research Supercomputer Center's CRAY Y-MP C90), massively parallel (Intel Paragon XP/S 35), and serial parallel (clusters of high-performance workstations using the Parallel Virtual Machine protocol)--offers a variety of paths to high resolution and significant improvements in real-time efficiency, each with its own advantages. The largest and most efficient calculations have been performed at the 200 Mword memory limit on the C90 in dedicated mode, where an overlap of 12 to 13 out of a maximum of 16 processors has been achieved with a gyrofluid model of core fluctuations. The richness of the physics captured by these calculations is commensurate with the increased resolution and efficiency and is limited only by the ingenuity brought to the analysis of the massive amounts of data generated
Comprehensive microRNA profiling in B-cells of human centenarians by massively parallel sequencing
Directory of Open Access Journals (Sweden)
Gombar Saurabh
2012-07-01
Full Text Available Abstract Background MicroRNAs (miRNAs are small, non-coding RNAs that regulate gene expression and play a critical role in development, homeostasis, and disease. Despite their demonstrated roles in age-associated pathologies, little is known about the role of miRNAs in human aging and longevity. Results We employed massively parallel sequencing technology to identify miRNAs expressed in B-cells from Ashkenazi Jewish centenarians, i.e., those living to a hundred and a human model of exceptional longevity, and younger controls without a family history of longevity. With data from 26.7 million reads comprising 9.4 × 108 bp from 3 centenarian and 3 control individuals, we discovered a total of 276 known miRNAs and 8 unknown miRNAs ranging several orders of magnitude in expression levels, a typical characteristics of saturated miRNA-sequencing. A total of 22 miRNAs were found to be significantly upregulated, with only 2 miRNAs downregulated, in centenarians as compared to controls. Gene Ontology analysis of the predicted and validated targets of the 24 differentially expressed miRNAs indicated enrichment of functional pathways involved in cell metabolism, cell cycle, cell signaling, and cell differentiation. A cross sectional expression analysis of the differentially expressed miRNAs in B-cells from Ashkenazi Jewish individuals between the 50th and 100th years of age indicated that expression levels of miR-363* declined significantly with age. Centenarians, however, maintained the youthful expression level. This result suggests that miR-363* may be a candidate longevity-associated miRNA. Conclusion Our comprehensive miRNA data provide a resource for further studies to identify genetic pathways associated with aging and longevity in humans.
Zwei-Dreibein Gravity : A Two-Frame-Field Model of 3D Massive Gravity
Bergshoeff, Eric A.; de Haan, Sjoerd; Hohm, Olaf; Merbis, Wout; Townsend, Paul K.
2013-01-01
We present a generally covariant and parity-invariant two-frame field ("zwei-dreibein") action for gravity in three space-time dimensions that propagates two massive spin-2 modes, unitarily, and we use Hamiltonian methods to confirm the absence of unphysical degrees of freedom. We show how
Parallel FE Electron-Photon Transport Analysis on 2-D Unstructured Mesh
International Nuclear Information System (INIS)
Drumm, C.R.; Lorenz, J.
1999-01-01
A novel solution method has been developed to solve the coupled electron-photon transport problem on an unstructured triangular mesh. Instead of tackling the first-order form of the linear Boltzmann equation, this approach is based on the second-order form in conjunction with the conventional multi-group discrete-ordinates approximation. The highly forward-peaked electron scattering is modeled with a multigroup Legendre expansion derived from the Goudsmit-Saunderson theory. The finite element method is used to treat the spatial dependence. The solution method is unique in that the space-direction dependence is solved simultaneously, eliminating the need for the conventional inner iterations, a method that is well suited for massively parallel computers
A massive parallel sequencing workflow for diagnostic genetic testing of mismatch repair genes
Hansen, Maren F; Neckmann, Ulrike; Lavik, Liss A S; Vold, Trine; Gilde, Bodil; Toft, Ragnhild K; Sjursen, Wenche
2014-01-01
The purpose of this study was to develop a massive parallel sequencing (MPS) workflow for diagnostic analysis of mismatch repair (MMR) genes using the GS Junior system (Roche). A pathogenic variant in one of four MMR genes, (MLH1, PMS2, MSH6, and MSH2), is the cause of Lynch Syndrome (LS), which mainly predispose to colorectal cancer. We used an amplicon-based sequencing method allowing specific and preferential amplification of the MMR genes including PMS2, of which several pseudogenes exist. The amplicons were pooled at different ratios to obtain coverage uniformity and maximize the throughput of a single-GS Junior run. In total, 60 previously identified and distinct variants (substitutions and indels), were sequenced by MPS and successfully detected. The heterozygote detection range was from 19% to 63% and dependent on sequence context and coverage. We were able to distinguish between false-positive and true-positive calls in homopolymeric regions by cross-sample comparison and evaluation of flow signal distributions. In addition, we filtered variants according to a predefined status, which facilitated variant annotation. Our study shows that implementation of MPS in routine diagnostics of LS can accelerate sample throughput and reduce costs without compromising sensitivity, compared to Sanger sequencing. PMID:24689082
Directory of Open Access Journals (Sweden)
Hahn Daniel A
2009-05-01
Full Text Available Abstract Background Flesh flies in the genus Sarcophaga are important models for investigating endocrinology, diapause, cold hardiness, reproduction, and immunity. Despite the prominence of Sarcophaga flesh flies as models for insect physiology and biochemistry, and in forensic studies, little genomic or transcriptomic data are available for members of this genus. We used massively parallel pyrosequencing on the Roche 454-FLX platform to produce a substantial EST dataset for the flesh fly Sarcophaga crassipalpis. To maximize sequence diversity, we pooled RNA extracted from whole bodies of all life stages and normalized the cDNA pool after reverse transcription. Results We obtained 207,110 ESTs with an average read length of 241 bp. These reads assembled into 20,995 contigs and 31,056 singletons. Using BLAST searches of the NR and NT databases we were able to identify 11,757 unique gene elements (ES. crassipalpis unigenes among GO Biological Process functional groups with that of the Drosophila melanogaster transcriptome suggests that our ESTs are broadly representative of the flesh fly transcriptome. Insertion and deletion errors in 454 sequencing present a serious hurdle to comparative transcriptome analysis. Aided by a new approach to correcting for these errors, we performed a comparative analysis of genetic divergence across GO categories among S. crassipalpis, D. melanogaster, and Anopheles gambiae. The results suggest that non-synonymous substitutions occur at similar rates across categories, although genes related to response to stimuli may evolve slightly faster. In addition, we identified over 500 potential microsatellite loci and more than 12,000 SNPs among our ESTs. Conclusion Our data provides the first large-scale EST-project for flesh flies, a much-needed resource for exploring this model species. In addition, we identified a large number of potential microsatellite and SNP markers that could be used in population and systematic
Wireless Rover Meets 3D Design and Product Development
Deal, Walter F., III; Hsiung, Steve C.
2016-01-01
Today there are a number of 3D printing technologies that are low cost and within the budgets of middle and high school programs. Educational technology companies offer a variety of 3D printing technologies and parallel curriculum materials to enable technology and engineering teachers to easily add 3D learning activities to their programs.…
3-D OBJECT RECOGNITION FROM POINT CLOUD DATA
Directory of Open Access Journals (Sweden)
W. Smith
2012-09-01
Full Text Available The market for real-time 3-D mapping includes not only traditional geospatial applications but also navigation of unmanned autonomous vehicles (UAVs. Massively parallel processes such as graphics processing unit (GPU computing make real-time 3-D object recognition and mapping achievable. Geospatial technologies such as digital photogrammetry and GIS offer advanced capabilities to produce 2-D and 3-D static maps using UAV data. The goal is to develop real-time UAV navigation through increased automation. It is challenging for a computer to identify a 3-D object such as a car, a tree or a house, yet automatic 3-D object recognition is essential to increasing the productivity of geospatial data such as 3-D city site models. In the past three decades, researchers have used radiometric properties to identify objects in digital imagery with limited success, because these properties vary considerably from image to image. Consequently, our team has developed software that recognizes certain types of 3-D objects within 3-D point clouds. Although our software is developed for modeling, simulation and visualization, it has the potential to be valuable in robotics and UAV applications. The locations and shapes of 3-D objects such as buildings and trees are easily recognizable by a human from a brief glance at a representation of a point cloud such as terrain-shaded relief. The algorithms to extract these objects have been developed and require only the point cloud and minimal human inputs such as a set of limits on building size and a request to turn on a squaring option. The algorithms use both digital surface model (DSM and digital elevation model (DEM, so software has also been developed to derive the latter from the former. The process continues through the following steps: identify and group 3-D object points into regions; separate buildings and houses from trees; trace region boundaries; regularize and simplify boundary polygons; construct complex
3-D Object Recognition from Point Cloud Data
Smith, W.; Walker, A. S.; Zhang, B.
2011-09-01
The market for real-time 3-D mapping includes not only traditional geospatial applications but also navigation of unmanned autonomous vehicles (UAVs). Massively parallel processes such as graphics processing unit (GPU) computing make real-time 3-D object recognition and mapping achievable. Geospatial technologies such as digital photogrammetry and GIS offer advanced capabilities to produce 2-D and 3-D static maps using UAV data. The goal is to develop real-time UAV navigation through increased automation. It is challenging for a computer to identify a 3-D object such as a car, a tree or a house, yet automatic 3-D object recognition is essential to increasing the productivity of geospatial data such as 3-D city site models. In the past three decades, researchers have used radiometric properties to identify objects in digital imagery with limited success, because these properties vary considerably from image to image. Consequently, our team has developed software that recognizes certain types of 3-D objects within 3-D point clouds. Although our software is developed for modeling, simulation and visualization, it has the potential to be valuable in robotics and UAV applications. The locations and shapes of 3-D objects such as buildings and trees are easily recognizable by a human from a brief glance at a representation of a point cloud such as terrain-shaded relief. The algorithms to extract these objects have been developed and require only the point cloud and minimal human inputs such as a set of limits on building size and a request to turn on a squaring option. The algorithms use both digital surface model (DSM) and digital elevation model (DEM), so software has also been developed to derive the latter from the former. The process continues through the following steps: identify and group 3-D object points into regions; separate buildings and houses from trees; trace region boundaries; regularize and simplify boundary polygons; construct complex roofs. Several case
Optimisation of a parallel ocean general circulation model
M. I. Beare; D. P. Stevens
1997-01-01
International audience; This paper presents the development of a general-purpose parallel ocean circulation model, for use on a wide range of computer platforms, from traditional scalar machines to workstation clusters and massively parallel processors. Parallelism is provided, as a modular option, via high-level message-passing routines, thus hiding the technical intricacies from the user. An initial implementation highlights that the parallel efficiency of the model is adversely affected by...
A 3D Two-node and One-node HCMFD Algorithm for Pin-wise Reactor Analysis
International Nuclear Information System (INIS)
Kim, Jaeha; Kim, Yonghee
2016-01-01
To maximize the parallel computational efficiency, an iterative local-global strategy is adopted in the HCMFD algorithm. The global eigenvalue problem is solved by one-node CMFD, and the local fixed-source problems are solved by two-node CMFD based on the pin-wise nodal solutions. In such local-global scheme, the computational cost is mostly concentrated in solving the local problems while they can be solved in parallel so that a parallel computing can effectively be applied. Previously, the feasibility of the HCMFD algorithm was evaluated only in a 2-D scheme. In this paper, the 3D HCMFD algorithm with some possible variations in treating the axial direction is introduced. The HCMFD algorithm was successfully extended to a 3-D core analysis without any numerical instability even though the axial mesh size in local problems is quite different from the x-y node size. We have shown that 3D pin-wise core analysis can be done very effectively with the HCMFD framework. Additionally, it was demonstrated that parallel efficiency of the new 3D HCMFD scheme can be quite high on a simple OpenMP parallel architecture. It is concluded that the 3D HCMFD will enable an efficient pin-wise 3D core analysis
Real time 3D structural and Doppler OCT imaging on graphics processing units
Sylwestrzak, Marcin; Szlag, Daniel; Szkulmowski, Maciej; Gorczyńska, Iwona; Bukowska, Danuta; Wojtkowski, Maciej; Targowski, Piotr
2013-03-01
In this report the application of graphics processing unit (GPU) programming for real-time 3D Fourier domain Optical Coherence Tomography (FdOCT) imaging with implementation of Doppler algorithms for visualization of the flows in capillary vessels is presented. Generally, the time of the data processing of the FdOCT data on the main processor of the computer (CPU) constitute a main limitation for real-time imaging. Employing additional algorithms, such as Doppler OCT analysis, makes this processing even more time consuming. Lately developed GPUs, which offers a very high computational power, give a solution to this problem. Taking advantages of them for massively parallel data processing, allow for real-time imaging in FdOCT. The presented software for structural and Doppler OCT allow for the whole processing with visualization of 2D data consisting of 2000 A-scans generated from 2048 pixels spectra with frame rate about 120 fps. The 3D imaging in the same mode of the volume data build of 220 × 100 A-scans is performed at a rate of about 8 frames per second. In this paper a software architecture, organization of the threads and optimization applied is shown. For illustration the screen shots recorded during real time imaging of the phantom (homogeneous water solution of Intralipid in glass capillary) and the human eye in-vivo is presented.
DICE/ColDICE: 6D collisionless phase space hydrodynamics using a lagrangian tesselation
Sousbie, Thierry
2018-01-01
DICE is a C++ template library designed to solve collisionless fluid dynamics in 6D phase space using massively parallel supercomputers via an hybrid OpenMP/MPI parallelization. ColDICE, based on DICE, implements a cosmological and physical VLASOV-POISSON solver for cold systems such as dark matter (CDM) dynamics.
Genotypic tropism testing by massively parallel sequencing: qualitative and quantitative analysis
Directory of Open Access Journals (Sweden)
Thiele Bernhard
2011-05-01
Full Text Available Abstract Background Inferring viral tropism from genotype is a fast and inexpensive alternative to phenotypic testing. While being highly predictive when performed on clonal samples, sensitivity of predicting CXCR4-using (X4 variants drops substantially in clinical isolates. This is mainly attributed to minor variants not detected by standard bulk-sequencing. Massively parallel sequencing (MPS detects single clones thereby being much more sensitive. Using this technology we wanted to improve genotypic prediction of coreceptor usage. Methods Plasma samples from 55 antiretroviral-treated patients tested for coreceptor usage with the Monogram Trofile Assay were sequenced with standard population-based approaches. Fourteen of these samples were selected for further analysis with MPS. Tropism was predicted from each sequence with geno2pheno[coreceptor]. Results Prediction based on bulk-sequencing yielded 59.1% sensitivity and 90.9% specificity compared to the trofile assay. With MPS, 7600 reads were generated on average per isolate. Minorities of sequences with high confidence in CXCR4-usage were found in all samples, irrespective of phenotype. When using the default false-positive-rate of geno2pheno[coreceptor] (10%, and defining a minority cutoff of 5%, the results were concordant in all but one isolate. Conclusions The combination of MPS and coreceptor usage prediction results in a fast and accurate alternative to phenotypic assays. The detection of X4-viruses in all isolates suggests that coreceptor usage as well as fitness of minorities is important for therapy outcome. The high sensitivity of this technology in combination with a quantitative description of the viral population may allow implementing meaningful cutoffs for predicting response to CCR5-antagonists in the presence of X4-minorities.
Genotypic tropism testing by massively parallel sequencing: qualitative and quantitative analysis.
Däumer, Martin; Kaiser, Rolf; Klein, Rolf; Lengauer, Thomas; Thiele, Bernhard; Thielen, Alexander
2011-05-13
Inferring viral tropism from genotype is a fast and inexpensive alternative to phenotypic testing. While being highly predictive when performed on clonal samples, sensitivity of predicting CXCR4-using (X4) variants drops substantially in clinical isolates. This is mainly attributed to minor variants not detected by standard bulk-sequencing. Massively parallel sequencing (MPS) detects single clones thereby being much more sensitive. Using this technology we wanted to improve genotypic prediction of coreceptor usage. Plasma samples from 55 antiretroviral-treated patients tested for coreceptor usage with the Monogram Trofile Assay were sequenced with standard population-based approaches. Fourteen of these samples were selected for further analysis with MPS. Tropism was predicted from each sequence with geno2pheno[coreceptor]. Prediction based on bulk-sequencing yielded 59.1% sensitivity and 90.9% specificity compared to the trofile assay. With MPS, 7600 reads were generated on average per isolate. Minorities of sequences with high confidence in CXCR4-usage were found in all samples, irrespective of phenotype. When using the default false-positive-rate of geno2pheno[coreceptor] (10%), and defining a minority cutoff of 5%, the results were concordant in all but one isolate. The combination of MPS and coreceptor usage prediction results in a fast and accurate alternative to phenotypic assays. The detection of X4-viruses in all isolates suggests that coreceptor usage as well as fitness of minorities is important for therapy outcome. The high sensitivity of this technology in combination with a quantitative description of the viral population may allow implementing meaningful cutoffs for predicting response to CCR5-antagonists in the presence of X4-minorities.
Parallel 3-D numerical simulation of dielectric barrier discharge plasma actuators
Houba, Tomas
Dielectric barrier discharge plasma actuators have shown promise in a range of applications including flow control, sterilization and ozone generation. Developing numerical models of plasma actuators is of great importance, because a high-fidelity parallel numerical model allows new design configurations to be tested rapidly. Additionally, it provides a better understanding of the plasma actuator physics which is useful for further innovation. The physics of plasma actuators is studied numerically. A loosely coupled approach is utilized for the coupling of the plasma to the neutral fluid. The state of the art in numerical plasma modeling is advanced by the development of a parallel, three-dimensional, first-principles model with detailed air chemistry. The model incorporates 7 charged species and 18 reactions, along with a solution of the electron energy equation. To the author's knowledge, a parallel three-dimensional model of a gas discharge with a detailed air chemistry model and the solution of electron energy is unique. Three representative geometries are studied using the gas discharge model. The discharge of gas between two parallel electrodes is used to validate the air chemistry model developed for the gas discharge code. The gas discharge model is then applied to the discharge produced by placing a dc powered wire and grounded plate electrodes in a channel. Finally, a three-dimensional simulation of gas discharge produced by electrodes placed inside a riblet is carried out. The body force calculated with the gas discharge model is loosely coupled with a fluid model to predict the induced flow inside the riblet.
Sharif, Harlina Md; Hazumi, Hazman; Hafizuddin Meli, Rafiq
2018-01-01
3D imaging technologies have undergone massive revolution in recent years. Despite this rapid development, documentation of 3D cultural assets in Malaysia is still very much reliant upon conventional techniques such as measured drawings and manual photogrammetry. There is very little progress towards exploring new methods or advanced technologies to convert 3D cultural assets into 3D visual representation and visualization models that are easily accessible for information sharing. In recent years, however, the advent of computer vision (CV) algorithms make it possible to reconstruct 3D geometry of objects by using image sequences from digital cameras, which are then processed by web services and freeware applications. This paper presents a completed stage of an exploratory study that investigates the potentials of using CV automated image-based open-source software and web services to reconstruct and replicate cultural assets. By selecting an intricate wooden boat, Petalaindera, this study attempts to evaluate the efficiency of CV systems and compare it with the application of 3D laser scanning, which is known for its accuracy, efficiency and high cost. The final aim of this study is to compare the visual accuracy of 3D models generated by CV system, and 3D models produced by 3D scanning and manual photogrammetry for an intricate subject such as the Petalaindera. The final objective is to explore cost-effective methods that could provide fundamental guidelines on the best practice approach for digital heritage in Malaysia.
Formation and pre-MS Evolution of Massive Stars with Growing Accretion
Maeder, A.; Behrend, R.
2002-10-01
We briefly describe the three existing scenarios for forming massive stars and emphasize that the arguments often used to reject the accretion scenario for massive stars are misleading. It is usually not accounted for the fact that the turbulent pressure associated to large turbulent velocities in clouds necessarily imply relatively high accretion rates for massive stars. We show the basic difference between the formation of low and high mass stars based on the values of the free fall time and of the Kelvin-Helmholtz timescale, and define the concept of birthline for massive stars. Due to D-burning, the radius and location of the birthline in the HR diagram, as well as the lifetimes are very sensitive to the accretion rate dM/dt(accr). If a form dM/dt(accr) propto A(M/Msun)phi is adopted, the observations in the HR diagram and the lifetimes support a value of A approx 10-5 Msun/yr and a value of phi > 1. Remarkably, such a law is consistent with the relation found by Churchwell and Henning et al. between the outflow rates and the luminosities of ultracompact HII regions, if we assume that a fraction 0.15 to 0.3 of the global inflow is accreted. The above relation implies high dM/dt(accr) approx 10-3 Msun/yr for the most massive stars. The physical possibility of such high dM/dt(accr) is supported by current numerical models. Finally, we give simple analytical arguments in favour of the growth of dM/dt(accr) with the already accreted mass. We also suggest that due to Bondi-Hoyle accretion, the formation of binary stars is largely favoured among massive stars in the accretion scenario.
FROMS3D: New Software for 3-D Visualization of Fracture Network System in Fractured Rock Masses
Noh, Y. H.; Um, J. G.; Choi, Y.
2014-12-01
A new software (FROMS3D) is presented to visualize fracture network system in 3-D. The software consists of several modules that play roles in management of borehole and field fracture data, fracture network modelling, visualization of fracture geometry in 3-D and calculation and visualization of intersections and equivalent pipes between fractures. Intel Parallel Studio XE 2013, Visual Studio.NET 2010 and the open source VTK library were utilized as development tools to efficiently implement the modules and the graphical user interface of the software. The results have suggested that the developed software is effective in visualizing 3-D fracture network system, and can provide useful information to tackle the engineering geological problems related to strength, deformability and hydraulic behaviors of the fractured rock masses.
A massive, quiescent galaxy at a redshift of 3.717
Glazebrook, Karl; Schreiber, Corentin; Labbé, Ivo; Nanayakkara, Themiya; Kacprzak, Glenn G.; Oesch, Pascal A.; Papovich, Casey; Spitler, Lee R.; Straatman, Caroline M. S.; Tran, Kim-Vy H.; Yuan, Tiantian
2017-04-01
Finding massive galaxies that stopped forming stars in the early Universe presents an observational challenge because their rest-frame ultraviolet emission is negligible and they can only be reliably identified by extremely deep near-infrared surveys. These surveys have revealed the presence of massive, quiescent early-type galaxies appearing as early as redshift z ≈ 2, an epoch three billion years after the Big Bang. Their age and formation processes have now been explained by an improved generation of galaxy-formation models, in which they form rapidly at z ≈ 3-4, consistent with the typical masses and ages derived from their observations. Deeper surveys have reported evidence for populations of massive, quiescent galaxies at even higher redshifts and earlier times, using coarsely sampled photometry. However, these early, massive, quiescent galaxies are not predicted by the latest generation of theoretical models. Here we report the spectroscopic confirmation of one such galaxy at redshift z = 3.717, with a stellar mass of 1.7 × 1011 solar masses. We derive its age to be nearly half the age of the Universe at this redshift and the absorption line spectrum shows no current star formation. These observations demonstrate that the galaxy must have formed the majority of its stars quickly, within the first billion years of cosmic history in a short, extreme starburst. This ancestral starburst appears similar to those being found by submillimetre-wavelength surveys. The early formation of such massive systems implies that our picture of early galaxy assembly requires substantial revision.
Massive Kaluza-Klein theories and their spontaneously broken symmetries
International Nuclear Information System (INIS)
Hohm, O.
2006-07-01
In this thesis we investigate the effective actions for massive Kaluza-Klein states, focusing on the massive modes of spin-3/2 and spin-2 fields. To this end we determine the spontaneously broken gauge symmetries associated to these 'higher-spin' states and construct the unbroken phase of the Kaluza-Klein theory. We show that for the particular background AdS 3 x S 3 x S 3 a consistent coupling of the first massive spin-3/2 multiplet requires an enhancement of local supersymmetry, which in turn will be partially broken in the Kaluza-Klein vacuum. The corresponding action is constructed as a gauged maximal supergravity in D=3. Subsequently, the symmetries underlying an infinite tower of massive spin-2 states are analyzed in case of a Kaluza-Klein compactification of four-dimensional gravity to D=3. It is shown that the resulting gravity-spin-2 theory is given by a Chern-Simons action of an affine algebra and also allows a geometrical interpretation in terms of 'algebra-valued' differential geometry. The global symmetry group is determined, which contains an affine extension of the Ehlers group. We show that the broken phase can in turn be constructed via gauging a certain subgroup of the global symmetry group. Finally, deformations of the Kaluza-Klein theory on AdS 3 x S 3 x S 3 and the corresponding symmetry breakings are analyzed as possible applications for the AdS/CFT correspondence. (Orig.)
3D magnetospheric parallel hybrid multi-grid method applied to planet–plasma interactions
Energy Technology Data Exchange (ETDEWEB)
Leclercq, L., E-mail: ludivine.leclercq@latmos.ipsl.fr [LATMOS/IPSL, UVSQ Université Paris-Saclay, UPMC Univ. Paris 06, CNRS, Guyancourt (France); Modolo, R., E-mail: ronan.modolo@latmos.ipsl.fr [LATMOS/IPSL, UVSQ Université Paris-Saclay, UPMC Univ. Paris 06, CNRS, Guyancourt (France); Leblanc, F. [LATMOS/IPSL, UPMC Univ. Paris 06 Sorbonne Universités, UVSQ, CNRS, Paris (France); Hess, S. [ONERA, Toulouse (France); Mancini, M. [LUTH, Observatoire Paris-Meudon (France)
2016-03-15
We present a new method to exploit multiple refinement levels within a 3D parallel hybrid model, developed to study planet–plasma interactions. This model is based on the hybrid formalism: ions are kinetically treated whereas electrons are considered as a inertia-less fluid. Generally, ions are represented by numerical particles whose size equals the volume of the cells. Particles that leave a coarse grid subsequently entering a refined region are split into particles whose volume corresponds to the volume of the refined cells. The number of refined particles created from a coarse particle depends on the grid refinement rate. In order to conserve velocity distribution functions and to avoid calculations of average velocities, particles are not coalesced. Moreover, to ensure the constancy of particles' shape function sizes, the hybrid method is adapted to allow refined particles to move within a coarse region. Another innovation of this approach is the method developed to compute grid moments at interfaces between two refinement levels. Indeed, the hybrid method is adapted to accurately account for the special grid structure at the interfaces, avoiding any overlapping grid considerations. Some fundamental test runs were performed to validate our approach (e.g. quiet plasma flow, Alfven wave propagation). Lastly, we also show a planetary application of the model, simulating the interaction between Jupiter's moon Ganymede and the Jovian plasma.
Iterative methods for 3D implicit finite-difference migration using the complex Padé approximation
International Nuclear Information System (INIS)
Costa, Carlos A N; Campos, Itamara S; Costa, Jessé C; Neto, Francisco A; Schleicher, Jörg; Novais, Amélia
2013-01-01
Conventional implementations of 3D finite-difference (FD) migration use splitting techniques to accelerate performance and save computational cost. However, such techniques are plagued with numerical anisotropy that jeopardises the correct positioning of dipping reflectors in the directions not used for the operator splitting. We implement 3D downward continuation FD migration without splitting using a complex Padé approximation. In this way, the numerical anisotropy is eliminated at the expense of a computationally more intensive solution of a large-band linear system. We compare the performance of the iterative stabilized biconjugate gradient (BICGSTAB) and that of the multifrontal massively parallel direct solver (MUMPS). It turns out that the use of the complex Padé approximation not only stabilizes the solution, but also acts as an effective preconditioner for the BICGSTAB algorithm, reducing the number of iterations as compared to the implementation using the real Padé expansion. As a consequence, the iterative BICGSTAB method is more efficient than the direct MUMPS method when solving a single term in the Padé expansion. The results of both algorithms, here evaluated by computing the migration impulse response in the SEG/EAGE salt model, are of comparable quality. (paper)
Directory of Open Access Journals (Sweden)
Tomohiro Kitano
Full Text Available A variant in a transcription factor gene, POU4F3, is responsible for autosomal dominant nonsyndromic hereditary hearing loss, DFNA15. To date, 14 variants, including a whole deletion of POU4F3, have been reported to cause HL in various ethnic groups. In the present study, genetic screening for POU4F3 variants was carried out for a large series of Japanese hearing loss (HL patients to clarify the prevalence and clinical characteristics of DFNA15 in the Japanese population. Massively parallel DNA sequencing of 68 target candidate genes was utilized in 2,549 unrelated Japanese HL patients (probands to identify genomic variations responsible for HL. The detailed clinical features in patients with POU4F3 variants were collected from medical charts and analyzed. Novel 12 POU4F3 likely pathogenic variants (six missense variants, three frameshift variants, and three nonsense variants were successfully identified in 15 probands (2.5% among 602 families exhibiting autosomal dominant HL, whereas no variants were detected in the other 1,947 probands with autosomal recessive or inheritance pattern unknown HL. To obtain the audiovestibular configuration of the patients harboring POU4F3 variants, we collected audiograms and vestibular symptoms of the probands and their affected family members. Audiovestibular phenotypes in a total of 24 individuals from the 15 families possessing variants were characterized by progressive HL, with a large variation in the onset age and severity with or without vestibular symptoms observed. Pure-tone audiograms indicated the most prevalent configuration as mid-frequency HL type followed by high-frequency HL type, with asymmetry observed in approximately 20% of affected individuals. Analysis of the relationship between age and pure-tone average suggested that individuals with truncating variants showed earlier onset and slower progression of HL than did those with non-truncating variants. The present study showed that variants
Parallel algorithms for 2-D cylindrical transport equations of Eigenvalue problem
International Nuclear Information System (INIS)
Wei, J.; Yang, S.
2013-01-01
In this paper, aimed at the neutron transport equations of eigenvalue problem under 2-D cylindrical geometry on unstructured grid, the discrete scheme of Sn discrete ordinate and discontinuous finite is built, and the parallel computation for the scheme is realized on MPI systems. Numerical experiments indicate that the designed parallel algorithm can reach perfect speedup, it has good practicality and scalability. (authors)
GRay: A MASSIVELY PARALLEL GPU-BASED CODE FOR RAY TRACING IN RELATIVISTIC SPACETIMES
Energy Technology Data Exchange (ETDEWEB)
Chan, Chi-kwan; Psaltis, Dimitrios; Özel, Feryal [Department of Astronomy, University of Arizona, 933 N. Cherry Ave., Tucson, AZ 85721 (United States)
2013-11-01
We introduce GRay, a massively parallel integrator designed to trace the trajectories of billions of photons in a curved spacetime. This graphics-processing-unit (GPU)-based integrator employs the stream processing paradigm, is implemented in CUDA C/C++, and runs on nVidia graphics cards. The peak performance of GRay using single-precision floating-point arithmetic on a single GPU exceeds 300 GFLOP (or 1 ns per photon per time step). For a realistic problem, where the peak performance cannot be reached, GRay is two orders of magnitude faster than existing central-processing-unit-based ray-tracing codes. This performance enhancement allows more effective searches of large parameter spaces when comparing theoretical predictions of images, spectra, and light curves from the vicinities of compact objects to observations. GRay can also perform on-the-fly ray tracing within general relativistic magnetohydrodynamic algorithms that simulate accretion flows around compact objects. Making use of this algorithm, we calculate the properties of the shadows of Kerr black holes and the photon rings that surround them. We also provide accurate fitting formulae of their dependencies on black hole spin and observer inclination, which can be used to interpret upcoming observations of the black holes at the center of the Milky Way, as well as M87, with the Event Horizon Telescope.
MAP3D: a media processor approach for high-end 3D graphics
Darsa, Lucia; Stadnicki, Steven; Basoglu, Chris
1999-12-01
Equator Technologies, Inc. has used a software-first approach to produce several programmable and advanced VLIW processor architectures that have the flexibility to run both traditional systems tasks and an array of media-rich applications. For example, Equator's MAP1000A is the world's fastest single-chip programmable signal and image processor targeted for digital consumer and office automation markets. The Equator MAP3D is a proposal for the architecture of the next generation of the Equator MAP family. The MAP3D is designed to achieve high-end 3D performance and a variety of customizable special effects by combining special graphics features with high performance floating-point and media processor architecture. As a programmable media processor, it offers the advantages of a completely configurable 3D pipeline--allowing developers to experiment with different algorithms and to tailor their pipeline to achieve the highest performance for a particular application. With the support of Equator's advanced C compiler and toolkit, MAP3D programs can be written in a high-level language. This allows the compiler to successfully find and exploit any parallelism in a programmer's code, thus decreasing the time to market of a given applications. The ability to run an operating system makes it possible to run concurrent applications in the MAP3D chip, such as video decoding while executing the 3D pipelines, so that integration of applications is easily achieved--using real-time decoded imagery for texturing 3D objects, for instance. This novel architecture enables an affordable, integrated solution for high performance 3D graphics.
4d quantum geometry from 3d supersymmetric gauge theory and holomorphic block
International Nuclear Information System (INIS)
Han, Muxin
2016-01-01
A class of 3d N=2 supersymmetric gauge theories are constructed and shown to encode the simplicial geometries in 4-dimensions. The gauge theories are defined by applying the Dimofte-Gaiotto-Gukov construction http://dx.doi.org/10.1007/s00220-013-1863-2 in 3d-3d correspondence to certain graph complement 3-manifolds. Given a gauge theory in this class, the massive supersymmetric vacua of the theory contain the classical geometries on a 4d simplicial complex. The corresponding 4d simplicial geometries are locally constant curvature (either dS or AdS), in the sense that they are made by gluing geometrical 4-simplices of the same constant curvature. When the simplicial complex is sufficiently refined, the simplicial geometries can approximate all possible smooth geometries on 4-manifold. At the quantum level, we propose that a class of holomorphic blocks defined in http://dx.doi.org/10.1007/JHEP12(2014)177 from the 3d N=2 gauge theories are wave functions of quantum 4d simplicial geometries. In the semiclassical limit, the asymptotic behavior of holomorphic block reproduces the classical action of 4d Einstein-Hilbert gravity in the simplicial context.
Parallel computing of a climate model on the dawn 1000 by domain decomposition method
Bi, Xunqiang
1997-12-01
In this paper the parallel computing of a grid-point nine-level atmospheric general circulation model on the Dawn 1000 is introduced. The model was developed by the Institute of Atmospheric Physics (IAP), Chinese Academy of Sciences (CAS). The Dawn 1000 is a MIMD massive parallel computer made by National Research Center for Intelligent Computer (NCIC), CAS. A two-dimensional domain decomposition method is adopted to perform the parallel computing. The potential ways to increase the speed-up ratio and exploit more resources of future massively parallel supercomputation are also discussed.
Hybrid massively parallel fast sweeping method for static Hamilton–Jacobi equations
Energy Technology Data Exchange (ETDEWEB)
Detrixhe, Miles, E-mail: mdetrixhe@engineering.ucsb.edu [Department of Mechanical Engineering (United States); University of California Santa Barbara, Santa Barbara, CA, 93106 (United States); Gibou, Frédéric, E-mail: fgibou@engineering.ucsb.edu [Department of Mechanical Engineering (United States); University of California Santa Barbara, Santa Barbara, CA, 93106 (United States); Department of Computer Science (United States); Department of Mathematics (United States)
2016-10-01
The fast sweeping method is a popular algorithm for solving a variety of static Hamilton–Jacobi equations. Fast sweeping algorithms for parallel computing have been developed, but are severely limited. In this work, we present a multilevel, hybrid parallel algorithm that combines the desirable traits of two distinct parallel methods. The fine and coarse grained components of the algorithm take advantage of heterogeneous computer architecture common in high performance computing facilities. We present the algorithm and demonstrate its effectiveness on a set of example problems including optimal control, dynamic games, and seismic wave propagation. We give results for convergence, parallel scaling, and show state-of-the-art speedup values for the fast sweeping method.
Hybrid massively parallel fast sweeping method for static Hamilton–Jacobi equations
International Nuclear Information System (INIS)
Detrixhe, Miles; Gibou, Frédéric
2016-01-01
The fast sweeping method is a popular algorithm for solving a variety of static Hamilton–Jacobi equations. Fast sweeping algorithms for parallel computing have been developed, but are severely limited. In this work, we present a multilevel, hybrid parallel algorithm that combines the desirable traits of two distinct parallel methods. The fine and coarse grained components of the algorithm take advantage of heterogeneous computer architecture common in high performance computing facilities. We present the algorithm and demonstrate its effectiveness on a set of example problems including optimal control, dynamic games, and seismic wave propagation. We give results for convergence, parallel scaling, and show state-of-the-art speedup values for the fast sweeping method.
Moussawi, Ali; Lubineau, Gilles; Xu, Jiangping; Pan, Bing
2015-01-01
Summary: The post-treatment of (3D) displacement fields for the identification of spatially varying elastic material parameters is a large inverse problem that remains out of reach for massive 3D structures. We explore here the potential
2-D and 3-D computations of curved accelerator magnets
International Nuclear Information System (INIS)
Turner, L.R.
1991-01-01
In order to save computer memory, a long accelerator magnet may be computed by treating the long central region and the end regions separately. The dipole magnets for the injector synchrotron of the Advanced Photon Source (APS), now under construction at Argonne National Laboratory (ANL), employ magnet iron consisting of parallel laminations, stacked with a uniform radius of curvature of 33.379 m. Laplace's equation for the magnetic scalar potential has a different form for a straight magnet (x-y coordinates), a magnet with surfaces curved about a common center (r-θ coordinates), and a magnet with parallel laminations like the APS injector dipole. Yet pseudo 2-D computations for the three geometries give basically identical results, even for a much more strongly curved magnet. Hence 2-D (x-y) computations of the central region and 3-D computations of the end regions can be combined to determine the overall magnetic behavior of the magnets. 1 ref., 6 figs
Hu, Anzhong; Lv, Tiejun; Gao, Hui; Zhang, Zhang; Yang, Shaoshi
2014-10-01
In this paper, an approach of estimating signal parameters via rotational invariance technique (ESPRIT) is proposed for two-dimensional (2-D) localization of incoherently distributed (ID) sources in large-scale/massive multiple-input multiple-output (MIMO) systems. The traditional ESPRIT-based methods are valid only for one-dimensional (1-D) localization of the ID sources. By contrast, in the proposed approach the signal subspace is constructed for estimating the nominal azimuth and elevation direction-of-arrivals and the angular spreads. The proposed estimator enjoys closed-form expressions and hence it bypasses the searching over the entire feasible field. Therefore, it imposes significantly lower computational complexity than the conventional 2-D estimation approaches. Our analysis shows that the estimation performance of the proposed approach improves when the large-scale/massive MIMO systems are employed. The approximate Cram\\'{e}r-Rao bound of the proposed estimator for the 2-D localization is also derived. Numerical results demonstrate that albeit the proposed estimation method is comparable with the traditional 2-D estimators in terms of performance, it benefits from a remarkably lower computational complexity.
Parallel fabrication of macroporous scaffolds.
Dobos, Andrew; Grandhi, Taraka Sai Pavan; Godeshala, Sudhakar; Meldrum, Deirdre R; Rege, Kaushal
2018-07-01
Scaffolds generated from naturally occurring and synthetic polymers have been investigated in several applications because of their biocompatibility and tunable chemo-mechanical properties. Existing methods for generation of 3D polymeric scaffolds typically cannot be parallelized, suffer from low throughputs, and do not allow for quick and easy removal of the fragile structures that are formed. Current molds used in hydrogel and scaffold fabrication using solvent casting and porogen leaching are often single-use and do not facilitate 3D scaffold formation in parallel. Here, we describe a simple device and related approaches for the parallel fabrication of macroporous scaffolds. This approach was employed for the generation of macroporous and non-macroporous materials in parallel, in higher throughput and allowed for easy retrieval of these 3D scaffolds once formed. In addition, macroporous scaffolds with interconnected as well as non-interconnected pores were generated, and the versatility of this approach was employed for the generation of 3D scaffolds from diverse materials including an aminoglycoside-derived cationic hydrogel ("Amikagel"), poly(lactic-co-glycolic acid) or PLGA, and collagen. Macroporous scaffolds generated using the device were investigated for plasmid DNA binding and cell loading, indicating the use of this approach for developing materials for different applications in biotechnology. Our results demonstrate that the device-based approach is a simple technology for generating scaffolds in parallel, which can enhance the toolbox of current fabrication techniques. © 2018 Wiley Periodicals, Inc.
Parallel CFD simulation of flow in a 3D model of vibrating human vocal folds
Czech Academy of Sciences Publication Activity Database
Šidlof, Petr; Horáček, Jaromír; Řidký, V.
2013-01-01
Roč. 80, č. 1 (2013), s. 290-300 ISSN 0045-7930 R&D Projects: GA ČR(CZ) GAP101/11/0207 Institutional research plan: CEZ:AV0Z20760514 Keywords : numerical simulation * vocal folds * glottal airflow * inite volume method * parallel CFD Subject RIV: BI - Acoustics Impact factor: 1.532, year: 2013 http://www.sciencedirect.com/science?_ob=ArticleListURL&_method=list&_ArticleListID=-268060849&_sort=r&_st=13&view=c&_acct=C000034318&_version=1&_urlVersion=0&_userid=640952&md5=7c5b5539857ee9a02af5e690585b3126&searchtype=a
Parallelization of elliptic solver for solving 1D Boussinesq model
Tarwidi, D.; Adytia, D.
2018-03-01
In this paper, a parallel implementation of an elliptic solver in solving 1D Boussinesq model is presented. Numerical solution of Boussinesq model is obtained by implementing a staggered grid scheme to continuity, momentum, and elliptic equation of Boussinesq model. Tridiagonal system emerging from numerical scheme of elliptic equation is solved by cyclic reduction algorithm. The parallel implementation of cyclic reduction is executed on multicore processors with shared memory architectures using OpenMP. To measure the performance of parallel program, large number of grids is varied from 28 to 214. Two test cases of numerical experiment, i.e. propagation of solitary and standing wave, are proposed to evaluate the parallel program. The numerical results are verified with analytical solution of solitary and standing wave. The best speedup of solitary and standing wave test cases is about 2.07 with 214 of grids and 1.86 with 213 of grids, respectively, which are executed by using 8 threads. Moreover, the best efficiency of parallel program is 76.2% and 73.5% for solitary and standing wave test cases, respectively.
Nishizawa, Hiroaki; Nishimura, Yoshifumi; Kobayashi, Masato; Irle, Stephan; Nakai, Hiromi
2016-08-05
The linear-scaling divide-and-conquer (DC) quantum chemical methodology is applied to the density-functional tight-binding (DFTB) theory to develop a massively parallel program that achieves on-the-fly molecular reaction dynamics simulations of huge systems from scratch. The functions to perform large scale geometry optimization and molecular dynamics with DC-DFTB potential energy surface are implemented to the program called DC-DFTB-K. A novel interpolation-based algorithm is developed for parallelizing the determination of the Fermi level in the DC method. The performance of the DC-DFTB-K program is assessed using a laboratory computer and the K computer. Numerical tests show the high efficiency of the DC-DFTB-K program, a single-point energy gradient calculation of a one-million-atom system is completed within 60 s using 7290 nodes of the K computer. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.
Optimizing a massive parallel sequencing workflow for quantitative miRNA expression analysis.
Directory of Open Access Journals (Sweden)
Francesca Cordero
Full Text Available BACKGROUND: Massive Parallel Sequencing methods (MPS can extend and improve the knowledge obtained by conventional microarray technology, both for mRNAs and short non-coding RNAs, e.g. miRNAs. The processing methods used to extract and interpret the information are an important aspect of dealing with the vast amounts of data generated from short read sequencing. Although the number of computational tools for MPS data analysis is constantly growing, their strengths and weaknesses as part of a complex analytical pipe-line have not yet been well investigated. PRIMARY FINDINGS: A benchmark MPS miRNA dataset, resembling a situation in which miRNAs are spiked in biological replication experiments was assembled by merging a publicly available MPS spike-in miRNAs data set with MPS data derived from healthy donor peripheral blood mononuclear cells. Using this data set we observed that short reads counts estimation is strongly under estimated in case of duplicates miRNAs, if whole genome is used as reference. Furthermore, the sensitivity of miRNAs detection is strongly dependent by the primary tool used in the analysis. Within the six aligners tested, specifically devoted to miRNA detection, SHRiMP and MicroRazerS show the highest sensitivity. Differential expression estimation is quite efficient. Within the five tools investigated, two of them (DESseq, baySeq show a very good specificity and sensitivity in the detection of differential expression. CONCLUSIONS: The results provided by our analysis allow the definition of a clear and simple analytical optimized workflow for miRNAs digital quantitative analysis.
Optimizing a massive parallel sequencing workflow for quantitative miRNA expression analysis.
Cordero, Francesca; Beccuti, Marco; Arigoni, Maddalena; Donatelli, Susanna; Calogero, Raffaele A
2012-01-01
Massive Parallel Sequencing methods (MPS) can extend and improve the knowledge obtained by conventional microarray technology, both for mRNAs and short non-coding RNAs, e.g. miRNAs. The processing methods used to extract and interpret the information are an important aspect of dealing with the vast amounts of data generated from short read sequencing. Although the number of computational tools for MPS data analysis is constantly growing, their strengths and weaknesses as part of a complex analytical pipe-line have not yet been well investigated. A benchmark MPS miRNA dataset, resembling a situation in which miRNAs are spiked in biological replication experiments was assembled by merging a publicly available MPS spike-in miRNAs data set with MPS data derived from healthy donor peripheral blood mononuclear cells. Using this data set we observed that short reads counts estimation is strongly under estimated in case of duplicates miRNAs, if whole genome is used as reference. Furthermore, the sensitivity of miRNAs detection is strongly dependent by the primary tool used in the analysis. Within the six aligners tested, specifically devoted to miRNA detection, SHRiMP and MicroRazerS show the highest sensitivity. Differential expression estimation is quite efficient. Within the five tools investigated, two of them (DESseq, baySeq) show a very good specificity and sensitivity in the detection of differential expression. The results provided by our analysis allow the definition of a clear and simple analytical optimized workflow for miRNAs digital quantitative analysis.
Iterative and iterative-noniterative integral solutions in 3-loop massive QCD calculations
International Nuclear Information System (INIS)
Ablinger, J.; Radu, C.S.; Schneider, C.; Behring, A.; Imamoglu, E.; Van Hoeij, M.; Von Manteuffel, A.; Raab, C.G.
2017-11-01
Various of the single scale quantities in massless and massive QCD up to 3-loop order can be expressed by iterative integrals over certain classes of alphabets, from the harmonic polylogarithms to root-valued alphabets. Examples are the anomalous dimensions to 3-loop order, the massless Wilson coefficients and also different massive operator matrix elements. Starting at 3-loop order, however, also other letters appear in the case of massive operator matrix elements, the so called iterative non-iterative integrals, which are related to solutions based on complete elliptic integrals or any other special function with an integral representation that is definite but not a Volterra-type integral. After outlining the formalism leading to iterative non-iterative integrals,we present examples for both of these cases with the 3-loop anomalous dimension γ (2) qg and the structure of the principle solution in the iterative non-interative case of the 3-loop QCD corrections to the ρ-parameter.
Iterative and iterative-noniterative integral solutions in 3-loop massive QCD calculations
Energy Technology Data Exchange (ETDEWEB)
Ablinger, J.; Radu, C.S.; Schneider, C. [Johannes Kepler Univ., Linz (Austria). Research Inst. for Symbolic Computation (RISC); Behring, A. [RWTH Aachen Univ. (Germany). Inst. fuer Theoretische Teilchenphysik und Kosmologie; Bluemlein, J.; Freitas, A. de [Deutsches Elektronen-Synchrotron (DESY), Zeuthen (Germany); Imamoglu, E.; Van Hoeij, M. [Florida State Univ., Tallahassee, FL (United States). Dept. of Mathematics; Von Manteuffel, A. [Michigan State Univ., East Lansing, MI (United States). Dept. of Physics and Astronomy; Raab, C.G. [Johannes Kepler Univ., Linz (Austria). Inst. for Algebra
2017-11-15
Various of the single scale quantities in massless and massive QCD up to 3-loop order can be expressed by iterative integrals over certain classes of alphabets, from the harmonic polylogarithms to root-valued alphabets. Examples are the anomalous dimensions to 3-loop order, the massless Wilson coefficients and also different massive operator matrix elements. Starting at 3-loop order, however, also other letters appear in the case of massive operator matrix elements, the so called iterative non-iterative integrals, which are related to solutions based on complete elliptic integrals or any other special function with an integral representation that is definite but not a Volterra-type integral. After outlining the formalism leading to iterative non-iterative integrals,we present examples for both of these cases with the 3-loop anomalous dimension γ{sup (2)}{sub qg} and the structure of the principle solution in the iterative non-interative case of the 3-loop QCD corrections to the ρ-parameter.
Directory of Open Access Journals (Sweden)
Cord Drögemüller
2010-08-01
Full Text Available Arachnomelia is a monogenic recessive defect of skeletal development in cattle. The causative mutation was previously mapped to a ∼7 Mb interval on chromosome 5. Here we show that array-based sequence capture and massively parallel sequencing technology, combined with the typical family structure in livestock populations, facilitates the identification of the causative mutation. We re-sequenced the entire critical interval in a healthy partially inbred cow carrying one copy of the critical chromosome segment in its ancestral state and one copy of the same segment with the arachnomelia mutation, and we detected a single heterozygous position. The genetic makeup of several partially inbred cattle provides extremely strong support for the causality of this mutation. The mutation represents a single base insertion leading to a premature stop codon in the coding sequence of the SUOX gene and is perfectly associated with the arachnomelia phenotype. Our findings suggest an important role for sulfite oxidase in bone development.
Drögemüller, Cord; Tetens, Jens; Sigurdsson, Snaevar; Gentile, Arcangelo; Testoni, Stefania; Lindblad-Toh, Kerstin; Leeb, Tosso
2010-01-01
Arachnomelia is a monogenic recessive defect of skeletal development in cattle. The causative mutation was previously mapped to a ∼7 Mb interval on chromosome 5. Here we show that array-based sequence capture and massively parallel sequencing technology, combined with the typical family structure in livestock populations, facilitates the identification of the causative mutation. We re-sequenced the entire critical interval in a healthy partially inbred cow carrying one copy of the critical chromosome segment in its ancestral state and one copy of the same segment with the arachnomelia mutation, and we detected a single heterozygous position. The genetic makeup of several partially inbred cattle provides extremely strong support for the causality of this mutation. The mutation represents a single base insertion leading to a premature stop codon in the coding sequence of the SUOX gene and is perfectly associated with the arachnomelia phenotype. Our findings suggest an important role for sulfite oxidase in bone development. PMID:20865119
Hu, Peng; Fabyanic, Emily; Kwon, Deborah Y; Tang, Sheng; Zhou, Zhaolan; Wu, Hao
2017-12-07
Massively parallel single-cell RNA sequencing can precisely resolve cellular diversity in a high-throughput manner at low cost, but unbiased isolation of intact single cells from complex tissues such as adult mammalian brains is challenging. Here, we integrate sucrose-gradient-assisted purification of nuclei with droplet microfluidics to develop a highly scalable single-nucleus RNA-seq approach (sNucDrop-seq), which is free of enzymatic dissociation and nucleus sorting. By profiling ∼18,000 nuclei isolated from cortical tissues of adult mice, we demonstrate that sNucDrop-seq not only accurately reveals neuronal and non-neuronal subtype composition with high sensitivity but also enables in-depth analysis of transient transcriptional states driven by neuronal activity, at single-cell resolution, in vivo. Copyright © 2017 Elsevier Inc. All rights reserved.
Massive Kaluza-Klein theories and their spontaneously broken symmetries
Energy Technology Data Exchange (ETDEWEB)
Hohm, O.
2006-07-15
In this thesis we investigate the effective actions for massive Kaluza-Klein states, focusing on the massive modes of spin-3/2 and spin-2 fields. To this end we determine the spontaneously broken gauge symmetries associated to these 'higher-spin' states and construct the unbroken phase of the Kaluza-Klein theory. We show that for the particular background AdS{sub 3} x S{sup 3} x S{sup 3} a consistent coupling of the first massive spin-3/2 multiplet requires an enhancement of local supersymmetry, which in turn will be partially broken in the Kaluza-Klein vacuum. The corresponding action is constructed as a gauged maximal supergravity in D=3. Subsequently, the symmetries underlying an infinite tower of massive spin-2 states are analyzed in case of a Kaluza-Klein compactification of four-dimensional gravity to D=3. It is shown that the resulting gravity-spin-2 theory is given by a Chern-Simons action of an affine algebra and also allows a geometrical interpretation in terms of 'algebra-valued' differential geometry. The global symmetry group is determined, which contains an affine extension of the Ehlers group. We show that the broken phase can in turn be constructed via gauging a certain subgroup of the global symmetry group. Finally, deformations of the Kaluza-Klein theory on AdS{sub 3} x S{sup 3} x S{sup 3} and the corresponding symmetry breakings are analyzed as possible applications for the AdS/CFT correspondence. (Orig.)
Development of the PARVMEC Code for Rapid Analysis of 3D MHD Equilibrium
Seal, Sudip; Hirshman, Steven; Cianciosa, Mark; Wingen, Andreas; Unterberg, Ezekiel; Wilcox, Robert; ORNL Collaboration
2015-11-01
The VMEC three-dimensional (3D) MHD equilibrium has been used extensively for designing stellarator experiments and analyzing experimental data in such strongly 3D systems. Recent applications of VMEC include 2D systems such as tokamaks (in particular, the D3D experiment), where application of very small (delB/B ~ 10-3) 3D resonant magnetic field perturbations render the underlying assumption of axisymmetry invalid. In order to facilitate the rapid analysis of such equilibria (for example, for reconstruction purposes), we have undertaken the task of parallelizing the VMEC code (PARVMEC) to produce a scalable and temporally rapidly convergent equilibrium code for use on parallel distributed memory platforms. The parallelization task naturally splits into three distinct parts 1) radial surfaces in the fixed-boundary part of the calculation; 2) two 2D angular meshes needed to compute the Green's function integrals over the plasma boundary for the free-boundary part of the code; and 3) block tridiagonal matrix needed to compute the full (3D) pre-conditioner near the final equilibrium state. Preliminary results show that scalability is achieved for tasks 1 and 3, with task 2 still nearing completion. The impact of this work on the rapid reconstruction of D3D plasmas using PARVMEC in the V3FIT code will be discussed. Work supported by U.S. DOE under Contract DE-AC05-00OR22725 with UT-Battelle, LLC.
Massively parallel computation of PARASOL code on the Origin 3800 system
International Nuclear Information System (INIS)
Hosokawa, Masanari; Takizuka, Tomonori
2001-10-01
The divertor particle simulation code named PARASOL simulates open-field plasmas between divertor walls self-consistently by using an electrostatic PIC method and a binary collision Monte Carlo model. The PARASOL parallelized with MPI-1.1 for scalar parallel computer worked on Intel Paragon XP/S system. A system SGI Origin 3800 was newly installed (May, 2001). The parallel programming was improved at this switchover. As a result of the high-performance new hardware and this improvement, the PARASOL is speeded up by about 60 times with the same number of processors. (author)
Maeda, Takuto; Takemura, Shunsuke; Furumura, Takashi
2017-07-01
We have developed an open-source software package, Open-source Seismic Wave Propagation Code (OpenSWPC), for parallel numerical simulations of seismic wave propagation in 3D and 2D (P-SV and SH) viscoelastic media based on the finite difference method in local-to-regional scales. This code is equipped with a frequency-independent attenuation model based on the generalized Zener body and an efficient perfectly matched layer for absorbing boundary condition. A hybrid-style programming using OpenMP and the Message Passing Interface (MPI) is adopted for efficient parallel computation. OpenSWPC has wide applicability for seismological studies and great portability to allowing excellent performance from PC clusters to supercomputers. Without modifying the code, users can conduct seismic wave propagation simulations using their own velocity structure models and the necessary source representations by specifying them in an input parameter file. The code has various modes for different types of velocity structure model input and different source representations such as single force, moment tensor and plane-wave incidence, which can easily be selected via the input parameters. Widely used binary data formats, the Network Common Data Form (NetCDF) and the Seismic Analysis Code (SAC) are adopted for the input of the heterogeneous structure model and the outputs of the simulation results, so users can easily handle the input/output datasets. All codes are written in Fortran 2003 and are available with detailed documents in a public repository.[Figure not available: see fulltext.
A parallel algorithm for transient solid dynamics simulations with contact detection
International Nuclear Information System (INIS)
Attaway, S.; Hendrickson, B.; Plimpton, S.; Gardner, D.; Vaughan, C.; Heinstein, M.; Peery, J.
1996-01-01
Solid dynamics simulations with Lagrangian finite elements are used to model a wide variety of problems, such as the calculation of impact damage to shipping containers for nuclear waste and the analysis of vehicular crashes. Using parallel computers for these simulations has been hindered by the difficulty of searching efficiently for material surface contacts in parallel. A new parallel algorithm for calculation of arbitrary material contacts in finite element simulations has been developed and implemented in the PRONTO3D transient solid dynamics code. This paper will explore some of the issues involved in developing efficient, portable, parallel finite element models for nonlinear transient solid dynamics simulations. The contact-detection problem poses interesting challenges for efficient implementation of a solid dynamics simulation on a parallel computer. The finite element mesh is typically partitioned so that each processor owns a localized region of the finite element mesh. This mesh partitioning is optimal for the finite element portion of the calculation since each processor must communicate only with the few connected neighboring processors that share boundaries with the decomposed mesh. However, contacts can occur between surfaces that may be owned by any two arbitrary processors. Hence, a global search across all processors is required at every time step to search for these contacts. Load-imbalance can become a problem since the finite element decomposition divides the volumetric mesh evenly across processors but typically leaves the surface elements unevenly distributed. In practice, these complications have been limiting factors in the performance and scalability of transient solid dynamics on massively parallel computers. In this paper the authors present a new parallel algorithm for contact detection that overcomes many of these limitations
HeinzelCluster: accelerated reconstruction for FORE and OSEM3D.
Vollmar, S; Michel, C; Treffert, J T; Newport, D F; Casey, M; Knöss, C; Wienhard, K; Liu, X; Defrise, M; Heiss, W D
2002-08-07
Using iterative three-dimensional (3D) reconstruction techniques for reconstruction of positron emission tomography (PET) is not feasible on most single-processor machines due to the excessive computing time needed, especially so for the large sinogram sizes of our high-resolution research tomograph (HRRT). In our first approach to speed up reconstruction time we transform the 3D scan into the format of a two-dimensional (2D) scan with sinograms that can be reconstructed independently using Fourier rebinning (FORE) and a fast 2D reconstruction method. On our dedicated reconstruction cluster (seven four-processor systems, Intel PIII@700 MHz, switched fast ethernet and Myrinet, Windows NT Server), we process these 2D sinograms in parallel. We have achieved a speedup > 23 using 26 processors and also compared results for different communication methods (RPC, Syngo, Myrinet GM). The other approach is to parallelize OSEM3D (implementation of C Michel), which has produced the best results for HRRT data so far and is more suitable for an adequate treatment of the sinogram gaps that result from the detector geometry of the HRRT. We have implemented two levels of parallelization for four dedicated cluster (a shared memory fine-grain level on each node utilizing all four processors and a coarse-grain level allowing for 15 nodes) reducing the time for one core iteration from over 7 h to about 35 min.
Energy Technology Data Exchange (ETDEWEB)
Ban, H. Y.; Kavuri, V. C., E-mail: venk@physics.upenn.edu; Cochran, J. M.; Pathak, S.; Chung, S. H.; Yodh, A. G. [Department of Physics and Astronomy, University of Pennsylvania, Philadelphia, Pennsylvania 19104 (United States); Schweiger, M.; Arridge, S. R. [Department of Computer Science, University College London, London WC1E 7JE (United Kingdom); Xie, L. [Department of Radiology, University of Pennsylvania, Philadelphia, Pennsylvania 19104 (United States); Busch, D. R. [Division of Neurology, Children’s Hospital of Philadelphia, Philadelphia, Pennsylvania 19104 (United States); Katrašnik, J. [Faculty of Electrical Engineering, University of Ljubljana, Ljubljana 1000 (Slovenia); Lee, K. [Daegu Gyeongbuk Institute of Science and Technology, Daegu 711-813 (Korea, Republic of); Choe, R. [Department of Biomedical Engineering, University of Rochester, Rochester, New York 14642 (United States); Czerniecki, B. J. [Department of Surgery, Hospital of the University of Pennsylvania, Philadelphia, Pennsylvania 19104 (United States)
2016-07-15
Purpose: The authors introduce a state-of-the-art all-optical clinical diffuse optical tomography (DOT) imaging instrument which collects spatially dense, multispectral, frequency-domain breast data in the parallel-plate geometry. Methods: The instrument utilizes a CCD-based heterodyne detection scheme that permits massively parallel detection of diffuse photon density wave amplitude and phase for a large number of source–detector pairs (10{sup 6}). The stand-alone clinical DOT instrument thus offers high spatial resolution with reduced crosstalk between absorption and scattering. Other novel features include a fringe profilometry system for breast boundary segmentation, real-time data normalization, and a patient bed design which permits both axial and sagittal breast measurements. Results: The authors validated the instrument using tissue simulating phantoms with two different chromophore-containing targets and one scattering target. The authors also demonstrated the instrument in a case study breast cancer patient; the reconstructed 3D image of endogenous chromophores and scattering gave tumor localization in agreement with MRI. Conclusions: Imaging with a novel parallel-plate DOT breast imager that employs highly parallel, high-resolution CCD detection in the frequency-domain was demonstrated.
3D Printing of Organs-On-Chips.
Yi, Hee-Gyeong; Lee, Hyungseok; Cho, Dong-Woo
2017-01-25
Organ-on-a-chip engineering aims to create artificial living organs that mimic the complex and physiological responses of real organs, in order to test drugs by precisely manipulating the cells and their microenvironments. To achieve this, the artificial organs should to be microfabricated with an extracellular matrix (ECM) and various types of cells, and should recapitulate morphogenesis, cell differentiation, and functions according to the native organ. A promising strategy is 3D printing, which precisely controls the spatial distribution and layer-by-layer assembly of cells, ECMs, and other biomaterials. Owing to this unique advantage, integration of 3D printing into organ-on-a-chip engineering can facilitate the creation of micro-organs with heterogeneity, a desired 3D cellular arrangement, tissue-specific functions, or even cyclic movement within a microfluidic device. Moreover, fully 3D-printed organs-on-chips more easily incorporate other mechanical and electrical components with the chips, and can be commercialized via automated massive production. Herein, we discuss the recent advances and the potential of 3D cell-printing technology in engineering organs-on-chips, and provides the future perspectives of this technology to establish the highly reliable and useful drug-screening platforms.
Parallelization of quantum molecular dynamics simulation code
International Nuclear Information System (INIS)
Kato, Kaori; Kunugi, Tomoaki; Shibahara, Masahiko; Kotake, Susumu
1998-02-01
A quantum molecular dynamics simulation code has been developed for the analysis of the thermalization of photon energies in the molecule or materials in Kansai Research Establishment. The simulation code is parallelized for both Scalar massively parallel computer (Intel Paragon XP/S75) and Vector parallel computer (Fujitsu VPP300/12). Scalable speed-up has been obtained with a distribution to processor units by division of particle group in both parallel computers. As a result of distribution to processor units not only by particle group but also by the particles calculation that is constructed with fine calculations, highly parallelization performance is achieved in Intel Paragon XP/S75. (author)
Thyrotoxicosis after a massive levothyroxine ingestion in a 3-year-old patient.
Hays, Hannah L; Jolliff, Heath A; Casavant, Marcel J
2013-11-01
Most children with exploratory levothyroxine ingestions remain asymptomatic or suffer only minor effects, and most patients can be managed in the home or with supportive care in the hospital. We present a case of a 3-year-old girl who was found after a witnessed massive ingestion of levothyroxine. The patient was initially seen in an emergency department and discharged in stable condition, only to return 4 days after ingestion with thyrotoxicosis, hypertension, tachycardia, 24 hours of persistent vomiting, and clinical and laboratory evidence of dehydration. On the day of hospital admission, her thyroid-stimulating hormone was 0.018 µIU/mL (reference range, 0.6-4.5 µIU/mL); free T4 (tetraiodothyronine) was greater than 6.0 ng/dL (reference range, 0.7-2.1 ng/dL); and T3 (triiodothyronine) total was 494 ng/dL (reference range, 100-200 ng/dL). During a 3-day hospital admission, she was managed with supportive care, including intravenous fluid rehydration and antiemetics, and was ultimately discharged in good condition. The patient was followed up until 2 months after ingestion and remained asymptomatic. Although most exploratory levothyroxine ingestions suffer little to no clinical effects, serious symptoms can occur. Because serious symptoms can occur in a delayed fashion, it is important for clinicians to give proper anticipatory guidance regarding home symptom monitoring, follow-up, and reasons to return to the emergency department when patients present for medical evaluation.
Parallel Tensor Compression for Large-Scale Scientific Data.
Energy Technology Data Exchange (ETDEWEB)
Kolda, Tamara G. [Sandia National Lab. (SNL-CA), Livermore, CA (United States); Ballard, Grey [Sandia National Lab. (SNL-CA), Livermore, CA (United States); Austin, Woody Nathan [Univ. of Texas, Austin, TX (United States)
2015-10-01
As parallel computing trends towards the exascale, scientific data produced by high-fidelity simulations are growing increasingly massive. For instance, a simulation on a three-dimensional spatial grid with 512 points per dimension that tracks 64 variables per grid point for 128 time steps yields 8 TB of data. By viewing the data as a dense five way tensor, we can compute a Tucker decomposition to find inherent low-dimensional multilinear structure, achieving compression ratios of up to 10000 on real-world data sets with negligible loss in accuracy. So that we can operate on such massive data, we present the first-ever distributed memory parallel implementation for the Tucker decomposition, whose key computations correspond to parallel linear algebra operations, albeit with nonstandard data layouts. Our approach specifies a data distribution for tensors that avoids any tensor data redistribution, either locally or in parallel. We provide accompanying analysis of the computation and communication costs of the algorithms. To demonstrate the compression and accuracy of the method, we apply our approach to real-world data sets from combustion science simulations. We also provide detailed performance results, including parallel performance in both weak and strong scaling experiments.
A multi-frequency electrical impedance tomography system for real-time 2D and 3D imaging
Yang, Yunjie; Jia, Jiabin
2017-08-01
This paper presents the design and evaluation of a configurable, fast multi-frequency Electrical Impedance Tomography (mfEIT) system for real-time 2D and 3D imaging, particularly for biomedical imaging. The system integrates 32 electrode interfaces and the current frequency ranges from 10 kHz to 1 MHz. The system incorporates the following novel features. First, a fully adjustable multi-frequency current source with current monitoring function is designed. Second, a flexible switching scheme is developed for arbitrary sensing configuration and a semi-parallel data acquisition architecture is implemented for high-frame-rate data acquisition. Furthermore, multi-frequency digital quadrature demodulation is accomplished in a high-capacity Field Programmable Gate Array. At last, a 3D imaging software, visual tomography, is developed for real-time 2D and 3D image reconstruction, data analysis, and visualization. The mfEIT system is systematically tested and evaluated from the aspects of signal to noise ratio (SNR), frame rate, and 2D and 3D multi-frequency phantom imaging. The highest SNR is 82.82 dB on a 16-electrode sensor. The frame rate is up to 546 fps at serial mode and 1014 fps at semi-parallel mode. The evaluation results indicate that the presented mfEIT system is a powerful tool for real-time 2D and 3D imaging.
Design considerations for parallel graphics libraries
Crockett, Thomas W.
1994-01-01
Applications which run on parallel supercomputers are often characterized by massive datasets. Converting these vast collections of numbers to visual form has proven to be a powerful aid to comprehension. For a variety of reasons, it may be desirable to provide this visual feedback at runtime. One way to accomplish this is to exploit the available parallelism to perform graphics operations in place. In order to do this, we need appropriate parallel rendering algorithms and library interfaces. This paper provides a tutorial introduction to some of the issues which arise in designing parallel graphics libraries and their underlying rendering algorithms. The focus is on polygon rendering for distributed memory message-passing systems. We illustrate our discussion with examples from PGL, a parallel graphics library which has been developed on the Intel family of parallel systems.
DIMACS Workshop on Interconnection Networks and Mapping, and Scheduling Parallel Computations
Rosenberg, Arnold L; Sotteau, Dominique; NSF Science and Technology Center in Discrete Mathematics and Theoretical Computer Science; Interconnection networks and mapping and scheduling parallel computations
1995-01-01
The interconnection network is one of the most basic components of a massively parallel computer system. Such systems consist of hundreds or thousands of processors interconnected to work cooperatively on computations. One of the central problems in parallel computing is the task of mapping a collection of processes onto the processors and routing network of a parallel machine. Once this mapping is done, it is critical to schedule computations within and communication among processor from universities and laboratories, as well as practitioners involved in the design, implementation, and application of massively parallel systems. Focusing on interconnection networks of parallel architectures of today and of the near future , the book includes topics such as network topologies,network properties, message routing, network embeddings, network emulation, mappings, and efficient scheduling. inputs for a process are available where and when the process is scheduled to be computed. This book contains the refereed pro...
3D and 4D magnetic susceptibility tomography based on complex MR images
Chen, Zikuan; Calhoun, Vince D
2014-11-11
Magnetic susceptibility is the physical property for T2*-weighted magnetic resonance imaging (T2*MRI). The invention relates to methods for reconstructing an internal distribution (3D map) of magnetic susceptibility values, .chi. (x,y,z), of an object, from 3D T2*MRI phase images, by using Computed Inverse Magnetic Resonance Imaging (CIMRI) tomography. The CIMRI technique solves the inverse problem of the 3D convolution by executing a 3D Total Variation (TV) regularized iterative convolution scheme, using a split Bregman iteration algorithm. The reconstruction of .chi. (x,y,z) can be designed for low-pass, band-pass, and high-pass features by using a convolution kernel that is modified from the standard dipole kernel. Multiple reconstructions can be implemented in parallel, and averaging the reconstructions can suppress noise. 4D dynamic magnetic susceptibility tomography can be implemented by reconstructing a 3D susceptibility volume from a 3D phase volume by performing 3D CIMRI magnetic susceptibility tomography at each snapshot time.
A Parallel Adaboost-Backpropagation Neural Network for Massive Image Dataset Classification
Cao, Jianfang; Chen, Lichao; Wang, Min; Shi, Hao; Tian, Yun
2016-01-01
Image classification uses computers to simulate human understanding and cognition of images by automatically categorizing images. This study proposes a faster image classification approach that parallelizes the traditional Adaboost-Backpropagation (BP) neural network using the MapReduce parallel programming model. First, we construct a strong classifier by assembling the outputs of 15 BP neural networks (which are individually regarded as weak classifiers) based on the Adaboost algorithm. Second, we design Map and Reduce tasks for both the parallel Adaboost-BP neural network and the feature extraction algorithm. Finally, we establish an automated classification model by building a Hadoop cluster. We use the Pascal VOC2007 and Caltech256 datasets to train and test the classification model. The results are superior to those obtained using traditional Adaboost-BP neural network or parallel BP neural network approaches. Our approach increased the average classification accuracy rate by approximately 14.5% and 26.0% compared to the traditional Adaboost-BP neural network and parallel BP neural network, respectively. Furthermore, the proposed approach requires less computation time and scales very well as evaluated by speedup, sizeup and scaleup. The proposed approach may provide a foundation for automated large-scale image classification and demonstrates practical value. PMID:27905520
Massive Born--Infeld and Other Dual Pairs
Ferrara, S
2015-01-01
We consider massive dual pairs of p-forms and (D-p-1)-forms described by non-linear Lagrangians, where non-linear curvature terms in one theory translate into non-linear mass-like terms in the dual theory. In particular, for D=2p and p even the two non-linear structures coincide when the non-linear massless theory is self-dual. This state of affairs finds a natural realization in the four-dimensional massive N=1 supersymmetric Born-Infeld action, which describes either a massive vector multiplet or a massive linear (tensor) multiplet with a Born-Infeld mass-like term. These systems should play a role for the massive gravitino multiplet obtained from a partial super-Higgs in N=2 Supergravity.
Methodes spectrales paralleles et applications aux simulations de couches de melange compressibles
Male , Jean-Michel; Fezoui , Loula ,
1993-01-01
La resolution des equations de Navier-Stokes en methodes spectrales pour des ecoulements compressibles peut etre assez gourmande en temps de calcul. On etudie donc ici la parallelisation d'un tel algorithme et son implantation sur une machine massivement parallele, la connection-machine CM-2. La methode spectrale s'adapte bien aux exigences du parallelisme massif, mais l'un des outils de base de cette methode, la transformee de Fourier rapide (lorsqu'elle doit etre appliquee sur les deux dime...
Extensions of the 3-dimensional plasma transport code E3D
International Nuclear Information System (INIS)
Runov, A.; Schneider, R.; Kasilov, S.; Reiter, D.
2004-01-01
One important aspect of modern fusion research is plasma edge physics. Fluid transport codes extending beyond the standard 2-D code packages like B2-Eirene or UEDGE are under development. A 3-dimensional plasma fluid code, E3D, based upon the Multiple Coordinate System Approach and a Monte Carlo integration procedure has been developed for general magnetic configurations including ergodic regions. These local magnetic coordinates lead to a full metric tensor which accurately accounts for all transport terms in the equations. Here, we discuss new computational aspects of the realization of the algorithm. The main limitation to the Monte Carlo code efficiency comes from the restriction on the parallel jump of advancing test particles which must be small compared to the gradient length of the diffusion coefficient. In our problems, the parallel diffusion coefficient depends on both plasma and magnetic field parameters. Usually, the second dependence is much more critical. In order to allow long parallel jumps, this dependence can be eliminated in two steps: first, the longitudinal coordinate x 3 of local magnetic coordinates is modified in such a way that in the new coordinate system the metric determinant and contra-variant components of the magnetic field scale along the magnetic field with powers of the magnetic field module (like in Boozer flux coordinates). Second, specific weights of the test particles are introduced. As a result of increased parallel jump length, the efficiency of the code is about two orders of magnitude better. (copyright 2004 WILEY-VCH Verlag GmbH and Co. KGaA, Weinheim) (orig.)
A Case Study of a Hybrid Parallel 3D Surface Rendering Graphics Architecture
DEFF Research Database (Denmark)
Holten-Lund, Hans Erik; Madsen, Jan; Pedersen, Steen
1997-01-01
This paper presents a case study in the design strategy used inbuilding a graphics computer, for drawing very complex 3Dgeometric surfaces. The goal is to build a PC based computer systemcapable of handling surfaces built from about 2 million triangles, andto be able to render a perspective view...... of these on a computer displayat interactive frame rates, i.e. processing around 50 milliontriangles per second. The paper presents a hardware/softwarearchitecture called HPGA (Hybrid Parallel Graphics Architecture) whichis likely to be able to carry out this task. The case study focuses ontechniques to increase...
Vaidya spacetime in massive gravity's rainbow
Directory of Open Access Journals (Sweden)
Yaghoub Heydarzade
2017-11-01
Full Text Available In this paper, we will analyze the energy dependent deformation of massive gravity using the formalism of massive gravity's rainbow. So, we will use the Vainshtein mechanism and the dRGT mechanism for the energy dependent massive gravity, and thus analyze a ghost free theory of massive gravity's rainbow. We study the energy dependence of a time-dependent geometry, by analyzing the radiating Vaidya solution in this theory of massive gravity's rainbow. The energy dependent deformation of this Vaidya metric will be performed using suitable rainbow functions.
Zhao, Li; Wit, Janneke; Svetec, Nicolas; Begun, David J
2015-05-01
Gene expression variation within species is relatively common, however, the role of natural selection in the maintenance of this variation is poorly understood. Here we investigate low and high latitude populations of Drosophila melanogaster and its sister species, D. simulans, to determine whether the two species show similar patterns of population differentiation, consistent with a role for spatially varying selection in maintaining gene expression variation. We compared at two temperatures the whole male transcriptome of D. melanogaster and D. simulans sampled from Panama City (Panama) and Maine (USA). We observed a significant excess of genes exhibiting differential expression in both species, consistent with parallel adaptation to heterogeneous environments. Moreover, the majority of genes showing parallel expression differentiation showed the same direction of differential expression in the two species and the magnitudes of expression differences between high and low latitude populations were correlated across species, further bolstering the conclusion that parallelism for expression phenotypes results from spatially varying selection. However, the species also exhibited important differences in expression phenotypes. For example, the genomic extent of genotype × environment interaction was much more common in D. melanogaster. Highly differentiated SNPs between low and high latitudes were enriched in the 3' UTRs and CDS of the geographically differently expressed genes in both species, consistent with an important role for cis-acting variants in driving local adaptation for expression-related phenotypes.
[Real time 3D echocardiography
Bauer, F.; Shiota, T.; Thomas, J. D.
2001-01-01
Three-dimensional representation of the heart is an old concern. Usually, 3D reconstruction of the cardiac mass is made by successive acquisition of 2D sections, the spatial localisation and orientation of which require complex guiding systems. More recently, the concept of volumetric acquisition has been introduced. A matricial emitter-receiver probe complex with parallel data processing provides instantaneous of a pyramidal 64 degrees x 64 degrees volume. The image is restituted in real time and is composed of 3 planes (planes B and C) which can be displaced in all spatial directions at any time during acquisition. The flexibility of this system of acquisition allows volume and mass measurement with greater accuracy and reproducibility, limiting inter-observer variability. Free navigation of the planes of investigation allows reconstruction for qualitative and quantitative analysis of valvular heart disease and other pathologies. Although real time 3D echocardiography is ready for clinical usage, some improvements are still necessary to improve its conviviality. Then real time 3D echocardiography could be the essential tool for understanding, diagnosis and management of patients.
Application of parallel connected power-MOSFET elements to high current d.c. power supply
International Nuclear Information System (INIS)
Matsukawa, Tatsuya; Shioyama, Masanori; Shimada, Katsuhiro; Takaku, Taku; Neumeyer, Charles; Tsuji-Iio, Shunji; Shimada, Ryuichi
2001-01-01
The low aspect ratio spherical torus (ST), which has single turn toroidal field coil, requires the extremely high d.c. current like as 20 MA to energize the coil. Considering the ratings of such extremely high current and low voltage, power-MOSFET element is employed as the switching device for the a.c./d.c. converter of power supply. One of the advantages of power-MOSFET element is low on-state resistance, which is to meet the high current and low voltage operation. Recently, the capacity of power-MOSFET element has been increased and its on-state resistance has been decreased, so that the possibility of construction of high current and low voltage a.c./d.c. converter with parallel connected power-MOSFET elements has been growing. With the aim of developing the high current d.c. power supply using power-MOSFET, the basic characteristics of parallel operation with power-MOSFET elements are experimentally investigated. And, the synchronous rectifier type and the bi-directional self commutated type a.c./d.c. converters using parallel connected power-MOSFET elements are proposed
Araujo, Luiz H.; Timmers, Cynthia; Bell, Erica Hlavin; Shilo, Konstantin; Lammers, Philip E.; Zhao, Weiqiang; Natarajan, Thanemozhi G.; Miller, Clinton J.; Zhang, Jianying; Yilmaz, Ayse S.; Liu, Tom; Coombes, Kevin; Amann, Joseph; Carbone, David P.
2015-01-01
Purpose Technologic advances have enabled the comprehensive analysis of genetic perturbations in non–small-cell lung cancer (NSCLC); however, African Americans have often been underrepresented in these studies. This ethnic group has higher lung cancer incidence and mortality rates, and some studies have suggested a lower incidence of epidermal growth factor receptor mutations. Herein, we report the most in-depth molecular profile of NSCLC in African Americans to date. Methods A custom panel was designed to cover the coding regions of 81 NSCLC-related genes and 40 ancestry-informative markers. Clinical samples were sequenced on a massively parallel sequencing instrument, and anaplastic lymphoma kinase translocation was evaluated by fluorescent in situ hybridization. Results The study cohort included 99 patients (61% males, 94% smokers) comprising 31 squamous and 68 nonsquamous cell carcinomas. We detected 227 nonsilent variants in the coding sequence, including 24 samples with nonoverlapping, classic driver alterations. The frequency of driver mutations was not significantly different from that of whites, and no association was found between genetic ancestry and the presence of somatic mutations. Copy number alteration analysis disclosed distinguishable amplifications in the 3q chromosome arm in squamous cell carcinomas and pointed toward a handful of targetable alterations. We also found frequent SMARCA4 mutations and protein loss, mostly in driver-negative tumors. Conclusion Our data suggest that African American ancestry may not be significantly different from European/white background for the presence of somatic driver mutations in NSCLC. Furthermore, we demonstrated that using a comprehensive genotyping approach could identify numerous targetable alterations, with potential impact on therapeutic decisions. PMID:25918285
dRGT theory of massive gravity from spontaneous symmetry breaking
Torabian, Mahdi
2018-05-01
In this note we propose a topological action for a Poincare times diffeomorphism invariant gauge theory. We show that there is Higgs phase where the gauge symmetry is spontaneous broken to a diagonal Lorentz subgroup and gives the Einstein-Hilbert action plus the dRGT potential terms. In this vacuum, there are five (three from Goldstone modes) propagating degrees of freedom which form polarizations of a massive spin 2 particle, an extra healthy heavy scalar (Higgs) mode and no Boulware-Deser ghost mode. We further show that the action can be derived in a limit from a topological de Sitter invariant gauge theory in 4 dimensions.
Hybrid parallel computing architecture for multiview phase shifting
Zhong, Kai; Li, Zhongwei; Zhou, Xiaohui; Shi, Yusheng; Wang, Congjun
2014-11-01
The multiview phase-shifting method shows its powerful capability in achieving high resolution three-dimensional (3-D) shape measurement. Unfortunately, this ability results in very high computation costs and 3-D computations have to be processed offline. To realize real-time 3-D shape measurement, a hybrid parallel computing architecture is proposed for multiview phase shifting. In this architecture, the central processing unit can co-operate with the graphic processing unit (GPU) to achieve hybrid parallel computing. The high computation cost procedures, including lens distortion rectification, phase computation, correspondence, and 3-D reconstruction, are implemented in GPU, and a three-layer kernel function model is designed to simultaneously realize coarse-grained and fine-grained paralleling computing. Experimental results verify that the developed system can perform 50 fps (frame per second) real-time 3-D measurement with 260 K 3-D points per frame. A speedup of up to 180 times is obtained for the performance of the proposed technique using a NVIDIA GT560Ti graphics card rather than a sequential C in a 3.4 GHZ Inter Core i7 3770.
Mapping 3D genome architecture through in situ DNase Hi-C.
Ramani, Vijay; Cusanovich, Darren A; Hause, Ronald J; Ma, Wenxiu; Qiu, Ruolan; Deng, Xinxian; Blau, C Anthony; Disteche, Christine M; Noble, William S; Shendure, Jay; Duan, Zhijun
2016-11-01
With the advent of massively parallel sequencing, considerable work has gone into adapting chromosome conformation capture (3C) techniques to study chromosomal architecture at a genome-wide scale. We recently demonstrated that the inactive murine X chromosome adopts a bipartite structure using a novel 3C protocol, termed in situ DNase Hi-C. Like traditional Hi-C protocols, in situ DNase Hi-C requires that chromatin be chemically cross-linked, digested, end-repaired, and proximity-ligated with a biotinylated bridge adaptor. The resulting ligation products are optionally sheared, affinity-purified via streptavidin bead immobilization, and subjected to traditional next-generation library preparation for Illumina paired-end sequencing. Importantly, in situ DNase Hi-C obviates the dependence on a restriction enzyme to digest chromatin, instead relying on the endonuclease DNase I. Libraries generated by in situ DNase Hi-C have a higher effective resolution than traditional Hi-C libraries, which makes them valuable in cases in which high sequencing depth is allowed for, or when hybrid capture technologies are expected to be used. The protocol described here, which involves ∼4 d of bench work, is optimized for the study of mammalian cells, but it can be broadly applicable to any cell or tissue of interest, given experimental parameter optimization.
Lemon : An MPI parallel I/O library for data encapsulation using LIME
Deuzeman, Albert; Reker, Siebren; Urbach, Carsten
We introduce Lemon, an MPI parallel I/O library that provides efficient parallel I/O of both binary and metadata on massively parallel architectures. Motivated by the demands of the lattice Quantum Chromodynamics community, the data is stored in the SciDAC Lattice QCD Interchange Message
Cellular automata a parallel model
Mazoyer, J
1999-01-01
Cellular automata can be viewed both as computational models and modelling systems of real processes. This volume emphasises the first aspect. In articles written by leading researchers, sophisticated massive parallel algorithms (firing squad, life, Fischer's primes recognition) are treated. Their computational power and the specific complexity classes they determine are surveyed, while some recent results in relation to chaos from a new dynamic systems point of view are also presented. Audience: This book will be of interest to specialists of theoretical computer science and the parallelism challenge.
A class of black holes in dRGT massive gravity and their thermodynamical properties
Energy Technology Data Exchange (ETDEWEB)
Ghosh, Suchant G. [Jamia Millia Islamia, Centre of Theoretical Physics, New Delhi (India); University of Kwazulu-Natal, Astrophysics and Cosmology Research Unit, School of Mathematical Sciences, Private Bag 54001, Durban (South Africa); Tannukij, Lunchakorn [Mahidol University, Department of Physics, Faculty of Science, Bangkok (Thailand); Wongjun, Pitayuth [Naresuan University, The Institute for Fundamental Study, Phitsanulok (Thailand); Ministry of Education, Thailand Center of Excellence in Physics, Bangkok (Thailand)
2016-03-15
We present an exact spherical black hole solution in de Rham, Gabadadze, and Tolley (dRGT) massive gravity for a generic choice of the parameters in the theory, and also discuss the thermodynamical and phase structure of the black hole in both the grand canonical and the canonical ensembles (for the charged case). It turns out that the dRGT black hole solution includes other known solutions to the Einstein field equations, such as the monopole-de Sitter-Schwarzschild solution with the coefficients of the third and fourth terms in the potential and the graviton mass in massive gravity naturally generates the cosmological constant and the global monopole term. Furthermore, we compute the mass, temperature and entropy of the dRGT black hole, and also perform thermodynamical stability analysis. It turns out that the presence of the graviton mass completely changes the black hole thermodynamics, and it can provide the Hawking-Page phase transition which also occurs for the charged black holes. Interestingly, the entropy of a black hole is barely affected and still obeys the standard area law. In particular, our results, in the limit m{sub g} → 0, reduced exactly to the results of general relativity. (orig.)
Dual descriptions of spin-two massive particles in D=2+1 via master actions
Dalmazi, D.; Mendonça, Elias L.
2009-02-01
In the first part of this work we show the decoupling (up to contact terms) of redundant degrees of freedom which appear in the covariant description of spin-two massive particles in D=2+1. We make use of a master action which interpolates, without solving any constraints, between a first-, second-, and third-order (in derivatives) self-dual model. An explicit dual map between those models is derived. In our approach the absence of ghosts in the third-order self-dual model, which corresponds to a quadratic truncation of topologically massive gravity, is due to the triviality (no particle content) of the Einstein-Hilbert action in D=2+1. In the second part of the work, also in D=2+1, we prove the quantum equivalence of the gauge invariant sector of a couple of self-dual models of opposite helicities (+2 and -2) and masses m+ and m- to a generalized self-dual model which contains a quadratic Einstein-Hilbert action, a Chern-Simons term of first order, and a Fierz-Pauli mass term. The use of a first-order Chern-Simons term instead of a third-order one avoids conflicts with the sign of the Einstein-Hilbert action.
Dual descriptions of spin-two massive particles in D=2+1 via master actions
International Nuclear Information System (INIS)
Dalmazi, D.; Mendonca, Elias L.
2009-01-01
In the first part of this work we show the decoupling (up to contact terms) of redundant degrees of freedom which appear in the covariant description of spin-two massive particles in D=2+1. We make use of a master action which interpolates, without solving any constraints, between a first-, second-, and third-order (in derivatives) self-dual model. An explicit dual map between those models is derived. In our approach the absence of ghosts in the third-order self-dual model, which corresponds to a quadratic truncation of topologically massive gravity, is due to the triviality (no particle content) of the Einstein-Hilbert action in D=2+1. In the second part of the work, also in D=2+1, we prove the quantum equivalence of the gauge invariant sector of a couple of self-dual models of opposite helicities (+2 and -2) and masses m + and m - to a generalized self-dual model which contains a quadratic Einstein-Hilbert action, a Chern-Simons term of first order, and a Fierz-Pauli mass term. The use of a first-order Chern-Simons term instead of a third-order one avoids conflicts with the sign of the Einstein-Hilbert action.
The Parallel System for Integrating Impact Models and Sectors (pSIMS)
Elliott, Joshua; Kelly, David; Chryssanthacopoulos, James; Glotter, Michael; Jhunjhnuwala, Kanika; Best, Neil; Wilde, Michael; Foster, Ian
2014-01-01
We present a framework for massively parallel climate impact simulations: the parallel System for Integrating Impact Models and Sectors (pSIMS). This framework comprises a) tools for ingesting and converting large amounts of data to a versatile datatype based on a common geospatial grid; b) tools for translating this datatype into custom formats for site-based models; c) a scalable parallel framework for performing large ensemble simulations, using any one of a number of different impacts models, on clusters, supercomputers, distributed grids, or clouds; d) tools and data standards for reformatting outputs to common datatypes for analysis and visualization; and e) methodologies for aggregating these datatypes to arbitrary spatial scales such as administrative and environmental demarcations. By automating many time-consuming and error-prone aspects of large-scale climate impacts studies, pSIMS accelerates computational research, encourages model intercomparison, and enhances reproducibility of simulation results. We present the pSIMS design and use example assessments to demonstrate its multi-model, multi-scale, and multi-sector versatility.
Parallel imaging with phase scrambling.
Zaitsev, Maxim; Schultz, Gerrit; Hennig, Juergen; Gruetter, Rolf; Gallichan, Daniel
2015-04-01
Most existing methods for accelerated parallel imaging in MRI require additional data, which are used to derive information about the sensitivity profile of each radiofrequency (RF) channel. In this work, a method is presented to avoid the acquisition of separate coil calibration data for accelerated Cartesian trajectories. Quadratic phase is imparted to the image to spread the signals in k-space (aka phase scrambling). By rewriting the Fourier transform as a convolution operation, a window can be introduced to the convolved chirp function, allowing a low-resolution image to be reconstructed from phase-scrambled data without prominent aliasing. This image (for each RF channel) can be used to derive coil sensitivities to drive existing parallel imaging techniques. As a proof of concept, the quadratic phase was applied by introducing an offset to the x(2) - y(2) shim and the data were reconstructed using adapted versions of the image space-based sensitivity encoding and GeneRalized Autocalibrating Partially Parallel Acquisitions algorithms. The method is demonstrated in a phantom (1 × 2, 1 × 3, and 2 × 2 acceleration) and in vivo (2 × 2 acceleration) using a 3D gradient echo acquisition. Phase scrambling can be used to perform parallel imaging acceleration without acquisition of separate coil calibration data, demonstrated here for a 3D-Cartesian trajectory. Further research is required to prove the applicability to other 2D and 3D sampling schemes. © 2014 Wiley Periodicals, Inc.
Scudder, Nathan; McNevin, Dennis; Kelty, Sally F; Walsh, Simon J; Robertson, James
2018-03-01
Use of DNA in forensic science will be significantly influenced by new technology in coming years. Massively parallel sequencing and forensic genomics will hasten the broadening of forensic DNA analysis beyond short tandem repeats for identity towards a wider array of genetic markers, in applications as diverse as predictive phenotyping, ancestry assignment, and full mitochondrial genome analysis. With these new applications come a range of legal and policy implications, as forensic science touches on areas as diverse as 'big data', privacy and protected health information. Although these applications have the potential to make a more immediate and decisive forensic intelligence contribution to criminal investigations, they raise policy issues that will require detailed consideration if this potential is to be realised. The purpose of this paper is to identify the scope of the issues that will confront forensic and user communities. Copyright © 2017 The Chartered Society of Forensic Sciences. All rights reserved.
Sung, Chul
2013-08-01
Accurate estimation of neuronal count and distribution is central to the understanding of the organization and layout of cortical maps in the brain, and changes in the cell population induced by brain disorders. High-throughput 3D microscopy techniques such as Knife-Edge Scanning Microscopy (KESM) are enabling whole-brain survey of neuronal distributions. Data from such techniques pose serious challenges to quantitative analysis due to the massive, growing, and sparsely labeled nature of the data. In this paper, we present a scalable, incremental learning algorithm for cell body detection that can address these issues. Our algorithm is computationally efficient (linear mapping, non-iterative) and does not require retraining (unlike gradient-based approaches) or retention of old raw data (unlike instance-based learning). We tested our algorithm on our rat brain Nissl data set, showing superior performance compared to an artificial neural network-based benchmark, and also demonstrated robust performance in a scenario where the data set is rapidly growing in size. Our algorithm is also highly parallelizable due to its incremental nature, and we demonstrated this empirically using a MapReduce-based implementation of the algorithm. We expect our scalable, incremental learning approach to be widely applicable to medical imaging domains where there is a constant flux of new data. © 2013 IEEE.
Statistical 3D damage accumulation model for ion implant simulators
Hernandez-Mangas, J M; Enriquez, L E; Bailon, L; Barbolla, J; Jaraiz, M
2003-01-01
A statistical 3D damage accumulation model, based on the modified Kinchin-Pease formula, for ion implant simulation has been included in our physically based ion implantation code. It has only one fitting parameter for electronic stopping and uses 3D electron density distributions for different types of targets including compound semiconductors. Also, a statistical noise reduction mechanism based on the dose division is used. The model has been adapted to be run under parallel execution in order to speed up the calculation in 3D structures. Sequential ion implantation has been modelled including previous damage profiles. It can also simulate the implantation of molecular and cluster projectiles. Comparisons of simulated doping profiles with experimental SIMS profiles are presented. Also comparisons between simulated amorphization and experimental RBS profiles are shown. An analysis of sequential versus parallel processing is provided.
Statistical 3D damage accumulation model for ion implant simulators
International Nuclear Information System (INIS)
Hernandez-Mangas, J.M.; Lazaro, J.; Enriquez, L.; Bailon, L.; Barbolla, J.; Jaraiz, M.
2003-01-01
A statistical 3D damage accumulation model, based on the modified Kinchin-Pease formula, for ion implant simulation has been included in our physically based ion implantation code. It has only one fitting parameter for electronic stopping and uses 3D electron density distributions for different types of targets including compound semiconductors. Also, a statistical noise reduction mechanism based on the dose division is used. The model has been adapted to be run under parallel execution in order to speed up the calculation in 3D structures. Sequential ion implantation has been modelled including previous damage profiles. It can also simulate the implantation of molecular and cluster projectiles. Comparisons of simulated doping profiles with experimental SIMS profiles are presented. Also comparisons between simulated amorphization and experimental RBS profiles are shown. An analysis of sequential versus parallel processing is provided
Patient specific 3D printed phantom for IMRT quality assurance
International Nuclear Information System (INIS)
Ehler, Eric D; Higgins, Patrick D; Dusenbery, Kathryn E; Barney, Brett M
2014-01-01
The purpose of this study was to test the feasibility of a patient specific phantom for patient specific dosimetric verification. Using the head and neck region of an anthropomorphic phantom as a substitute for an actual patient, a soft-tissue equivalent model was constructed with the use of a 3D printer. Calculated and measured dose in the anthropomorphic phantom and the 3D printed phantom was compared for a parallel-opposed head and neck field geometry to establish tissue equivalence. A nine-field IMRT plan was constructed and dose verification measurements were performed for the 3D printed phantom as well as traditional standard phantoms. The maximum difference in calculated dose was 1.8% for the parallel-opposed configuration. Passing rates of various dosimetric parameters were compared for the IMRT plan measurements; the 3D printed phantom results showed greater disagreement at superficial depths than other methods. A custom phantom was created using a 3D printer. It was determined that the use of patient specific phantoms to perform dosimetric verification and estimate the dose in the patient is feasible. In addition, end-to-end testing on a per-patient basis was possible with the 3D printed phantom. Further refinement of the phantom construction process is needed for routine use. (paper)
Chen, R.; Chen, S.; He, L.; Yao, H.; Li, H.; Xi, X.; Zhao, X.
2017-12-01
EM method plays a key role in volcanic massive sulfide (VMS) deposit which is with high grade and high economic value. However, the performance of high density 3D AMT in detecting deep concealed VMS targets is not clear. The size of a typical VMS target is less than 100 m x 100 m x 50 m, it's a challenge task to find it with large depth. We carried a test in a VMS Pb-Zn deposit using high density 3D AMT with site spacing as 20 m and profile spacing as 40 - 80 m. About 2000 AMT sites were acquired in an area as 2000 m x 1500 m. Then we used a sever with 8 CPUs (Intel Xeon E7-8880 v3, 2.3 GHz, 144 cores), 2048 GB RAM, and 40 TB disk array to invert above 3D AMT sites using integral equation forward modeling and re-weighted conjugated-gradient inversion. The depth of VMS ore body is about 600 m and the size of the ore body is about 100 x 100 x 20m with dip angle about 45 degree. We finds that it's very hard to recover the location and shape of the ore body by 3D AMT inversion even using the data of all AMT sites and frequencies. However, it's possible to recover the location and shape of the deep concealed ore body if we adjust the inversion parameters carefully. A new set of inversion parameter needs to be find for high density 3D AMT data set and the inversion parameters working good for Dublin Secret Model II (DSM 2) is not suitable for our real data. This problem may be caused by different data density and different number of frequency. We find a set of good inversion parameter by comparing the shape and location of ore body with inversion result and trying different inversion parameters. And the application of new inversion parameter in nearby area with high density AMT sites shows that the inversion result is improved greatly.
Directory of Open Access Journals (Sweden)
Kim De Leeneer
Full Text Available Despite improvements in terms of sequence quality and price per basepair, Sanger sequencing remains restricted to screening of individual disease genes. The development of massively parallel sequencing (MPS technologies heralded an era in which molecular diagnostics for multigenic disorders becomes reality. Here, we outline different PCR amplification based strategies for the screening of a multitude of genes in a patient cohort. We performed a thorough evaluation in terms of set-up, coverage and sequencing variants on the data of 10 GS-FLX experiments (over 200 patients. Crucially, we determined the actual coverage that is required for reliable diagnostic results using MPS, and provide a tool to calculate the number of patients that can be screened in a single run. Finally, we provide an overview of factors contributing to false negative or false positive mutation calls and suggest ways to maximize sensitivity and specificity, both important in a routine setting. By describing practical strategies for screening of multigenic disorders in a multitude of samples and providing answers to questions about minimum required coverage, the number of patients that can be screened in a single run and the factors that may affect sensitivity and specificity we hope to facilitate the implementation of MPS technology in molecular diagnostics.
0-d modeling of fast radiative shutdown of Tokamak discharges following massive gas injection
International Nuclear Information System (INIS)
Hollmann, E.M.; Parks, P.B.; Scott, H.A.
2008-01-01
0-D modeling of fast radiative shutdowns of tokamak discharges following massive gas injection is presented. Realistic neutral deposition rates are used together with a 1-D diffusive model to estimate impurity deposition into the plasma. Non-coronal radiation rates including opacity are used, as are induced wall currents, wall impurity radiation, and neutral and neoclassical corrections to plasma resistivity. The 0-D modeling is found to reproduce the shutdown timescale and free electron density rise seen in DIII-D argon injection experiments well. Opacity, wall currents, and wall impurities can all have a significant (>10%) impact on simulated timescales. (copyright 2008 WILEY-VCH Verlag GmbH and Co. KGaA, Weinheim) (orig.)
The dynamics of massive starless cores with ALMA
Energy Technology Data Exchange (ETDEWEB)
Tan, Jonathan C. [Departments of Astronomy and Physics, University of Florida, Gainesville, FL 32611 (United States); Kong, Shuo; Butler, Michael J. [Department of Astronomy, University of Florida, Gainesville, FL 32611 (United States); Caselli, Paola [School of Physics and Astronomy, The University of Leeds, Leeds LS2 9JT (United Kingdom); Fontani, Francesco [INAF-Osservatorio Astrofisico di Arcetri, Largo Enrico Fermi 5, I-50125 Firenze (Italy)
2013-12-20
How do stars that are more massive than the Sun form, and thus how is the stellar initial mass function (IMF) established? Such intermediate- and high-mass stars may be born from relatively massive pre-stellar gas cores, which are more massive than the thermal Jeans mass. The turbulent core accretion model invokes such cores as being in approximate virial equilibrium and in approximate pressure equilibrium with their surrounding clump medium. Their internal pressure is provided by a combination of turbulence and magnetic fields. Alternatively, the competitive accretion model requires strongly sub-virial initial conditions that then lead to extensive fragmentation to the thermal Jeans scale, with intermediate- and high-mass stars later forming by competitive Bondi-Hoyle accretion. To test these models, we have identified four prime examples of massive (∼100 M {sub ☉}) clumps from mid-infrared extinction mapping of infrared dark clouds. Fontani et al. found high deuteration fractions of N{sub 2}H{sup +} in these objects, which are consistent with them being starless. Here we present ALMA observations of these four clumps that probe the N{sub 2}D{sup +} (3-2) line at 2.''3 resolution. We find six N{sub 2}D{sup +} cores and determine their dynamical state. Their observed velocity dispersions and sizes are broadly consistent with the predictions of the turbulent core model of self-gravitating, magnetized (with Alfvén Mach number m{sub A} ∼ 1) and virialized cores that are bounded by the high pressures of their surrounding clumps. However, in the most massive cores, with masses up to ∼60 M {sub ☉}, our results suggest that moderately enhanced magnetic fields (so that m{sub A} ≅ 0.3) may be needed for the structures to be in virial and pressure equilibrium. Magnetically regulated core formation may thus be important in controlling the formation of massive cores, inhibiting their fragmentation, and thus helping to establish the stellar IMF.
Directory of Open Access Journals (Sweden)
Kim Vancampenhout
Full Text Available The advent of massive parallel sequencing (MPS has revolutionized the field of human molecular genetics, including the diagnostic study of mitochondrial (mt DNA dysfunction. The analysis of the complete mitochondrial genome using MPS platforms is now common and will soon outrun conventional sequencing. However, the development of a robust and reliable protocol is rather challenging. A previous pilot study for the re-sequencing of human mtDNA revealed an uneven coverage, affecting predominantly part of the plus strand. In an attempt to address this problem, we undertook a comparative study of standard and modified protocols for the Ion Torrent PGM system. We could not improve strand representation by altering the recommended shearing methodology of the standard workflow or omitting the DNA polymerase amplification step from the library construction process. However, we were able to associate coverage bias of the plus strand with a specific sequence motif. Additionally, we compared coverage and variant calling across technologies. The same samples were also sequenced on a MiSeq device which showed that coverage and heteroplasmic variant calling were much improved.
Fernández-Caballero Rico, Jose Ángel; Chueca Porcuna, Natalia; Álvarez Estévez, Marta; Mosquera Gutiérrez, María Del Mar; Marcos Maeso, María Ángeles; García, Federico
2018-02-01
To show how to generate a consensus sequence from the information of massive parallel sequences data obtained from routine HIV anti-retroviral resistance studies, and that may be suitable for molecular epidemiology studies. Paired Sanger (Trugene-Siemens) and next-generation sequencing (NGS) (454 GSJunior-Roche) HIV RT and protease sequences from 62 patients were studied. NGS consensus sequences were generated using Mesquite, using 10%, 15%, and 20% thresholds. Molecular evolutionary genetics analysis (MEGA) was used for phylogenetic studies. At a 10% threshold, NGS-Sanger sequences from 17/62 patients were phylogenetically related, with a median bootstrap-value of 88% (IQR83.5-95.5). Association increased to 36/62 sequences, median bootstrap 94% (IQR85.5-98)], using a 15% threshold. Maximum association was at the 20% threshold, with 61/62 sequences associated, and a median bootstrap value of 99% (IQR98-100). A safe method is presented to generate consensus sequences from HIV-NGS data at 20% threshold, which will prove useful for molecular epidemiological studies. Copyright © 2016 Elsevier España, S.L.U. and Sociedad Española de Enfermedades Infecciosas y Microbiología Clínica. All rights reserved.
3D Printing of Organs-On-Chips
Yi, Hee-Gyeong; Lee, Hyungseok; Cho, Dong-Woo
2017-01-01
Organ-on-a-chip engineering aims to create artificial living organs that mimic the complex and physiological responses of real organs, in order to test drugs by precisely manipulating the cells and their microenvironments. To achieve this, the artificial organs should to be microfabricated with an extracellular matrix (ECM) and various types of cells, and should recapitulate morphogenesis, cell differentiation, and functions according to the native organ. A promising strategy is 3D printing, which precisely controls the spatial distribution and layer-by-layer assembly of cells, ECMs, and other biomaterials. Owing to this unique advantage, integration of 3D printing into organ-on-a-chip engineering can facilitate the creation of micro-organs with heterogeneity, a desired 3D cellular arrangement, tissue-specific functions, or even cyclic movement within a microfluidic device. Moreover, fully 3D-printed organs-on-chips more easily incorporate other mechanical and electrical components with the chips, and can be commercialized via automated massive production. Herein, we discuss the recent advances and the potential of 3D cell-printing technology in engineering organs-on-chips, and provides the future perspectives of this technology to establish the highly reliable and useful drug-screening platforms. PMID:28952489
3D Printing of Organs-On-Chips
Directory of Open Access Journals (Sweden)
Hee-Gyeong Yi
2017-01-01
Full Text Available Organ-on-a-chip engineering aims to create artificial living organs that mimic the complex and physiological responses of real organs, in order to test drugs by precisely manipulating the cells and their microenvironments. To achieve this, the artificial organs should to be microfabricated with an extracellular matrix (ECM and various types of cells, and should recapitulate morphogenesis, cell differentiation, and functions according to the native organ. A promising strategy is 3D printing, which precisely controls the spatial distribution and layer-by-layer assembly of cells, ECMs, and other biomaterials. Owing to this unique advantage, integration of 3D printing into organ-on-a-chip engineering can facilitate the creation of micro-organs with heterogeneity, a desired 3D cellular arrangement, tissue-specific functions, or even cyclic movement within a microfluidic device. Moreover, fully 3D-printed organs-on-chips more easily incorporate other mechanical and electrical components with the chips, and can be commercialized via automated massive production. Herein, we discuss the recent advances and the potential of 3D cell-printing technology in engineering organs-on-chips, and provides the future perspectives of this technology to establish the highly reliable and useful drug-screening platforms.
Synthesis and Evaluation of A High Precision 3D-Printed Ti6Al4V Compliant Parallel Manipulator
Pham, Minh Tuan; Teo, Tat Joo; Huat Yeo, Song; Wang, Pan; Nai, Mui Ling Sharon
2017-12-01
A novel 3D printed compliant parallel manipulator (CPM) with θX - θX - Z motions is presented in this paper. This CPM is synthesized using the beam-based method, a new structural optimization approach, to achieve optimized stiffness properties with targeted dynamic behavior. The CPM performs high non-actuating stiffness based on the predicted stiffness ratios of about 3600 for translations and 570 for rotations, while the dynamic response is fast with the targeted first resonant mode of 100Hz. A prototype of the synthesized CPM is fabricated using the electron beam melting (EBM) technology with Ti6Al4V material. Driven by three voice-coil (VC) motors, the CPM demonstrated a positioning resolution of 50nm along the Z axis and an angular resolution of ~0.3 “about the X and Y axes, the positioning accuracy is also good with the measured values of ±25.2nm and ±0.17” for the translation and rotations respectively. Experimental investigation also shows that this large workspace CPM has a first resonant mode of 98Hz and the stiffness behavior matches the prediction with the highest deviation of 11.2%. Most importantly, the full workspace of 10° × 10° × 7mm of the proposed CPM can be achieved, that demonstrates 3D printed compliant mechanisms can perform large elastic deformation. The obtained results show that CPMs printed by EBM technology have predictable mechanical characteristics and are applicable in precise positioning systems.
Radiation asymmetries during disruptions on DIII-D caused by massive gas injection
International Nuclear Information System (INIS)
Commaux, N.; Baylor, L. R.; Jernigan, T. C.; Foust, C. R.; Combs, S.; Meitner, S. J.; Hollmann, E. M.; Izzo, V. A.; Moyer, R. A.; Humphreys, D. A.; Wesley, J. C.; Eidietis, N. W.; Parks, P. B.; Lasnier, C. J.
2014-01-01
One of the major challenges that the ITER tokamak will have to face during its operations are disruptions. During the last few years, it has been proven that the global consequences of a disruption can be mitigated by the injection of large quantities of impurities. But one aspect that has been difficult to study was the possibility of local effects inside the torus during such injection that could damage a portion of the device despite the global heat losses and generated currents remaining below design parameter. 3D MHD simulations show that there is a potential for large toroidal asymmetries of the radiated power during impurity injection due to the interaction between the particle injection plume and a large n = 1 mode. Another aspect of 3D effects is the potential occurrence of Vertical Displacement Events (VDE), which could induce large poloidal heat load asymmetries. This potential deleterious effect of 3D phenomena has been studied on the DIII-D tokamak, thanks to the implementation of a multi-location massive gas injection (MGI) system as well as new diagnostic capabilities. This study showed the existence of a correlation between the location of the n = 1 mode and the local heat load on the plasma facing components but shows also that this effect is much smaller than anticipated (peaking factor of ∼1.1 vs 3-4 according to the simulations). There seems to be no observable heat load on the first wall of DIII-D at the location of the impurity injection port as well as no significant radiation asymmetries whether one or 2 valves are fired. This study enabled the first attempt of mitigation of a VDE using impurity injection at different poloidal locations. The results showed a more favorable heat deposition when the VDE is mitigated early (right at the onset) by impurity injection. No significant improvement of the heat load mitigation efficiency has been observed for late particle injection whether the injection is done “in the way” of the VDE
Radiation asymmetries during disruptions on DIII-D caused by massive gas injectiona)
Commaux, N.; Baylor, L. R.; Jernigan, T. C.; Hollmann, E. M.; Humphreys, D. A.; Wesley, J. C.; Izzo, V. A.; Eidietis, N. W.; Lasnier, C. J.; Moyer, R. A.; Parks, P. B.; Foust, C. R.; Combs, S.; Meitner, S. J.
2014-10-01
One of the major challenges that the ITER tokamak will have to face during its operations are disruptions. During the last few years, it has been proven that the global consequences of a disruption can be mitigated by the injection of large quantities of impurities. But one aspect that has been difficult to study was the possibility of local effects inside the torus during such injection that could damage a portion of the device despite the global heat losses and generated currents remaining below design parameter. 3D MHD simulations show that there is a potential for large toroidal asymmetries of the radiated power during impurity injection due to the interaction between the particle injection plume and a large n = 1 mode. Another aspect of 3D effects is the potential occurrence of Vertical Displacement Events (VDE), which could induce large poloidal heat load asymmetries. This potential deleterious effect of 3D phenomena has been studied on the DIII-D tokamak, thanks to the implementation of a multi-location massive gas injection (MGI) system as well as new diagnostic capabilities. This study showed the existence of a correlation between the location of the n = 1 mode and the local heat load on the plasma facing components but shows also that this effect is much smaller than anticipated (peaking factor of ˜1.1 vs 3-4 according to the simulations). There seems to be no observable heat load on the first wall of DIII-D at the location of the impurity injection port as well as no significant radiation asymmetries whether one or 2 valves are fired. This study enabled the first attempt of mitigation of a VDE using impurity injection at different poloidal locations. The results showed a more favorable heat deposition when the VDE is mitigated early (right at the onset) by impurity injection. No significant improvement of the heat load mitigation efficiency has been observed for late particle injection whether the injection is done "in the way" of the VDE (upward VDE
Tirapu Azpiroz, Jaione; Burr, Geoffrey W.; Rosenbluth, Alan E.; Hibbs, Michael
2008-03-01
In the Hyper-NA immersion lithography regime, the electromagnetic response of the reticle is known to deviate in a complicated manner from the idealized Thin-Mask-like behavior. Already, this is driving certain RET choices, such as the use of polarized illumination and the customization of reticle film stacks. Unfortunately, full 3-D electromagnetic mask simulations are computationally intensive. And while OPC-compatible mask electromagnetic field (EMF) models can offer a reasonable tradeoff between speed and accuracy for full-chip OPC applications, full understanding of these complex physical effects demands higher accuracy. Our paper describes recent advances in leveraging High Performance Computing as a critical step towards lithographic modeling of the full manufacturing process. In this paper, highly accurate full 3-D electromagnetic simulation of very large mask layouts are conducted in parallel with reasonable turnaround time, using a Blue- Gene/L supercomputer and a Finite-Difference Time-Domain (FDTD) code developed internally within IBM. A 3-D simulation of a large 2-D layout spanning 5μm×5μm at the wafer plane (and thus (20μm×20μm×0.5μm at the mask) results in a simulation with roughly 12.5GB of memory (grid size of 10nm at the mask, single-precision computation, about 30 bytes/grid point). FDTD is flexible and easily parallelizable to enable full simulations of such large layout in approximately an hour using one BlueGene/L "midplane" containing 512 dual-processor nodes with 256MB of memory per processor. Our scaling studies on BlueGene/L demonstrate that simulations up to 100μm × 100μm at the mask can be computed in a few hours. Finally, we will show that the use of a subcell technique permits accurate simulation of features smaller than the grid discretization, thus improving on the tradeoff between computational complexity and simulation accuracy. We demonstrate the correlation of the real and quadrature components that comprise the
Anomaly effects of arrays for 3d geoelectrical resistivity imaging ...
African Journals Online (AJOL)
user
The effectiveness of using a net of orthogonal or parallel sets of two-dimensional (2D) profiles for three- dimensional (3D) geoelectrical resistivity imaging has been evaluated. A series of 2D apparent resistivity data were generated over two synthetic models which represent geological or environmental conditions for a ...
Effects of a new 3-alpha reaction on the s-process in massive stars
International Nuclear Information System (INIS)
Kikuch, Yukihiro; Ono, Masaomi; Matsuo, Yasuhide; Hashimoto, Masa-aki; Fujimoto, Shin-ichiro
2012-01-01
Effect of a new 3-alpha reaction rate on the s-process during the evolution of a massive star of 25 solar mass is investigated for the first time, because the s-process in massive stars have been believed to be established with only minor change. We find that the s-process with use of the new rate during the core helium burning is very inefficient compared to the case with the previous 3-alpha rate. However, the difference of the overproduction is found to be largely compensated by the subsequent carbon burning. Since the s-process in massive stars has been attributed so far to the neutron irradiation during core helium burning, our finding reveals for the first time the importance of the carbon burning for the s-process during the evolution of massive stars.
International Nuclear Information System (INIS)
Li Hanyu; Zhou Haijing; Dong Zhiwei; Liao Cheng; Chang Lei; Cao Xiaolin; Xiao Li
2010-01-01
A large-scale parallel electromagnetic field simulation program JEMS-FDTD(J Electromagnetic Solver-Finite Difference Time Domain) is designed and implemented on JASMIN (J parallel Adaptive Structured Mesh applications INfrastructure). This program can simulate propagation, radiation, couple of electromagnetic field by solving Maxwell equations on structured mesh explicitly with FDTD method. JEMS-FDTD is able to simulate billion-mesh-scale problems on thousands of processors. In this article, the program is verified by simulating the radiation of an electric dipole. A beam waveguide is simulated to demonstrate the capability of large scale parallel computation. A parallel performance test indicates that a high parallel efficiency is obtained. (authors)
Dal Palù, Alessandro; Pontelli, Enrico; He, Jing; Lu, Yonggang
2007-01-01
The paper describes a novel framework, constructed using Constraint Logic Programming (CLP) and parallelism, to determine the association between parts of the primary sequence of a protein and alpha-helices extracted from 3D low-resolution descriptions of large protein complexes. The association is determined by extracting constraints from the 3D information, regarding length, relative position and connectivity of helices, and solving these constraints with the guidance of a secondary structure prediction algorithm. Parallelism is employed to enhance performance on large proteins. The framework provides a fast, inexpensive alternative to determine the exact tertiary structure of unknown proteins.
Directory of Open Access Journals (Sweden)
Pietro Quaglio
2017-05-01
Full Text Available Repeated, precise sequences of spikes are largely considered a signature of activation of cell assemblies. These repeated sequences are commonly known under the name of spatio-temporal patterns (STPs. STPs are hypothesized to play a role in the communication of information in the computational process operated by the cerebral cortex. A variety of statistical methods for the detection of STPs have been developed and applied to electrophysiological recordings, but such methods scale poorly with the current size of available parallel spike train recordings (more than 100 neurons. In this work, we introduce a novel method capable of overcoming the computational and statistical limits of existing analysis techniques in detecting repeating STPs within massively parallel spike trains (MPST. We employ advanced data mining techniques to efficiently extract repeating sequences of spikes from the data. Then, we introduce and compare two alternative approaches to distinguish statistically significant patterns from chance sequences. The first approach uses a measure known as conceptual stability, of which we investigate a computationally cheap approximation for applications to such large data sets. The second approach is based on the evaluation of pattern statistical significance. In particular, we provide an extension to STPs of a method we recently introduced for the evaluation of statistical significance of synchronous spike patterns. The performance of the two approaches is evaluated in terms of computational load and statistical power on a variety of artificial data sets that replicate specific features of experimental data. Both methods provide an effective and robust procedure for detection of STPs in MPST data. The method based on significance evaluation shows the best overall performance, although at a higher computational cost. We name the novel procedure the spatio-temporal Spike PAttern Detection and Evaluation (SPADE analysis.
A massive hydrogen-rich Martian greenhouse recorded in D/H
Pahlevan, K.; Schaefer, L. K.; Desch, S. J.; Elkins-Tanton, L. T.
2017-12-01
The deuterium-to-hydrogen (D/H) ratio in Martian atmospheric water ( 6x standard mean ocean water, SMOW) [1,2] is higher than that of known sources [3,4] alluding to a planetary enrichment process. A recent measurement by the Curiosity rover of Hesperian clays yields a D/H value 3x higher than SMOW [5], demonstrating that most enrichment occurred early in planetary history, buttressing the conclusions of Martian meteorite studies [6,7]. Extant models of the isotopic evolution of the Martian hydrosphere have not incorporated primordial H2, despite its likely abundance on early Mars. Here, we report the first 1D climate calculations with an atmospheric composition determined via degassing from a reducing magma ocean to study Martian climate during an early water ocean stage. A reducing Martian magma ocean is expected based on experimental petrology [8], the degassing of which gives rise to an H2-rich steam atmosphere [9] with strong attendant greenhouse warming [10,11] even after the removal of steam via condensation. At the pressures and temperatures prevailing in such a degassed greenhouse, we find that isotopic exchange in the fluid envelope is rapid, strongly concentrating deuterium in water molecules over molecular hydrogen [12]. The subsequent loss of the isotopically light H2-rich atmosphere results in a 2x D/H enrichment in the oceans via isotopic equilibration alone. These calculations suggest that most of the D/H enrichment observed in the first billion years of Martian history is produced by the evolution of a massive ( 100 bar) H2-rich greenhouse in the aftermath of magma ocean crystallization. The proposed link between early planetary process and modern isotopic observable opens a new window into the earliest history of Mars. [1] Owen, T. et al. Science 240, 1767-1770 (1988). [2] Webster, C. R. et al. Science 341, 260-263 (2013). [3] Lunine, J. I. et al. Icarus 165, 1-8, (2003). [4] Marty, B. et al. EPSL 441, 91-102, (2016). [5] Mahaffy, P. et al
KMOS"3"D Reveals Low-level Star Formation Activity in Massive Quiescent Galaxies at 0.7 < z < 2.7
International Nuclear Information System (INIS)
Belli, Sirio; Genzel, Reinhard; Förster Schreiber, Natascha M.; Wisnioski, Emily; Wilman, David J.; Mendel, J. Trevor; Beifiori, Alessandra; Bender, Ralf; Burkert, Andreas; Chan, Jeffrey; Davies, Rebecca L.; Davies, Ric; Fabricius, Maximilian; Fossati, Matteo; Galametz, Audrey; Lang, Philipp; Lutz, Dieter; Wuyts, Stijn; Brammer, Gabriel B.; Momcheva, Ivelina G.
2017-01-01
We explore the H α emission in the massive quiescent galaxies observed by the KMOS"3"D survey at 0.7 < z < 2.7. The H α line is robustly detected in 20 out of 120 UVJ -selected quiescent galaxies, and we classify the emission mechanism using the H α line width and the [N ii]/H α line ratio. We find that AGNs are likely to be responsible for the line emission in more than half of the cases. We also find robust evidence for star formation activity in nine quiescent galaxies, which we explore in detail. The H α kinematics reveal rotating disks in five of the nine galaxies. The dust-corrected H α star formation rates are low (0.2–7 M _⊙ yr"−"1), and place these systems significantly below the main sequence. The 24 μ m-based, infrared luminosities, instead, overestimate the star formation rates. These galaxies present a lower gas-phase metallicity compared to star-forming objects with similar stellar mass, and many of them have close companions. We therefore conclude that the low-level star formation activity in these nine quiescent galaxies is likely to be fueled by inflowing gas or minor mergers, and could be a sign of rejuvenation events.
Computation and parallel implementation for early vision
Gualtieri, J. Anthony
1990-01-01
The problem of early vision is to transform one or more retinal illuminance images-pixel arrays-to image representations built out of such primitive visual features such as edges, regions, disparities, and clusters. These transformed representations form the input to later vision stages that perform higher level vision tasks including matching and recognition. Researchers developed algorithms for: (1) edge finding in the scale space formulation; (2) correlation methods for computing matches between pairs of images; and (3) clustering of data by neural networks. These algorithms are formulated for parallel implementation of SIMD machines, such as the Massively Parallel Processor, a 128 x 128 array processor with 1024 bits of local memory per processor. For some cases, researchers can show speedups of three orders of magnitude over serial implementations.
Comparison of 2D and 3D Neutron Transport Analyses on Yonggwang Unit 3 Reactor
International Nuclear Information System (INIS)
Maeng, Aoung Jae; Kim, Byoung Chul; Lim, Mi Joung; Kim, Kyung Sik; Jeon, Young Kyou; Yoo, Choon Sung
2012-01-01
10 CFR Part 50 Appendix H requires periodical surveillance program in the reactor vessel (RV) belt line region of light water nuclear power plant to check vessel integrity resulting from the exposure to neutron irradiation and thermal environment. Exact exposure analysis of the neutron fluence based on right modeling and simulations is the most important in the evaluation. Traditional 2 dimensional (D) and 1D synthesis methodologies have been widely applied to evaluate the fast neutron (E > 1.0 MeV) fluence exposure to RV. However, 2D and 1D methodologies have not provided accurate fast neutron fluence evaluation at elevations far above or below the active core region. RAPTOR-M3G (RApid Parallel Transport Of Radiation - Multiple 3D Geometries) program for 3D geometries calculation was therefore developed both by Westinghouse Electronic Company, USA and Korea Reactor Integrity Surveillance Technology (KRIST) for the analysis of In-Vessel Surveillance Test and Ex-Vessel Neutron Dosimetry (EVND). Especially EVND which is installed at active core height between biological shielding material and concrete also evaluates axial neutron fluence by placing three dosimetries each at Top, Middle and Bottom part of the angle representing maximum neutron fluence. The EVND programs have been applied to the Korea Nuclear Plants. The objective of this study is therefore to compare the 3D and the 2D Neutron Transport Calculations and Analyses on the Yonggwang unit 3 Reactor as an example
2D-RBUC for efficient parallel compression of residuals
Đurđević, Đorđe M.; Tartalja, Igor I.
2018-02-01
In this paper, we present a method for lossless compression of residuals with an efficient SIMD parallel decompression. The residuals originate from lossy or near lossless compression of height fields, which are commonly used to represent models of terrains. The algorithm is founded on the existing RBUC method for compression of non-uniform data sources. We have adapted the method to capture 2D spatial locality of height fields, and developed the data decompression algorithm for modern GPU architectures already present even in home computers. In combination with the point-level SIMD-parallel lossless/lossy high field compression method HFPaC, characterized by fast progressive decompression and seamlessly reconstructed surface, the newly proposed method trades off small efficiency degradation for a non negligible compression ratio (measured up to 91%) benefit.
From parallel to distributed computing for reactive scattering calculations
International Nuclear Information System (INIS)
Lagana, A.; Gervasi, O.; Baraglia, R.
1994-01-01
Some reactive scattering codes have been ported on different innovative computer architectures ranging from massively parallel machines to clustered workstations. The porting has required a drastic restructuring of the codes to single out computationally decoupled cpu intensive subsections. The suitability of different theoretical approaches for parallel and distributed computing restructuring is discussed and the efficiency of related algorithms evaluated
Effects of parallel electron dynamics on plasma blob transport
Energy Technology Data Exchange (ETDEWEB)
Angus, Justin R.; Krasheninnikov, Sergei I. [University of California, San Diego, 9500 Gilman Drive, La Jolla, California 92093 (United States); Umansky, Maxim V. [Lawrence Livermore National Laboratory, 7000 East Avenue, Livermore, California 94550 (United States)
2012-08-15
The 3D effects on sheath connected plasma blobs that result from parallel electron dynamics are studied by allowing for the variation of blob density and potential along the magnetic field line and using collisional Ohm's law to model the parallel current density. The parallel current density from linear sheath theory, typically used in the 2D model, is implemented as parallel boundary conditions. This model includes electrostatic 3D effects, such as resistive drift waves and blob spinning, while retaining all of the fundamental 2D physics of sheath connected plasma blobs. If the growth time of unstable drift waves is comparable to the 2D advection time scale of the blob, then the blob's density gradient will be depleted resulting in a much more diffusive blob with little radial motion. Furthermore, blob profiles that are initially varying along the field line drive the potential to a Boltzmann relation that spins the blob and thereby acts as an addition sink of the 2D potential. Basic dimensionless parameters are presented to estimate the relative importance of these two 3D effects. The deviation of blob dynamics from that predicted by 2D theory in the appropriate limits of these parameters is demonstrated by a direct comparison of 2D and 3D seeded blob simulations.
International Nuclear Information System (INIS)
Lichters, R.; Pfund, R.E.W.; Meyer-ter-Vehn, J.
1997-08-01
The code LPIC++ presented here, is based on a one-dimensional, electromagnetic, relativistic PIC code that has originally been developed by one of the authors during a PhD thesis at the Max-Planck-Institut fuer Quantenoptik for kinetic simulations of high harmonic generation from overdense plasma surfaces. The code uses essentially the algorithm of Birdsall and Langdon and Villasenor and Bunemann. It is written in C++ in order to be easily extendable and has been parallelized to be able to grow in power linearly with the size of accessable hardware, e.g. massively parallel machines like Cray T3E. The parallel LPIC++ version uses PVM for communication between processors. PVM is public domain software, can be downloaded from the world wide web. A particular strength of LPIC++ lies in its clear program and data structure, which uses chained lists for the organization of grid cells and enables dynamic adjustment of spatial domain sizes in a very convenient way, and therefore easy balancing of processor loads. Also particles belonging to one cell are linked in a chained list and are immediately accessable from this cell. In addition to this convenient type of data organization in a PIC code, the code shows excellent performance in both its single processor and parallel version. (orig.)
Ages of Massive Galaxies at 0.5 > z > 2.0 from 3D-HST Rest-frame Optical Spectroscopy
Fumagalli, Mattia; Franx, Marijn; van Dokkum, Pieter; Whitaker, Katherine E.; Skelton, Rosalind E.; Brammer, Gabriel; Nelson, Erica; Maseda, Michael; Momcheva, Ivelina; Kriek, Mariska; Labbé, Ivo; Lundgren, Britt; Rix, Hans-Walter
2016-05-01
We present low-resolution near-infrared stacked spectra from the 3D-HST survey up to z = 2.0 and fit them with commonly used stellar population synthesis models: BC03, FSPS10 (Flexible Stellar Population Synthesis), and FSPS-C3K. The accuracy of the grism redshifts allows the unambiguous detection of many emission and absorption features and thus a first systematic exploration of the rest-frame optical spectra of galaxies up to z = 2. We select massive galaxies ({log}({M}*/{M}⊙ )\\gt 10.8), we divide them into quiescent and star-forming via a rest-frame color-color technique, and we median-stack the samples in three redshift bins between z = 0.5 and z = 2.0. We find that stellar population models fit the observations well at wavelengths below the 6500 Å rest frame, but show systematic residuals at redder wavelengths. The FSPS-C3K model generally provides the best fits (evaluated with χ 2 red statistics) for quiescent galaxies, while BC03 performs the best for star-forming galaxies. The stellar ages of quiescent galaxies implied by the models, assuming solar metallicity, vary from 4 Gyr at z ˜ 0.75 to 1.5 Gyr at z ˜ 1.75, with an uncertainty of a factor of two caused by the unknown metallicity. On average, the stellar ages are half the age of the universe at these redshifts. We show that the inferred evolution of ages of quiescent galaxies is in agreement with fundamental plane measurements, assuming an 8 Gyr age for local galaxies. For star-forming galaxies, the inferred ages depend strongly on the stellar population model and the shape of the assumed star-formation history.
Scalable Strategies for Computing with Massive Data
Directory of Open Access Journals (Sweden)
Michael Kane
2013-11-01
Full Text Available This paper presents two complementary statistical computing frameworks that address challenges in parallel processing and the analysis of massive data. First, the foreach package allows users of the R programming environment to define parallel loops that may be run sequentially on a single machine, in parallel on a symmetric multiprocessing (SMP machine, or in cluster environments without platform-specific code. Second, the bigmemory package implements memory- and file-mapped data structures that provide (a access to arbitrarily large data while retaining a look and feel that is familiar to R users and (b data structures that are shared across processor cores in order to support efficient parallel computing techniques. Although these packages may be used independently, this paper shows how they can be used in combination to address challenges that have effectively been beyond the reach of researchers who lack specialized software development skills or expensive hardware.
Parallel simulation of tsunami inundation on a large-scale supercomputer
Oishi, Y.; Imamura, F.; Sugawara, D.
2013-12-01
An accurate prediction of tsunami inundation is important for disaster mitigation purposes. One approach is to approximate the tsunami wave source through an instant inversion analysis using real-time observation data (e.g., Tsushima et al., 2009) and then use the resulting wave source data in an instant tsunami inundation simulation. However, a bottleneck of this approach is the large computational cost of the non-linear inundation simulation and the computational power of recent massively parallel supercomputers is helpful to enable faster than real-time execution of a tsunami inundation simulation. Parallel computers have become approximately 1000 times faster in 10 years (www.top500.org), and so it is expected that very fast parallel computers will be more and more prevalent in the near future. Therefore, it is important to investigate how to efficiently conduct a tsunami simulation on parallel computers. In this study, we are targeting very fast tsunami inundation simulations on the K computer, currently the fastest Japanese supercomputer, which has a theoretical peak performance of 11.2 PFLOPS. One computing node of the K computer consists of 1 CPU with 8 cores that share memory, and the nodes are connected through a high-performance torus-mesh network. The K computer is designed for distributed-memory parallel computation, so we have developed a parallel tsunami model. Our model is based on TUNAMI-N2 model of Tohoku University, which is based on a leap-frog finite difference method. A grid nesting scheme is employed to apply high-resolution grids only at the coastal regions. To balance the computation load of each CPU in the parallelization, CPUs are first allocated to each nested layer in proportion to the number of grid points of the nested layer. Using CPUs allocated to each layer, 1-D domain decomposition is performed on each layer. In the parallel computation, three types of communication are necessary: (1) communication to adjacent neighbours for the
THE THREE-DIMENSIONAL EVOLUTION TO CORE COLLAPSE OF A MASSIVE STAR
Energy Technology Data Exchange (ETDEWEB)
Couch, Sean M. [TAPIR, Walter Burke Institute for Theoretical Physics, California Institute of Technology, Pasadena, CA 91125 (United States); Chatzopoulos, Emmanouil [Flash Center for Computational Science, Department of Astronomy and Astrophysics, University of Chicago, Chicago, IL 60637 (United States); Arnett, W. David [Steward Observatory, University of Arizona, Tucson, AZ 85721 (United States); Timmes, F. X., E-mail: smc@tapir.caltech.edu [Joint Institute for Nuclear Astrophysics, Michigan State University, East Lansing, MI 48824 (United States)
2015-07-20
We present the first three-dimensional (3D) simulation of the final minutes of iron core growth in a massive star, up to and including the point of core gravitational instability and collapse. We capture the development of strong convection driven by violent Si burning in the shell surrounding the iron core. This convective burning builds the iron core to its critical mass and collapse ensues, driven by electron capture and photodisintegration. The non-spherical structure and motion generated by 3D convection is substantial at the point of collapse, with convective speeds of several hundreds of km s{sup −1}. We examine the impact of such physically realistic 3D initial conditions on the core-collapse supernova mechanism using 3D simulations including multispecies neutrino leakage and find that the enhanced post-shock turbulence resulting from 3D progenitor structure aids successful explosions. We conclude that non-spherical progenitor structure should not be ignored, and should have a significant and favorable impact on the likelihood for neutrino-driven explosions. In order to make simulating the 3D collapse of an iron core feasible, we were forced to make approximations to the nuclear network making this effort only a first step toward accurate, self-consistent 3D stellar evolution models of the end states of massive stars.
THE THREE-DIMENSIONAL EVOLUTION TO CORE COLLAPSE OF A MASSIVE STAR
International Nuclear Information System (INIS)
Couch, Sean M.; Chatzopoulos, Emmanouil; Arnett, W. David; Timmes, F. X.
2015-01-01
We present the first three-dimensional (3D) simulation of the final minutes of iron core growth in a massive star, up to and including the point of core gravitational instability and collapse. We capture the development of strong convection driven by violent Si burning in the shell surrounding the iron core. This convective burning builds the iron core to its critical mass and collapse ensues, driven by electron capture and photodisintegration. The non-spherical structure and motion generated by 3D convection is substantial at the point of collapse, with convective speeds of several hundreds of km s −1 . We examine the impact of such physically realistic 3D initial conditions on the core-collapse supernova mechanism using 3D simulations including multispecies neutrino leakage and find that the enhanced post-shock turbulence resulting from 3D progenitor structure aids successful explosions. We conclude that non-spherical progenitor structure should not be ignored, and should have a significant and favorable impact on the likelihood for neutrino-driven explosions. In order to make simulating the 3D collapse of an iron core feasible, we were forced to make approximations to the nuclear network making this effort only a first step toward accurate, self-consistent 3D stellar evolution models of the end states of massive stars
A Software Reference Architecture for Service-Oriented 3D Geovisualization Systems
Hildebrandt, Dieter
2014-01-01
Modern 3D geovisualization systems (3DGeoVSs) are complex and evolving systems that are required to be adaptable and leverage distributed resources, including massive geodata. This article focuses on 3DGeoVSs built based on the principles of service-oriented architectures, standards and image-based representations (SSI) to address practically relevant challenges and potentials. Such systems facilitate resource sharing and agile and efficient system construction and change in an interoperable ...
Introduction to 3D Graphics through Excel
Benacka, Jan
2013-01-01
The article presents a method of explaining the principles of 3D graphics through making a revolvable and sizable orthographic parallel projection of cuboid in Excel. No programming is used. The method was tried in fourteen 90 minute lessons with 181 participants, which were Informatics teachers, undergraduates of Applied Informatics and gymnasium…
Spatial Correlation Characterization of a Full Dimension Massive MIMO System
Nadeem, Qurrat-Ul-Ain
2017-02-07
Elevation beamforming and Full Dimension MIMO (FD-MIMO) are currently active areas of research and standardization in 3GPP LTE-Advanced. FD-MIMO utilizes an active antenna array system (AAS), that provides the ability of adaptive electronic beam control over the elevation dimension, resulting in a better system performance as compared to the conventional 2D MIMO systems. FD-MIMO is more advantageous when amalgamated with massive MIMO systems, in that it exploits the additional degrees of freedom offered by a large number of antennas in the elevation. To facilitate the evaluation of these systems, a large effort in 3D channel modeling is needed. This paper aims at providing a summary of the recent 3GPP activity around 3D channel modeling. The 3GPP proposed approach to model antenna radiation pattern is compared with the ITU approach. A closed-form expression is then worked out for the spatial correlation function (SCF) for channels constituted by individual antenna elements in the array by exploiting results on spherical harmonics and Legendre polynomials. The proposed expression can be used to obtain correlation coefficients for any arbitrary 3D propagation environment. Simulation results corroborate and study the derived spatial correlation expression. The results are directly applicable to the analysis of future 5G 3D massive MIMO systems.
10D massive type IIA supergravities as the uplift of parabolic M2-brane torus bundles
Energy Technology Data Exchange (ETDEWEB)
Garcia del Moral, Maria Pilar [Universidad de Antofagasta (Chile). Dept. de Fisica; Restuccia, Alvaro [Universidad de Antofagasta (Chile). Dept. de Fisica; Universidad Simon Bolivar, Caracas (Venezuela, Bolivarian Republic of). Dept. de Fisica
2016-04-15
We remark that the two 10D massive deformations of the N = 2 maximal type IIA supergravity (Romans and HLW supergravity) are associated to the low energy limit of the uplift to 10D of M2-brane torus bundles with parabolic monodromy linearly and non-linearly realized respectively. Romans supergravity corresponds to M2-brane compactified on a twice-punctured torus bundle. (copyright 2015 WILEY-VCH Verlag GmbH and Co. KGaA, Weinheim)
A parallel solution for high resolution histological image analysis.
Bueno, G; González, R; Déniz, O; García-Rojo, M; González-García, J; Fernández-Carrobles, M M; Vállez, N; Salido, J
2012-10-01
This paper describes a general methodology for developing parallel image processing algorithms based on message passing for high resolution images (on the order of several Gigabytes). These algorithms have been applied to histological images and must be executed on massively parallel processing architectures. Advances in new technologies for complete slide digitalization in pathology have been combined with developments in biomedical informatics. However, the efficient use of these digital slide systems is still a challenge. The image processing that these slides are subject to is still limited both in terms of data processed and processing methods. The work presented here focuses on the need to design and develop parallel image processing tools capable of obtaining and analyzing the entire gamut of information included in digital slides. Tools have been developed to assist pathologists in image analysis and diagnosis, and they cover low and high-level image processing methods applied to histological images. Code portability, reusability and scalability have been tested by using the following parallel computing architectures: distributed memory with massive parallel processors and two networks, INFINIBAND and Myrinet, composed of 17 and 1024 nodes respectively. The parallel framework proposed is flexible, high performance solution and it shows that the efficient processing of digital microscopic images is possible and may offer important benefits to pathology laboratories. Copyright © 2012 Elsevier Ireland Ltd. All rights reserved.
Comparative Analysis of Data Structures for Storing Massive Tins in a Dbms
Kumar, K.; Ledoux, H.; Stoter, J.
2016-06-01
Point cloud data are an important source for 3D geoinformation. Modern day 3D data acquisition and processing techniques such as airborne laser scanning and multi-beam echosounding generate billions of 3D points for simply an area of few square kilometers. With the size of the point clouds exceeding the billion mark for even a small area, there is a need for their efficient storage and management. These point clouds are sometimes associated with attributes and constraints as well. Storing billions of 3D points is currently possible which is confirmed by the initial implementations in Oracle Spatial SDO PC and the PostgreSQL Point Cloud extension. But to be able to analyse and extract useful information from point clouds, we need more than just points i.e. we require the surface defined by these points in space. There are different ways to represent surfaces in GIS including grids, TINs, boundary representations, etc. In this study, we investigate the database solutions for the storage and management of massive TINs. The classical (face and edge based) and compact (star based) data structures are discussed at length with reference to their structure, advantages and limitations in handling massive triangulations and are compared with the current solution of PostGIS Simple Feature. The main test dataset is the TIN generated from third national elevation model of the Netherlands (AHN3) with a point density of over 10 points/m2. PostgreSQL/PostGIS DBMS is used for storing the generated TIN. The data structures are tested with the generated TIN models to account for their geometry, topology, storage, indexing, and loading time in a database. Our study is useful in identifying what are the limitations of the existing data structures for storing massive TINs and what is required to optimise these structures for managing massive triangulations in a database.
The parity-preserving massive QED3: Vanishing β-function and no parity anomaly
Directory of Open Access Journals (Sweden)
O.M. Del Cima
2015-11-01
Full Text Available The parity-preserving massive QED3 exhibits vanishing gauge coupling β-function and is parity and infrared anomaly free at all orders in perturbation theory. Parity is not an anomalous symmetry, even for the parity-preserving massive QED3, in spite of some claims about the possibility of a perturbative parity breakdown, called parity anomaly. The proof is done by using the algebraic renormalization method, which is independent of any regularization scheme, based on general theorems of perturbative quantum field theory.
Dirac equation for massive neutrinos in a Schwarzschild-de Sitter spacetime from a 5D vacuum
International Nuclear Information System (INIS)
Sánchez, Pablo Alejandro; Anabitarte, Mariano; Bellini, Mauricio
2011-01-01
Starting from a Dirac equation for massless neutrino in a 5D Ricci-flat background metric, we obtain the effective 4D equation for massive neutrino in a Schwarzschild-de Sitter (SdS) background metric from an extended SdS 5D Ricci-flat metric. We use the fact that the spin connection is defined to an accuracy of a vector, so that the covariant derivative of the spinor field is strongly dependent of the background geometry. We show that the mass of the neutrino can be induced from the extra space-like dimension.
3D Membrane Imaging and Porosity Visualization
Sundaramoorthi, Ganesh
2016-03-03
Ultrafiltration asymmetric porous membranes were imaged by two microscopy methods, which allow 3D reconstruction: Focused Ion Beam and Serial Block Face Scanning Electron Microscopy. A new algorithm was proposed to evaluate porosity and average pore size in different layers orthogonal and parallel to the membrane surface. The 3D-reconstruction enabled additionally the visualization of pore interconnectivity in different parts of the membrane. The method was demonstrated for a block copolymer porous membrane and can be extended to other membranes with application in ultrafiltration, supports for forward osmosis, etc, offering a complete view of the transport paths in the membrane.
Hoekstra, A.G.; Sloot, P.M.A.; Haan, M.J.; Hertzberger, L.O.; van Leeuwen, J.
1991-01-01
New developments in Computer Science, both hardware and software, offer researchers, such as physicists, unprecedented possibilities to solve their computational intensive problems.However, full exploitation of e.g. new massively parallel computers, parallel languages or runtime environments
Optimisation of a parallel ocean general circulation model
Beare, M. I.; Stevens, D. P.
1997-10-01
This paper presents the development of a general-purpose parallel ocean circulation model, for use on a wide range of computer platforms, from traditional scalar machines to workstation clusters and massively parallel processors. Parallelism is provided, as a modular option, via high-level message-passing routines, thus hiding the technical intricacies from the user. An initial implementation highlights that the parallel efficiency of the model is adversely affected by a number of factors, for which optimisations are discussed and implemented. The resulting ocean code is portable and, in particular, allows science to be achieved on local workstations that could otherwise only be undertaken on state-of-the-art supercomputers.
Energy Technology Data Exchange (ETDEWEB)
Watanabe, Hideo; Kawai, Wataru; Nemoto, Toshiyuki [Fujitsu Ltd., Tokyo (Japan); and others
1997-12-01
Several computer codes in the nuclear field have been vectorized, parallelized and transported on the FUJITSU VPP500 system at Center for Promotion of Computational Science and Engineering in Japan Atomic Energy Research Institute. These results are reported in 3 parts, i.e., the vectorization part, the parallelization part and the porting part. In this report, we describe the parallelization. In this parallelization part, the parallelization of 2-Dimensional relativistic electromagnetic particle code EM2D, Cylindrical Direct Numerical Simulation code CYLDNS and molecular dynamics code for simulating radiation damages in diamond crystals DGR are described. In the vectorization part, the vectorization of two and three dimensional discrete ordinates simulation code DORT-TORT, gas dynamics analysis code FLOWGR and relativistic Boltzmann-Uehling-Uhlenbeck simulation code RBUU are described. And then, in the porting part, the porting of reactor safety analysis code RELAP5/MOD3.2 and RELAP5/MOD3.2.1.2, nuclear data processing system NJOY and 2-D multigroup discrete ordinate transport code TWOTRAN-II are described. And also, a survey for the porting of command-driven interactive data analysis plotting program IPLOT are described. (author)
Mechanical Properties of 3d Scaffolds for Bone Regeneration
Directory of Open Access Journals (Sweden)
Deividas Mizeras
2017-01-01
Full Text Available One of the biggest challenges in modern tissue engineering is a creation 3D scaffolds for bone tissue regeneration. Until now, in order to restore bone defects are used various bone substitutes (autologous and allogeneic, however, their usage is limited because is required additional surgery, possible complications, also limited their use is associated with ethical point of view. In this work we aim to determine the mechanical properties of 3D printed PLA objects having various orientation woodpile microarchitectures. In this work we chose three different 3D microarchitectures: woodpile BCC (each layer consists of parallel logs which are rotated 90 deg every next layer, woodpile FCC (every layer is additionally shifted half of the period in respect to the previous parallel log layer and a rotating woodpile 60 deg (each layer is rotated 60 deg in respect to the previous one. Compressive and bending tests were carried out with TIRAtest2300 universal testing machine. We found that 60 deg rotating woodpile geometry had the highest mechanical values which were approximately about 3 times higher than the BCC or FCC microstructures.
Regional-scale calculation of the LS factor using parallel processing
Liu, Kai; Tang, Guoan; Jiang, Ling; Zhu, A.-Xing; Yang, Jianyi; Song, Xiaodong
2015-05-01
With the increase of data resolution and the increasing application of USLE over large areas, the existing serial implementation of algorithms for computing the LS factor is becoming a bottleneck. In this paper, a parallel processing model based on message passing interface (MPI) is presented for the calculation of the LS factor, so that massive datasets at a regional scale can be processed efficiently. The parallel model contains algorithms for calculating flow direction, flow accumulation, drainage network, slope, slope length and the LS factor. According to the existence of data dependence, the algorithms are divided into local algorithms and global algorithms. Parallel strategy are designed according to the algorithm characters including the decomposition method for maintaining the integrity of the results, optimized workflow for reducing the time taken for exporting the unnecessary intermediate data and a buffer-communication-computation strategy for improving the communication efficiency. Experiments on a multi-node system show that the proposed parallel model allows efficient calculation of the LS factor at a regional scale with a massive dataset.
AMIDST: Analysis of MassIve Data STreams
DEFF Research Database (Denmark)
Masegosa, Andres; Martinez, Ana Maria; Borchani, Hanen
2015-01-01
The Analysis of MassIve Data STreams (AMIDST) Java toolbox provides a collection of scalable and parallel algorithms for inference and learning of hybrid Bayesian networks from data streams. The toolbox, available at http://amidst.github.io/toolbox/ under the Apache Software License version 2.......0, also efficiently leverages existing functionalities and algorithms by interfacing to software tools such as HUGIN and MOA....
Massively Parallel Polar Decomposition on Distributed-Memory Systems
Ltaief, Hatem
2018-01-01
We present a high-performance implementation of the Polar Decomposition (PD) on distributed-memory systems. Building upon on the QR-based Dynamically Weighted Halley (QDWH) algorithm, the key idea lies in finding the best rational approximation for the scalar sign function, which also corresponds to the polar factor for symmetric matrices, to further accelerate the QDWH convergence. Based on the Zolotarev rational functions—introduced by Zolotarev (ZOLO) in 1877— this new PD algorithm ZOLO-PD converges within two iterations even for ill-conditioned matrices, instead of the original six iterations needed for QDWH. ZOLO-PD uses the property of Zolotarev functions that optimality is maintained when two functions are composed in an appropriate manner. The resulting ZOLO-PD has a convergence rate up to seventeen, in contrast to the cubic convergence rate for QDWH. This comes at the price of higher arithmetic costs and memory footprint. These extra floating-point operations can, however, be processed in an embarrassingly parallel fashion. We demonstrate performance using up to 102, 400 cores on two supercomputers. We demonstrate that, in the presence of a large number of processing units, ZOLO-PD is able to outperform QDWH by up to 2.3X speedup, especially in situations where QDWH runs out of work, for instance, in the strong scaling mode of operation.
Zapata, M. A. Uh; Van Bang, D. Pham; Nguyen, K. D.
2016-05-01
This paper presents a parallel algorithm for the finite-volume discretisation of the Poisson equation on three-dimensional arbitrary geometries. The proposed method is formulated by using a 2D horizontal block domain decomposition and interprocessor data communication techniques with message passing interface. The horizontal unstructured-grid cells are reordered according to the neighbouring relations and decomposed into blocks using a load-balanced distribution to give all processors an equal amount of elements. In this algorithm, two parallel successive over-relaxation methods are presented: a multi-colour ordering technique for unstructured grids based on distributed memory and a block method using reordering index following similar ideas of the partitioning for structured grids. In all cases, the parallel algorithms are implemented with a combination of an acceleration iterative solver. This solver is based on a parabolic-diffusion equation introduced to obtain faster solutions of the linear systems arising from the discretisation. Numerical results are given to evaluate the performances of the methods showing speedups better than linear.
Efficient reconfigurable architectures for 3D medical image compression
Afandi, Ahmad
2010-01-01
This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University. Recently, the more widespread use of three-dimensional (3-D) imaging modalities, such as magnetic resonance imaging (MRI), computed tomography (CT), positron emission tomography (PET), and ultrasound (US) have generated a massive amount of volumetric data. These have provided an impetus to the development of other applications, in particular telemedicine and teleradiology. In thes...
Unfolded equations for massive higher spin supermultiplets in AdS{sub 3}
Energy Technology Data Exchange (ETDEWEB)
Buchbinder, I.L. [Department of Theoretical Physics, Tomsk State Pedagogical University,60 Kievskaya Str., Tomsk, 634061 (Russian Federation); National Research Tomsk State University,36 Lenina Ave., Tomsk, 634050 (Russian Federation); Snegirev, T.V. [Department of Theoretical Physics, Tomsk State Pedagogical University,60 Kievskaya Str., Tomsk, 634061 (Russian Federation); Department of Higher Mathematics and Mathematical Physics,National Research Tomsk Polytechnic University, 30 Lenina Ave., Tomsk, 634050 (Russian Federation); Zinoviev, Yu.M. [Department of Theoretical Physics,Institute for High Energy Physics of National Research Center “Kurchatov Institute”, 1 Pobedy Str., Protvino, Moscow Region, 142280 (Russian Federation)
2016-08-10
In this paper we give an explicit construction of unfolded equations for massive higher spin supermultiplets of the minimal (1,0) supersymmetry in AdS{sub 3} space. For that purpose we use an unfolded formulation for massive bosonic and fermionic higher spins and find supertransformations leaving appropriate set of unfolded equations invariant. We provide two general supermultiplets (s,s+1/2) and (s,s−1/2) with arbitrary integer s, as well as a number of lower spin examples.
Parallel adaptive simulations on unstructured meshes
International Nuclear Information System (INIS)
Shephard, M S; Jansen, K E; Sahni, O; Diachin, L A
2007-01-01
This paper discusses methods being developed by the ITAPS center to support the execution of parallel adaptive simulations on unstructured meshes. The paper first outlines the ITAPS approach to the development of interoperable mesh, geometry and field services to support the needs of SciDAC application in these areas. The paper then demonstrates the ability of unstructured adaptive meshing methods built on such interoperable services to effectively solve important physics problems. Attention is then focused on ITAPs' developing ability to solve adaptive unstructured mesh problems on massively parallel computers
Tramm, John R.; Gunow, Geoffrey; He, Tim; Smith, Kord S.; Forget, Benoit; Siegel, Andrew R.
2016-05-01
In this study we present and analyze a formulation of the 3D Method of Characteristics (MOC) technique applied to the simulation of full core nuclear reactors. Key features of the algorithm include a task-based parallelism model that allows independent MOC tracks to be assigned to threads dynamically, ensuring load balancing, and a wide vectorizable inner loop that takes advantage of modern SIMD computer architectures. The algorithm is implemented in a set of highly optimized proxy applications in order to investigate its performance characteristics on CPU, GPU, and Intel Xeon Phi architectures. Speed, power, and hardware cost efficiencies are compared. Additionally, performance bottlenecks are identified for each architecture in order to determine the prospects for continued scalability of the algorithm on next generation HPC architectures.
a Novel Approach for 3d Neighbourhood Analysis
Emamgholian, S.; Taleai, M.; Shojaei, D.
2017-09-01
Population growth and lack of land in urban areas have caused massive developments such as high rises and underground infrastructures. Land authorities in the international context recognizes 3D cadastres as a solution to efficiently manage these developments in complex cities. Although a 2D cadastre does not efficiently register these developments, it is currently being used in many jurisdictions for registering land and property information. Limitations in analysis and presentation are considered as examples of such limitations. 3D neighbourhood analysis by automatically finding 3D spaces has become an issue of major interest in recent years. Whereas the neighbourhood analysis has been in the focus of research, the idea of 3D neighbourhood analysis has rarely been addressed in 3 dimensional information systems (3D GIS) analysis. In this paper, a novel approach for 3D neighbourhood analysis has been proposed by recording spatial and descriptive information of the apartment units and easements. This approach uses the coordinates of the subject apartment unit to find the neighbour spaces. By considering a buffer around the edges of the unit, neighbour spaces are accurately detected. This method was implemented in ESRI ArcScene and three case studies were defined to test the efficiency of this approach. The results show that spaces are accurately detected in various complex scenarios. This approach can also be applied for other applications such as property management and disaster management in order to find the affected apartments around a defined space.
Accelerated 3D-OSEM image reconstruction using a Beowulf PC cluster for pinhole SPECT
International Nuclear Information System (INIS)
Zeniya, Tsutomu; Watabe, Hiroshi; Sohlberg, Antti; Iida, Hidehiro
2007-01-01
A conventional pinhole single-photon emission computed tomography (SPECT) with a single circular orbit has limitations associated with non-uniform spatial resolution or axial blurring. Recently, we demonstrated that three-dimensional (3D) images with uniform spatial resolution and no blurring can be obtained by complete data acquired using two-circular orbit, combined with the 3D ordered subsets expectation maximization (OSEM) reconstruction method. However, a long computation time is required to obtain the reconstruction image, because of the fact that 3D-OSEM is an iterative method and two-orbit acquisition doubles the size of the projection data. To reduce the long reconstruction time, we parallelized the two-orbit pinhole 3D-OSEM reconstruction process by using a Beowulf personal computer (PC) cluster. The Beowulf PC cluster consists of seven PCs connected to Gbit Ethernet switches. Message passing interface protocol was utilized for parallelizing the reconstruction process. The projection data in a subset are distributed to each PC. The partial image forward-and back-projected in each PC is transferred to all PCs. The current image estimate on each PC is updated after summing the partial images. The performance of parallelization on the PC cluster was evaluated using two independent projection data sets acquired by a pinhole SPECT system with two different circular orbits. Parallelization using the PC cluster improved the reconstruction time with increa