WorldWideScience

Sample records for efficient parallel iterative

  1. PRIM: An Efficient Preconditioning Iterative Reweighted Least Squares Method for Parallel Brain MRI Reconstruction.

    Science.gov (United States)

    Xu, Zheng; Wang, Sheng; Li, Yeqing; Zhu, Feiyun; Huang, Junzhou

    2018-02-08

    The most recent history of parallel Magnetic Resonance Imaging (pMRI) has in large part been devoted to finding ways to reduce acquisition time. While joint total variation (JTV) regularized model has been demonstrated as a powerful tool in increasing sampling speed for pMRI, however, the major bottleneck is the inefficiency of the optimization method. While all present state-of-the-art optimizations for the JTV model could only reach a sublinear convergence rate, in this paper, we squeeze the performance by proposing a linear-convergent optimization method for the JTV model. The proposed method is based on the Iterative Reweighted Least Squares algorithm. Due to the complexity of the tangled JTV objective, we design a novel preconditioner to further accelerate the proposed method. Extensive experiments demonstrate the superior performance of the proposed algorithm for pMRI regarding both accuracy and efficiency compared with state-of-the-art methods.

  2. An efficient parallel algorithm: Poststack and prestack Kirchhoff 3D depth migration using flexi-depth iterations

    Science.gov (United States)

    Rastogi, Richa; Srivastava, Abhishek; Khonde, Kiran; Sirasala, Kirannmayi M.; Londhe, Ashutosh; Chavhan, Hitesh

    2015-07-01

    This paper presents an efficient parallel 3D Kirchhoff depth migration algorithm suitable for current class of multicore architecture. The fundamental Kirchhoff depth migration algorithm exhibits inherent parallelism however, when it comes to 3D data migration, as the data size increases the resource requirement of the algorithm also increases. This challenges its practical implementation even on current generation high performance computing systems. Therefore a smart parallelization approach is essential to handle 3D data for migration. The most compute intensive part of Kirchhoff depth migration algorithm is the calculation of traveltime tables due to its resource requirements such as memory/storage and I/O. In the current research work, we target this area and develop a competent parallel algorithm for post and prestack 3D Kirchhoff depth migration, using hybrid MPI+OpenMP programming techniques. We introduce a concept of flexi-depth iterations while depth migrating data in parallel imaging space, using optimized traveltime table computations. This concept provides flexibility to the algorithm by migrating data in a number of depth iterations, which depends upon the available node memory and the size of data to be migrated during runtime. Furthermore, it minimizes the requirements of storage, I/O and inter-node communication, thus making it advantageous over the conventional parallelization approaches. The developed parallel algorithm is demonstrated and analysed on Yuva II, a PARAM series of supercomputers. Optimization, performance and scalability experiment results along with the migration outcome show the effectiveness of the parallel algorithm.

  3. Parallel S/sub n/ iteration schemes

    International Nuclear Information System (INIS)

    Wienke, B.R.; Hiromoto, R.E.

    1986-01-01

    The iterative, multigroup, discrete ordinates (S/sub n/) technique for solving the linear transport equation enjoys widespread usage and appeal. Serial iteration schemes and numerical algorithms developed over the years provide a timely framework for parallel extension. On the Denelcor HEP, the authors investigate three parallel iteration schemes for solving the one-dimensional S/sub n/ transport equation. The multigroup representation and serial iteration methods are also reviewed. This analysis represents a first attempt to extend serial S/sub n/ algorithms to parallel environments and provides good baseline estimates on ease of parallel implementation, relative algorithm efficiency, comparative speedup, and some future directions. The authors examine ordered and chaotic versions of these strategies, with and without concurrent rebalance and diffusion acceleration. Two strategies efficiently support high degrees of parallelization and appear to be robust parallel iteration techniques. The third strategy is a weaker parallel algorithm. Chaotic iteration, difficult to simulate on serial machines, holds promise and converges faster than ordered versions of the schemes. Actual parallel speedup and efficiency are high and payoff appears substantial

  4. Efficient parallel iterative solvers for the solution of large dense linear systems arising from the boundary element method in electromagnetism

    Energy Technology Data Exchange (ETDEWEB)

    Alleon, G. [EADS-CCR, 31 - Blagnac (France); Carpentieri, B.; Du, I.S.; Giraud, L.; Langou, J.; Martin, E. [Cerfacs, 31 - Toulouse (France)

    2003-07-01

    The boundary element method has become a popular tool for the solution of Maxwell's equations in electromagnetism. It discretizes only the surface of the radiating object and gives rise to linear systems that are smaller in size compared to those arising from finite element or finite difference discretizations. However, these systems are prohibitively demanding in terms of memory for direct methods and challenging to solve by iterative methods. In this paper we address the iterative solution via preconditioned Krylov methods of electromagnetic scattering problems expressed in an integral formulation, with main focus on the design of the pre-conditioner. We consider an approximate inverse method based on the Frobenius-norm minimization with a pattern prescribed in advance. The pre-conditioner is constructed from a sparse approximation of the dense coefficient matrix, and the patterns both for the pre-conditioner and for the coefficient matrix are computed a priori using geometric information from the mesh. We describe the implementation of the approximate inverse in an out-of-core parallel code that uses multipole techniques for the matrix-vector products, and show results on the numerical scalability of our method on systems of size up to one million unknowns. We propose an embedded iterative scheme based on the GMRES method and combined with multipole techniques, aimed at improving the robustness of the approximate inverse for large problems. We prove by numerical experiments that the proposed scheme enables the solution of very large and difficult problems efficiently at reduced computational and memory cost. Finally we perform a preliminary study on a spectral two-level pre-conditioner to enhance the robustness of our method. This numerical technique exploits spectral information of the preconditioned systems to build a low rank-update of the pre-conditioner. (authors)

  5. Efficient parallel iterative solvers for the solution of large dense linear systems arising from the boundary element method in electromagnetism

    International Nuclear Information System (INIS)

    Alleon, G.; Carpentieri, B.; Du, I.S.; Giraud, L.; Langou, J.; Martin, E.

    2003-01-01

    The boundary element method has become a popular tool for the solution of Maxwell's equations in electromagnetism. It discretizes only the surface of the radiating object and gives rise to linear systems that are smaller in size compared to those arising from finite element or finite difference discretizations. However, these systems are prohibitively demanding in terms of memory for direct methods and challenging to solve by iterative methods. In this paper we address the iterative solution via preconditioned Krylov methods of electromagnetic scattering problems expressed in an integral formulation, with main focus on the design of the pre-conditioner. We consider an approximate inverse method based on the Frobenius-norm minimization with a pattern prescribed in advance. The pre-conditioner is constructed from a sparse approximation of the dense coefficient matrix, and the patterns both for the pre-conditioner and for the coefficient matrix are computed a priori using geometric information from the mesh. We describe the implementation of the approximate inverse in an out-of-core parallel code that uses multipole techniques for the matrix-vector products, and show results on the numerical scalability of our method on systems of size up to one million unknowns. We propose an embedded iterative scheme based on the GMRES method and combined with multipole techniques, aimed at improving the robustness of the approximate inverse for large problems. We prove by numerical experiments that the proposed scheme enables the solution of very large and difficult problems efficiently at reduced computational and memory cost. Finally we perform a preliminary study on a spectral two-level pre-conditioner to enhance the robustness of our method. This numerical technique exploits spectral information of the preconditioned systems to build a low rank-update of the pre-conditioner. (authors)

  6. Time parallelization of advanced operation scenario simulations of ITER plasma

    International Nuclear Information System (INIS)

    Samaddar, D; Casper, T A; Kim, S H; Houlberg, W A; Berry, L A; Elwasif, W R; Batchelor, D

    2013-01-01

    This work demonstrates that simulations of advanced burning plasma operation scenarios can be successfully parallelized in time using the parareal algorithm. CORSICA -an advanced operation scenario code for tokamak plasmas is used as a test case. This is a unique application since the parareal algorithm has so far been applied to relatively much simpler systems except for the case of turbulence. In the present application, a computational gain of an order of magnitude has been achieved which is extremely promising. A successful implementation of the Parareal algorithm to codes like CORSICA ushers in the possibility of time efficient simulations of ITER plasmas.

  7. Parallel computation of multigroup reactivity coefficient using iterative method

    Science.gov (United States)

    Susmikanti, Mike; Dewayatna, Winter

    2013-09-01

    One of the research activities to support the commercial radioisotope production program is a safety research target irradiation FPM (Fission Product Molybdenum). FPM targets form a tube made of stainless steel in which the nuclear degrees of superimposed high-enriched uranium. FPM irradiation tube is intended to obtain fission. The fission material widely used in the form of kits in the world of nuclear medicine. Irradiation FPM tube reactor core would interfere with performance. One of the disorders comes from changes in flux or reactivity. It is necessary to study a method for calculating safety terrace ongoing configuration changes during the life of the reactor, making the code faster became an absolute necessity. Neutron safety margin for the research reactor can be reused without modification to the calculation of the reactivity of the reactor, so that is an advantage of using perturbation method. The criticality and flux in multigroup diffusion model was calculate at various irradiation positions in some uranium content. This model has a complex computation. Several parallel algorithms with iterative method have been developed for the sparse and big matrix solution. The Black-Red Gauss Seidel Iteration and the power iteration parallel method can be used to solve multigroup diffusion equation system and calculated the criticality and reactivity coeficient. This research was developed code for reactivity calculation which used one of safety analysis with parallel processing. It can be done more quickly and efficiently by utilizing the parallel processing in the multicore computer. This code was applied for the safety limits calculation of irradiated targets FPM with increment Uranium.

  8. Parallel iterative decoding of transform domain Wyner-Ziv video using cross bitplane correlation

    DEFF Research Database (Denmark)

    Luong, Huynh Van; Huang, Xin; Forchhammer, Søren

    2011-01-01

    decoding scheme is proposed to improve the coding efficiency of TDWZ video codecs. The proposed parallel iterative LDPC decoding scheme is able to utilize cross bitplane correlation during decoding, by iteratively refining the soft-input, updating a modeled noise distribution and thereafter enhancing......In recent years, Transform Domain Wyner-Ziv (TDWZ) video coding has been proposed as an efficient Distributed Video Coding (DVC) solution, which fully or partly exploits the source statistics at the decoder to reduce the computational burden at the encoder. In this paper, a parallel iterative LDPC...

  9. Sparse BLIP: BLind Iterative Parallel imaging reconstruction using compressed sensing.

    Science.gov (United States)

    She, Huajun; Chen, Rong-Rong; Liang, Dong; DiBella, Edward V R; Ying, Leslie

    2014-02-01

    To develop a sensitivity-based parallel imaging reconstruction method to reconstruct iteratively both the coil sensitivities and MR image simultaneously based on their prior information. Parallel magnetic resonance imaging reconstruction problem can be formulated as a multichannel sampling problem where solutions are sought analytically. However, the channel functions given by the coil sensitivities in parallel imaging are not known exactly and the estimation error usually leads to artifacts. In this study, we propose a new reconstruction algorithm, termed Sparse BLind Iterative Parallel, for blind iterative parallel imaging reconstruction using compressed sensing. The proposed algorithm reconstructs both the sensitivity functions and the image simultaneously from undersampled data. It enforces the sparseness constraint in the image as done in compressed sensing, but is different from compressed sensing in that the sensing matrix is unknown and additional constraint is enforced on the sensitivities as well. Both phantom and in vivo imaging experiments were carried out with retrospective undersampling to evaluate the performance of the proposed method. Experiments show improvement in Sparse BLind Iterative Parallel reconstruction when compared with Sparse SENSE, JSENSE, IRGN-TV, and L1-SPIRiT reconstructions with the same number of measurements. The proposed Sparse BLind Iterative Parallel algorithm reduces the reconstruction errors when compared to the state-of-the-art parallel imaging methods. Copyright © 2013 Wiley Periodicals, Inc.

  10. Iterative algorithms for large sparse linear systems on parallel computers

    Science.gov (United States)

    Adams, L. M.

    1982-01-01

    Algorithms for assembling in parallel the sparse system of linear equations that result from finite difference or finite element discretizations of elliptic partial differential equations, such as those that arise in structural engineering are developed. Parallel linear stationary iterative algorithms and parallel preconditioned conjugate gradient algorithms are developed for solving these systems. In addition, a model for comparing parallel algorithms on array architectures is developed and results of this model for the algorithms are given.

  11. Iteration schemes for parallelizing models of superconductivity

    Energy Technology Data Exchange (ETDEWEB)

    Gray, P.A. [Michigan State Univ., East Lansing, MI (United States)

    1996-12-31

    The time dependent Lawrence-Doniach model, valid for high fields and high values of the Ginzburg-Landau parameter, is often used for studying vortex dynamics in layered high-T{sub c} superconductors. When solving these equations numerically, the added degrees of complexity due to the coupling and nonlinearity of the model often warrant the use of high-performance computers for their solution. However, the interdependence between the layers can be manipulated so as to allow parallelization of the computations at an individual layer level. The reduced parallel tasks may then be solved independently using a heterogeneous cluster of networked workstations connected together with Parallel Virtual Machine (PVM) software. Here, this parallelization of the model is discussed and several computational implementations of varying degrees of parallelism are presented. Computational results are also given which contrast properties of convergence speed, stability, and consistency of these implementations. Included in these results are models involving the motion of vortices due to an applied current and pinning effects due to various material properties.

  12. Parallel GPU implementation of iterative PCA algorithms.

    Science.gov (United States)

    Andrecut, M

    2009-11-01

    Principal component analysis (PCA) is a key statistical technique for multivariate data analysis. For large data sets, the common approach to PCA computation is based on the standard NIPALS-PCA algorithm, which unfortunately suffers from loss of orthogonality, and therefore its applicability is usually limited to the estimation of the first few components. Here we present an algorithm based on Gram-Schmidt orthogonalization (called GS-PCA), which eliminates this shortcoming of NIPALS-PCA. Also, we discuss the GPU (Graphics Processing Unit) parallel implementation of both NIPALS-PCA and GS-PCA algorithms. The numerical results show that the GPU parallel optimized versions, based on CUBLAS (NVIDIA), are substantially faster (up to 12 times) than the CPU optimized versions based on CBLAS (GNU Scientific Library).

  13. AZTEC: A parallel iterative package for the solving linear systems

    Energy Technology Data Exchange (ETDEWEB)

    Hutchinson, S.A.; Shadid, J.N.; Tuminaro, R.S. [Sandia National Labs., Albuquerque, NM (United States)

    1996-12-31

    We describe a parallel linear system package, AZTEC. The package incorporates a number of parallel iterative methods (e.g. GMRES, biCGSTAB, CGS, TFQMR) and preconditioners (e.g. Jacobi, Gauss-Seidel, polynomial, domain decomposition with LU or ILU within subdomains). Additionally, AZTEC allows for the reuse of previous preconditioning factorizations within Newton schemes for nonlinear methods. Currently, a number of different users are using this package to solve a variety of PDE applications.

  14. Parallelization of the model-based iterative reconstruction algorithm DIRA

    International Nuclear Information System (INIS)

    Oertenberg, A.; Sandborg, M.; Alm Carlsson, G.; Malusek, A.; Magnusson, M.

    2016-01-01

    New paradigms for parallel programming have been devised to simplify software development on multi-core processors and many-core graphical processing units (GPU). Despite their obvious benefits, the parallelization of existing computer programs is not an easy task. In this work, the use of the Open Multiprocessing (OpenMP) and Open Computing Language (OpenCL) frameworks is considered for the parallelization of the model-based iterative reconstruction algorithm DIRA with the aim to significantly shorten the code's execution time. Selected routines were parallelized using OpenMP and OpenCL libraries; some routines were converted from MATLAB to C and optimised. Parallelization of the code with the OpenMP was easy and resulted in an overall speedup of 15 on a 16-core computer. Parallelization with OpenCL was more difficult owing to differences between the central processing unit and GPU architectures. The resulting speedup was substantially lower than the theoretical peak performance of the GPU; the cause was explained. (authors)

  15. PARALLEL ITERATIVE RECONSTRUCTION OF PHANTOM CATPHAN ON EXPERIMENTAL DATA

    Directory of Open Access Journals (Sweden)

    M. A. Mirzavand

    2016-01-01

    Full Text Available The principles of fast parallel iterative algorithms based on the use of graphics accelerators and OpenGL library are considered in the paper. The proposed approach provides simultaneous minimization of the residuals of the desired solution and total variation of the reconstructed three- dimensional image. The number of necessary input data, i. e. conical X-ray projections, can be reduced several times. It means in a corresponding number of times the possibility to reduce radiation exposure to the patient. At the same time maintain the necessary contrast and spatial resolution of threedimensional image of the patient. Heuristic iterative algorithm can be used as an alternative to the well-known three-dimensional Feldkamp algorithm.

  16. P-SPARSLIB: A parallel sparse iterative solution package

    Energy Technology Data Exchange (ETDEWEB)

    Saad, Y. [Univ. of Minnesota, Minneapolis, MN (United States)

    1994-12-31

    Iterative methods are gaining popularity in engineering and sciences at a time where the computational environment is changing rapidly. P-SPARSLIB is a project to build a software library for sparse matrix computations on parallel computers. The emphasis is on iterative methods and the use of distributed sparse matrices, an extension of the domain decomposition approach to general sparse matrices. One of the goals of this project is to develop a software package geared towards specific applications. For example, the author will test the performance and usefulness of P-SPARSLIB modules on linear systems arising from CFD applications. Equally important is the goal of portability. In the long run, the author wishes to ensure that this package is portable on a variety of platforms, including SIMD environments and shared memory environments.

  17. Iterative schemes for parallel Sn algorithms in a shared-memory computing environment

    International Nuclear Information System (INIS)

    Haghighat, A.; Hunter, M.A.; Mattis, R.E.

    1995-01-01

    Several two-dimensional spatial domain partitioning S n transport theory algorithms are developed on the basis of different iterative schemes. These algorithms are incorporated into TWOTRAN-II and tested on the shared-memory CRAY Y-MP C90 computer. For a series of fixed-source r-z geometry homogeneous problems, it is demonstrated that the concurrent red-black algorithms may result in large parallel efficiencies (>60%) on C90. It is also demonstrated that for a realistic shielding problem, the use of the negative flux fixup causes high load imbalance, which results in a significant loss of parallel efficiency

  18. Parallel iterative procedures for approximate solutions of wave propagation by finite element and finite difference methods

    Energy Technology Data Exchange (ETDEWEB)

    Kim, S. [Purdue Univ., West Lafayette, IN (United States)

    1994-12-31

    Parallel iterative procedures based on domain decomposition techniques are defined and analyzed for the numerical solution of wave propagation by finite element and finite difference methods. For finite element methods, in a Lagrangian framework, an efficient way for choosing the algorithm parameter as well as the algorithm convergence are indicated. Some heuristic arguments for finding the algorithm parameter for finite difference schemes are addressed. Numerical results are presented to indicate the effectiveness of the methods.

  19. The new Exponential Directional Iterative (EDI) 3-D Sn scheme for parallel adaptive differencing

    International Nuclear Information System (INIS)

    Sjoden, G.E.

    2005-01-01

    The new Exponential Directional Iterative (EDI) discrete ordinates (Sn) scheme for 3-D Cartesian Coordinates is presented. The EDI scheme is a logical extension of the positive, efficient Exponential Directional Weighted (EDW) Sn scheme currently used as the third level of the adaptive spatial differencing algorithm in the PENTRAN parallel discrete ordinates solver. Here, the derivation and advantages of the EDI scheme are presented; EDI uses EDW-rendered exponential coefficients as initial starting values to begin a fixed point iteration of the exponential coefficients. One issue that required evaluation was an iterative cutoff criterion to prevent the application of an unstable fixed point iteration; although this was needed in some cases, it was readily treated with a default to EDW. Iterative refinement of the exponential coefficients in EDI typically converged in fewer than four fixed point iterations. Moreover, EDI yielded more accurate angular fluxes compared to the other schemes tested, particularly in streaming conditions. Overall, it was found that the EDI scheme was up to an order of magnitude more accurate than the EDW scheme on a given mesh interval in streaming cases, and is potentially a good candidate as a fourth-level differencing scheme in the PENTRAN adaptive differencing sequence. The 3-D Cartesian computational cost of EDI was only about 20% more than the EDW scheme, and about 40% more than Diamond Zero (DZ). More evaluation and testing are required to determine suitable upgrade metrics for EDI to be fully integrated into the current adaptive spatial differencing sequence in PENTRAN. (author)

  20. Parallel iterative solution of the Hermite Collocation equations on GPUs II

    International Nuclear Information System (INIS)

    Vilanakis, N; Mathioudakis, E

    2014-01-01

    Hermite Collocation is a high order finite element method for Boundary Value Problems modelling applications in several fields of science and engineering. Application of this integration free numerical solver for the solution of linear BVPs results in a large and sparse general system of algebraic equations, suggesting the usage of an efficient iterative solver especially for realistic simulations. In part I of this work an efficient parallel algorithm of the Schur complement method coupled with Bi-Conjugate Gradient Stabilized (BiCGSTAB) iterative solver has been designed for multicore computing architectures with a Graphics Processing Unit (GPU). In the present work the proposed algorithm has been extended for high performance computing environments consisting of multiprocessor machines with multiple GPUs. Since this is a distributed GPU and shared CPU memory parallel architecture, a hybrid memory treatment is needed for the development of the parallel algorithm. The realization of the algorithm took place on a multiprocessor machine HP SL390 with Tesla M2070 GPUs using the OpenMP and OpenACC standards. Execution time measurements reveal the efficiency of the parallel implementation

  1. Parallel iterative solvers and preconditioners using approximate hierarchical methods

    Energy Technology Data Exchange (ETDEWEB)

    Grama, A.; Kumar, V.; Sameh, A. [Univ. of Minnesota, Minneapolis, MN (United States)

    1996-12-31

    In this paper, we report results of the performance, convergence, and accuracy of a parallel GMRES solver for Boundary Element Methods. The solver uses a hierarchical approximate matrix-vector product based on a hybrid Barnes-Hut / Fast Multipole Method. We study the impact of various accuracy parameters on the convergence and show that with minimal loss in accuracy, our solver yields significant speedups. We demonstrate the excellent parallel efficiency and scalability of our solver. The combined speedups from approximation and parallelism represent an improvement of several orders in solution time. We also develop fast and paralellizable preconditioners for this problem. We report on the performance of an inner-outer scheme and a preconditioner based on truncated Green`s function. Experimental results on a 256 processor Cray T3D are presented.

  2. An iterative algorithm for solving the multidimensional neutron diffusion nodal method equations on parallel computers

    International Nuclear Information System (INIS)

    Kirk, B.L.; Azmy, Y.Y.

    1992-01-01

    In this paper the one-group, steady-state neutron diffusion equation in two-dimensional Cartesian geometry is solved using the nodal integral method. The discrete variable equations comprise loosely coupled sets of equations representing the nodal balance of neutrons, as well as neutron current continuity along rows or columns of computational cells. An iterative algorithm that is more suitable for solving large problems concurrently is derived based on the decomposition of the spatial domain and is accelerated using successive overrelaxation. This algorithm is very well suited for parallel computers, especially since the spatial domain decomposition occurs naturally, so that the number of iterations required for convergence does not depend on the number of processors participating in the calculation. Implementation of the authors' algorithm on the Intel iPSC/2 hypercube and Sequent Balance 8000 parallel computer is presented, and measured speedup and efficiency for test problems are reported. The results suggest that the efficiency of the hypercube quickly deteriorates when many processors are used, while the Sequent Balance retains very high efficiency for a comparable number of participating processors. This leads to the conjecture that message-passing parallel computers are not as well suited for this algorithm as shared-memory machines

  3. Structured Parallel Programming Patterns for Efficient Computation

    CERN Document Server

    McCool, Michael; Robison, Arch

    2012-01-01

    Programming is now parallel programming. Much as structured programming revolutionized traditional serial programming decades ago, a new kind of structured programming, based on patterns, is relevant to parallel programming today. Parallel computing experts and industry insiders Michael McCool, Arch Robison, and James Reinders describe how to design and implement maintainable and efficient parallel algorithms using a pattern-based approach. They present both theory and practice, and give detailed concrete examples using multiple programming models. Examples are primarily given using two of th

  4. Accuracy analysis of hybrid parallel robot for the assembling of ITER

    International Nuclear Information System (INIS)

    Wang Yongbo; Pessi, Pekka; Wu Huapeng; Handroos, Heikki

    2009-01-01

    This paper presents a novel mobile parallel robot, which is able to carry welding and machining processes from inside the international thermonuclear experimental reactor (ITER) vacuum vessel (VV). The kinematics design of the robot has been optimized for ITER access. To improve the accuracy of the parallel robot, the errors caused by the stiffness and manufacture process have to be compensated or limited to a minimum value. In this paper kinematics errors and stiffness modeling are given. The simulation results are presented.

  5. Accuracy analysis of hybrid parallel robot for the assembling of ITER

    Energy Technology Data Exchange (ETDEWEB)

    Wang Yongbo [Institute of Mechatronics and Virtual Engineering, Lappeenranta University of Technology, Skinnarilankatu 34, 53850 Lappeenranta (Finland); The State Key Laboratory of Mechanical Transmission, Chongqing University (China); Pessi, Pekka [Institute of Mechatronics and Virtual Engineering, Lappeenranta University of Technology, Skinnarilankatu 34, 53850 Lappeenranta (Finland); Wu Huapeng [Institute of Mechatronics and Virtual Engineering, Lappeenranta University of Technology, Skinnarilankatu 34, 53850 Lappeenranta (Finland)], E-mail: huapeng@lut.fi; Handroos, Heikki [Institute of Mechatronics and Virtual Engineering, Lappeenranta University of Technology, Skinnarilankatu 34, 53850 Lappeenranta (Finland)

    2009-06-15

    This paper presents a novel mobile parallel robot, which is able to carry welding and machining processes from inside the international thermonuclear experimental reactor (ITER) vacuum vessel (VV). The kinematics design of the robot has been optimized for ITER access. To improve the accuracy of the parallel robot, the errors caused by the stiffness and manufacture process have to be compensated or limited to a minimum value. In this paper kinematics errors and stiffness modeling are given. The simulation results are presented.

  6. Fast parallel algorithm for three-dimensional distance-driven model in iterative computed tomography reconstruction

    International Nuclear Information System (INIS)

    Chen Jian-Lin; Li Lei; Wang Lin-Yuan; Cai Ai-Long; Xi Xiao-Qi; Zhang Han-Ming; Li Jian-Xin; Yan Bin

    2015-01-01

    The projection matrix model is used to describe the physical relationship between reconstructed object and projection. Such a model has a strong influence on projection and backprojection, two vital operations in iterative computed tomographic reconstruction. The distance-driven model (DDM) is a state-of-the-art technology that simulates forward and back projections. This model has a low computational complexity and a relatively high spatial resolution; however, it includes only a few methods in a parallel operation with a matched model scheme. This study introduces a fast and parallelizable algorithm to improve the traditional DDM for computing the parallel projection and backprojection operations. Our proposed model has been implemented on a GPU (graphic processing unit) platform and has achieved satisfactory computational efficiency with no approximation. The runtime for the projection and backprojection operations with our model is approximately 4.5 s and 10.5 s per loop, respectively, with an image size of 256×256×256 and 360 projections with a size of 512×512. We compare several general algorithms that have been proposed for maximizing GPU efficiency by using the unmatched projection/backprojection models in a parallel computation. The imaging resolution is not sacrificed and remains accurate during computed tomographic reconstruction. (paper)

  7. Primal Domain Decomposition Method with Direct and Iterative Solver for Circuit-Field-Torque Coupled Parallel Finite Element Method to Electric Machine Modelling

    Directory of Open Access Journals (Sweden)

    Daniel Marcsa

    2015-01-01

    Full Text Available The analysis and design of electromechanical devices involve the solution of large sparse linear systems, and require therefore high performance algorithms. In this paper, the primal Domain Decomposition Method (DDM with parallel forward-backward and with parallel Preconditioned Conjugate Gradient (PCG solvers are introduced in two-dimensional parallel time-stepping finite element formulation to analyze rotating machine considering the electromagnetic field, external circuit and rotor movement. The proposed parallel direct and the iterative solver with two preconditioners are analyzed concerning its computational efficiency and number of iterations of the solver with different preconditioners. Simulation results of a rotating machine is also presented.

  8. Efficient parallel implicit methods for rotary-wing aerodynamics calculations

    Science.gov (United States)

    Wissink, Andrew M.

    Euler/Navier-Stokes Computational Fluid Dynamics (CFD) methods are commonly used for prediction of the aerodynamics and aeroacoustics of modern rotary-wing aircraft. However, their widespread application to large complex problems is limited lack of adequate computing power. Parallel processing offers the potential for dramatic increases in computing power, but most conventional implicit solution methods are inefficient in parallel and new techniques must be adopted to realize its potential. This work proposes alternative implicit schemes for Euler/Navier-Stokes rotary-wing calculations which are robust and efficient in parallel. The first part of this work proposes an efficient parallelizable modification of the Lower Upper-Symmetric Gauss Seidel (LU-SGS) implicit operator used in the well-known Transonic Unsteady Rotor Navier Stokes (TURNS) code. The new hybrid LU-SGS scheme couples a point-relaxation approach of the Data Parallel-Lower Upper Relaxation (DP-LUR) algorithm for inter-processor communication with the Symmetric Gauss Seidel algorithm of LU-SGS for on-processor computations. With the modified operator, TURNS is implemented in parallel using Message Passing Interface (MPI) for communication. Numerical performance and parallel efficiency are evaluated on the IBM SP2 and Thinking Machines CM-5 multi-processors for a variety of steady-state and unsteady test cases. The hybrid LU-SGS scheme maintains the numerical performance of the original LU-SGS algorithm in all cases and shows a good degree of parallel efficiency. It experiences a higher degree of robustness than DP-LUR for third-order upwind solutions. The second part of this work examines use of Krylov subspace iterative solvers for the nonlinear CFD solutions. The hybrid LU-SGS scheme is used as a parallelizable preconditioner. Two iterative methods are tested, Generalized Minimum Residual (GMRES) and Orthogonal s-Step Generalized Conjugate Residual (OSGCR). The Newton method demonstrates good

  9. Design of parallel intersector weld/cut robot for machining processes in ITER vacuum vessel

    International Nuclear Information System (INIS)

    Wu Huapeng; Handroos, Heikki; Kovanen, Janne; Rouvinen, Asko; Hannukainen, Petri; Saira, Tanja; Jones, Lawrence

    2003-01-01

    This paper presents a new parallel robot Penta-WH, which has five degrees of freedom driven by hydraulic cylinders. The manipulator has a large, singularity-free workspace and high stiffness and it acts as a transport device for welding, machining and inspection end-effectors inside the ITER vacuum vessel. The presented kinematic structure of a parallel robot is particularly suitable for the ITER environment. Analysis of the machining process for ITER, such as the machining methods and forces are given, and the kinematic analyses, such as workspace and force capacity are discussed

  10. III - Template Metaprogramming for massively parallel scientific computing - Templates for Iteration; Thread-level Parallelism

    CERN Multimedia

    CERN. Geneva

    2016-01-01

    Large scale scientific computing raises questions on different levels ranging from the fomulation of the problems to the choice of the best algorithms and their implementation for a specific platform. There are similarities in these different topics that can be exploited by modern-style C++ template metaprogramming techniques to produce readable, maintainable and generic code. Traditional low-level code tend to be fast but platform-dependent, and it obfuscates the meaning of the algorithm. On the other hand, object-oriented approach is nice to read, but may come with an inherent performance penalty. These lectures aim to present he basics of the Expression Template (ET) idiom which allows us to keep the object-oriented approach without sacrificing performance. We will in particular show to to enhance ET to include SIMD vectorization. We will then introduce techniques for abstracting iteration, and introduce thread-level parallelism for use in heavy data-centric loads. We will show to to apply these methods i...

  11. Efficient Parallel Algorithms for Unsteady Incompressible Flows

    KAUST Repository

    Guermond, Jean-Luc; Minev, Peter D.

    2013-01-01

    The objective of this paper is to give an overview of recent developments on splitting schemes for solving the time-dependent incompressible Navier–Stokes equations and to discuss possible extensions to the variable density/viscosity case. A particular attention is given to algorithms that can be implemented efficiently on large parallel clusters.

  12. Issues in developing parallel iterative algorithms for solving partial differential equations on a (transputer-based) distributed parallel computing system

    International Nuclear Information System (INIS)

    Rajagopalan, S.; Jethra, A.; Khare, A.N.; Ghodgaonkar, M.D.; Srivenkateshan, R.; Menon, S.V.G.

    1990-01-01

    Issues relating to implementing iterative procedures, for numerical solution of elliptic partial differential equations, on a distributed parallel computing system are discussed. Preliminary investigations show that a speed-up of about 3.85 is achievable on a four transputer pipeline network. (author). 2 figs., 3 a ppendixes., 7 refs

  13. Multi-petascale highly efficient parallel supercomputer

    Science.gov (United States)

    Asaad, Sameh; Bellofatto, Ralph E.; Blocksome, Michael A.; Blumrich, Matthias A.; Boyle, Peter; Brunheroto, Jose R.; Chen, Dong; Cher, Chen-Yong; Chiu, George L.; Christ, Norman; Coteus, Paul W.; Davis, Kristan D.; Dozsa, Gabor J.; Eichenberger, Alexandre E.; Eisley, Noel A.; Ellavsky, Matthew R.; Evans, Kahn C.; Fleischer, Bruce M.; Fox, Thomas W.; Gara, Alan; Giampapa, Mark E.; Gooding, Thomas M.; Gschwind, Michael K.; Gunnels, John A.; Hall, Shawn A.; Haring, Rudolf A.; Heidelberger, Philip; Inglett, Todd A.; Knudson, Brant L.; Kopcsay, Gerard V.; Kumar, Sameer; Mamidala, Amith R.; Marcella, James A.; Megerian, Mark G.; Miller, Douglas R.; Miller, Samuel J.; Muff, Adam J.; Mundy, Michael B.; O'Brien, John K.; O'Brien, Kathryn M.; Ohmacht, Martin; Parker, Jeffrey J.; Poole, Ruth J.; Ratterman, Joseph D.; Salapura, Valentina; Satterfield, David L.; Senger, Robert M.; Steinmacher-Burow, Burkhard; Stockdell, William M.; Stunkel, Craig B.; Sugavanam, Krishnan; Sugawara, Yutaka; Takken, Todd E.; Trager, Barry M.; Van Oosten, James L.; Wait, Charles D.; Walkup, Robert E.; Watson, Alfred T.; Wisniewski, Robert W.; Wu, Peng

    2018-05-15

    A Multi-Petascale Highly Efficient Parallel Supercomputer of 100 petaflop-scale includes node architectures based upon System-On-a-Chip technology, where each processing node comprises a single Application Specific Integrated Circuit (ASIC). The ASIC nodes are interconnected by a five dimensional torus network that optimally maximize the throughput of packet communications between nodes and minimize latency. The network implements collective network and a global asynchronous network that provides global barrier and notification functions. Integrated in the node design include a list-based prefetcher. The memory system implements transaction memory, thread level speculation, and multiversioning cache that improves soft error rate at the same time and supports DMA functionality allowing for parallel processing message-passing.

  14. A mobile robot with parallel kinematics constructed under requirements for assembling and machining of the ITER vacuum vessel

    International Nuclear Information System (INIS)

    Pessi, P.; Huapeng Wu; Handroos, H.; Jones, L.

    2006-01-01

    ITER sectors require more stringent tolerances ± 5 mm than normally expected for the size of structure involved. The walls of ITER sectors are made of 60 mm thick stainless steel and are joined together by high efficiency structural and leak tight welds. In addition to the initial vacuum vessel assembly, sectors may have to be replaced for repair. Since commercially available machines are too heavy for the required machining operations and the lifting of a possible e-beam gun column system, and conventional robots lack the stiffness and accuracy in such machining condition, a new flexible, lightweight and mobile robotic machine is being considered. For the assembly of the ITER vacuum vessel sector, precise positioning of welding end-effectors, at some distance in a confined space from the available supports, will be required, which is not possible using conventional machines or robots. This paper presents a special robot, able to carry out welding and machining processes from inside the ITER vacuum vessel, consisting of a ten-degree-of-freedom parallel robot mounted on a carriage driven by electric motor/gearbox on a track. The robot consists of a Stewart platform based parallel mechanism. Water hydraulic cylinders are used as actuators to reach six degrees of freedom for parallel construction. Two linear and two rotational motions are used for enlargement the workspace of the manipulator. The robot carries both welding gun such as a TIG, hybrid laser or e-beam welding gun to weld the inner and outer walls of the ITER vacuum vessel sectors and machining tools to cut and milling the walls with necessary accuracy, it can also carry other tools and material to a required position inside the vacuum vessel . For assembling an on line six degrees of freedom seam finding algorithm has been developed, which enables the robot to find welding seam automatically in a very complex environment. In the machining multi flexible machining processes carried out automatically by

  15. A mobile robot with parallel kinematics to meet the requirements for assembling and machining the ITER vacuum vessel

    International Nuclear Information System (INIS)

    Pessi, Pekka; Wu, Huapeng; Handroos, Heikki; Jones, Lawrence

    2007-01-01

    The present paper introduces a mobile parallel robot developed for International Thermonuclear Experimental Reactor (ITER). The task of the robot is to carry out welding and machining processes inside the ITER vacuum vessel. The kinematic design of the robot has been optimized for the ITER access. The kinematic analysis is given in the paper. A virtual prototype of the parallel robot is built. A dynamic behavior of the whole robot is studied by the multi-body system simulation (MBS)

  16. A mobile robot with parallel kinematics to meet the requirements for assembling and machining the ITER vacuum vessel

    Energy Technology Data Exchange (ETDEWEB)

    Pessi, Pekka [Lappeenranta University of Technology, Lappeenranta (Finland)], E-mail: pessi@lut.fi; Wu, Huapeng; Handroos, Heikki [Lappeenranta University of Technology, Lappeenranta (Finland); Jones, Lawrence [EFDA Close Support Unit, Boltzmannstrasse 2, Garching D-85748 (Germany)

    2007-10-15

    The present paper introduces a mobile parallel robot developed for International Thermonuclear Experimental Reactor (ITER). The task of the robot is to carry out welding and machining processes inside the ITER vacuum vessel. The kinematic design of the robot has been optimized for the ITER access. The kinematic analysis is given in the paper. A virtual prototype of the parallel robot is built. A dynamic behavior of the whole robot is studied by the multi-body system simulation (MBS)

  17. Efficiency Analysis of the Parallel Implementation of the SIMPLE Algorithm on Multiprocessor Computers

    Science.gov (United States)

    Lashkin, S. V.; Kozelkov, A. S.; Yalozo, A. V.; Gerasimov, V. Yu.; Zelensky, D. K.

    2017-12-01

    This paper describes the details of the parallel implementation of the SIMPLE algorithm for numerical solution of the Navier-Stokes system of equations on arbitrary unstructured grids. The iteration schemes for the serial and parallel versions of the SIMPLE algorithm are implemented. In the description of the parallel implementation, special attention is paid to computational data exchange among processors under the condition of the grid model decomposition using fictitious cells. We discuss the specific features for the storage of distributed matrices and implementation of vector-matrix operations in parallel mode. It is shown that the proposed way of matrix storage reduces the number of interprocessor exchanges. A series of numerical experiments illustrates the effect of the multigrid SLAE solver tuning on the general efficiency of the algorithm; the tuning involves the types of the cycles used (V, W, and F), the number of iterations of a smoothing operator, and the number of cells for coarsening. Two ways (direct and indirect) of efficiency evaluation for parallelization of the numerical algorithm are demonstrated. The paper presents the results of solving some internal and external flow problems with the evaluation of parallelization efficiency by two algorithms. It is shown that the proposed parallel implementation enables efficient computations for the problems on a thousand processors. Based on the results obtained, some general recommendations are made for the optimal tuning of the multigrid solver, as well as for selecting the optimal number of cells per processor.

  18. Efficient Parallel Strategy Improvement for Parity Games

    OpenAIRE

    Fearnley, John

    2017-01-01

    We study strategy improvement algorithms for solving parity games. While these algorithms are known to solve parity games using a very small number of iterations, experimental studies have found that a high step complexity causes them to perform poorly in practice. In this paper we seek to address this situation. Every iteration of the algorithm must compute a best response, and while the standard way of doing this uses the Bellman-Ford algorithm, we give experimental results that show that o...

  19. Multi-petascale highly efficient parallel supercomputer

    Science.gov (United States)

    Asaad, Sameh; Bellofatto, Ralph E.; Blocksome, Michael A.; Blumrich, Matthias A.; Boyle, Peter; Brunheroto, Jose R.; Chen, Dong; Cher, Chen -Yong; Chiu, George L.; Christ, Norman; Coteus, Paul W.; Davis, Kristan D.; Dozsa, Gabor J.; Eichenberger, Alexandre E.; Eisley, Noel A.; Ellavsky, Matthew R.; Evans, Kahn C.; Fleischer, Bruce M.; Fox, Thomas W.; Gara, Alan; Giampapa, Mark E.; Gooding, Thomas M.; Gschwind, Michael K.; Gunnels, John A.; Hall, Shawn A.; Haring, Rudolf A.; Heidelberger, Philip; Inglett, Todd A.; Knudson, Brant L.; Kopcsay, Gerard V.; Kumar, Sameer; Mamidala, Amith R.; Marcella, James A.; Megerian, Mark G.; Miller, Douglas R.; Miller, Samuel J.; Muff, Adam J.; Mundy, Michael B.; O'Brien, John K.; O'Brien, Kathryn M.; Ohmacht, Martin; Parker, Jeffrey J.; Poole, Ruth J.; Ratterman, Joseph D.; Salapura, Valentina; Satterfield, David L.; Senger, Robert M.; Smith, Brian; Steinmacher-Burow, Burkhard; Stockdell, William M.; Stunkel, Craig B.; Sugavanam, Krishnan; Sugawara, Yutaka; Takken, Todd E.; Trager, Barry M.; Van Oosten, James L.; Wait, Charles D.; Walkup, Robert E.; Watson, Alfred T.; Wisniewski, Robert W.; Wu, Peng

    2015-07-14

    A Multi-Petascale Highly Efficient Parallel Supercomputer of 100 petaOPS-scale computing, at decreased cost, power and footprint, and that allows for a maximum packaging density of processing nodes from an interconnect point of view. The Supercomputer exploits technological advances in VLSI that enables a computing model where many processors can be integrated into a single Application Specific Integrated Circuit (ASIC). Each ASIC computing node comprises a system-on-chip ASIC utilizing four or more processors integrated into one die, with each having full access to all system resources and enabling adaptive partitioning of the processors to functions such as compute or messaging I/O on an application by application basis, and preferably, enable adaptive partitioning of functions in accordance with various algorithmic phases within an application, or if I/O or other processors are underutilized, then can participate in computation or communication nodes are interconnected by a five dimensional torus network with DMA that optimally maximize the throughput of packet communications between nodes and minimize latency.

  20. Variation in efficiency of parallel algorithms. [for study of stiffness matrices in planar trusses

    Science.gov (United States)

    Hayashi, A.; Melosh, R. J.; Utku, S.; Salama, M.

    1985-01-01

    The present study has the objective to investigate some iterative parallel-processor linear equation solving algorithms with respect to efficiency for analyses of typical linear engineering systems. Attention is given to a set of n linear equations, Ku = p, where K = an n x n positive definite, sparsely populated, symmetric matrix, u = an n x 1 vector of unknown responses, and p = an n x 1 vector of prescribed constants. This study is concerned with a hybrid method in which iteration is used to solve the problem, while a direct method is used on the local processor level. Variations in the efficiency of parallel algorithms are explored. Measures of the efficiency are based on computer experiments regarding the algorithms. For all the algorithms, the wall clock time is found to decrease as the number of processors increases.

  1. An Efficient Algorithm for Perturbed Orbit Integration Combining Analytical Continuation and Modified Chebyshev Picard Iteration

    Science.gov (United States)

    Elgohary, T.; Kim, D.; Turner, J.; Junkins, J.

    2014-09-01

    Several methods exist for integrating the motion in high order gravity fields. Some recent methods use an approximate starting orbit, and an efficient method is needed for generating warm starts that account for specific low order gravity approximations. By introducing two scalar Lagrange-like invariants and employing Leibniz product rule, the perturbed motion is integrated by a novel recursive formulation. The Lagrange-like invariants allow exact arbitrary order time derivatives. Restricting attention to the perturbations due to the zonal harmonics J2 through J6, we illustrate an idea. The recursively generated vector-valued time derivatives for the trajectory are used to develop a continuation series-based solution for propagating position and velocity. Numerical comparisons indicate performance improvements of ~ 70X over existing explicit Runge-Kutta methods while maintaining mm accuracy for the orbit predictions. The Modified Chebyshev Picard Iteration (MCPI) is an iterative path approximation method to solve nonlinear ordinary differential equations. The MCPI utilizes Picard iteration with orthogonal Chebyshev polynomial basis functions to recursively update the states. The key advantages of the MCPI are as follows: 1) Large segments of a trajectory can be approximated by evaluating the forcing function at multiple nodes along the current approximation during each iteration. 2) It can readily handle general gravity perturbations as well as non-conservative forces. 3) Parallel applications are possible. The Picard sequence converges to the solution over large time intervals when the forces are continuous and differentiable. According to the accuracy of the starting solutions, however, the MCPI may require significant number of iterations and function evaluations compared to other integrators. In this work, we provide an efficient methodology to establish good starting solutions from the continuation series method; this warm start improves the performance of the

  2. Block iterative restoration of astronomical images with the massively parallel processor

    International Nuclear Information System (INIS)

    Heap, S.R.; Lindler, D.J.

    1987-01-01

    A method is described for algebraic image restoration capable of treating astronomical images. For a typical 500 x 500 image, direct algebraic restoration would require the solution of a 250,000 x 250,000 linear system. The block iterative approach is used to reduce the problem to solving 4900 121 x 121 linear systems. The algorithm was implemented on the Goddard Massively Parallel Processor, which can solve a 121 x 121 system in approximately 0.06 seconds. Examples are shown of the results for various astronomical images

  3. PCG: A software package for the iterative solution of linear systems on scalar, vector and parallel computers

    Energy Technology Data Exchange (ETDEWEB)

    Joubert, W. [Los Alamos National Lab., NM (United States); Carey, G.F. [Univ. of Texas, Austin, TX (United States)

    1994-12-31

    A great need exists for high performance numerical software libraries transportable across parallel machines. This talk concerns the PCG package, which solves systems of linear equations by iterative methods on parallel computers. The features of the package are discussed, as well as techniques used to obtain high performance as well as transportability across architectures. Representative numerical results are presented for several machines including the Connection Machine CM-5, Intel Paragon and Cray T3D parallel computers.

  4. An efficient iterative method for the generalized Stokes problem

    Energy Technology Data Exchange (ETDEWEB)

    Sameh, A. [Univ. of Minnesota, Twin Cities, MN (United States); Sarin, V. [Univ. of Illinois, Urbana, IL (United States)

    1996-12-31

    This paper presents an efficient iterative scheme for the generalized Stokes problem, which arises frequently in the simulation of time-dependent Navier-Stokes equations for incompressible fluid flow. The general form of the linear system is where A = {alpha}M + vT is an n x n symmetric positive definite matrix, in which M is the mass matrix, T is the discrete Laplace operator, {alpha} and {nu} are positive constants proportional to the inverses of the time-step {Delta}t and the Reynolds number Re respectively, and B is the discrete gradient operator of size n x k (k < n). Even though the matrix A is symmetric and positive definite, the system is indefinite due to the incompressibility constraint (B{sup T}u = 0). This causes difficulties both for iterative methods and commonly used preconditioners. Moreover, depending on the ratio {alpha}/{nu}, A behaves like the mass matrix M at one extreme and the Laplace operator T at the other, thus complicating the issue of preconditioning.

  5. Efficient Parallel Engineering Computing on Linux Workstations

    Science.gov (United States)

    Lou, John Z.

    2010-01-01

    A C software module has been developed that creates lightweight processes (LWPs) dynamically to achieve parallel computing performance in a variety of engineering simulation and analysis applications to support NASA and DoD project tasks. The required interface between the module and the application it supports is simple, minimal and almost completely transparent to the user applications, and it can achieve nearly ideal computing speed-up on multi-CPU engineering workstations of all operating system platforms. The module can be integrated into an existing application (C, C++, Fortran and others) either as part of a compiled module or as a dynamically linked library (DLL).

  6. Improved Iterative Parallel Interference Cancellation Receiver for Future Wireless DS-CDMA Systems

    Directory of Open Access Journals (Sweden)

    Andrea Bernacchioni

    2005-04-01

    Full Text Available We present a new turbo multiuser detector for turbo-coded direct sequence code division multiple access (DS-CDMA systems. The proposed detector is based on the utilization of a parallel interference cancellation (PIC and a bank of turbo decoders. The PIC is broken up in order to perform interference cancellation after each constituent decoder of the turbo decoding scheme. Moreover, in the paper we propose a new enhanced algorithm that provides a more accurate estimation of the signal-to-noise-plus-interference-ratio used in the tentative decision device and in the MAP decoding algorithm. The performance of the proposed receiver is evaluated by means of computer simulations for medium to very high system loads, in AWGN and multipath fading channel, and compared to recently proposed interference cancellation-based iterative MUD, by taking into account the number of iterations and the complexity involved. We will see that the proposed receiver outperforms the others especially for highly loaded systems.

  7. Efficient relaxed-Jacobi smoothers for multigrid on parallel computers

    Science.gov (United States)

    Yang, Xiang; Mittal, Rajat

    2017-03-01

    In this Technical Note, we present a family of Jacobi-based multigrid smoothers suitable for the solution of discretized elliptic equations. These smoothers are based on the idea of scheduled-relaxation Jacobi proposed recently by Yang & Mittal (2014) [18] and employ two or three successive relaxed Jacobi iterations with relaxation factors derived so as to maximize the smoothing property of these iterations. The performance of these new smoothers measured in terms of convergence acceleration and computational workload, is assessed for multi-domain implementations typical of parallelized solvers, and compared to the lexicographic point Gauss-Seidel smoother. The tests include the geometric multigrid method on structured grids as well as the algebraic grid method on unstructured grids. The tests demonstrate that unlike Gauss-Seidel, the convergence of these Jacobi-based smoothers is unaffected by domain decomposition, and furthermore, they outperform the lexicographic Gauss-Seidel by factors that increase with domain partition count.

  8. Implementation of a cell-wise block-Gauss-Seidel iterative method for SN transport on a hybrid parallel computer architecture

    International Nuclear Information System (INIS)

    Rosa, Massimiliano; Warsa, James S.; Perks, Michael

    2011-01-01

    We have implemented a cell-wise, block-Gauss-Seidel (bGS) iterative algorithm, for the solution of the S_n transport equations on the Roadrunner hybrid, parallel computer architecture. A compute node of this massively parallel machine comprises AMD Opteron cores that are linked to a Cell Broadband Engine™ (Cell/B.E.)"1. LAPACK routines have been ported to the Cell/B.E. in order to make use of its parallel Synergistic Processing Elements (SPEs). The bGS algorithm is based on the LU factorization and solution of a linear system that couples the fluxes for all S_n angles and energy groups on a mesh cell. For every cell of a mesh that has been parallel decomposed on the higher-level Opteron processors, a linear system is transferred to the Cell/B.E. and the parallel LAPACK routines are used to compute a solution, which is then transferred back to the Opteron, where the rest of the computations for the S_n transport problem take place. Compared to standard parallel machines, a hundred-fold speedup of the bGS was observed on the hybrid Roadrunner architecture. Numerical experiments with strong and weak parallel scaling demonstrate the bGS method is viable and compares favorably to full parallel sweeps (FPS) on two-dimensional, unstructured meshes when it is applied to optically thick, multi-material problems. As expected, however, it is not as efficient as FPS in optically thin problems. (author)

  9. Efficient approach to simulate EM loads on massive structures in ITER machine

    Energy Technology Data Exchange (ETDEWEB)

    Alekseev, A. [ITER Organization, Route de Vinon sur Verdon, 13115 St. Paul-Lez-Durance (France); Andreeva, Z.; Belov, A.; Belyakov, V.; Filatov, O. [D.V. Efremov Scientific Research Institute, 196641 St. Petersburg (Russian Federation); Gribov, Yu.; Ioki, K. [ITER Organization, Route de Vinon sur Verdon, 13115 St. Paul-Lez-Durance (France); Kukhtin, V.; Labusov, A.; Lamzin, E.; Lyublin, B.; Malkov, A.; Mazul, I. [D.V. Efremov Scientific Research Institute, 196641 St. Petersburg (Russian Federation); Rozov, V.; Sugihara, M. [ITER Organization, Route de Vinon sur Verdon, 13115 St. Paul-Lez-Durance (France); Sychevsky, S., E-mail: sytch@sintez.niiefa.spb.su [D.V. Efremov Scientific Research Institute, 196641 St. Petersburg (Russian Federation)

    2013-10-15

    Highlights: ► A modelling technique to predict EM loads in ITER conducting structures is presented. ► The technique provides low computational cost and parallel computations. ► Detailed models were built for the system “vacuum vessel, cryostat, thermal shields”. ► EM loads on massive in-vessel structures were simulated with the use of local models. ► A flexible combination of models enables desired accuracy of load distributions. -- Abstract: Operation of the ITER machine is associated with high electromagnetic (EM) loads. An essential contributor to EM loads is eddy currents induced in passive conductive structures. Reasoning from the ITER construction, a modelling technique has been developed and applied in computations to efficiently predict anticipated loads. The technique allows us to avoid building a global 3D finite-element (FE) model that requires meshing of the conducting structures and their vacuum environment into 3D solid elements that leads to high computational cost. The key features of the proposed technique are: (i) the use of an existing shell model for the system “vacuum vessel (VV), cryostat, and thermal shields (TS)” implementing the magnetic shell approach. A solution is obtained in terms of a single-component, in this case, vector electric potential taken within the conducting shells of the “VV + cryostat + TS” system. (ii) EM loads on in-vessel conducting structures are simulated with the use of local FE models. The local models use either the 3D solid body or shell approximations. Reasoning from the simulation efficiency, the local boundary conditions are put with respect to the total field or an external field. The use of an integral-differential formulation and special procedures ensures smooth and accurate simulated distributions of fields from current sources of any geometry. The local FE models have been developed and applied for EM analyses of a variety of the ITER components including the diagnostic systems

  10. Iter

    Science.gov (United States)

    Iotti, Robert

    2015-04-01

    ITER is an international experimental facility being built by seven Parties to demonstrate the long term potential of fusion energy. The ITER Joint Implementation Agreement (JIA) defines the structure and governance model of such cooperation. There are a number of necessary conditions for such international projects to be successful: a complete design, strong systems engineering working with an agreed set of requirements, an experienced organization with systems and plans in place to manage the project, a cost estimate backed by industry, and someone in charge. Unfortunately for ITER many of these conditions were not present. The paper discusses the priorities in the JIA which led to setting up the project with a Central Integrating Organization (IO) in Cadarache, France as the ITER HQ, and seven Domestic Agencies (DAs) located in the countries of the Parties, responsible for delivering 90%+ of the project hardware as Contributions-in-Kind and also financial contributions to the IO, as ``Contributions-in-Cash.'' Theoretically the Director General (DG) is responsible for everything. In practice the DG does not have the power to control the work of the DAs, and there is not an effective management structure enabling the IO and the DAs to arbitrate disputes, so the project is not really managed, but is a loose collaboration of competing interests. Any DA can effectively block a decision reached by the DG. Inefficiencies in completing design while setting up a competent organization from scratch contributed to the delays and cost increases during the initial few years. So did the fact that the original estimate was not developed from industry input. Unforeseen inflation and market demand on certain commodities/materials further exacerbated the cost increases. Since then, improvements are debatable. Does this mean that the governance model of ITER is a wrong model for international scientific cooperation? I do not believe so. Had the necessary conditions for success

  11. Parallel state transfer and efficient quantum routing on quantum networks.

    Science.gov (United States)

    Chudzicki, Christopher; Strauch, Frederick W

    2010-12-31

    We study the routing of quantum information in parallel on multidimensional networks of tunable qubits and oscillators. These theoretical models are inspired by recent experiments in superconducting circuits. We show that perfect parallel state transfer is possible for certain networks of harmonic oscillator modes. We extend this to the distribution of entanglement between every pair of nodes in the network, finding that the routing efficiency of hypercube networks is optimal and robust in the presence of dissipation and finite bandwidth.

  12. An Expert System for the Development of Efficient Parallel Code

    Science.gov (United States)

    Jost, Gabriele; Chun, Robert; Jin, Hao-Qiang; Labarta, Jesus; Gimenez, Judit

    2004-01-01

    We have built the prototype of an expert system to assist the user in the development of efficient parallel code. The system was integrated into the parallel programming environment that is currently being developed at NASA Ames. The expert system interfaces to tools for automatic parallelization and performance analysis. It uses static program structure information and performance data in order to automatically determine causes of poor performance and to make suggestions for improvements. In this paper we give an overview of our programming environment, describe the prototype implementation of our expert system, and demonstrate its usefulness with several case studies.

  13. Efficient Parallel Kernel Solvers for Computational Fluid Dynamics Applications

    Science.gov (United States)

    Sun, Xian-He

    1997-01-01

    Distributed-memory parallel computers dominate today's parallel computing arena. These machines, such as Intel Paragon, IBM SP2, and Cray Origin2OO, have successfully delivered high performance computing power for solving some of the so-called "grand-challenge" problems. Despite initial success, parallel machines have not been widely accepted in production engineering environments due to the complexity of parallel programming. On a parallel computing system, a task has to be partitioned and distributed appropriately among processors to reduce communication cost and to attain load balance. More importantly, even with careful partitioning and mapping, the performance of an algorithm may still be unsatisfactory, since conventional sequential algorithms may be serial in nature and may not be implemented efficiently on parallel machines. In many cases, new algorithms have to be introduced to increase parallel performance. In order to achieve optimal performance, in addition to partitioning and mapping, a careful performance study should be conducted for a given application to find a good algorithm-machine combination. This process, however, is usually painful and elusive. The goal of this project is to design and develop efficient parallel algorithms for highly accurate Computational Fluid Dynamics (CFD) simulations and other engineering applications. The work plan is 1) developing highly accurate parallel numerical algorithms, 2) conduct preliminary testing to verify the effectiveness and potential of these algorithms, 3) incorporate newly developed algorithms into actual simulation packages. The work plan has well achieved. Two highly accurate, efficient Poisson solvers have been developed and tested based on two different approaches: (1) Adopting a mathematical geometry which has a better capacity to describe the fluid, (2) Using compact scheme to gain high order accuracy in numerical discretization. The previously developed Parallel Diagonal Dominant (PDD) algorithm

  14. High Efficiency EBCOT with Parallel Coding Architecture for JPEG2000

    Directory of Open Access Journals (Sweden)

    Chiang Jen-Shiun

    2006-01-01

    Full Text Available This work presents a parallel context-modeling coding architecture and a matching arithmetic coder (MQ-coder for the embedded block coding (EBCOT unit of the JPEG2000 encoder. Tier-1 of the EBCOT consumes most of the computation time in a JPEG2000 encoding system. The proposed parallel architecture can increase the throughput rate of the context modeling. To match the high throughput rate of the parallel context-modeling architecture, an efficient pipelined architecture for context-based adaptive arithmetic encoder is proposed. This encoder of JPEG2000 can work at 180 MHz to encode one symbol each cycle. Compared with the previous context-modeling architectures, our parallel architectures can improve the throughput rate up to 25%.

  15. Development and control towards a parallel water hydraulic weld/cut robot for machining processes in ITER vacuum vessel

    International Nuclear Information System (INIS)

    Wu Huapeng; Handroos, Heikki; Pessi, Pekka; Kilkki, Juha; Jones, Lawrence

    2005-01-01

    This paper presents a special robot, able to carry out welding and machining processes from inside the ITER vacuum vessel (VV), consisting of a five degree-of-freedom parallel mechanism, mounted on a carriage driven by two electric motors on a rack. The kinematic design of the robot has been optimised for ITER access and a hydraulically actuated pre-prototype built. A hybrid controller is designed for the robot, including position, speed and pressure feedback loops to achieve high accuracy and high dynamic performances. Finally, the experimental tests are given and discussed

  16. Discontinuous interleaving of parallel inverters for efficiency improvement

    DEFF Research Database (Denmark)

    Rannestad, Bjørn; Munk-Nielsen, Stig; Gadgaard, Kristian

    2017-01-01

    Interleaved switching of parallel inverters has previously been proposed for efficiency/size improvements of grid connected three-phase inverters. This paper proposes a novel interleaving method which practically eliminates insulated gate bipolar transistor (IGBT) turn-on losses and drastically...... overall power module losses are reduced. The modulation strategy is suited for converters with doubly fed induction generators (DFIG) for wind turbines, but are not limited hereto. Improvement of switching performance are measured and operational efficiency improvements are calculated and verified...

  17. On the efficient parallel computation of Legendre transforms

    NARCIS (Netherlands)

    Inda, M.A.; Bisseling, R.H.; Maslen, D.K.

    2001-01-01

    In this article, we discuss a parallel implementation of efficient algorithms for computation of Legendre polynomial transforms and other orthogonal polynomial transforms. We develop an approach to the Driscoll-Healy algorithm using polynomial arithmetic and present experimental results on the

  18. On the efficient parallel computation of Legendre transforms

    NARCIS (Netherlands)

    Inda, M.A.; Bisseling, R.H.; Maslen, D.K.

    1999-01-01

    In this article we discuss a parallel implementation of efficient algorithms for computation of Legendre polynomial transforms and other orthogonal polynomial transforms. We develop an approach to the Driscoll-Healy algorithm using polynomial arithmetic and present experimental results on the

  19. Efficient Four-Parametric with-and-without-Memory Iterative Methods Possessing High Efficiency Indices

    Directory of Open Access Journals (Sweden)

    Alicia Cordero

    2018-01-01

    Full Text Available We construct a family of derivative-free optimal iterative methods without memory to approximate a simple zero of a nonlinear function. Error analysis demonstrates that the without-memory class has eighth-order convergence and is extendable to with-memory class. The extension of new family to the with-memory one is also presented which attains the convergence order 15.5156 and a very high efficiency index 15.51561/4≈1.9847. Some particular schemes of the with-memory family are also described. Numerical examples and some dynamical aspects of the new schemes are given to support theoretical results.

  20. MICADO: Parallel implementation of a 2D-1D iterative algorithm for the 3D neutron transport problem in prismatic geometries

    International Nuclear Information System (INIS)

    Fevotte, F.; Lathuiliere, B.

    2013-01-01

    The large increase in computing power over the past few years now makes it possible to consider developing 3D full-core heterogeneous deterministic neutron transport solvers for reference calculations. Among all approaches presented in the literature, the method first introduced in [1] seems very promising. It consists in iterating over resolutions of 2D and ID MOC problems by taking advantage of prismatic geometries without introducing approximations of a low order operator such as diffusion. However, before developing a solver with all industrial options at EDF, several points needed to be clarified. In this work, we first prove the convergence of this iterative process, under some assumptions. We then present our high-performance, parallel implementation of this algorithm in the MICADO solver. Benchmarking the solver against the Takeda case shows that the 2D-1D coupling algorithm does not seem to affect the spatial convergence order of the MOC solver. As for performance issues, our study shows that even though the data distribution is suited to the 2D solver part, the efficiency of the ID part is sufficient to ensure a good parallel efficiency of the global algorithm. After this study, the main remaining difficulty implementation-wise is about the memory requirement of a vector used for initialization. An efficient acceleration operator will also need to be developed. (authors)

  1. Decomposition based parallel processing technique for efficient collaborative optimization

    International Nuclear Information System (INIS)

    Park, Hyung Wook; Kim, Sung Chan; Kim, Min Soo; Choi, Dong Hoon

    2000-01-01

    In practical design studies, most of designers solve multidisciplinary problems with complex design structure. These multidisciplinary problems have hundreds of analysis and thousands of variables. The sequence of process to solve these problems affects the speed of total design cycle. Thus it is very important for designer to reorder original design processes to minimize total cost and time. This is accomplished by decomposing large multidisciplinary problem into several MultiDisciplinary Analysis SubSystem (MDASS) and processing it in parallel. This paper proposes new strategy for parallel decomposition of multidisciplinary problem to raise design efficiency by using genetic algorithm and shows the relationship between decomposition and Multidisciplinary Design Optimization(MDO) methodology

  2. Efficient Parallel Algorithm For Direct Numerical Simulation of Turbulent Flows

    Science.gov (United States)

    Moitra, Stuti; Gatski, Thomas B.

    1997-01-01

    A distributed algorithm for a high-order-accurate finite-difference approach to the direct numerical simulation (DNS) of transition and turbulence in compressible flows is described. This work has two major objectives. The first objective is to demonstrate that parallel and distributed-memory machines can be successfully and efficiently used to solve computationally intensive and input/output intensive algorithms of the DNS class. The second objective is to show that the computational complexity involved in solving the tridiagonal systems inherent in the DNS algorithm can be reduced by algorithm innovations that obviate the need to use a parallelized tridiagonal solver.

  3. Direct and iterative algorithms for the parallel solution of the one-dimensional macroscopic Navier-Stokes equations

    International Nuclear Information System (INIS)

    Doster, J.M.; Sills, E.D.

    1986-01-01

    Current efforts are under way to develop and evaluate numerical algorithms for the parallel solution of the large sparse matrix equations associated with the finite difference representation of the macroscopic Navier-Stokes equations. Previous work has shown that these equations can be cast into smaller coupled matrix equations suitable for solution utilizing multiple computer processors operating in parallel. The individual processors themselves may exhibit parallelism through the use of vector pipelines. This wor, has concentrated on the one-dimensional drift flux form of the Navier-Stokes equations. Direct and iterative algorithms that may be suitable for implementation on parallel computer architectures are evaluated in terms of accuracy and overall execution speed. This work has application to engineering and training simulations, on-line process control systems, and engineering workstations where increased computational speeds are required

  4. A reduced complexity highly power/bandwidth efficient coded FQPSK system with iterative decoding

    Science.gov (United States)

    Simon, M. K.; Divsalar, D.

    2001-01-01

    Based on a representation of FQPSK as a trellis-coded modulation, this paper investigates the potential improvement in power efficiency obtained from the application of simple outer codes to form a concatenated coding arrangement with iterative decoding.

  5. High-Efficient Parallel CAVLC Encoders on Heterogeneous Multicore Architectures

    Directory of Open Access Journals (Sweden)

    H. Y. Su

    2012-04-01

    Full Text Available This article presents two high-efficient parallel realizations of the context-based adaptive variable length coding (CAVLC based on heterogeneous multicore processors. By optimizing the architecture of the CAVLC encoder, three kinds of dependences are eliminated or weaken, including the context-based data dependence, the memory accessing dependence and the control dependence. The CAVLC pipeline is divided into three stages: two scans, coding, and lag packing, and be implemented on two typical heterogeneous multicore architectures. One is a block-based SIMD parallel CAVLC encoder on multicore stream processor STORM. The other is a component-oriented SIMT parallel encoder on massively parallel architecture GPU. Both of them exploited rich data-level parallelism. Experiments results show that compared with the CPU version, more than 70 times of speedup can be obtained for STORM and over 50 times for GPU. The implementation of encoder on STORM can make a real-time processing for 1080p @30fps and GPU-based version can satisfy the requirements for 720p real-time encoding. The throughput of the presented CAVLC encoders is more than 10 times higher than that of published software encoders on DSP and multicore platforms.

  6. Efficient multitasking: parallel versus serial processing of multiple tasks.

    Science.gov (United States)

    Fischer, Rico; Plessow, Franziska

    2015-01-01

    In the context of performance optimizations in multitasking, a central debate has unfolded in multitasking research around whether cognitive processes related to different tasks proceed only sequentially (one at a time), or can operate in parallel (simultaneously). This review features a discussion of theoretical considerations and empirical evidence regarding parallel versus serial task processing in multitasking. In addition, we highlight how methodological differences and theoretical conceptions determine the extent to which parallel processing in multitasking can be detected, to guide their employment in future research. Parallel and serial processing of multiple tasks are not mutually exclusive. Therefore, questions focusing exclusively on either task-processing mode are too simplified. We review empirical evidence and demonstrate that shifting between more parallel and more serial task processing critically depends on the conditions under which multiple tasks are performed. We conclude that efficient multitasking is reflected by the ability of individuals to adjust multitasking performance to environmental demands by flexibly shifting between different processing strategies of multiple task-component scheduling.

  7. Efficient parallel simulation of CO2 geologic sequestration in saline aquifers

    International Nuclear Information System (INIS)

    Zhang, Keni; Doughty, Christine; Wu, Yu-Shu; Pruess, Karsten

    2007-01-01

    An efficient parallel simulator for large-scale, long-term CO2 geologic sequestration in saline aquifers has been developed. The parallel simulator is a three-dimensional, fully implicit model that solves large, sparse linear systems arising from discretization of the partial differential equations for mass and energy balance in porous and fractured media. The simulator is based on the ECO2N module of the TOUGH2code and inherits all the process capabilities of the single-CPU TOUGH2code, including a comprehensive description of the thermodynamics and thermophysical properties of H2O-NaCl- CO2 mixtures, modeling single and/or two-phase isothermal or non-isothermal flow processes, two-phase mixtures, fluid phases appearing or disappearing, as well as salt precipitation or dissolution. The new parallel simulator uses MPI for parallel implementation, the METIS software package for simulation domain partitioning, and the iterative parallel linear solver package Aztec for solving linear equations by multiple processors. In addition, the parallel simulator has been implemented with an efficient communication scheme. Test examples show that a linear or super-linear speedup can be obtained on Linux clusters as well as on supercomputers. Because of the significant improvement in both simulation time and memory requirement, the new simulator provides a powerful tool for tackling larger scale and more complex problems than can be solved by single-CPU codes. A high-resolution simulation example is presented that models buoyant convection, induced by a small increase in brine density caused by dissolution of CO2

  8. 2D-RBUC for efficient parallel compression of residuals

    Science.gov (United States)

    Đurđević, Đorđe M.; Tartalja, Igor I.

    2018-02-01

    In this paper, we present a method for lossless compression of residuals with an efficient SIMD parallel decompression. The residuals originate from lossy or near lossless compression of height fields, which are commonly used to represent models of terrains. The algorithm is founded on the existing RBUC method for compression of non-uniform data sources. We have adapted the method to capture 2D spatial locality of height fields, and developed the data decompression algorithm for modern GPU architectures already present even in home computers. In combination with the point-level SIMD-parallel lossless/lossy high field compression method HFPaC, characterized by fast progressive decompression and seamlessly reconstructed surface, the newly proposed method trades off small efficiency degradation for a non negligible compression ratio (measured up to 91%) benefit.

  9. Computationally efficient implementation of combustion chemistry in parallel PDF calculations

    International Nuclear Information System (INIS)

    Lu Liuyan; Lantz, Steven R.; Ren Zhuyin; Pope, Stephen B.

    2009-01-01

    In parallel calculations of combustion processes with realistic chemistry, the serial in situ adaptive tabulation (ISAT) algorithm [S.B. Pope, Computationally efficient implementation of combustion chemistry using in situ adaptive tabulation, Combustion Theory and Modelling, 1 (1997) 41-63; L. Lu, S.B. Pope, An improved algorithm for in situ adaptive tabulation, Journal of Computational Physics 228 (2009) 361-386] substantially speeds up the chemistry calculations on each processor. To improve the parallel efficiency of large ensembles of such calculations in parallel computations, in this work, the ISAT algorithm is extended to the multi-processor environment, with the aim of minimizing the wall clock time required for the whole ensemble. Parallel ISAT strategies are developed by combining the existing serial ISAT algorithm with different distribution strategies, namely purely local processing (PLP), uniformly random distribution (URAN), and preferential distribution (PREF). The distribution strategies enable the queued load redistribution of chemistry calculations among processors using message passing. They are implemented in the software x2f m pi, which is a Fortran 95 library for facilitating many parallel evaluations of a general vector function. The relative performance of the parallel ISAT strategies is investigated in different computational regimes via the PDF calculations of multiple partially stirred reactors burning methane/air mixtures. The results show that the performance of ISAT with a fixed distribution strategy strongly depends on certain computational regimes, based on how much memory is available and how much overlap exists between tabulated information on different processors. No one fixed strategy consistently achieves good performance in all the regimes. Therefore, an adaptive distribution strategy, which blends PLP, URAN and PREF, is devised and implemented. It yields consistently good performance in all regimes. In the adaptive parallel

  10. Parallel processing based decomposition technique for efficient collaborative optimization

    International Nuclear Information System (INIS)

    Park, Hyung Wook; Kim, Sung Chan; Kim, Min Soo; Choi, Dong Hoon

    2001-01-01

    In practical design studies, most of designers solve multidisciplinary problems with large sized and complex design system. These multidisciplinary problems have hundreds of analysis and thousands of variables. The sequence of process to solve these problems affects the speed of total design cycle. Thus it is very important for designer to reorder the original design processes to minimize total computational cost. This is accomplished by decomposing large multidisciplinary problem into several MultiDisciplinary Analysis SubSystem (MDASS) and processing it in parallel. This paper proposes new strategy for parallel decomposition of multidisciplinary problem to raise design efficiency by using genetic algorithm and shows the relationship between decomposition and Multidisciplinary Design Optimization(MDO) methodology

  11. Picard Trajectory Approximation Iteration for Efficient Orbit Propagation

    Science.gov (United States)

    2015-07-21

    computing language developed by NVIDIA for use upon their Graphics Processing Units (GPUs); effectively it allows lightweight parallel computation at...Computation Toolbox, and require Matlab 2010 or newer (2011 or newer recommended), and an NVIDIA GPU with compute capability of 1.3 or greater. 3...and Resonances, pp. 216–227, Dordrecht, Holland, 1970. D. Reidel Publishing Company . [4] Zadunaisky, P. E., On the Estimation of Errors Propagated in

  12. Iterating skeletons

    DEFF Research Database (Denmark)

    Dieterle, Mischa; Horstmeyer, Thomas; Berthold, Jost

    2012-01-01

    a particular skeleton ad-hoc for repeated execution turns out to be considerably complicated, and raises general questions about introducing state into a stateless parallel computation. In addition, one would strongly prefer an approach which leaves the original skeleton intact, and only uses it as a building...... block inside a bigger structure. In this work, we present a general framework for skeleton iteration and discuss requirements and variations of iteration control and iteration body. Skeleton iteration is expressed by synchronising a parallel iteration body skeleton with a (likewise parallel) state......Skeleton-based programming is an area of increasing relevance with upcoming highly parallel hardware, since it substantially facilitates parallel programming and separates concerns. When parallel algorithms expressed by skeletons involve iterations – applying the same algorithm repeatedly...

  13. Efficient Parallel Statistical Model Checking of Biochemical Networks

    Directory of Open Access Journals (Sweden)

    Paolo Ballarini

    2009-12-01

    Full Text Available We consider the problem of verifying stochastic models of biochemical networks against behavioral properties expressed in temporal logic terms. Exact probabilistic verification approaches such as, for example, CSL/PCTL model checking, are undermined by a huge computational demand which rule them out for most real case studies. Less demanding approaches, such as statistical model checking, estimate the likelihood that a property is satisfied by sampling executions out of the stochastic model. We propose a methodology for efficiently estimating the likelihood that a LTL property P holds of a stochastic model of a biochemical network. As with other statistical verification techniques, the methodology we propose uses a stochastic simulation algorithm for generating execution samples, however there are three key aspects that improve the efficiency: first, the sample generation is driven by on-the-fly verification of P which results in optimal overall simulation time. Second, the confidence interval estimation for the probability of P to hold is based on an efficient variant of the Wilson method which ensures a faster convergence. Third, the whole methodology is designed according to a parallel fashion and a prototype software tool has been implemented that performs the sampling/verification process in parallel over an HPC architecture.

  14. Biomedical applications on the GRID efficient management of parallel jobs

    CERN Document Server

    Moscicki, Jakub T; Lee Hurng Chun; Lin, S C; Pia, Maria Grazia

    2004-01-01

    Distributed computing based on the Master-Worker and PULL interaction model is applicable to a number of applications in high energy physics, medical physics and bio-informatics. We demonstrate a realistic medical physics use-case of a dosimetric system for brachytherapy using distributed Grid resources. We present the efficient techniques for running parallel jobs in a case of the BLAST, a gene sequencing application, as well as for the Monte Carlo simulation based on Geant4. We present a strategy for improving the runtime performance and robustness of the jobs as well as for the minimization of the development time needed to migrate the applications to a distributed environment.

  15. Parallel efficient rate control methods for JPEG 2000

    Science.gov (United States)

    Martínez-del-Amor, Miguel Á.; Bruns, Volker; Sparenberg, Heiko

    2017-09-01

    Since the introduction of JPEG 2000, several rate control methods have been proposed. Among them, post-compression rate-distortion optimization (PCRD-Opt) is the most widely used, and the one recommended by the standard. The approach followed by this method is to first compress the entire image split in code blocks, and subsequently, optimally truncate the set of generated bit streams according to the maximum target bit rate constraint. The literature proposes various strategies on how to estimate ahead of time where a block will get truncated in order to stop the execution prematurely and save time. However, none of them have been defined bearing in mind a parallel implementation. Today, multi-core and many-core architectures are becoming popular for JPEG 2000 codecs implementations. Therefore, in this paper, we analyze how some techniques for efficient rate control can be deployed in GPUs. In order to do that, the design of our GPU-based codec is extended, allowing stopping the process at a given point. This extension also harnesses a higher level of parallelism on the GPU, leading to up to 40% of speedup with 4K test material on a Titan X. In a second step, three selected rate control methods are adapted and implemented in our parallel encoder. A comparison is then carried out, and used to select the best candidate to be deployed in a GPU encoder, which gave an extra 40% of speedup in those situations where it was really employed.

  16. Communications oriented programming of parallel iterative solutions of sparse linear systems

    Science.gov (United States)

    Patrick, M. L.; Pratt, T. W.

    1986-01-01

    Parallel algorithms are developed for a class of scientific computational problems by partitioning the problems into smaller problems which may be solved concurrently. The effectiveness of the resulting parallel solutions is determined by the amount and frequency of communication and synchronization and the extent to which communication can be overlapped with computation. Three different parallel algorithms for solving the same class of problems are presented, and their effectiveness is analyzed from this point of view. The algorithms are programmed using a new programming environment. Run-time statistics and experience obtained from the execution of these programs assist in measuring the effectiveness of these algorithms.

  17. Efficient sequential and parallel algorithms for record linkage.

    Science.gov (United States)

    Mamun, Abdullah-Al; Mi, Tian; Aseltine, Robert; Rajasekaran, Sanguthevar

    2014-01-01

    Integrating data from multiple sources is a crucial and challenging problem. Even though there exist numerous algorithms for record linkage or deduplication, they suffer from either large time needs or restrictions on the number of datasets that they can integrate. In this paper we report efficient sequential and parallel algorithms for record linkage which handle any number of datasets and outperform previous algorithms. Our algorithms employ hierarchical clustering algorithms as the basis. A key idea that we use is radix sorting on certain attributes to eliminate identical records before any further processing. Another novel idea is to form a graph that links similar records and find the connected components. Our sequential and parallel algorithms have been tested on a real dataset of 1,083,878 records and synthetic datasets ranging in size from 50,000 to 9,000,000 records. Our sequential algorithm runs at least two times faster, for any dataset, than the previous best-known algorithm, the two-phase algorithm using faster computation of the edit distance (TPA (FCED)). The speedups obtained by our parallel algorithm are almost linear. For example, we get a speedup of 7.5 with 8 cores (residing in a single node), 14.1 with 16 cores (residing in two nodes), and 26.4 with 32 cores (residing in four nodes). We have compared the performance of our sequential algorithm with TPA (FCED) and found that our algorithm outperforms the previous one. The accuracy is the same as that of this previous best-known algorithm.

  18. An efficient iteration strategy for the solution of the Euler equations

    Science.gov (United States)

    Walters, R. W.; Dwoyer, D. L.

    1985-01-01

    A line Gauss-Seidel (LGS) relaxation algorithm in conjunction with a one-parameter family of upwind discretizations of the Euler equations in two-dimensions is described. The basic algorithm has the property that convergence to the steady-state is quadratic for fully supersonic flows and linear otherwise. This is in contrast to the block ADI methods (either central or upwind differenced) and the upwind biased relaxation schemes, all of which converge linearly, independent of the flow regime. Moreover, the algorithm presented here is easily enhanced to detect regions of subsonic flow embedded in supersonic flow. This allows marching by lines in the supersonic regions, converging each line quadratically, and iterating in the subsonic regions, thus yielding a very efficient iteration strategy. Numerical results are presented for two-dimensional supersonic and transonic flows containing both oblique and normal shock waves which confirm the efficiency of the iteration strategy.

  19. Efficient parallel algorithms for string editing and related problems

    Science.gov (United States)

    Apostolico, Alberto; Atallah, Mikhail J.; Larmore, Lawrence; Mcfaddin, H. S.

    1988-01-01

    The string editing problem for input strings x and y consists of transforming x into y by performing a series of weighted edit operations on x of overall minimum cost. An edit operation on x can be the deletion of a symbol from x, the insertion of a symbol in x or the substitution of a symbol x with another symbol. This problem has a well known O((absolute value of x)(absolute value of y)) time sequential solution (25). The efficient Program Requirements Analysis Methods (PRAM) parallel algorithms for the string editing problem are given. If m = ((absolute value of x),(absolute value of y)) and n = max((absolute value of x),(absolute value of y)), then the CREW bound is O (log m log n) time with O (mn/log m) processors. In all algorithms, space is O (mn).

  20. Efficient sequential and parallel algorithms for planted motif search.

    Science.gov (United States)

    Nicolae, Marius; Rajasekaran, Sanguthevar

    2014-01-31

    Motif searching is an important step in the detection of rare events occurring in a set of DNA or protein sequences. One formulation of the problem is known as (l,d)-motif search or Planted Motif Search (PMS). In PMS we are given two integers l and d and n biological sequences. We want to find all sequences of length l that appear in each of the input sequences with at most d mismatches. The PMS problem is NP-complete. PMS algorithms are typically evaluated on certain instances considered challenging. Despite ample research in the area, a considerable performance gap exists because many state of the art algorithms have large runtimes even for moderately challenging instances. This paper presents a fast exact parallel PMS algorithm called PMS8. PMS8 is the first algorithm to solve the challenging (l,d) instances (25,10) and (26,11). PMS8 is also efficient on instances with larger l and d such as (50,21). We include a comparison of PMS8 with several state of the art algorithms on multiple problem instances. This paper also presents necessary and sufficient conditions for 3 l-mers to have a common d-neighbor. The program is freely available at http://engr.uconn.edu/~man09004/PMS8/. We present PMS8, an efficient exact algorithm for Planted Motif Search. PMS8 introduces novel ideas for generating common neighborhoods. We have also implemented a parallel version for this algorithm. PMS8 can solve instances not solved by any previous algorithms.

  1. An efficient iterative grand canonical Monte Carlo algorithm to determine individual ionic chemical potentials in electrolytes.

    Science.gov (United States)

    Malasics, Attila; Boda, Dezso

    2010-06-28

    Two iterative procedures have been proposed recently to calculate the chemical potentials corresponding to prescribed concentrations from grand canonical Monte Carlo (GCMC) simulations. Both are based on repeated GCMC simulations with updated excess chemical potentials until the desired concentrations are established. In this paper, we propose combining our robust and fast converging iteration algorithm [Malasics, Gillespie, and Boda, J. Chem. Phys. 128, 124102 (2008)] with the suggestion of Lamperski [Mol. Simul. 33, 1193 (2007)] to average the chemical potentials in the iterations (instead of just using the chemical potentials obtained in the last iteration). We apply the unified method for various electrolyte solutions and show that our algorithm is more efficient if we use the averaging procedure. We discuss the convergence problems arising from violation of charge neutrality when inserting/deleting individual ions instead of neutral groups of ions (salts). We suggest a correction term to the iteration procedure that makes the algorithm efficient to determine the chemical potentials of individual ions too.

  2. Adaptive Iterative Soft-Input Soft-Output Parallel Decision-Feedback Detectors for Asynchronous Coded DS-CDMA Systems

    Directory of Open Access Journals (Sweden)

    Zhang Wei

    2005-01-01

    Full Text Available The optimum and many suboptimum iterative soft-input soft-output (SISO multiuser detectors require a priori information about the multiuser system, such as the users' transmitted signature waveforms, relative delays, as well as the channel impulse response. In this paper, we employ adaptive algorithms in the SISO multiuser detector in order to avoid the need for this a priori information. First, we derive the optimum SISO parallel decision-feedback detector for asynchronous coded DS-CDMA systems. Then, we propose two adaptive versions of this SISO detector, which are based on the normalized least mean square (NLMS and recursive least squares (RLS algorithms. Our SISO adaptive detectors effectively exploit the a priori information of coded symbols, whose soft inputs are obtained from a bank of single-user decoders. Furthermore, we consider how to select practical finite feedforward and feedback filter lengths to obtain a good tradeoff between the performance and computational complexity of the receiver.

  3. Efficiency of thermal outgassing for tritium retention measurement and removal in ITER

    Directory of Open Access Journals (Sweden)

    G. De Temmerman

    2017-08-01

    Full Text Available As a licensed nuclear facility, ITER must limit the in-vessel tritium (T retention to reduce the risks of potential release during accidents, the inventory limit being set at 1kg. Simulations and extrapolations from existing experiments indicate that T-retention in ITER will mainly be driven by co-deposition with beryllium (Be eroded from the first wall, with co-deposits forming mainly in the divertor region but also possibly on the first wall itself. A pulsed Laser-Induced Desorption (LID system, called Tritium Monitor, is being designed to locally measure the T-retention in co-deposits forming on the inner divertor baffle of ITER. Regarding tritium removal, the baseline strategy is to perform baking of the plasma-facing components, at 513K for the FW and 623K for the divertor. Both baking and laser desorption rely on the thermal desorption of tritium from the surface, the efficiency of which remains unclear for thick (and possibly impure co-deposits. This contribution reports on the results of TMAP7 studies of this efficiency for ITER-relevant deposits.

  4. An iterative reconstruction method of complex images using expectation maximization for radial parallel MRI

    International Nuclear Information System (INIS)

    Choi, Joonsung; Kim, Dongchan; Oh, Changhyun; Han, Yeji; Park, HyunWook

    2013-01-01

    In MRI (magnetic resonance imaging), signal sampling along a radial k-space trajectory is preferred in certain applications due to its distinct advantages such as robustness to motion, and the radial sampling can be beneficial for reconstruction algorithms such as parallel MRI (pMRI) due to the incoherency. For radial MRI, the image is usually reconstructed from projection data using analytic methods such as filtered back-projection or Fourier reconstruction after gridding. However, the quality of the reconstructed image from these analytic methods can be degraded when the number of acquired projection views is insufficient. In this paper, we propose a novel reconstruction method based on the expectation maximization (EM) method, where the EM algorithm is remodeled for MRI so that complex images can be reconstructed. Then, to optimize the proposed method for radial pMRI, a reconstruction method that uses coil sensitivity information of multichannel RF coils is formulated. Experiment results from synthetic and in vivo data show that the proposed method introduces better reconstructed images than the analytic methods, even from highly subsampled data, and provides monotonic convergence properties compared to the conjugate gradient based reconstruction method. (paper)

  5. Fast parallel MR image reconstruction via B1-based, adaptive restart, iterative soft thresholding algorithms (BARISTA).

    Science.gov (United States)

    Muckley, Matthew J; Noll, Douglas C; Fessler, Jeffrey A

    2015-02-01

    Sparsity-promoting regularization is useful for combining compressed sensing assumptions with parallel MRI for reducing scan time while preserving image quality. Variable splitting algorithms are the current state-of-the-art algorithms for SENSE-type MR image reconstruction with sparsity-promoting regularization. These methods are very general and have been observed to work with almost any regularizer; however, the tuning of associated convergence parameters is a commonly-cited hindrance in their adoption. Conversely, majorize-minimize algorithms based on a single Lipschitz constant have been observed to be slow in shift-variant applications such as SENSE-type MR image reconstruction since the associated Lipschitz constants are loose bounds for the shift-variant behavior. This paper bridges the gap between the Lipschitz constant and the shift-variant aspects of SENSE-type MR imaging by introducing majorizing matrices in the range of the regularizer matrix. The proposed majorize-minimize methods (called BARISTA) converge faster than state-of-the-art variable splitting algorithms when combined with momentum acceleration and adaptive momentum restarting. Furthermore, the tuning parameters associated with the proposed methods are unitless convergence tolerances that are easier to choose than the constraint penalty parameters required by variable splitting algorithms.

  6. An efficient parallel algorithm for matrix-vector multiplication

    Energy Technology Data Exchange (ETDEWEB)

    Hendrickson, B.; Leland, R.; Plimpton, S.

    1993-03-01

    The multiplication of a vector by a matrix is the kernel computation of many algorithms in scientific computation. A fast parallel algorithm for this calculation is therefore necessary if one is to make full use of the new generation of parallel supercomputers. This paper presents a high performance, parallel matrix-vector multiplication algorithm that is particularly well suited to hypercube multiprocessors. For an n x n matrix on p processors, the communication cost of this algorithm is O(n/[radical]p + log(p)), independent of the matrix sparsity pattern. The performance of the algorithm is demonstrated by employing it as the kernel in the well-known NAS conjugate gradient benchmark, where a run time of 6.09 seconds was observed. This is the best published performance on this benchmark achieved to date using a massively parallel supercomputer.

  7. Parallelizing flow-accumulation calculations on graphics processing units—From iterative DEM preprocessing algorithm to recursive multiple-flow-direction algorithm

    Science.gov (United States)

    Qin, Cheng-Zhi; Zhan, Lijun

    2012-06-01

    As one of the important tasks in digital terrain analysis, the calculation of flow accumulations from gridded digital elevation models (DEMs) usually involves two steps in a real application: (1) using an iterative DEM preprocessing algorithm to remove the depressions and flat areas commonly contained in real DEMs, and (2) using a recursive flow-direction algorithm to calculate the flow accumulation for every cell in the DEM. Because both algorithms are computationally intensive, quick calculation of the flow accumulations from a DEM (especially for a large area) presents a practical challenge to personal computer (PC) users. In recent years, rapid increases in hardware capacity of the graphics processing units (GPUs) provided in modern PCs have made it possible to meet this challenge in a PC environment. Parallel computing on GPUs using a compute-unified-device-architecture (CUDA) programming model has been explored to speed up the execution of the single-flow-direction algorithm (SFD). However, the parallel implementation on a GPU of the multiple-flow-direction (MFD) algorithm, which generally performs better than the SFD algorithm, has not been reported. Moreover, GPU-based parallelization of the DEM preprocessing step in the flow-accumulation calculations has not been addressed. This paper proposes a parallel approach to calculate flow accumulations (including both iterative DEM preprocessing and a recursive MFD algorithm) on a CUDA-compatible GPU. For the parallelization of an MFD algorithm (MFD-md), two different parallelization strategies using a GPU are explored. The first parallelization strategy, which has been used in the existing parallel SFD algorithm on GPU, has the problem of computing redundancy. Therefore, we designed a parallelization strategy based on graph theory. The application results show that the proposed parallel approach to calculate flow accumulations on a GPU performs much faster than either sequential algorithms or other parallel GPU

  8. Efficient numerical methods for the large-scale, parallel solution of elastoplastic contact problems

    KAUST Repository

    Frohne, Jörg

    2015-08-06

    © 2016 John Wiley & Sons, Ltd. Quasi-static elastoplastic contact problems are ubiquitous in many industrial processes and other contexts, and their numerical simulation is consequently of great interest in accurately describing and optimizing production processes. The key component in these simulations is the solution of a single load step of a time iteration. From a mathematical perspective, the problems to be solved in each time step are characterized by the difficulties of variational inequalities for both the plastic behavior and the contact problem. Computationally, they also often lead to very large problems. In this paper, we present and evaluate a complete set of methods that are (1) designed to work well together and (2) allow for the efficient solution of such problems. In particular, we use adaptive finite element meshes with linear and quadratic elements, a Newton linearization of the plasticity, active set methods for the contact problem, and multigrid-preconditioned linear solvers. Through a sequence of numerical experiments, we show the performance of these methods. This includes highly accurate solutions of a three-dimensional benchmark problem and scaling our methods in parallel to 1024 cores and more than a billion unknowns.

  9. Efficient numerical methods for the large-scale, parallel solution of elastoplastic contact problems

    KAUST Repository

    Frohne, Jö rg; Heister, Timo; Bangerth, Wolfgang

    2015-01-01

    © 2016 John Wiley & Sons, Ltd. Quasi-static elastoplastic contact problems are ubiquitous in many industrial processes and other contexts, and their numerical simulation is consequently of great interest in accurately describing and optimizing production processes. The key component in these simulations is the solution of a single load step of a time iteration. From a mathematical perspective, the problems to be solved in each time step are characterized by the difficulties of variational inequalities for both the plastic behavior and the contact problem. Computationally, they also often lead to very large problems. In this paper, we present and evaluate a complete set of methods that are (1) designed to work well together and (2) allow for the efficient solution of such problems. In particular, we use adaptive finite element meshes with linear and quadratic elements, a Newton linearization of the plasticity, active set methods for the contact problem, and multigrid-preconditioned linear solvers. Through a sequence of numerical experiments, we show the performance of these methods. This includes highly accurate solutions of a three-dimensional benchmark problem and scaling our methods in parallel to 1024 cores and more than a billion unknowns.

  10. Comparative efficiencies of three parallel algorithms for nonlinear ...

    Indian Academy of Sciences (India)

    R. Narasimhan (Krishtel eMaging) 1461 1996 Oct 15 13:05:22

    This algorithm is better suited for large size problems on coarse ... and reliable time integration algorithms for solving the second-order dynamic equilibrium equations that arise due ... Programming models required to take advantage of the parallel and distributed ..... In addition, MPI added the concept of a 'virtual topology'.

  11. Efficient Simulation of Population Overflow in Parallel Queues

    NARCIS (Netherlands)

    Nicola, V.F.; Zaburnenko, T.S.

    2006-01-01

    In this paper we propose a state-dependent importance sampling heuristic to estimate the probability of population overflow in networks of parallel queues. This heuristic approximates the “optimal��? state-dependent change of measure without the need for dif��?cult mathematical analysis or costly

  12. Efficient Heuristics for Simulating Population Overflow in Parallel Networks

    NARCIS (Netherlands)

    Zaburnenko, T.S.; Nicola, V.F.

    2006-01-01

    In this paper we propose a state-dependent importance sampling heuristic to estimate the probability of population overflow in networks of parallel queues. This heuristic approximates the “optimal��? state-dependent change of measure without the need for costly optimization involved in other

  13. Multi states electromechanical switch for energy efficient parallel data processing

    KAUST Repository

    Kloub, Hussam

    2011-04-01

    We present a design, simulation results and fabrication of electromechanical switches enabling parallel data processing and multi functionality. The device is applied in logic gates AND, NOR, XNOR, and Flip-Flops. The device footprint size is 2μm by 0.5μm, and has a pull-in voltage of 5.15V which is verified by FEM simulation. © 2011 IEEE.

  14. Multi states electromechanical switch for energy efficient parallel data processing

    KAUST Repository

    Kloub, Hussam; Smith, Casey; Hussain, Muhammad Mustafa

    2011-01-01

    We present a design, simulation results and fabrication of electromechanical switches enabling parallel data processing and multi functionality. The device is applied in logic gates AND, NOR, XNOR, and Flip-Flops. The device footprint size is 2μm by 0.5μm, and has a pull-in voltage of 5.15V which is verified by FEM simulation. © 2011 IEEE.

  15. Efficient assignment of the temperature set for Parallel Tempering

    International Nuclear Information System (INIS)

    Guidetti, M.; Rolando, V.; Tripiccione, R.

    2012-01-01

    We propose a simple algorithm able to identify a set of temperatures for a Parallel Tempering Monte Carlo simulation, that maximizes the probability that the configurations drift across all temperature values, from the coldest to the hottest ones, and vice versa. The proposed algorithm starts from data gathered from relatively short Monte Carlo simulations and is straightforward to implement. We assess its effectiveness on a test case simulation of an Edwards–Anderson spin glass on a lattice of 12 3 sites.

  16. Work-Efficient Parallel Skyline Computation for the GPU

    DEFF Research Database (Denmark)

    Bøgh, Kenneth Sejdenfaden; Chester, Sean; Assent, Ira

    2015-01-01

    offers the potential for parallelizing skyline computation across thousands of cores. However, attempts to port skyline algorithms to the GPU have prioritized throughput and failed to outperform sequential algorithms. In this paper, we introduce a new skyline algorithm, designed for the GPU, that uses...... a global, static partitioning scheme. With the partitioning, we can permit controlled branching to exploit transitive relationships and avoid most point-to-point comparisons. The result is a non-traditional GPU algorithm, SkyAlign, that prioritizes work-effciency and respectable throughput, rather than...

  17. Implementation of the multireference Brillouin-Wigner and Mukherjee’s coupled cluster methods with non-iterative triple excitations utilizing reference-level parallelism

    Energy Technology Data Exchange (ETDEWEB)

    Bhaskaran-Nair, Kiran; Brabec, Jiri; Apra, Edoardo; van Dam, Hubertus JJ; Pittner, Jiri; Kowalski, Karol

    2012-09-07

    In this paper we discuss the performance of the non-iterative State-Specific Mul- tireference Coupled Cluster (SS-MRCC) methods accounting for the effect of triply excited cluster amplitudes. The corrections to the Brillouin-Wigner and Mukherjee MRCC models based on the manifold of singly and doubly excited cluster amplitudes (BW-MRCCSD and Mk-MRCCSD, respectively) are tested and compared with the exact full configuration interaction results (FCI) for small systems (H2O, N2, and Be3). For larger systems (naphthyne isomers and -carotene), the non-iterative BW-MRCCSD(T) and Mk-MRCCSD(T) methods are compared against the results obtained with the single reference coupled cluster methods. We also report on the parallel performance of the non-iterative implementations based on the use of pro- cessor groups.

  18. Energy and fuel efficient parallel mild hybrids for urban roads

    International Nuclear Information System (INIS)

    Babu, Ajay; Ashok, S.

    2016-01-01

    Highlights: • Energy and fuel savings depend on battery charge variations and the vehicle speed parameters. • Indian urban conditions provide lot of scope for energy and fuel savings in mild hybrids. • Energy saving strategy has lower payback periods than the fuel saving one in mild hybrids. • Sensitivity to parameter variations is the least for energy saving strategy in a mild hybrid. - Abstract: Fuel economy improvements and battery energy savings can promote the adoption of parallel mild hybrids for urban driving conditions. The aim of this study is to establish these benefits through two operating modes: an energy saving mode and a fuel saving mode. The performances of a typical parallel mild hybrid using these modes were analysed over urban driving cycles, in the US, Europe, and India, with a particular focus on the Indian urban conditions. The energy pack available from the proposed energy-saving operating mode, in addition to the energy already available from the conventional mode, was observed to be the highest for the representative urban driving cycle of the US. The extra energy pack available was found to be approximately 21.9 times that available from the conventional mode. By employing the proposed fuel saving operating mode, the fuel economy improvement achievable in New York City was observed to be approximately 22.69% of the fuel economy with the conventional strategy. The energy saving strategy was found to possess the lowest payback periods and highest immunity to variations in various cost parameters.

  19. A parallel reconfigurable platform for efficient sequence alignment

    African Journals Online (AJOL)

    SAM

    2014-08-13

    Aug 13, 2014 ... efficient probabilistic data structure that is used to test whether an element ... of given string with the help of hash functions; 4) ... It speeds up the data searching .... International Journal of Automation and Computing, Springer.

  20. A parallel reconfigurable platform for efficient sequence alignment ...

    African Journals Online (AJOL)

    Bioinformatics is one of the emerging trends in today's world. The major part of bioinformatics is dealing with DNA. Analysis of DNA requires more memory and high efficient computations to produce accurate outputs. Researchers use various bioinformatics algorithms for sequencing and pattern detection techniques, but still ...

  1. Study on Parallel Processing for Efficient Flexible Multibody Analysis based on Subsystem Synthesis Method

    Energy Technology Data Exchange (ETDEWEB)

    Han, Jong-Boo; Song, Hajun; Kim, Sung-Soo [Chungnam Nat’l Univ., Daejeon (Korea, Republic of)

    2017-06-15

    Flexible multibody simulations are widely used in the industry to design mechanical systems. In flexible multibody dynamics, deformation coordinates are described either relatively in the body reference frame that is floating in the space or in the inertial reference frame. Moreover, these deformation coordinates are generated based on the discretization of the body according to the finite element approach. Therefore, the formulation of the flexible multibody system always deals with a huge number of degrees of freedom and the numerical solution methods require a substantial amount of computational time. Parallel computational methods are a solution for efficient computation. However, most of the parallel computational methods are focused on the efficient solution of large-sized linear equations. For multibody analysis, we need to develop an efficient formulation that could be suitable for parallel computation. In this paper, we developed a subsystem synthesis method for a flexible multibody system and proposed efficient parallel computational schemes based on the OpenMP API in order to achieve efficient computation. Simulations of a rotating blade system, which consists of three identical blades, were carried out with two different parallel computational schemes. Actual CPU times were measured to investigate the efficiency of the proposed parallel schemes.

  2. On efficiency of fire simulation realization: parallelization with greater number of computational meshes

    Science.gov (United States)

    Valasek, Lukas; Glasa, Jan

    2017-12-01

    Current fire simulation systems are capable to utilize advantages of high-performance computer (HPC) platforms available and to model fires efficiently in parallel. In this paper, efficiency of a corridor fire simulation on a HPC computer cluster is discussed. The parallel MPI version of Fire Dynamics Simulator is used for testing efficiency of selected strategies of allocation of computational resources of the cluster using a greater number of computational cores. Simulation results indicate that if the number of cores used is not equal to a multiple of the total number of cluster node cores there are allocation strategies which provide more efficient calculations.

  3. Efficient Out of Core Sorting Algorithms for the Parallel Disks Model.

    Science.gov (United States)

    Kundeti, Vamsi; Rajasekaran, Sanguthevar

    2011-11-01

    In this paper we present efficient algorithms for sorting on the Parallel Disks Model (PDM). Numerous asymptotically optimal algorithms have been proposed in the literature. However many of these merge based algorithms have large underlying constants in the time bounds, because they suffer from the lack of read parallelism on PDM. The irregular consumption of the runs during the merge affects the read parallelism and contributes to the increased sorting time. In this paper we first introduce a novel idea called the dirty sequence accumulation that improves the read parallelism. Secondly, we show analytically that this idea can reduce the number of parallel I/O's required to sort the input close to the lower bound of [Formula: see text]. We experimentally verify our dirty sequence idea with the standard R-Way merge and show that our idea can reduce the number of parallel I/Os to sort on PDM significantly.

  4. An efficient implementation of parallel molecular dynamics method on SMP cluster architecture

    International Nuclear Information System (INIS)

    Suzuki, Masaaki; Okuda, Hiroshi; Yagawa, Genki

    2003-01-01

    The authors have applied MPI/OpenMP hybrid parallel programming model to parallelize a molecular dynamics (MD) method on a symmetric multiprocessor (SMP) cluster architecture. In that architecture, it can be expected that the hybrid parallel programming model, which uses the message passing library such as MPI for inter-SMP node communication and the loop directive such as OpenMP for intra-SNP node parallelization, is the most effective one. In this study, the parallel performance of the hybrid style has been compared with that of conventional flat parallel programming style, which uses only MPI, both in cases the fast multipole method (FMM) is employed for computing long-distance interactions and that is not employed. The computer environments used here are Hitachi SR8000/MPP placed at the University of Tokyo. The results of calculation are as follows. Without FMM, the parallel efficiency using 16 SMP nodes (128 PEs) is: 90% with the hybrid style, 75% with the flat-MPI style for MD simulation with 33,402 atoms. With FMM, the parallel efficiency using 16 SMP nodes (128 PEs) is: 60% with the hybrid style, 48% with the flat-MPI style for MD simulation with 117,649 atoms. (author)

  5. The role of crowding in parallel search: Peripheral pooling is not responsible for logarithmic efficiency in parallel search.

    Science.gov (United States)

    Madison, Anna; Lleras, Alejandro; Buetti, Simona

    2018-02-01

    Recent results from our laboratory showed that, in fixed-target parallel search tasks, reaction times increase in a logarithmic fashion with set size, and the slope of this logarithmic function is modulated by lure-target similarity. These results were interpreted as being consistent with a processing architecture where early vision (stage one) processes elements in the display in exhaustive fashion with unlimited capacity and with a limitation in resolution. Here, we evaluate the contribution of crowding to our recent logarithmic search slope findings, considering the possibility that peripheral pooling of features (as observed in crowding) may be responsible for logarithmic efficiency. Factors known to affect the strength of crowding were varied, specifically: item spacing and similarity. The results from three experiments converge on the same pattern of results: reaction times increased logarithmically with set size and were modulated by lure-target similarity even when crowding was minimized within displays through an inter-item spacing manipulation. Furthermore, we found logarithmic search efficiencies were overall improved in displays where crowding was minimized compared to displays where crowding was possible. The findings from these three experiments suggest logarithmic efficiency in efficient search is not the result peripheral pooling of features. That said, the presence of crowding does tend to reduce search efficiency, even in "pop-out" search situations.

  6. Efficient methodologies for system matrix modelling in iterative image reconstruction for rotating high-resolution PET

    Energy Technology Data Exchange (ETDEWEB)

    Ortuno, J E; Kontaxakis, G; Rubio, J L; Santos, A [Departamento de Ingenieria Electronica (DIE), Universidad Politecnica de Madrid, Ciudad Universitaria s/n, 28040 Madrid (Spain); Guerra, P [Networking Research Center on Bioengineering, Biomaterials and Nanomedicine (CIBER-BBN), Madrid (Spain)], E-mail: juanen@die.upm.es

    2010-04-07

    A fully 3D iterative image reconstruction algorithm has been developed for high-resolution PET cameras composed of pixelated scintillator crystal arrays and rotating planar detectors, based on the ordered subsets approach. The associated system matrix is precalculated with Monte Carlo methods that incorporate physical effects not included in analytical models, such as positron range effects and interaction of the incident gammas with the scintillator material. Custom Monte Carlo methodologies have been developed and optimized for modelling of system matrices for fast iterative image reconstruction adapted to specific scanner geometries, without redundant calculations. According to the methodology proposed here, only one-eighth of the voxels within two central transaxial slices need to be modelled in detail. The rest of the system matrix elements can be obtained with the aid of axial symmetries and redundancies, as well as in-plane symmetries within transaxial slices. Sparse matrix techniques for the non-zero system matrix elements are employed, allowing for fast execution of the image reconstruction process. This 3D image reconstruction scheme has been compared in terms of image quality to a 2D fast implementation of the OSEM algorithm combined with Fourier rebinning approaches. This work confirms the superiority of fully 3D OSEM in terms of spatial resolution, contrast recovery and noise reduction as compared to conventional 2D approaches based on rebinning schemes. At the same time it demonstrates that fully 3D methodologies can be efficiently applied to the image reconstruction problem for high-resolution rotational PET cameras by applying accurate pre-calculated system models and taking advantage of the system's symmetries.

  7. Leveraging Cloud Heterogeneity for Cost-Efficient Execution of Parallel Applications

    OpenAIRE

    Roloff, Eduardo; Diener, Matthias; Diaz Carreño, Emmanuell; Gaspary, Luciano Paschoal; Navaux, Philippe O.A.

    2017-01-01

    Public cloud providers offer a wide range of instance types, with different processing and interconnection speeds, as well as varying prices. Furthermore, the tasks of many parallel applications show different computational demands due to load imbalance. These differences can be exploited for improving the cost efficiency of parallel applications in many cloud environments by matching application requirements to instance types. In this paper, we introduce the concept of heterogeneous cloud sy...

  8. An efficient iterative model reduction method for aeroviscoelastic panel flutter analysis in the supersonic regime

    Science.gov (United States)

    Cunha-Filho, A. G.; Briend, Y. P. J.; de Lima, A. M. G.; Donadon, M. V.

    2018-05-01

    The flutter boundary prediction of complex aeroelastic systems is not an easy task. In some cases, these analyses may become prohibitive due to the high computational cost and time associated with the large number of degrees of freedom of the aeroelastic models, particularly when the aeroelastic model incorporates a control strategy with the aim of suppressing the flutter phenomenon, such as the use of viscoelastic treatments. In this situation, the use of a model reduction method is essential. However, the construction of a modal reduction basis for aeroviscoelastic systems is still a challenge, owing to the inherent frequency- and temperature-dependent behavior of the viscoelastic materials. Thus, the main contribution intended for the present study is to propose an efficient and accurate iterative enriched Ritz basis to deal with aeroviscoelastic systems. The main features and capabilities of the proposed model reduction method are illustrated in the prediction of flutter boundary for a thin three-layer sandwich flat panel and a typical aeronautical stiffened panel, both under supersonic flow.

  9. Efficient high-precision matrix algebra on parallel architectures for nonlinear combinatorial optimization

    KAUST Repository

    Gunnels, John; Lee, Jon; Margulies, Susan

    2010-01-01

    We provide a first demonstration of the idea that matrix-based algorithms for nonlinear combinatorial optimization problems can be efficiently implemented. Such algorithms were mainly conceived by theoretical computer scientists for proving efficiency. We are able to demonstrate the practicality of our approach by developing an implementation on a massively parallel architecture, and exploiting scalable and efficient parallel implementations of algorithms for ultra high-precision linear algebra. Additionally, we have delineated and implemented the necessary algorithmic and coding changes required in order to address problems several orders of magnitude larger, dealing with the limits of scalability from memory footprint, computational efficiency, reliability, and interconnect perspectives. © Springer and Mathematical Programming Society 2010.

  10. Efficient high-precision matrix algebra on parallel architectures for nonlinear combinatorial optimization

    KAUST Repository

    Gunnels, John

    2010-06-01

    We provide a first demonstration of the idea that matrix-based algorithms for nonlinear combinatorial optimization problems can be efficiently implemented. Such algorithms were mainly conceived by theoretical computer scientists for proving efficiency. We are able to demonstrate the practicality of our approach by developing an implementation on a massively parallel architecture, and exploiting scalable and efficient parallel implementations of algorithms for ultra high-precision linear algebra. Additionally, we have delineated and implemented the necessary algorithmic and coding changes required in order to address problems several orders of magnitude larger, dealing with the limits of scalability from memory footprint, computational efficiency, reliability, and interconnect perspectives. © Springer and Mathematical Programming Society 2010.

  11. An efficient parallel algorithm for the calculation of canonical MP2 energies.

    Science.gov (United States)

    Baker, Jon; Pulay, Peter

    2002-09-01

    We present the parallel version of a previous serial algorithm for the efficient calculation of canonical MP2 energies (Pulay, P.; Saebo, S.; Wolinski, K. Chem Phys Lett 2001, 344, 543). It is based on the Saebo-Almlöf direct-integral transformation, coupled with an efficient prescreening of the AO integrals. The parallel algorithm avoids synchronization delays by spawning a second set of slaves during the bin-sort prior to the second half-transformation. Results are presented for systems with up to 2000 basis functions. MP2 energies for molecules with 400-500 basis functions can be routinely calculated to microhartree accuracy on a small number of processors (6-8) in a matter of minutes with modern PC-based parallel computers. Copyright 2002 Wiley Periodicals, Inc. J Comput Chem 23: 1150-1156, 2002

  12. Functional efficiency comparison between split- and parallel-hybrid using advanced energy flow analysis methods

    Energy Technology Data Exchange (ETDEWEB)

    Guttenberg, Philipp; Lin, Mengyan [Romax Technology, Nottingham (United Kingdom)

    2009-07-01

    The following paper presents a comparative efficiency analysis of the Toyota Prius versus the Honda Insight using advanced Energy Flow Analysis methods. The sample study shows that even very different hybrid concepts like a split- and a parallel-hybrid can be compared in a high level of detail and demonstrates the benefit showing exemplary results. (orig.)

  13. Power Efficient Design of Parallel/Serial FIR Filters in RNS

    DEFF Research Database (Denmark)

    Petricca, Massimo; Albicocco, Pietro; Cardarilli, Gian Carlo

    2012-01-01

    It is well known that the Residue Number System (RNS) provides an efficient implementation of parallel FIR filters especially when the filter order and the dynamic range are high. The two main drawbacks of RNS, need of converters and coding overhead, make a serialized implementation of the FIR...

  14. Further comments on the geometrical efficiency of a parallel-disk source and detector system

    International Nuclear Information System (INIS)

    Ruby, L.

    1994-01-01

    A derivation is presented for a previously published formula, which determines the geometrical efficiency of a parallel-disk source and detector system. The formula involves an integral over a product of two Bessel functions. An algebraic approximation to the integral is also discussed. (orig.)

  15. COMPUTATIONAL EFFICIENCY OF A MODIFIED SCATTERING KERNEL FOR FULL-COUPLED PHOTON-ELECTRON TRANSPORT PARALLEL COMPUTING WITH UNSTRUCTURED TETRAHEDRAL MESHES

    Directory of Open Access Journals (Sweden)

    JONG WOON KIM

    2014-04-01

    In this paper, we introduce a modified scattering kernel approach to avoid the unnecessarily repeated calculations involved with the scattering source calculation, and used it with parallel computing to effectively reduce the computation time. Its computational efficiency was tested for three-dimensional full-coupled photon-electron transport problems using our computer program which solves the multi-group discrete ordinates transport equation by using the discontinuous finite element method with unstructured tetrahedral meshes for complicated geometrical problems. The numerical tests show that we can improve speed up to 17∼42 times for the elapsed time per iteration using the modified scattering kernel, not only in the single CPU calculation but also in the parallel computing with several CPUs.

  16. CONTRIBUTION OF QUADRATIC RESIDUE DIFFUSERS TO EFFICIENCY OF TILTED PROFILE PARALLEL HIGHWAY NOISE BARRIERS

    Directory of Open Access Journals (Sweden)

    M. R. Monazzam ، P. Nassiri

    2009-10-01

    Full Text Available This paper presents the results of an investigation on the acoustic performance of tilted profile parallel barriers with quadratic residue diffuser (QRD tops and faces. A 2D boundary element method (BEM is used to predict the barrier insertion loss. The results of rigid and with absorptive coverage are also calculated for comparisons. Using QRD on the top surface and faces of all tilted profile parallel barrier models introduced here is found to improve the efficiency of barriers compared with rigid equivalent parallel barrier at the examined receiver positions. Applying a QRD with frequency design of 400 Hz on 5 degrees tilted parallel barrier improves the overall performance of its equivalent rigid barrier by 1.8 dB(A. Increase in the treated surfaces with reactive elements shifts the effective performance toward lower frequencies. It is found that by tilting the barriers from 0 to 10 degrees in parallel set up, the degradation effects in parallel barriers is reduced but the absorption effect of fibrous materials and also diffusivity of the quadratic residue diffuser is reduced significantly. In this case all the designed barriers have better performance with 10 degrees tilting in parallel set up. The most economic traffic noise parallel barrier which produces significantly high performance, is achieved by covering the top surface of the barrier closed to the receiver by just a QRD with frequency design of 400 Hz and tilting angle of 10 degrees. The average A-weighted insertion loss in this barrier is predicted to be 16.3 dB (A.

  17. Contribution of diffuser surfaces to efficiency of tilted T shape parallel highway noise barriers

    Directory of Open Access Journals (Sweden)

    N. Javid Rouzi

    2009-04-01

    Full Text Available Background and aimsThe paper presents the results of an investigation on the acoustic  performance of tilted profile parallel barriers with quadratic residue diffuser tops and faces.MethodsA2D boundary element method (BEM is used to predict the barrier insertion loss. The results of rigid and with absorptive coverage are also calculated for comparisons. Using QRD on the top surface and faces of all tilted profile parallel barrier models introduced here is found to  improve the efficiency of barriers compared with rigid equivalent parallel barrier at the examined  receiver positions.Results Applying a QRD with frequency design of 400 Hz on 5 degrees tilted parallel barrier  improves the overall performance of its equivalent rigid barrier by 1.8 dB(A. Increase the treated surfaces with reactive elements shifts the effective performance toward lower frequencies. It is  found that by tilting the barriers from 0 to 10 degrees in parallel set up, the degradation effects in  parallel barriers is reduced but the absorption effect of fibrous materials and also diffusivity of thequadratic residue diffuser is reduced significantly. In this case all the designed barriers have better  performance with 10 degrees tilting in parallel set up.ConclusionThe most economic traffic noise parallel barrier, which produces significantly  high performance, is achieved by covering the top surface of the barrier closed to the receiver by  just a QRD with frequency design of 400 Hz and tilting angle of 10 degrees. The average Aweighted  insertion loss in this barrier is predicted to be 16.3 dB (A.

  18. An efficient parallel algorithm for the solution of a tridiagonal linear system of equations

    Science.gov (United States)

    Stone, H. S.

    1971-01-01

    Tridiagonal linear systems of equations are solved on conventional serial machines in a time proportional to N, where N is the number of equations. The conventional algorithms do not lend themselves directly to parallel computations on computers of the ILLIAC IV class, in the sense that they appear to be inherently serial. An efficient parallel algorithm is presented in which computation time grows as log sub 2 N. The algorithm is based on recursive doubling solutions of linear recurrence relations, and can be used to solve recurrence relations of all orders.

  19. Efficient parallel implementation of active appearance model fitting algorithm on GPU.

    Science.gov (United States)

    Wang, Jinwei; Ma, Xirong; Zhu, Yuanping; Sun, Jizhou

    2014-01-01

    The active appearance model (AAM) is one of the most powerful model-based object detecting and tracking methods which has been widely used in various situations. However, the high-dimensional texture representation causes very time-consuming computations, which makes the AAM difficult to apply to real-time systems. The emergence of modern graphics processing units (GPUs) that feature a many-core, fine-grained parallel architecture provides new and promising solutions to overcome the computational challenge. In this paper, we propose an efficient parallel implementation of the AAM fitting algorithm on GPUs. Our design idea is fine grain parallelism in which we distribute the texture data of the AAM, in pixels, to thousands of parallel GPU threads for processing, which makes the algorithm fit better into the GPU architecture. We implement our algorithm using the compute unified device architecture (CUDA) on the Nvidia's GTX 650 GPU, which has the latest Kepler architecture. To compare the performance of our algorithm with different data sizes, we built sixteen face AAM models of different dimensional textures. The experiment results show that our parallel AAM fitting algorithm can achieve real-time performance for videos even on very high-dimensional textures.

  20. Efficient Parallel Implementation of Active Appearance Model Fitting Algorithm on GPU

    Directory of Open Access Journals (Sweden)

    Jinwei Wang

    2014-01-01

    Full Text Available The active appearance model (AAM is one of the most powerful model-based object detecting and tracking methods which has been widely used in various situations. However, the high-dimensional texture representation causes very time-consuming computations, which makes the AAM difficult to apply to real-time systems. The emergence of modern graphics processing units (GPUs that feature a many-core, fine-grained parallel architecture provides new and promising solutions to overcome the computational challenge. In this paper, we propose an efficient parallel implementation of the AAM fitting algorithm on GPUs. Our design idea is fine grain parallelism in which we distribute the texture data of the AAM, in pixels, to thousands of parallel GPU threads for processing, which makes the algorithm fit better into the GPU architecture. We implement our algorithm using the compute unified device architecture (CUDA on the Nvidia’s GTX 650 GPU, which has the latest Kepler architecture. To compare the performance of our algorithm with different data sizes, we built sixteen face AAM models of different dimensional textures. The experiment results show that our parallel AAM fitting algorithm can achieve real-time performance for videos even on very high-dimensional textures.

  1. OS and Runtime Support for Efficiently Managing Cores in Parallel Applications

    OpenAIRE

    Klues, Kevin Alan

    2015-01-01

    Parallel applications can benefit from the ability to explicitly control their thread scheduling policies in user-space. However, modern operating systems lack the interfaces necessary to make this type of “user-level” scheduling efficient. The key component missing is the ability for applications to gain direct access to cores and keep control of those cores even when making I/O operations that traditionally block in the kernel. A number of former systems provided limited support for these c...

  2. An efficient numerical scheme for the simulation of parallel-plate active magnetic regenerators

    DEFF Research Database (Denmark)

    Torregrosa-Jaime, Bárbara; Corberán, José M.; Payá, Jorge

    2015-01-01

    A one-dimensional model of a parallel-plate active magnetic regenerator (AMR) is presented in this work. The model is based on an efficient numerical scheme which has been developed after analysing the heat transfer mechanisms in the regenerator bed. The new finite difference scheme optimally com...... to the fully implicit scheme, the proposed scheme achieves more accurate results, prevents numerical errors and requires less computational effort. In AMR simulations the new scheme can reduce the computational time by 88%....

  3. Efficient parallel and out of core algorithms for constructing large bi-directed de Bruijn graphs

    Directory of Open Access Journals (Sweden)

    Vaughn Matthew

    2010-11-01

    Full Text Available Abstract Background Assembling genomic sequences from a set of overlapping reads is one of the most fundamental problems in computational biology. Algorithms addressing the assembly problem fall into two broad categories - based on the data structures which they employ. The first class uses an overlap/string graph and the second type uses a de Bruijn graph. However with the recent advances in short read sequencing technology, de Bruijn graph based algorithms seem to play a vital role in practice. Efficient algorithms for building these massive de Bruijn graphs are very essential in large sequencing projects based on short reads. In an earlier work, an O(n/p time parallel algorithm has been given for this problem. Here n is the size of the input and p is the number of processors. This algorithm enumerates all possible bi-directed edges which can overlap with a node and ends up generating Θ(nΣ messages (Σ being the size of the alphabet. Results In this paper we present a Θ(n/p time parallel algorithm with a communication complexity that is equal to that of parallel sorting and is not sensitive to Σ. The generality of our algorithm makes it very easy to extend it even to the out-of-core model and in this case it has an optimal I/O complexity of Θ(nlog(n/BBlog(M/B (M being the main memory size and B being the size of the disk block. We demonstrate the scalability of our parallel algorithm on a SGI/Altix computer. A comparison of our algorithm with the previous approaches reveals that our algorithm is faster - both asymptotically and practically. We demonstrate the scalability of our sequential out-of-core algorithm by comparing it with the algorithm used by VELVET to build the bi-directed de Bruijn graph. Our experiments reveal that our algorithm can build the graph with a constant amount of memory, which clearly outperforms VELVET. We also provide efficient algorithms for the bi-directed chain compaction problem. Conclusions The bi

  4. Efficient parallel and out of core algorithms for constructing large bi-directed de Bruijn graphs.

    Science.gov (United States)

    Kundeti, Vamsi K; Rajasekaran, Sanguthevar; Dinh, Hieu; Vaughn, Matthew; Thapar, Vishal

    2010-11-15

    Assembling genomic sequences from a set of overlapping reads is one of the most fundamental problems in computational biology. Algorithms addressing the assembly problem fall into two broad categories - based on the data structures which they employ. The first class uses an overlap/string graph and the second type uses a de Bruijn graph. However with the recent advances in short read sequencing technology, de Bruijn graph based algorithms seem to play a vital role in practice. Efficient algorithms for building these massive de Bruijn graphs are very essential in large sequencing projects based on short reads. In an earlier work, an O(n/p) time parallel algorithm has been given for this problem. Here n is the size of the input and p is the number of processors. This algorithm enumerates all possible bi-directed edges which can overlap with a node and ends up generating Θ(nΣ) messages (Σ being the size of the alphabet). In this paper we present a Θ(n/p) time parallel algorithm with a communication complexity that is equal to that of parallel sorting and is not sensitive to Σ. The generality of our algorithm makes it very easy to extend it even to the out-of-core model and in this case it has an optimal I/O complexity of Θ(nlog(n/B)Blog(M/B)) (M being the main memory size and B being the size of the disk block). We demonstrate the scalability of our parallel algorithm on a SGI/Altix computer. A comparison of our algorithm with the previous approaches reveals that our algorithm is faster--both asymptotically and practically. We demonstrate the scalability of our sequential out-of-core algorithm by comparing it with the algorithm used by VELVET to build the bi-directed de Bruijn graph. Our experiments reveal that our algorithm can build the graph with a constant amount of memory, which clearly outperforms VELVET. We also provide efficient algorithms for the bi-directed chain compaction problem. The bi-directed de Bruijn graph is a fundamental data structure for

  5. An Efficient Parallel Multi-Scale Segmentation Method for Remote Sensing Imagery

    Directory of Open Access Journals (Sweden)

    Haiyan Gu

    2018-04-01

    Full Text Available Remote sensing (RS image segmentation is an essential step in geographic object-based image analysis (GEOBIA to ultimately derive “meaningful objects”. While many segmentation methods exist, most of them are not efficient for large data sets. Thus, the goal of this research is to develop an efficient parallel multi-scale segmentation method for RS imagery by combining graph theory and the fractal net evolution approach (FNEA. Specifically, a minimum spanning tree (MST algorithm in graph theory is proposed to be combined with a minimum heterogeneity rule (MHR algorithm that is used in FNEA. The MST algorithm is used for the initial segmentation while the MHR algorithm is used for object merging. An efficient implementation of the segmentation strategy is presented using data partition and the “reverse searching-forward processing” chain based on message passing interface (MPI parallel technology. Segmentation results of the proposed method using images from multiple sensors (airborne, SPECIM AISA EAGLE II, WorldView-2, RADARSAT-2 and different selected landscapes (residential/industrial, residential/agriculture covering four test sites indicated its efficiency in accuracy and speed. We conclude that the proposed method is applicable and efficient for the segmentation of a variety of RS imagery (airborne optical, satellite optical, SAR, high-spectral, while the accuracy is comparable with that of the FNEA method.

  6. An efficient parallel stochastic simulation method for analysis of nonviral gene delivery systems

    KAUST Repository

    Kuwahara, Hiroyuki

    2011-01-01

    Gene therapy has a great potential to become an effective treatment for a wide variety of diseases. One of the main challenges to make gene therapy practical in clinical settings is the development of efficient and safe mechanisms to deliver foreign DNA molecules into the nucleus of target cells. Several computational and experimental studies have shown that the design process of synthetic gene transfer vectors can be greatly enhanced by computational modeling and simulation. This paper proposes a novel, effective parallelization of the stochastic simulation algorithm (SSA) for pharmacokinetic models that characterize the rate-limiting, multi-step processes of intracellular gene delivery. While efficient parallelizations of the SSA are still an open problem in a general setting, the proposed parallel simulation method is able to substantially accelerate the next reaction selection scheme and the reaction update scheme in the SSA by exploiting and decomposing the structures of stochastic gene delivery models. This, thus, makes computationally intensive analysis such as parameter optimizations and gene dosage control for specific cell types, gene vectors, and transgene expression stability substantially more practical than that could otherwise be with the standard SSA. Here, we translated the nonviral gene delivery model based on mass-action kinetics by Varga et al. [Molecular Therapy, 4(5), 2001] into a more realistic model that captures intracellular fluctuations based on stochastic chemical kinetics, and as a case study we applied our parallel simulation to this stochastic model. Our results show that our simulation method is able to increase the efficiency of statistical analysis by at least 50% in various settings. © 2011 ACM.

  7. Improving the efficiency of molecular replacement by utilizing a new iterative transform phasing algorithm

    Energy Technology Data Exchange (ETDEWEB)

    He, Hongxing; Fang, Hengrui [Department of Physics and Texas Center for Superconductivity, University of Houston, Houston, Texas 77204 (United States); Miller, Mitchell D. [Department of BioSciences, Rice University, Houston, Texas 77005 (United States); Phillips, George N. Jr [Department of BioSciences, Rice University, Houston, Texas 77005 (United States); Department of Chemistry, Rice University, Houston, Texas 77005 (United States); Department of Biochemistry, University of Wisconsin-Madison, Madison, Wisconsin 53706 (United States); Su, Wu-Pei, E-mail: wpsu@uh.edu [Department of Physics and Texas Center for Superconductivity, University of Houston, Houston, Texas 77204 (United States)

    2016-07-15

    An iterative transform algorithm is proposed to improve the conventional molecular-replacement method for solving the phase problem in X-ray crystallography. Several examples of successful trial calculations carried out with real diffraction data are presented. An iterative transform method proposed previously for direct phasing of high-solvent-content protein crystals is employed for enhancing the molecular-replacement (MR) algorithm in protein crystallography. Target structures that are resistant to conventional MR due to insufficient similarity between the template and target structures might be tractable with this modified phasing method. Trial calculations involving three different structures are described to test and illustrate the methodology. The relationship of the approach to PHENIX Phaser-MR and MR-Rosetta is discussed.

  8. Efficient numerical methods for fluid- and electrodynamics on massively parallel systems

    Energy Technology Data Exchange (ETDEWEB)

    Zudrop, Jens

    2016-07-01

    In the last decade, computer technology has evolved rapidly. Modern high performance computing systems offer a tremendous amount of computing power in the range of a few peta floating point operations per second. In contrast, numerical software development is much slower and most existing simulation codes cannot exploit the full computing power of these systems. Partially, this is due to the numerical methods themselves and partially it is related to bottlenecks within the parallelization concept and its data structures. The goal of the thesis is the development of numerical algorithms and corresponding data structures to remedy both kinds of parallelization bottlenecks. The approach is based on a co-design of the numerical schemes (including numerical analysis) and their realizations in algorithms and software. Various kinds of applications, from multicomponent flows (Lattice Boltzmann Method) to electrodynamics (Discontinuous Galerkin Method) to embedded geometries (Octree), are considered and efficiency of the developed approaches is demonstrated for large scale simulations.

  9. Parallel scalability and efficiency of vortex particle method for aeroelasticity analysis of bluff bodies

    Science.gov (United States)

    Tolba, Khaled Ibrahim; Morgenthal, Guido

    2018-01-01

    This paper presents an analysis of the scalability and efficiency of a simulation framework based on the vortex particle method. The code is applied for the numerical aerodynamic analysis of line-like structures. The numerical code runs on multicore CPU and GPU architectures using OpenCL framework. The focus of this paper is the analysis of the parallel efficiency and scalability of the method being applied to an engineering test case, specifically the aeroelastic response of a long-span bridge girder at the construction stage. The target is to assess the optimal configuration and the required computer architecture, such that it becomes feasible to efficiently utilise the method within the computational resources available for a regular engineering office. The simulations and the scalability analysis are performed on a regular gaming type computer.

  10. Parallel and Efficient Sensitivity Analysis of Microscopy Image Segmentation Workflows in Hybrid Systems.

    Science.gov (United States)

    Barreiros, Willian; Teodoro, George; Kurc, Tahsin; Kong, Jun; Melo, Alba C M A; Saltz, Joel

    2017-09-01

    We investigate efficient sensitivity analysis (SA) of algorithms that segment and classify image features in a large dataset of high-resolution images. Algorithm SA is the process of evaluating variations of methods and parameter values to quantify differences in the output. A SA can be very compute demanding because it requires re-processing the input dataset several times with different parameters to assess variations in output. In this work, we introduce strategies to efficiently speed up SA via runtime optimizations targeting distributed hybrid systems and reuse of computations from runs with different parameters. We evaluate our approach using a cancer image analysis workflow on a hybrid cluster with 256 nodes, each with an Intel Phi and a dual socket CPU. The SA attained a parallel efficiency of over 90% on 256 nodes. The cooperative execution using the CPUs and the Phi available in each node with smart task assignment strategies resulted in an additional speedup of about 2×. Finally, multi-level computation reuse lead to an additional speedup of up to 2.46× on the parallel version. The level of performance attained with the proposed optimizations will allow the use of SA in large-scale studies.

  11. Energy-Efficient FPGA-Based Parallel Quasi-Stochastic Computing

    Directory of Open Access Journals (Sweden)

    Ramu Seva

    2017-11-01

    Full Text Available The high performance of FPGA (Field Programmable Gate Array in image processing applications is justified by its flexible reconfigurability, its inherent parallel nature and the availability of a large amount of internal memories. Lately, the Stochastic Computing (SC paradigm has been found to be significantly advantageous in certain application domains including image processing because of its lower hardware complexity and power consumption. However, its viability is deemed to be limited due to its serial bitstream processing and excessive run-time requirement for convergence. To address these issues, a novel approach is proposed in this work where an energy-efficient implementation of SC is accomplished by introducing fast-converging Quasi-Stochastic Number Generators (QSNGs and parallel stochastic bitstream processing, which are well suited to leverage FPGA’s reconfigurability and abundant internal memory resources. The proposed approach has been tested on the Virtex-4 FPGA, and results have been compared with the serial and parallel implementations of conventional stochastic computation using the well-known SC edge detection and multiplication circuits. Results prove that by using this approach, execution time, as well as the power consumption are decreased by a factor of 3.5 and 4.5 for the edge detection circuit and multiplication circuit, respectively.

  12. Development and application of efficient strategies for parallel magnetic resonance imaging

    Energy Technology Data Exchange (ETDEWEB)

    Breuer, F.

    2006-07-01

    artifacts. Unfortunately, parallel imaging is associated with a loss in signal-to-noise ratio (SNR) and therefore is limited to applications which do not already operate at the SNR limit. An additional limitation is the fact that the coil array must provide sufficient sensitivity variations throughout the object under investigation in order to offer enough spatial encoding capacity. This doctoral thesis exhibits an overview of my research on the topic of efficient parallel imaging strategies. Based on existing parallel acquisition and reconstruction strategies, such as SENSE and GRAPPA, new concepts have been developed and transferred to potential clinical applications. (orig.)

  13. Development and application of efficient strategies for parallel magnetic resonance imaging

    International Nuclear Information System (INIS)

    Breuer, F.

    2006-01-01

    . Unfortunately, parallel imaging is associated with a loss in signal-to-noise ratio (SNR) and therefore is limited to applications which do not already operate at the SNR limit. An additional limitation is the fact that the coil array must provide sufficient sensitivity variations throughout the object under investigation in order to offer enough spatial encoding capacity. This doctoral thesis exhibits an overview of my research on the topic of efficient parallel imaging strategies. Based on existing parallel acquisition and reconstruction strategies, such as SENSE and GRAPPA, new concepts have been developed and transferred to potential clinical applications. (orig.)

  14. Implementing O(N N-Body Algorithms Efficiently in Data-Parallel Languages

    Directory of Open Access Journals (Sweden)

    Yu Hu

    1996-01-01

    Full Text Available The optimization techniques for hierarchical O(N N-body algorithms described here focus on managing the data distribution and the data references, both between the memories of different nodes and within the memory hierarchy of each node. We show how the techniques can be expressed in data-parallel languages, such as High Performance Fortran (HPF and Connection Machine Fortran (CMF. The effectiveness of our techniques is demonstrated on an implementation of Anderson's hierarchical O(N N-body method for the Connection Machine system CM-5/5E. Of the total execution time, communication accounts for about 10–20% of the total time, with the average efficiency for arithmetic operations being about 40% and the total efficiency (including communication being about 35%. For the CM-5E, a performance in excess of 60 Mflop/s per node (peak 160 Mflop/s per node has been measured.

  15. Efficient method to design RF pulses for parallel excitation MRI using gridding and conjugate gradient.

    Science.gov (United States)

    Feng, Shuo; Ji, Jim

    2014-04-01

    Parallel excitation (pTx) techniques with multiple transmit channels have been widely used in high field MRI imaging to shorten the RF pulse duration and/or reduce the specific absorption rate (SAR). However, the efficiency of pulse design still needs substantial improvement for practical real-time applications. In this paper, we present a detailed description of a fast pulse design method with Fourier domain gridding and a conjugate gradient method. Simulation results of the proposed method show that the proposed method can design pTx pulses at an efficiency 10 times higher than that of the conventional conjugate-gradient based method, without reducing the accuracy of the desirable excitation patterns.

  16. The composite iteration algorithm for finding efficient and financially fair risk-sharing rules

    NARCIS (Netherlands)

    Pazdera, J.; Schumacher, J.M.; Werker, B.J.M.

    2017-01-01

    We consider the problem of finding an efficient and fair ex-ante rule for division of an uncertain monetary outcome among a finite number of von Neumann-Morgenstern agents. Efficiency is understood here, as usual, in the sense of Pareto efficiency subject to the feasibility constraint. Fairness is

  17. The composite iteration algorithm for finding efficient and financially fair risk-sharing rules

    NARCIS (Netherlands)

    Pazdera, Jaroslav; Schumacher, Hans; Werker, Bas

    2017-01-01

    We consider the problem of finding an efficient and fair ex-ante rule for division of an uncertain monetary outcome among a finite number of von Neumann–Morgenstern agents. Efficiency is understood here, as usual, in the sense of Pareto efficiency subject to the feasibility constraint. Fairness is

  18. Efficient fractal-based mutation in evolutionary algorithms from iterated function systems

    Science.gov (United States)

    Salcedo-Sanz, S.; Aybar-Ruíz, A.; Camacho-Gómez, C.; Pereira, E.

    2018-03-01

    In this paper we present a new mutation procedure for Evolutionary Programming (EP) approaches, based on Iterated Function Systems (IFSs). The new mutation procedure proposed consists of considering a set of IFS which are able to generate fractal structures in a two-dimensional phase space, and use them to modify a current individual of the EP algorithm, instead of using random numbers from different probability density functions. We test this new proposal in a set of benchmark functions for continuous optimization problems. In this case, we compare the proposed mutation against classical Evolutionary Programming approaches, with mutations based on Gaussian, Cauchy and chaotic maps. We also include a discussion on the IFS-based mutation in a real application of Tuned Mass Dumper (TMD) location and optimization for vibration cancellation in buildings. In both practical cases, the proposed EP with the IFS-based mutation obtained extremely competitive results compared to alternative classical mutation operators.

  19. Efficient sequential and parallel algorithms for finding edit distance based motifs.

    Science.gov (United States)

    Pal, Soumitra; Xiao, Peng; Rajasekaran, Sanguthevar

    2016-08-18

    Motif search is an important step in extracting meaningful patterns from biological data. The general problem of motif search is intractable and there is a pressing need to develop efficient, exact and approximation algorithms to solve this problem. In this paper, we present several novel, exact, sequential and parallel algorithms for solving the (l,d) Edit-distance-based Motif Search (EMS) problem: given two integers l,d and n biological strings, find all strings of length l that appear in each input string with atmost d errors of types substitution, insertion and deletion. One popular technique to solve the problem is to explore for each input string the set of all possible l-mers that belong to the d-neighborhood of any substring of the input string and output those which are common for all input strings. We introduce a novel and provably efficient neighborhood exploration technique. We show that it is enough to consider the candidates in neighborhood which are at a distance exactly d. We compactly represent these candidate motifs using wildcard characters and efficiently explore them with very few repetitions. Our sequential algorithm uses a trie based data structure to efficiently store and sort the candidate motifs. Our parallel algorithm in a multi-core shared memory setting uses arrays for storing and a novel modification of radix-sort for sorting the candidate motifs. The algorithms for EMS are customarily evaluated on several challenging instances such as (8,1), (12,2), (16,3), (20,4), and so on. The best previously known algorithm, EMS1, is sequential and in estimated 3 days solves up to instance (16,3). Our sequential algorithms are more than 20 times faster on (16,3). On other hard instances such as (9,2), (11,3), (13,4), our algorithms are much faster. Our parallel algorithm has more than 600 % scaling performance while using 16 threads. Our algorithms have pushed up the state-of-the-art of EMS solvers and we believe that the techniques introduced in

  20. Efficient Parallel Sorting for Migrating Birds Optimization When Solving Machine-Part Cell Formation Problems

    Directory of Open Access Journals (Sweden)

    Ricardo Soto

    2016-01-01

    Full Text Available The Machine-Part Cell Formation Problem (MPCFP is a NP-Hard optimization problem that consists in grouping machines and parts in a set of cells, so that each cell can operate independently and the intercell movements are minimized. This problem has largely been tackled in the literature by using different techniques ranging from classic methods such as linear programming to more modern nature-inspired metaheuristics. In this paper, we present an efficient parallel version of the Migrating Birds Optimization metaheuristic for solving the MPCFP. Migrating Birds Optimization is a population metaheuristic based on the V-Flight formation of the migrating birds, which is proven to be an effective formation in energy saving. This approach is enhanced by the smart incorporation of parallel procedures that notably improve performance of the several sorting processes performed by the metaheuristic. We perform computational experiments on 1080 benchmarks resulting from the combination of 90 well-known MPCFP instances with 12 sorting configurations with and without threads. We illustrate promising results where the proposal is able to reach the global optimum in all instances, while the solving time with respect to a nonparallel approach is notably reduced.

  1. An Efficient MapReduce-Based Parallel Clustering Algorithm for Distributed Traffic Subarea Division

    Directory of Open Access Journals (Sweden)

    Dawen Xia

    2015-01-01

    Full Text Available Traffic subarea division is vital for traffic system management and traffic network analysis in intelligent transportation systems (ITSs. Since existing methods may not be suitable for big traffic data processing, this paper presents a MapReduce-based Parallel Three-Phase K-Means (Par3PKM algorithm for solving traffic subarea division problem on a widely adopted Hadoop distributed computing platform. Specifically, we first modify the distance metric and initialization strategy of K-Means and then employ a MapReduce paradigm to redesign the optimized K-Means algorithm for parallel clustering of large-scale taxi trajectories. Moreover, we propose a boundary identifying method to connect the borders of clustering results for each cluster. Finally, we divide traffic subarea of Beijing based on real-world trajectory data sets generated by 12,000 taxis in a period of one month using the proposed approach. Experimental evaluation results indicate that when compared with K-Means, Par2PK-Means, and ParCLARA, Par3PKM achieves higher efficiency, more accuracy, and better scalability and can effectively divide traffic subarea with big taxi trajectory data.

  2. Parallel and series FED microstrip array with high efficiency and low cross polarization

    Science.gov (United States)

    Huang, John (Inventor)

    1995-01-01

    A microstrip array antenna for vertically polarized fan beam (approximately 2 deg x 50 deg) for C-band SAR applications with a physical area of 1.7 m by 0.17 m comprises two rows of patch elements and employs a parallel feed to left- and right-half sections of the rows. Each section is divided into two segments that are fed in parallel with the elements in each segment fed in series through matched transmission lines for high efficiency. The inboard section has half the number of patch elements of the outboard section, and the outboard sections, which have tapered distribution with identical transmission line sections, terminated with half wavelength long open-circuit stubs so that the remaining energy is reflected and radiated in phase. The elements of the two inboard segments of the two left- and right-half sections are provided with tapered transmission lines from element to element for uniform power distribution over the central third of the entire array antenna. The two rows of array elements are excited at opposite patch feed locations with opposite (180 deg difference) phases for reduced cross-polarization.

  3. ERA: Efficient serial and parallel suffix tree construction for very long strings

    KAUST Repository

    Mansour, Essam

    2011-09-01

    The suffix tree is a data structure for indexing strings. It is used in a variety of applications such as bioinformatics, time series analysis, clustering, text editing and data compression. However, when the string and the resulting suffix tree are too large to fit into the main memory, most existing construction algorithms become very inefficient. This paper presents a disk-based suffix tree construction method, called Elastic Range (ERa), which works efficiently with very long strings that are much larger than the available memory. ERa partitions the tree construction process horizontally and vertically and minimizes I/Os by dynamically adjusting the horizontal partitions independently for each vertical partition, based on the evolving shape of the tree and the available memory. Where appropriate, ERa also groups vertical partitions together to amortize the I/O cost. We developed a serial version; a parallel version for shared-memory and shared-disk multi-core systems; and a parallel version for shared-nothing architectures. ERa indexes the entire human genome in 19 minutes on an ordinary desktop computer. For comparison, the fastest existing method needs 15 minutes using 1024 CPUs on an IBM BlueGene supercomputer.

  4. Efficient exact optimization of multi-objective redundancy allocation problems in series-parallel systems

    International Nuclear Information System (INIS)

    Cao, Dingzhou; Murat, Alper; Chinnam, Ratna Babu

    2013-01-01

    This paper proposes a decomposition-based approach to exactly solve the multi-objective Redundancy Allocation Problem for series-parallel systems. Redundancy allocation problem is a form of reliability optimization and has been the subject of many prior studies. The majority of these earlier studies treat redundancy allocation problem as a single objective problem maximizing the system reliability or minimizing the cost given certain constraints. The few studies that treated redundancy allocation problem as a multi-objective optimization problem relied on meta-heuristic solution approaches. However, meta-heuristic approaches have significant limitations: they do not guarantee that Pareto points are optimal and, more importantly, they may not identify all the Pareto-optimal points. In this paper, we treat redundancy allocation problem as a multi-objective problem, as is typical in practice. We decompose the original problem into several multi-objective sub-problems, efficiently and exactly solve sub-problems, and then systematically combine the solutions. The decomposition-based approach can efficiently generate all the Pareto-optimal solutions for redundancy allocation problems. Experimental results demonstrate the effectiveness and efficiency of the proposed method over meta-heuristic methods on a numerical example taken from the literature.

  5. An efficient parallel algorithm for the calculation of unrestricted canonical MP2 energies.

    Science.gov (United States)

    Baker, Jon; Wolinski, Krzysztof

    2011-11-30

    We present details of our efficient implementation of full accuracy unrestricted open-shell second-order canonical Møller-Plesset (MP2) energies, both serial and parallel. The algorithm is based on our previous restricted closed-shell MP2 code using the Saebo-Almlöf direct integral transformation. Depending on system details, UMP2 energies take from less than 1.5 to about 3.0 times as long as a closed-shell RMP2 energy on a similar system using the same algorithm. Several examples are given including timings for some large stable radicals with 90+ atoms and over 3600 basis functions. Copyright © 2011 Wiley Periodicals, Inc.

  6. Efficient job handling in the GRID short deadline, interactivity, fault tolerance and parallelism

    CERN Document Server

    Moscicki, Jakub

    2006-01-01

    The major GRID infastructures are designed mainly for batch-oriented computing with coarse-grained jobs and relatively high job turnaround time. However many practical applications in natural and physical sciences may be easily parallelized and run as a set of smaller tasks which require little or no synchronization and which may be scheduled in a more efficient way. The Distributed Analysis Environment Framework (DIANE), is a Master-Worker execution skeleton for applications, which complements the GRID middleware stack. Automatic failure recovery and task dispatching policies enable an easy customization of the behaviour of the framework in a dynamic and non-reliable computing environment. We demonstrate the experience of using the framework with several diverse real-life applications, including Monte Carlo Simulation, Physics Data Analysis and Biotechnology. The interfacing of existing sequential applications from the point of view of non-expert user is made easy, also for legacy applications. We analyze th...

  7. Efficient Serial and Parallel Algorithms for Selection of Unique Oligos in EST Databases.

    Science.gov (United States)

    Mata-Montero, Manrique; Shalaby, Nabil; Sheppard, Bradley

    2013-01-01

    Obtaining unique oligos from an EST database is a problem of great importance in bioinformatics, particularly in the discovery of new genes and the mapping of the human genome. Many algorithms have been developed to find unique oligos, many of which are much less time consuming than the traditional brute force approach. An algorithm was presented by Zheng et al. (2004) which finds the solution of the unique oligos search problem efficiently. We implement this algorithm as well as several new algorithms based on some theorems included in this paper. We demonstrate how, with these new algorithms, we can obtain unique oligos much faster than with previous ones. We parallelize these new algorithms to further improve the time of finding unique oligos. All algorithms are run on ESTs obtained from a Barley EST database.

  8. An Efficient, Non-iterative Method of Identifying the Cost-Effectiveness Frontier

    Science.gov (United States)

    Suen, Sze-chuan; Goldhaber-Fiebert, Jeremy D.

    2015-01-01

    Cost-effectiveness analysis aims to identify treatments and policies that maximize benefits subject to resource constraints. However, the conventional process of identifying the efficient frontier (i.e., the set of potentially cost-effective options) can be algorithmically inefficient, especially when considering a policy problem with many alternative options or when performing an extensive suite of sensitivity analyses for which the efficient frontier must be found for each. Here, we describe an alternative one-pass algorithm that is conceptually simple, easier to implement, and potentially faster for situations that challenge the conventional approach. Our algorithm accomplishes this by exploiting the relationship between the net monetary benefit and the cost-effectiveness plane. To facilitate further evaluation and use of this approach, we additionally provide scripts in R and Matlab that implement our method and can be used to identify efficient frontiers for any decision problem. PMID:25926282

  9. How to use MPI communication in highly parallel climate simulations more easily and more efficiently.

    Science.gov (United States)

    Behrens, Jörg; Hanke, Moritz; Jahns, Thomas

    2014-05-01

    In this talk we present a way to facilitate efficient use of MPI communication for developers of climate models. Exploitation of the performance potential of today's highly parallel supercomputers with real world simulations is a complex task. This is partly caused by the low level nature of the MPI communication library which is the dominant communication tool at least for inter-node communication. In order to manage the complexity of the task, climate simulations with non-trivial communication patterns often use an internal abstraction layer above MPI without exploiting the benefits of communication aggregation or MPI-datatypes. The solution for the complexity and performance problem we propose is the communication library YAXT. This library is built on top of MPI and takes high level descriptions of arbitrary domain decompositions and automatically derives an efficient collective data exchange. Several exchanges can be aggregated in order to reduce latency costs. Examples are given which demonstrate the simplicity and the performance gains for selected climate applications.

  10. Active Vibration Suppression of a 3-DOF Flexible Parallel Manipulator Using Efficient Modal Control

    Directory of Open Access Journals (Sweden)

    Quan Zhang

    2014-01-01

    Full Text Available This paper addresses the dynamic modeling and efficient modal control of a planar parallel manipulator (PPM with three flexible linkages actuated by linear ultrasonic motors (LUSM. To achieve active vibration control, multiple lead zirconate titanate (PZT transducers are mounted on the flexible links as vibration sensors and actuators. Based on Lagrange’s equations, the dynamic model of the flexible links is derived with the dynamics of PZT actuators incorporated. Using the assumed mode method (AMM, the elastic motion of the flexible links are discretized under the assumptions of pinned-free boundary conditions, and the assumed mode shapes are validated through experimental modal test. Efficient modal control (EMC, in which the feedback forces in different modes are determined according to the vibration amplitude or energy of their own, is employed to control the PZT actuators to realize active vibration suppression. Modal filters are developed to extract the modal displacements and velocities from the vibration sensors. Numerical simulation and vibration control experiments are conducted to verify the proposed dynamic model and controller. The results show that the EMC method has the capability of suppressing multimode vibration simultaneously, and both the structural and residual vibrations of the flexible links are effectively suppressed using EMC approach.

  11. ITER...ation

    International Nuclear Information System (INIS)

    Troyon, F.

    1997-01-01

    Recurrent attacks against ITER, the new generation of tokamak are a mix of political and scientific arguments. This short article draws a historical review of the European fusion program. This program has allowed to build and manage several installations in the aim of getting experimental results necessary to lead the program forwards. ITER will bring together a fusion reactor core with technologies such as materials, superconductive coils, heating devices and instrumentation in order to validate and delimit the operating range. ITER will be a logical and decisive step towards the use of controlled fusion. (A.C.)

  12. Jet formation and equatorial superrotation in Jupiter's atmosphere: Numerical modelling using a new efficient parallel code

    Science.gov (United States)

    Rivier, Leonard Gilles

    Using an efficient parallel code solving the primitive equations of atmospheric dynamics, the jet structure of a Jupiter like atmosphere is modeled. In the first part of this thesis, a parallel spectral code solving both the shallow water equations and the multi-level primitive equations of atmospheric dynamics is built. The implementation of this code called BOB is done so that it runs effectively on an inexpensive cluster of workstations. A one dimensional decomposition and transposition method insuring load balancing among processes is used. The Legendre transform is cache-blocked. A "compute on the fly" of the Legendre polynomials used in the spectral method produces a lower memory footprint and enables high resolution runs on relatively small memory machines. Performance studies are done using a cluster of workstations located at the National Center for Atmospheric Research (NCAR). BOB performances are compared to the parallel benchmark code PSTSWM and the dynamical core of NCAR's CCM3.6.6. In both cases, the comparison favors BOB. In the second part of this thesis, the primitive equation version of the code described in part I is used to study the formation of organized zonal jets and equatorial superrotation in a planetary atmosphere where the parameters are chosen to best model the upper atmosphere of Jupiter. Two levels are used in the vertical and only large scale forcing is present. The model is forced towards a baroclinically unstable flow, so that eddies are generated by baroclinic instability. We consider several types of forcing, acting on either the temperature or the momentum field. We show that only under very specific parametric conditions, zonally elongated structures form and persist resembling the jet structure observed near the cloud level top (1 bar) on Jupiter. We also study the effect of an equatorial heat source, meant to be a crude representation of the effect of the deep convective planetary interior onto the outer atmospheric layer. We

  13. Automatic analysis (aa: efficient neuroimaging workflows and parallel processing using Matlab and XML

    Directory of Open Access Journals (Sweden)

    Rhodri eCusack

    2015-01-01

    Full Text Available Recent years have seen neuroimaging data becoming richer, with larger cohorts of participants, a greater variety of acquisition techniques, and increasingly complex analyses. These advances have made data analysis pipelines complex to set up and run (increasing the risk of human error and time consuming to execute (restricting what analyses are attempted. Here we present an open-source framework, automatic analysis (aa, to address these concerns. Human efficiency is increased by making code modular and reusable, and managing its execution with a processing engine that tracks what has been completed and what needs to be (redone. Analysis is accelerated by optional parallel processing of independent tasks on cluster or cloud computing resources. A pipeline comprises a series of modules that each perform a specific task. The processing engine keeps track of the data, calculating a map of upstream and downstream dependencies for each module. Existing modules are available for many analysis tasks, such as SPM-based fMRI preprocessing, individual and group level statistics, voxel-based morphometry, tractography, and multi-voxel pattern analyses (MVPA. However, aa also allows for full customization, and encourages efficient management of code: new modules may be written with only a small code overhead. aa has been used by more than 50 researchers in hundreds of neuroimaging studies comprising thousands of subjects. It has been found to be robust, fast and efficient, for simple single subject studies up to multimodal pipelines on hundreds of subjects. It is attractive to both novice and experienced users. aa can reduce the amount of time neuroimaging laboratories spend performing analyses and reduce errors, expanding the range of scientific questions it is practical to address.

  14. Automatic analysis (aa): efficient neuroimaging workflows and parallel processing using Matlab and XML.

    Science.gov (United States)

    Cusack, Rhodri; Vicente-Grabovetsky, Alejandro; Mitchell, Daniel J; Wild, Conor J; Auer, Tibor; Linke, Annika C; Peelle, Jonathan E

    2014-01-01

    Recent years have seen neuroimaging data sets becoming richer, with larger cohorts of participants, a greater variety of acquisition techniques, and increasingly complex analyses. These advances have made data analysis pipelines complicated to set up and run (increasing the risk of human error) and time consuming to execute (restricting what analyses are attempted). Here we present an open-source framework, automatic analysis (aa), to address these concerns. Human efficiency is increased by making code modular and reusable, and managing its execution with a processing engine that tracks what has been completed and what needs to be (re)done. Analysis is accelerated by optional parallel processing of independent tasks on cluster or cloud computing resources. A pipeline comprises a series of modules that each perform a specific task. The processing engine keeps track of the data, calculating a map of upstream and downstream dependencies for each module. Existing modules are available for many analysis tasks, such as SPM-based fMRI preprocessing, individual and group level statistics, voxel-based morphometry, tractography, and multi-voxel pattern analyses (MVPA). However, aa also allows for full customization, and encourages efficient management of code: new modules may be written with only a small code overhead. aa has been used by more than 50 researchers in hundreds of neuroimaging studies comprising thousands of subjects. It has been found to be robust, fast, and efficient, for simple-single subject studies up to multimodal pipelines on hundreds of subjects. It is attractive to both novice and experienced users. aa can reduce the amount of time neuroimaging laboratories spend performing analyses and reduce errors, expanding the range of scientific questions it is practical to address.

  15. ITER council proceedings: 1997

    International Nuclear Information System (INIS)

    1997-01-01

    This volume of the ITER EDA Documentation Series presents records of the 12th ITER Council Meeting, IC-12, which took place on 23-24 July, 1997 in Tampere, Finland. The Council received from the Parties (EU, Japan, Russia, US) positive responses on the Detailed Design Report. The Parties stated their willingness to contribute to fulfil their obligations in contributing to the ITER EDA. The summary discussions among the Parties led to the consensus that in July 1998 the ITER activities should proceed for additional three years with a general intent to enable an efficient start of possible, future ITER construction

  16. Accurate and Efficient Parallel Implementation of an Effective Linear-Scaling Direct Random Phase Approximation Method.

    Science.gov (United States)

    Graf, Daniel; Beuerle, Matthias; Schurkus, Henry F; Luenser, Arne; Savasci, Gökcen; Ochsenfeld, Christian

    2018-05-08

    An efficient algorithm for calculating the random phase approximation (RPA) correlation energy is presented that is as accurate as the canonical molecular orbital resolution-of-the-identity RPA (RI-RPA) with the important advantage of an effective linear-scaling behavior (instead of quartic) for large systems due to a formulation in the local atomic orbital space. The high accuracy is achieved by utilizing optimized minimax integration schemes and the local Coulomb metric attenuated by the complementary error function for the RI approximation. The memory bottleneck of former atomic orbital (AO)-RI-RPA implementations ( Schurkus, H. F.; Ochsenfeld, C. J. Chem. Phys. 2016 , 144 , 031101 and Luenser, A.; Schurkus, H. F.; Ochsenfeld, C. J. Chem. Theory Comput. 2017 , 13 , 1647 - 1655 ) is addressed by precontraction of the large 3-center integral matrix with the Cholesky factors of the ground state density reducing the memory requirements of that matrix by a factor of [Formula: see text]. Furthermore, we present a parallel implementation of our method, which not only leads to faster RPA correlation energy calculations but also to a scalable decrease in memory requirements, opening the door for investigations of large molecules even on small- to medium-sized computing clusters. Although it is known that AO methods are highly efficient for extended systems, where sparsity allows for reaching the linear-scaling regime, we show that our work also extends the applicability when considering highly delocalized systems for which no linear scaling can be achieved. As an example, the interlayer distance of two covalent organic framework pore fragments (comprising 384 atoms in total) is analyzed.

  17. Reliable and Efficient Parallel Processing Algorithms and Architectures for Modern Signal Processing. Ph.D. Thesis

    Science.gov (United States)

    Liu, Kuojuey Ray

    1990-01-01

    Least-squares (LS) estimations and spectral decomposition algorithms constitute the heart of modern signal processing and communication problems. Implementations of recursive LS and spectral decomposition algorithms onto parallel processing architectures such as systolic arrays with efficient fault-tolerant schemes are the major concerns of this dissertation. There are four major results in this dissertation. First, we propose the systolic block Householder transformation with application to the recursive least-squares minimization. It is successfully implemented on a systolic array with a two-level pipelined implementation at the vector level as well as at the word level. Second, a real-time algorithm-based concurrent error detection scheme based on the residual method is proposed for the QRD RLS systolic array. The fault diagnosis, order degraded reconfiguration, and performance analysis are also considered. Third, the dynamic range, stability, error detection capability under finite-precision implementation, order degraded performance, and residual estimation under faulty situations for the QRD RLS systolic array are studied in details. Finally, we propose the use of multi-phase systolic algorithms for spectral decomposition based on the QR algorithm. Two systolic architectures, one based on triangular array and another based on rectangular array, are presented for the multiphase operations with fault-tolerant considerations. Eigenvectors and singular vectors can be easily obtained by using the multi-pase operations. Performance issues are also considered.

  18. Development of efficient GPU parallelization of WRF Yonsei University planetary boundary layer scheme

    Directory of Open Access Journals (Sweden)

    M. Huang

    2015-09-01

    Full Text Available The planetary boundary layer (PBL is the lowest part of the atmosphere and where its character is directly affected by its contact with the underlying planetary surface. The PBL is responsible for vertical sub-grid-scale fluxes due to eddy transport in the whole atmospheric column. It determines the flux profiles within the well-mixed boundary layer and the more stable layer above. It thus provides an evolutionary model of atmospheric temperature, moisture (including clouds, and horizontal momentum in the entire atmospheric column. For such purposes, several PBL models have been proposed and employed in the weather research and forecasting (WRF model of which the Yonsei University (YSU scheme is one. To expedite weather research and prediction, we have put tremendous effort into developing an accelerated implementation of the entire WRF model using graphics processing unit (GPU massive parallel computing architecture whilst maintaining its accuracy as compared to its central processing unit (CPU-based implementation. This paper presents our efficient GPU-based design on a WRF YSU PBL scheme. Using one NVIDIA Tesla K40 GPU, the GPU-based YSU PBL scheme achieves a speedup of 193× with respect to its CPU counterpart running on one CPU core, whereas the speedup for one CPU socket (4 cores with respect to 1 CPU core is only 3.5×. We can even boost the speedup to 360× with respect to 1 CPU core as two K40 GPUs are applied.

  19. Parallel artificial liquid membrane extraction as an efficient tool for removal of phospholipids from human plasma.

    Science.gov (United States)

    Ask, Kristine Skoglund; Bardakci, Turgay; Parmer, Marthe Petrine; Halvorsen, Trine Grønhaug; Øiestad, Elisabeth Leere; Pedersen-Bjergaard, Stig; Gjelstad, Astrid

    2016-09-10

    Generic Parallel Artificial Liquid Membrane Extraction (PALME) methods for non-polar basic and non-polar acidic drugs from human plasma were investigated with respect to phospholipid removal. In both cases, extractions in 96-well format were performed from plasma (125μL), through 4μL organic solvent used as supported liquid membranes (SLMs), and into 50μL aqueous acceptor solutions. The acceptor solutions were subsequently analysed by liquid chromatography-tandem mass spectrometry (LC-MS/MS) using in-source fragmentation and monitoring the m/z 184→184 transition for investigation of phosphatidylcholines (PC), sphingomyelins (SM), and lysophosphatidylcholines (Lyso-PC). In both generic methods, no phospholipids were detected in the acceptor solutions. Thus, PALME appeared to be highly efficient for phospholipid removal. To further support this, qualitative (post-column infusion) and quantitative matrix effects were investigated with fluoxetine, fluvoxamine, and quetiapine as model analytes. No signs of matrix effects were observed. Finally, PALME was evaluated for the aforementioned drug substances, and data were in accordance with European Medicines Agency (EMA) guidelines. Copyright © 2016 Elsevier B.V. All rights reserved.

  20. Efficient graph-based dynamic load-balancing for parallel large-scale agent-based traffic simulation

    NARCIS (Netherlands)

    Xu, Y.; Cai, W.; Aydt, H.; Lees, M.; Tolk, A.; Diallo, S.Y.; Ryzhov, I.O.; Yilmaz, L.; Buckley, S.; Miller, J.A.

    2014-01-01

    One of the issues of parallelizing large-scale agent-based traffic simulations is partitioning and load-balancing. Traffic simulations are dynamic applications where the distribution of workload in the spatial domain constantly changes. Dynamic load-balancing at run-time has shown better efficiency

  1. A simple and efficient parallel FFT algorithm using the BSP model

    NARCIS (Netherlands)

    Bisseling, R.H.; Inda, M.A.

    2000-01-01

    In this paper we present a new parallel radix FFT algorithm based on the BSP model Our parallel algorithm uses the groupcyclic distribution family which makes it simple to understand and easy to implement We show how to reduce the com munication cost of the algorithm by a factor of three in the case

  2. An efficient implementation of a backpropagation learning algorithm on quadrics parallel supercomputer

    International Nuclear Information System (INIS)

    Taraglio, S.; Massaioli, F.

    1995-08-01

    A parallel implementation of a library to build and train Multi Layer Perceptrons via the Back Propagation algorithm is presented. The target machine is the SIMD massively parallel supercomputer Quadrics. Performance measures are provided on three different machines with different number of processors, for two network examples. A sample source code is given

  3. A Generic and Efficient E-field Parallel Imaging Correlator for Next-Generation Radio Telescopes

    Science.gov (United States)

    Thyagarajan, Nithyanandan; Beardsley, Adam P.; Bowman, Judd D.; Morales, Miguel F.

    2017-05-01

    Modern radio telescopes are favouring densely packed array layouts with large numbers of antennas (NA ≳ 1000). Since the complexity of traditional correlators scales as O(N_A^2), there will be a steep cost for realizing the full imaging potential of these powerful instruments. Through our generic and efficient E-field Parallel Imaging Correlator (epic), we present the first software demonstration of a generalized direct imaging algorithm, namely the Modular Optimal Frequency Fourier imager. Not only does it bring down the cost for dense layouts to O(N_A log _2N_A) but can also image from irregular layouts and heterogeneous arrays of antennas. epic is highly modular, parallelizable, implemented in object-oriented python, and publicly available. We have verified the images produced to be equivalent to those from traditional techniques to within a precision set by gridding coarseness. We have also validated our implementation on data observed with the Long Wavelength Array (LWA1). We provide a detailed framework for imaging with heterogeneous arrays and show that epic robustly estimates the input sky model for such arrays. Antenna layouts with dense filling factors consisting of a large number of antennas such as LWA, the Square Kilometre Array, Hydrogen Epoch of Reionization Array, and Canadian Hydrogen Intensity Mapping Experiment will gain significant computational advantage by deploying an optimized version of epic. The algorithm is a strong candidate for instruments targeting transient searches of fast radio bursts as well as planetary and exoplanetary phenomena due to the availability of high-speed calibrated time-domain images and low output bandwidth relative to visibility-based systems.

  4. A highly efficient parallel algorithm for solving the neutron diffusion nodal equations on shared-memory computers

    International Nuclear Information System (INIS)

    Azmy, Y.Y.; Kirk, B.L.

    1990-01-01

    Modern parallel computer architectures offer an enormous potential for reducing CPU and wall-clock execution times of large-scale computations commonly performed in various applications in science and engineering. Recently, several authors have reported their efforts in developing and implementing parallel algorithms for solving the neutron diffusion equation on a variety of shared- and distributed-memory parallel computers. Testing of these algorithms for a variety of two- and three-dimensional meshes showed significant speedup of the computation. Even for very large problems (i.e., three-dimensional fine meshes) executed concurrently on a few nodes in serial (nonvector) mode, however, the measured computational efficiency is very low (40 to 86%). In this paper, the authors present a highly efficient (∼85 to 99.9%) algorithm for solving the two-dimensional nodal diffusion equations on the Sequent Balance 8000 parallel computer. Also presented is a model for the performance, represented by the efficiency, as a function of problem size and the number of participating processors. The model is validated through several tests and then extrapolated to larger problems and more processors to predict the performance of the algorithm in more computationally demanding situations

  5. Parallel Implicit Algorithms for CFD

    Science.gov (United States)

    Keyes, David E.

    1998-01-01

    The main goal of this project was efficient distributed parallel and workstation cluster implementations of Newton-Krylov-Schwarz (NKS) solvers for implicit Computational Fluid Dynamics (CFD.) "Newton" refers to a quadratically convergent nonlinear iteration using gradient information based on the true residual, "Krylov" to an inner linear iteration that accesses the Jacobian matrix only through highly parallelizable sparse matrix-vector products, and "Schwarz" to a domain decomposition form of preconditioning the inner Krylov iterations with primarily neighbor-only exchange of data between the processors. Prior experience has established that Newton-Krylov methods are competitive solvers in the CFD context and that Krylov-Schwarz methods port well to distributed memory computers. The combination of the techniques into Newton-Krylov-Schwarz was implemented on 2D and 3D unstructured Euler codes on the parallel testbeds that used to be at LaRC and on several other parallel computers operated by other agencies or made available by the vendors. Early implementations were made directly in Massively Parallel Integration (MPI) with parallel solvers we adapted from legacy NASA codes and enhanced for full NKS functionality. Later implementations were made in the framework of the PETSC library from Argonne National Laboratory, which now includes pseudo-transient continuation Newton-Krylov-Schwarz solver capability (as a result of demands we made upon PETSC during our early porting experiences). A secondary project pursued with funding from this contract was parallel implicit solvers in acoustics, specifically in the Helmholtz formulation. A 2D acoustic inverse problem has been solved in parallel within the PETSC framework.

  6. On the mathematic simulation of the energy efficiency for heat exchangers with the systems of impingement plane-parallel jets

    Directory of Open Access Journals (Sweden)

    Haritonova Larisa

    2017-01-01

    Full Text Available The article gives the analytical generalization of the data on the energy efficiency for heat exchangers with the flat heat exchange surface to which systems of impact plane parallel jets are sent. Functional relations of specific power consumption (per unit of area, which were obtained for the first time using the techniques of the similarity law, for moving a heat carrier are shown with regard to design and operation factors. The regression equations representing a mathematical model of the process enable to carry out an analysis of various factors impact on the parameter to be determined. The obtained results can be used to optimize or to create the calculation techniques for new highly-efficient heat exchange devices with jet plane -parallel impingement systems and also to reduce power consumption for moving a heat carrier.

  7. Parallel artificial liquid membrane extraction as an efficient tool for removal of phospholipids from human plasma

    DEFF Research Database (Denmark)

    Ask, Kristine Skoglund; Bardakci, Turgay; Parmer, Marthe Petrine

    2016-01-01

    Generic Parallel Artificial Liquid Membrane Extraction (PALME) methods for non-polar basic and non-polar acidic drugs from human plasma were investigated with respect to phospholipid removal. In both cases, extractions in 96-well format were performed from plasma (125μL), through 4μL organic...

  8. Efficient Heuristics for the Simulation of Buffer Overflow in Series and Parallel Queueing Networks

    NARCIS (Netherlands)

    Nicola, V.F.; Zaburnenko, T.S.

    2006-01-01

    In this paper we propose state-dependent importance sampling heuristics to estimate the probability of population overflow in Markovian networks of series and parallel queues. These heuristics capture state-dependence along the boundaries (when one or more queues are empty) which is critical for

  9. The Treeterbi and Parallel Treeterbi algorithms: efficient, optimal decoding for ordinary, generalized and pair HMMs

    DEFF Research Database (Denmark)

    Keibler, Evan; Arumugam, Manimozhiyan; Brent, Michael R

    2007-01-01

    MOTIVATION: Hidden Markov models (HMMs) and generalized HMMs been successfully applied to many problems, but the standard Viterbi algorithm for computing the most probable interpretation of an input sequence (known as decoding) requires memory proportional to the length of the sequence, which can...... be prohibitive. Existing approaches to reducing memory usage either sacrifice optimality or trade increased running time for reduced memory. RESULTS: We developed two novel decoding algorithms, Treeterbi and Parallel Treeterbi, and implemented them in the TWINSCAN/N-SCAN gene-prediction system. The worst case...... asymptotic space and time are the same as for standard Viterbi, but in practice, Treeterbi optimally decodes arbitrarily long sequences with generalized HMMs in bounded memory without increasing running time. Parallel Treeterbi uses the same ideas to split optimal decoding across processors, dividing latency...

  10. Parallel Sequential Monte Carlo for Efficient Density Combination: The Deco Matlab Toolbox

    DEFF Research Database (Denmark)

    Casarin, Roberto; Grassi, Stefano; Ravazzolo, Francesco

    This paper presents the Matlab package DeCo (Density Combination) which is based on the paper by Billio et al. (2013) where a constructive Bayesian approach is presented for combining predictive densities originating from different models or other sources of information. The combination weights...... for standard CPU computing and for Graphical Process Unit (GPU) parallel computing. For the GPU implementation we use the Matlab parallel computing toolbox and show how to use General Purposes GPU computing almost effortless. This GPU implementation comes with a speed up of the execution time up to seventy...... times compared to a standard CPU Matlab implementation on a multicore CPU. We show the use of the package and the computational gain of the GPU version, through some simulation experiments and empirical applications....

  11. Highly efficient parallel direct solver for solving dense complex matrix equations from method of moments

    Directory of Open Access Journals (Sweden)

    Yan Chen

    2017-03-01

    Full Text Available Based on the vectorised and cache optimised kernel, a parallel lower upper decomposition with a novel communication avoiding pivoting scheme is developed to solve dense complex matrix equations generated by the method of moments. The fine-grain data rearrangement and assembler instructions are adopted to reduce memory accessing times and improve CPU cache utilisation, which also facilitate vectorisation of the code. Through grouping processes in a binary tree, a parallel pivoting scheme is designed to optimise the communication pattern and thus reduces the solving time of the proposed solver. Two large electromagnetic radiation problems are solved on two supercomputers, respectively, and the numerical results demonstrate that the proposed method outperforms those in open source and commercial libraries.

  12. Angular parallelization of a curvilinear Sn transport theory method

    International Nuclear Information System (INIS)

    Haghighat, A.

    1991-01-01

    In this paper a parallel algorithm for angular domain decomposition (or parallelization) of an r-dependent spherical S n transport theory method is derived. The parallel formulation is incorporated into TWOTRAN-II using the IBM Parallel Fortran compiler and implemented on an IBM 3090/400 (with four processors). The behavior of the parallel algorithm for different physical problems is studied, and it is concluded that the parallel algorithm behaves differently in the presence of a fission source as opposed to the absence of a fission source; this is attributed to the relative contributions of the source and the angular redistribution terms in the S s algorithm. Further, the parallel performance of the algorithm is measured for various problem sizes and different combinations of angular subdomains or processors. Poor parallel efficiencies between ∼35 and 50% are achieved in situations where the relative difference of parallel to serial iterations is ∼50%. High parallel efficiencies between ∼60% and 90% are obtained in situations where the relative difference of parallel to serial iterations is <35%

  13. Comments on the parallelization efficiency of the Sunway TaihuLight supercomputer

    OpenAIRE

    Végh, János

    2016-01-01

    In the world of supercomputers, the large number of processors requires to minimize the inefficiencies of parallelization, which appear as a sequential part of the program from the point of view of Amdahl's law. The recently suggested new figure of merit is applied to the recently presented supercomputer, and the timeline of "Top 500" supercomputers is scrutinized using the metric. It is demonstrated, that in addition to the computing performance and power consumption, the new supercomputer i...

  14. Some computational challenges of developing efficient parallel algorithms for data-dependent computations in thermal-hydraulics supercomputer applications

    International Nuclear Information System (INIS)

    Woodruff, S.B.

    1992-01-01

    The Transient Reactor Analysis Code (TRAC), which features a two- fluid treatment of thermal-hydraulics, is designed to model transients in water reactors and related facilities. One of the major computational costs associated with TRAC and similar codes is calculating constitutive coefficients. Although the formulations for these coefficients are local the costs are flow-regime- or data-dependent; i.e., the computations needed for a given spatial node often vary widely as a function of time. Consequently, poor load balancing will degrade efficiency on either vector or data parallel architectures when the data are organized according to spatial location. Unfortunately, a general automatic solution to the load-balancing problem associated with data-dependent computations is not yet available for massively parallel architectures. This document discusses why developers algorithms, such as a neural net representation, that do not exhibit algorithms, such as a neural net representation, that do not exhibit load-balancing problems

  15. An efficient, interactive, and parallel system for biomedical volume analysis on a standard workstation

    International Nuclear Information System (INIS)

    Rebuffel, V.; Gonon, G.

    1992-01-01

    A software package is presented that can be employed for any 3D imaging modalities: X-ray tomography, emission tomography, magnetic resonance imaging. This system uses a hierarchical data structure, named Octree, that naturally allows a multi-resolution approach. The well-known problems of such an indeterministic representation, especially the neighbor finding, has been solved. Several algorithms of volume processing have been developed, using these techniques and an optimal data storage for the Octree. A parallel implementation was chosen that is compatible with the constraints of the Octree base and the various algorithms. (authors) 4 refs., 3 figs., 1 tab

  16. The Treeterbi and Parallel Treeterbi algorithms: efficient, optimal decoding for ordinary, generalized and pair HMMs.

    Science.gov (United States)

    Keibler, Evan; Arumugam, Manimozhiyan; Brent, Michael R

    2007-03-01

    Hidden Markov models (HMMs) and generalized HMMs been successfully applied to many problems, but the standard Viterbi algorithm for computing the most probable interpretation of an input sequence (known as decoding) requires memory proportional to the length of the sequence, which can be prohibitive. Existing approaches to reducing memory usage either sacrifice optimality or trade increased running time for reduced memory. We developed two novel decoding algorithms, Treeterbi and Parallel Treeterbi, and implemented them in the TWINSCAN/N-SCAN gene-prediction system. The worst case asymptotic space and time are the same as for standard Viterbi, but in practice, Treeterbi optimally decodes arbitrarily long sequences with generalized HMMs in bounded memory without increasing running time. Parallel Treeterbi uses the same ideas to split optimal decoding across processors, dividing latency to completion by approximately the number of available processors with constant average overhead per processor. Using these algorithms, we were able to optimally decode all human chromosomes with N-SCAN, which increased its accuracy relative to heuristic solutions. We also implemented Treeterbi for Pairagon, our pair HMM based cDNA-to-genome aligner. The TWINSCAN/N-SCAN/PAIRAGON open source software package is available from http://genes.cse.wustl.edu.

  17. A Faster Parallel Algorithm and Efficient Multithreaded Implementations for Evaluating Betweenness Centrality on Massive Datasets

    Energy Technology Data Exchange (ETDEWEB)

    Madduri, Kamesh; Ediger, David; Jiang, Karl; Bader, David A.; Chavarria-Miranda, Daniel

    2009-02-15

    We present a new lock-free parallel algorithm for computing betweenness centralityof massive small-world networks. With minor changes to the data structures, ouralgorithm also achieves better spatial cache locality compared to previous approaches. Betweenness centrality is a key algorithm kernel in HPCS SSCA#2, a benchmark extensively used to evaluate the performance of emerging high-performance computing architectures for graph-theoretic computations. We design optimized implementations of betweenness centrality and the SSCA#2 benchmark for two hardware multithreaded systems: a Cray XMT system with the Threadstorm processor, and a single-socket Sun multicore server with the UltraSPARC T2 processor. For a small-world network of 134 million vertices and 1.073 billion edges, the 16-processor XMT system and the 8-core Sun Fire T5120 server achieve TEPS scores (an algorithmic performance count for the SSCA#2 benchmark) of 160 million and 90 million respectively, which corresponds to more than a 2X performance improvement over the previous parallel implementations. To better characterize the performance of these multithreaded systems, we correlate the SSCA#2 performance results with data from the memory-intensive STREAM and RandomAccess benchmarks. Finally, we demonstrate the applicability of our implementation to analyze massive real-world datasets by computing approximate betweenness centrality for a large-scale IMDb movie-actor network.

  18. An efficient parallel simulation of unsteady blood flows in patient-specific pulmonary artery.

    Science.gov (United States)

    Kong, Fande; Kheyfets, Vitaly; Finol, Ender; Cai, Xiao-Chuan

    2018-04-01

    Simulation of blood flows in the pulmonary artery provides some insight into certain diseases by examining the relationship between some continuum metrics, eg, the wall shear stress acting on the vascular endothelium, which responds to flow-induced mechanical forces by releasing vasodilators/constrictors. V. Kheyfets, in his previous work, studies numerically a patient-specific pulmonary circulation to show that decreasing wall shear stress is correlated with increasing pulmonary vascular impedance. In this paper, we develop a scalable parallel algorithm based on domain decomposition methods to investigate an unsteady model with patient-specific pulsatile waveforms as the inlet boundary condition. The unsteady model offers tremendously more information about the dynamic behavior of the flow field, but computationally speaking, the simulation is a lot more expensive since a problem which is similar to the steady-state problem has to be solved many times, and therefore, the traditional sequential approach is not suitable anymore. We show computationally that simulations using the proposed parallel approach with up to 10 000 processor cores can be obtained with much reduced compute time. This makes the technology potentially usable for the routine study of the dynamic behavior of blood flows in the pulmonary artery, in particular, the changes of the blood flows and the wall shear stress in the spatial and temporal dimensions. Copyright © 2017 John Wiley & Sons, Ltd.

  19. A Scheduling-Based Framework for Efficient Massively Parallel Execution, Phase I

    Data.gov (United States)

    National Aeronautics and Space Administration — The barrier to entry creating efficient, scalable applications for heterogeneous supercomputing environments is too high. EM Photonics has found that the majority of...

  20. Simulation of MGI efficiency for plasma energy conversion into Ar radiation in JET and implications for ITER

    Energy Technology Data Exchange (ETDEWEB)

    Pestchanyi, Serguei, E-mail: serguei.pestchanyi@kit.edu [Association EURATOM-KIT, Karlsruhe (Germany); Koslowski, Rudi; Reux, Cedric [JET-EFDA, Culham Science Centre, Abingdon OX14 3DB (United Kingdom); Lehnen, Michael [Route de Vinon-sur-Verdon, CS 90 046, 13067 St. Paul Lez Durance Cedex (France)

    2015-10-15

    Highlights: • We simulated disruption mitigation using massive gas injection with the TOKES code. • Cross-reference analysis of JET experiments on MGI and their simulations have been done. • The analysis allows suggesting the mechanism for saturation of radiated energy fraction at 70–80%. • Rough extrapolation of the result on ITER conditions has been done. - Abstract: Effectiveness of massive gas injection (MGI) for mitigation of disruptive wall damage has been investigated. Cross-reference analysis of the available JET experiments on MGI and their simulations with the TOKES code allow suggesting that in JET conditions one can convert into radiation the electron thermal energy and the plasma current energy, but the ion thermal energy does not convert into radiation because of very ineffective excitation of injected noble gas (NG) ions by D ions and long equipartition time between D ions and electrons. The model assumes rather high electron temperature during current quench (CQ), which contradicts with its time duration. Rough extrapolation of the result on ITER conditions shows that one can expect irradiation of total plasma energy if CQ duration in ITER is not shorter as in JET.

  1. Development of an efficient real-time disruption predictor from scratch on JET and implications for ITER

    International Nuclear Information System (INIS)

    Dormido-Canto, S.; Ramírez, J.M.; Vega, J.; Moreno, R.; Pereira, A.; Murari, A.; López, J.M.

    2013-01-01

    Prediction of disruptions from scratch is an ITER-relevant topic. The first operations with the new ITER-like wall constitute a good opportunity to test the development of new predictors from scratch and the related methodologies. These methodologies have been based on the Advanced Predictor Of DISruptions (APODIS) architecture. APODIS is a real-time disruption predictor that is in operation in the JET real-time network. Balanced and unbalanced datasets are used to develop real-time predictors from scratch. The discharges are used in chronological order. Also, different criteria to decide when to re-train a predictor are discussed. The best results are obtained by applying a hybrid method (balanced/unbalanced datasets) for training and with the criterion of re-training after every missed alarm. The predictors are tested off-line with all the discharges (disruptive/non-disruptive) corresponding to the first three JET ITER-like wall campaigns. The results give a success rate of 93.8% and a false alarm rate of 2.8%. It should be considered that these results are obtained from models trained with no more than 42 disruptive discharges. (paper)

  2. Development of a Robust and Efficient Parallel Solver for Unsteady Turbomachinery Flows

    Science.gov (United States)

    West, Jeff; Wright, Jeffrey; Thakur, Siddharth; Luke, Ed; Grinstead, Nathan

    2012-01-01

    The traditional design and analysis practice for advanced propulsion systems relies heavily on expensive full-scale prototype development and testing. Over the past decade, use of high-fidelity analysis and design tools such as CFD early in the product development cycle has been identified as one way to alleviate testing costs and to develop these devices better, faster and cheaper. In the design of advanced propulsion systems, CFD plays a major role in defining the required performance over the entire flight regime, as well as in testing the sensitivity of the design to the different modes of operation. Increased emphasis is being placed on developing and applying CFD models to simulate the flow field environments and performance of advanced propulsion systems. This necessitates the development of next generation computational tools which can be used effectively and reliably in a design environment. The turbomachinery simulation capability presented here is being developed in a computational tool called Loci-STREAM [1]. It integrates proven numerical methods for generalized grids and state-of-the-art physical models in a novel rule-based programming framework called Loci [2] which allows: (a) seamless integration of multidisciplinary physics in a unified manner, and (b) automatic handling of massively parallel computing. The objective is to be able to routinely simulate problems involving complex geometries requiring large unstructured grids and complex multidisciplinary physics. An immediate application of interest is simulation of unsteady flows in rocket turbopumps, particularly in cryogenic liquid rocket engines. The key components of the overall methodology presented in this paper are the following: (a) high fidelity unsteady simulation capability based on Detached Eddy Simulation (DES) in conjunction with second-order temporal discretization, (b) compliance with Geometric Conservation Law (GCL) in order to maintain conservative property on moving meshes for

  3. Domain decomposition methods and parallel computing

    International Nuclear Information System (INIS)

    Meurant, G.

    1991-01-01

    In this paper, we show how to efficiently solve large linear systems on parallel computers. These linear systems arise from discretization of scientific computing problems described by systems of partial differential equations. We show how to get a discrete finite dimensional system from the continuous problem and the chosen conjugate gradient iterative algorithm is briefly described. Then, the different kinds of parallel architectures are reviewed and their advantages and deficiencies are emphasized. We sketch the problems found in programming the conjugate gradient method on parallel computers. For this algorithm to be efficient on parallel machines, domain decomposition techniques are introduced. We give results of numerical experiments showing that these techniques allow a good rate of convergence for the conjugate gradient algorithm as well as computational speeds in excess of a billion of floating point operations per second. (author). 5 refs., 11 figs., 2 tabs., 1 inset

  4. HAlign-II: efficient ultra-large multiple sequence alignment and phylogenetic tree reconstruction with distributed and parallel computing.

    Science.gov (United States)

    Wan, Shixiang; Zou, Quan

    2017-01-01

    Multiple sequence alignment (MSA) plays a key role in biological sequence analyses, especially in phylogenetic tree construction. Extreme increase in next-generation sequencing results in shortage of efficient ultra-large biological sequence alignment approaches for coping with different sequence types. Distributed and parallel computing represents a crucial technique for accelerating ultra-large (e.g. files more than 1 GB) sequence analyses. Based on HAlign and Spark distributed computing system, we implement a highly cost-efficient and time-efficient HAlign-II tool to address ultra-large multiple biological sequence alignment and phylogenetic tree construction. The experiments in the DNA and protein large scale data sets, which are more than 1GB files, showed that HAlign II could save time and space. It outperformed the current software tools. HAlign-II can efficiently carry out MSA and construct phylogenetic trees with ultra-large numbers of biological sequences. HAlign-II shows extremely high memory efficiency and scales well with increases in computing resource. THAlign-II provides a user-friendly web server based on our distributed computing infrastructure. HAlign-II with open-source codes and datasets was established at http://lab.malab.cn/soft/halign.

  5. A high efficient integrated planar transformer for primary-parallel isolated boost converters

    DEFF Research Database (Denmark)

    Sen, Gökhan; Ouyang, Ziwei; Thomsen, Ole Cornelius

    2010-01-01

    for a fuel cell fed battery charger application with 50–110 V input and 65–105 V output. Input inductors are coupled for current sharing, eliminating the use of current sharing transformers. An efficiency of 94% is achieved during nominal operating condition where the input is 70-V and the output is 84-V....

  6. 3D-radiative transfer in terrestrial atmosphere: An efficient parallel numerical procedure

    Science.gov (United States)

    Bass, L. P.; Germogenova, T. A.; Nikolaeva, O. V.; Kokhanovsky, A. A.; Kuznetsov, V. S.

    2003-04-01

    Light propagation and scattering in terrestrial atmosphere is usually studied in the framework of the 1D radiative transfer theory [1]. However, in reality particles (e.g., ice crystals, solid and liquid aerosols, cloud droplets) are randomly distributed in 3D space. In particular, their concentrations vary both in vertical and horizontal directions. Therefore, 3D effects influence modern cloud and aerosol retrieval procedures, which are currently based on the 1D radiative transfer theory. It should be pointed out that the standard radiative transfer equation allows to study these more complex situations as well [2]. In recent year the parallel version of the 2D and 3D RADUGA code has been developed. This version is successfully used in gammas and neutrons transport problems [3]. Applications of this code to radiative transfer in atmosphere problems are contained in [4]. Possibilities of code RADUGA are presented in [5]. The RADUGA code system is an universal solver of radiative transfer problems for complicated models, including 2D and 3D aerosol and cloud fields with arbitrary scattering anisotropy, light absorption, inhomogeneous underlying surface and topography. Both delta type and distributed light sources can be accounted for in the framework of the algorithm developed. The accurate numerical procedure is based on the new discrete ordinate SWDD scheme [6]. The algorithm is specifically designed for parallel supercomputers. The version RADUGA 5.1(P) can run on MBC1000M [7] (768 processors with 10 Gb of hard disc memory for each processor). The peak productivity is equal 1 Tfl. Corresponding scalar version RADUGA 5.1 is working on PC. As a first example of application of the algorithm developed, we have studied the shadowing effects of clouds on neighboring cloudless atmosphere, depending on the cloud optical thickness, surface albedo, and illumination conditions. This is of importance for modern satellite aerosol retrieval algorithms development. [1] Sobolev

  7. PIXIE3D: An efficient, fully implicit, parallel, 3D extended MHD code for fusion plasma modeling

    International Nuclear Information System (INIS)

    Chacon, L.

    2007-01-01

    PIXIE3D is a modern, parallel, state-of-the-art extended MHD code that employs fully implicit methods for efficiency and accuracy. It features a general geometry formulation, and is therefore suitable for the study of many magnetic fusion configurations of interest. PIXIE3D advances the state of the art in extended MHD modeling in two fundamental ways. Firstly, it employs a novel conservative finite volume scheme which is remarkably robust and stable, and demands very small physical and/or numerical dissipation. This is a fundamental requirement when one wants to study fusion plasmas with realistic conductivities. Secondly, PIXIE3D features fully-implicit time stepping, employing Newton-Krylov methods for inverting the associated nonlinear systems. These methods have been shown to be scalable and efficient when preconditioned properly. Novel preconditioned ideas (so-called physics based), which were prototypes in the context of reduced MHD, have been adapted for 3D primitive-variable resistive MHD in PIXIE3D, and are currently being extended to Hall MHD. PIXIE3D is fully parallel, employing PETSc for parallelism. PIXIE3D has been thoroughly benchmarked against linear theory and against other available extended MHD codes on nonlinear test problems (such as the GEM reconnection challenge). We are currently in the process of extending such comparisons to fusion-relevant problems in realistic geometries. In this talk, we will describe both the spatial discretization approach and the preconditioning strategy employed for extended MHD in PIXIE3D. We will report on recent benchmarking studies between PIXIE3D and other 3D extended MHD codes, and will demonstrate its usefulness in a variety of fusion-relevant configurations such as Tokamaks and Reversed Field Pinches. (Author)

  8. Parallel analysis of tagged deletion mutants efficiently identifies genes involved in endoplasmic reticulum biogenesis.

    Science.gov (United States)

    Wright, Robin; Parrish, Mark L; Cadera, Emily; Larson, Lynnelle; Matson, Clinton K; Garrett-Engele, Philip; Armour, Chris; Lum, Pek Yee; Shoemaker, Daniel D

    2003-07-30

    Increased levels of HMG-CoA reductase induce cell type- and isozyme-specific proliferation of the endoplasmic reticulum. In yeast, the ER proliferations induced by Hmg1p consist of nuclear-associated stacks of smooth ER membranes known as karmellae. To identify genes required for karmellae assembly, we compared the composition of populations of homozygous diploid S. cerevisiae deletion mutants following 20 generations of growth with and without karmellae. Using an initial population of 1,557 deletion mutants, 120 potential mutants were identified as a result of three independent experiments. Each experiment produced a largely non-overlapping set of potential mutants, suggesting that differences in specific growth conditions could be used to maximize the comprehensiveness of similar parallel analysis screens. Only two genes, UBC7 and YAL011W, were identified in all three experiments. Subsequent analysis of individual mutant strains confirmed that each experiment was identifying valid mutations, based on the mutant's sensitivity to elevated HMG-CoA reductase and inability to assemble normal karmellae. The largest class of HMG-CoA reductase-sensitive mutations was a subset of genes that are involved in chromatin structure and transcriptional regulation, suggesting that karmellae assembly requires changes in transcription or that the presence of karmellae may interfere with normal transcriptional regulation. Copyright 2003 John Wiley & Sons, Ltd.

  9. Efficient parallel implementations of approximation algorithms for guarding 1.5D terrains

    Directory of Open Access Journals (Sweden)

    Goran Martinović

    2015-03-01

    Full Text Available In the 1.5D terrain guarding problem, an x-monotone polygonal line is dened by k vertices and a G set of terrain points, i.e. guards, and a N set of terrain points which guards are to observe (guard. This involves a weighted version of the guarding problem where guards G have weights. The goal is to determine a minimum weight subset of G to cover all the points in N, including a version where points from N have demands. Furthermore, another goal is to determine the smallest subset of G, such that every point in N is observed by the required number of guards. Both problems are NP-hard and have a factor 5 approximation [3, 4]. This paper will show that if the (1+ϵ-approximate solver for the corresponding linear program is a computer, for any ϵ > 0, an extra 1+ϵ factor will appear in the final approximation factor for both problems. A comparison will be carried out the parallel implementation based on GPU and CPU threads with the Gurobi solver, leading to the conclusion that the respective algorithm outperforms the Gurobi solver on large and dense inputs typically by one order of magnitude.

  10. Scheduling of Iterative Algorithms with Matrix Operations for Efficient FPGA Design—Implementation of Finite Interval Constant Modulus Algorithm

    Czech Academy of Sciences Publication Activity Database

    Šůcha, P.; Hanzálek, Z.; Heřmánek, Antonín; Schier, Jan

    2007-01-01

    Roč. 46, č. 1 (2007), s. 35-53 ISSN 0922-5773 R&D Projects: GA AV ČR(CZ) 1ET300750402; GA MŠk(CZ) 1M0567; GA MPO(CZ) FD-K3/082 Institutional research plan: CEZ:AV0Z10750506 Keywords : high-level synthesis * cyclic scheduling * iterative algorithms * imperfectly nested loops * integer linear programming * FPGA * VLSI design * blind equalization * implementation Subject RIV: BA - General Mathematics Impact factor: 0.449, year: 2007 http://www.springerlink.com/content/t217kg0822538014/fulltext.pdf

  11. Fast ℓ1-SPIRiT Compressed Sensing Parallel Imaging MRI: Scalable Parallel Implementation and Clinically Feasible Runtime

    Science.gov (United States)

    Murphy, Mark; Alley, Marcus; Demmel, James; Keutzer, Kurt; Vasanawala, Shreyas; Lustig, Michael

    2012-01-01

    We present ℓ1-SPIRiT, a simple algorithm for auto calibrating parallel imaging (acPI) and compressed sensing (CS) that permits an efficient implementation with clinically-feasible runtimes. We propose a CS objective function that minimizes cross-channel joint sparsity in the Wavelet domain. Our reconstruction minimizes this objective via iterative soft-thresholding, and integrates naturally with iterative Self-Consistent Parallel Imaging (SPIRiT). Like many iterative MRI reconstructions, ℓ1-SPIRiT’s image quality comes at a high computational cost. Excessively long runtimes are a barrier to the clinical use of any reconstruction approach, and thus we discuss our approach to efficiently parallelizing ℓ1-SPIRiT and to achieving clinically-feasible runtimes. We present parallelizations of ℓ1-SPIRiT for both multi-GPU systems and multi-core CPUs, and discuss the software optimization and parallelization decisions made in our implementation. The performance of these alternatives depends on the processor architecture, the size of the image matrix, and the number of parallel imaging channels. Fundamentally, achieving fast runtime requires the correct trade-off between cache usage and parallelization overheads. We demonstrate image quality via a case from our clinical experimentation, using a custom 3DFT Spoiled Gradient Echo (SPGR) sequence with up to 8× acceleration via poisson-disc undersampling in the two phase-encoded directions. PMID:22345529

  12. High-efficiency one-dimensional atom localization via two parallel standing-wave fields

    International Nuclear Information System (INIS)

    Wang, Zhiping; Wu, Xuqiang; Lu, Liang; Yu, Benli

    2014-01-01

    We present a new scheme of high-efficiency one-dimensional (1D) atom localization via measurement of upper state population or the probe absorption in a four-level N-type atomic system. By applying two classical standing-wave fields, the localization peak position and number, as well as the conditional position probability, can be easily controlled by the system parameters, and the sub-half-wavelength atom localization is also observed. More importantly, there is 100% detecting probability of the atom in the subwavelength domain when the corresponding conditions are satisfied. The proposed scheme may open up a promising way to achieve high-precision and high-efficiency 1D atom localization. (paper)

  13. Scalability of Parallel Scientific Applications on the Cloud

    Directory of Open Access Journals (Sweden)

    Satish Narayana Srirama

    2011-01-01

    Full Text Available Cloud computing, with its promise of virtually infinite resources, seems to suit well in solving resource greedy scientific computing problems. To study the effects of moving parallel scientific applications onto the cloud, we deployed several benchmark applications like matrix–vector operations and NAS parallel benchmarks, and DOUG (Domain decomposition On Unstructured Grids on the cloud. DOUG is an open source software package for parallel iterative solution of very large sparse systems of linear equations. The detailed analysis of DOUG on the cloud showed that parallel applications benefit a lot and scale reasonable on the cloud. We could also observe the limitations of the cloud and its comparison with cluster in terms of performance. However, for efficiently running the scientific applications on the cloud infrastructure, the applications must be reduced to frameworks that can successfully exploit the cloud resources, like the MapReduce framework. Several iterative and embarrassingly parallel algorithms are reduced to the MapReduce model and their performance is measured and analyzed. The analysis showed that Hadoop MapReduce has significant problems with iterative methods, while it suits well for embarrassingly parallel algorithms. Scientific computing often uses iterative methods to solve large problems. Thus, for scientific computing on the cloud, this paper raises the necessity for better frameworks or optimizations for MapReduce.

  14. Efficient implementation of multidimensional fast fourier transform on a distributed-memory parallel multi-node computer

    Science.gov (United States)

    Bhanot, Gyan V [Princeton, NJ; Chen, Dong [Croton-On-Hudson, NY; Gara, Alan G [Mount Kisco, NY; Giampapa, Mark E [Irvington, NY; Heidelberger, Philip [Cortlandt Manor, NY; Steinmacher-Burow, Burkhard D [Mount Kisco, NY; Vranas, Pavlos M [Bedford Hills, NY

    2012-01-10

    The present in invention is directed to a method, system and program storage device for efficiently implementing a multidimensional Fast Fourier Transform (FFT) of a multidimensional array comprising a plurality of elements initially distributed in a multi-node computer system comprising a plurality of nodes in communication over a network, comprising: distributing the plurality of elements of the array in a first dimension across the plurality of nodes of the computer system over the network to facilitate a first one-dimensional FFT; performing the first one-dimensional FFT on the elements of the array distributed at each node in the first dimension; re-distributing the one-dimensional FFT-transformed elements at each node in a second dimension via "all-to-all" distribution in random order across other nodes of the computer system over the network; and performing a second one-dimensional FFT on elements of the array re-distributed at each node in the second dimension, wherein the random order facilitates efficient utilization of the network thereby efficiently implementing the multidimensional FFT. The "all-to-all" re-distribution of array elements is further efficiently implemented in applications other than the multidimensional FFT on the distributed-memory parallel supercomputer.

  15. Efficient implementation of a multidimensional fast fourier transform on a distributed-memory parallel multi-node computer

    Science.gov (United States)

    Bhanot, Gyan V [Princeton, NJ; Chen, Dong [Croton-On-Hudson, NY; Gara, Alan G [Mount Kisco, NY; Giampapa, Mark E [Irvington, NY; Heidelberger, Philip [Cortlandt Manor, NY; Steinmacher-Burow, Burkhard D [Mount Kisco, NY; Vranas, Pavlos M [Bedford Hills, NY

    2008-01-01

    The present in invention is directed to a method, system and program storage device for efficiently implementing a multidimensional Fast Fourier Transform (FFT) of a multidimensional array comprising a plurality of elements initially distributed in a multi-node computer system comprising a plurality of nodes in communication over a network, comprising: distributing the plurality of elements of the array in a first dimension across the plurality of nodes of the computer system over the network to facilitate a first one-dimensional FFT; performing the first one-dimensional FFT on the elements of the array distributed at each node in the first dimension; re-distributing the one-dimensional FFT-transformed elements at each node in a second dimension via "all-to-all" distribution in random order across other nodes of the computer system over the network; and performing a second one-dimensional FFT on elements of the array re-distributed at each node in the second dimension, wherein the random order facilitates efficient utilization of the network thereby efficiently implementing the multidimensional FFT. The "all-to-all" re-distribution of array elements is further efficiently implemented in applications other than the multidimensional FFT on the distributed-memory parallel supercomputer.

  16. Some computational challenges of developing efficient parallel algorithms for data-dependent computations in thermal-hydraulics supercomputer applications

    International Nuclear Information System (INIS)

    Woodruff, S.B.

    1994-01-01

    The Transient Reactor Analysis Code (TRAC), which features a two-fluid treatment of thermal-hydraulics, is designed to model transients in water reactors and related facilities. One of the major computational costs associated with TRAC and similar codes is calculating constitutive coefficients. Although the formulations for these coefficients are local, the costs are flow-regime- or data-dependent; i.e., the computations needed for a given spatial node often vary widely as a function of time. Consequently, a fixed, uniform assignment of nodes to prallel processors will result in degraded computational efficiency due to the poor load balancing. A standard method for treating data-dependent models on vector architectures has been to use gather operations (or indirect adressing) to sort the nodes into subsets that (temporarily) share a common computational model. However, this method is not effective on distributed memory data parallel architectures, where indirect adressing involves expensive communication overhead. Another serious problem with this method involves software engineering challenges in the areas of maintainability and extensibility. For example, an implementation that was hand-tuned to achieve good computational efficiency would have to be rewritten whenever the decision tree governing the sorting was modified. Using an example based on the calculation of the wall-to-liquid and wall-to-vapor heat-transfer coefficients for three nonboiling flow regimes, we describe how the use of the Fortran 90 WHERE construct and automatic inlining of functions can be used to ameliorate this problem while improving both efficiency and software engineering. Unfortunately, a general automatic solution to the load-balancing problem associated with data-dependent computations is not yet available for massively parallel architectures. We discuss why developers should either wait for such solutions or consider alternative numerical algorithms, such as a neural network

  17. Colorado Conference on iterative methods. Volume 1

    Energy Technology Data Exchange (ETDEWEB)

    NONE

    1994-12-31

    The conference provided a forum on many aspects of iterative methods. Volume I topics were:Session: domain decomposition, nonlinear problems, integral equations and inverse problems, eigenvalue problems, iterative software kernels. Volume II presents nonsymmetric solvers, parallel computation, theory of iterative methods, software and programming environment, ODE solvers, multigrid and multilevel methods, applications, robust iterative methods, preconditioners, Toeplitz and circulation solvers, and saddle point problems. Individual papers are indexed separately on the EDB.

  18. FILMPAR: A parallel algorithm designed for the efficient and accurate computation of thin film flow on functional surfaces containing micro-structure

    Science.gov (United States)

    Lee, Y. C.; Thompson, H. M.; Gaskell, P. H.

    2009-12-01

    , industrial and physical applications. However, despite recent modelling advances, the accurate numerical solution of the equations governing such problems is still at a relatively early stage. Indeed, recent studies employing a simplifying long-wave approximation have shown that highly efficient numerical methods are necessary to solve the resulting lubrication equations in order to achieve the level of grid resolution required to accurately capture the effects of micro- and nano-scale topographical features. Solution method: A portable parallel multigrid algorithm has been developed for the above purpose, for the particular case of flow over submerged topographical features. Within the multigrid framework adopted, a W-cycle is used to accelerate convergence in respect of the time dependent nature of the problem, with relaxation sweeps performed using a fixed number of pre- and post-Red-Black Gauss-Seidel Newton iterations. In addition, the algorithm incorporates automatic adaptive time-stepping to avoid the computational expense associated with repeated time-step failure. Running time: 1.31 minutes using 128 processors on BlueGene/P with a problem size of over 16.7 million mesh points.

  19. Parallel preconditioning techniques for sparse CG solvers

    Energy Technology Data Exchange (ETDEWEB)

    Basermann, A.; Reichel, B.; Schelthoff, C. [Central Institute for Applied Mathematics, Juelich (Germany)

    1996-12-31

    Conjugate gradient (CG) methods to solve sparse systems of linear equations play an important role in numerical methods for solving discretized partial differential equations. The large size and the condition of many technical or physical applications in this area result in the need for efficient parallelization and preconditioning techniques of the CG method. In particular for very ill-conditioned matrices, sophisticated preconditioner are necessary to obtain both acceptable convergence and accuracy of CG. Here, we investigate variants of polynomial and incomplete Cholesky preconditioners that markedly reduce the iterations of the simply diagonally scaled CG and are shown to be well suited for massively parallel machines.

  20. ITER safety

    International Nuclear Information System (INIS)

    Raeder, J.; Piet, S.; Buende, R.

    1991-01-01

    As part of the series of publications by the IAEA that summarize the results of the Conceptual Design Activities for the ITER project, this document describes the ITER safety analyses. It contains an assessment of normal operation effluents, accident scenarios, plasma chamber safety, tritium system safety, magnet system safety, external loss of coolant and coolant flow problems, and a waste management assessment, while it describes the implementation of the safety approach for ITER. The document ends with a list of major conclusions, a set of topical remarks on technical safety issues, and recommendations for the Engineering Design Activities, safety considerations for siting ITER, and recommendations with regard to the safety issues for the R and D for ITER. Refs, figs and tabs

  1. LightForce Photon-Pressure Collision Avoidance: Updated Efficiency Analysis Utilizing a Highly Parallel Simulation Approach

    Science.gov (United States)

    Stupl, Jan; Faber, Nicolas; Foster, Cyrus; Yang, Fan Yang; Nelson, Bron; Aziz, Jonathan; Nuttall, Andrew; Henze, Chris; Levit, Creon

    2014-01-01

    This paper provides an updated efficiency analysis of the LightForce space debris collision avoidance scheme. LightForce aims to prevent collisions on warning by utilizing photon pressure from ground based, commercial off the shelf lasers. Past research has shown that a few ground-based systems consisting of 10 kilowatt class lasers directed by 1.5 meter telescopes with adaptive optics could lower the expected number of collisions in Low Earth Orbit (LEO) by an order of magnitude. Our simulation approach utilizes the entire Two Line Element (TLE) catalogue in LEO for a given day as initial input. Least-squares fitting of a TLE time series is used for an improved orbit estimate. We then calculate the probability of collision for all LEO objects in the catalogue for a time step of the simulation. The conjunctions that exceed a threshold probability of collision are then engaged by a simulated network of laser ground stations. After those engagements, the perturbed orbits are used to re-assess the probability of collision and evaluate the efficiency of the system. This paper describes new simulations with three updated aspects: 1) By utilizing a highly parallel simulation approach employing hundreds of processors, we have extended our analysis to a much broader dataset. The simulation time is extended to one year. 2) We analyze not only the efficiency of LightForce on conjunctions that naturally occur, but also take into account conjunctions caused by orbit perturbations due to LightForce engagements. 3) We use a new simulation approach that is regularly updating the LightForce engagement strategy, as it would be during actual operations. In this paper we present our simulation approach to parallelize the efficiency analysis, its computational performance and the resulting expected efficiency of the LightForce collision avoidance system. Results indicate that utilizing a network of four LightForce stations with 20 kilowatt lasers, 85% of all conjunctions with a

  2. NDPA: A generalized efficient parallel in-place N-Dimensional Permutation Algorithm

    Directory of Open Access Journals (Sweden)

    Muhammad Elsayed Ali

    2015-09-01

    Full Text Available N-dimensional transpose/permutation is a very important operation in many large-scale data intensive and scientific applications. These applications include but not limited to oil industry i.e. seismic data processing, nuclear medicine, media production, digital signal processing and business intelligence. This paper proposes an efficient in-place N-dimensional permutation algorithm. The algorithm is based on a novel 3D transpose algorithm that was published recently. The proposed algorithm has been tested on 3D, 4D, 5D, 6D and 7D data sets as a proof of concept. This is the first contribution which is breaking the dimensions’ limitation of the base algorithm. The suggested algorithm exploits the idea of mixing both logical and physical permutations together. In the logical permutation, the address map is transposed for each data unit access. In the physical permutation, actual data elements are swapped. Both permutation levels exploit the fast on-chip memory bandwidth by transferring large amount of data and allowing for fine-grain SIMD (Single Instruction, Multiple Data operations. Thus, the performance is improved as evident from the experimental results section. The algorithm is implemented on NVidia GeForce GTS 250 GPU (Graphics Processing Unit containing 128 cores. The rapid increase in GPUs performance coupled with the recent and continuous improvements in its programmability proved that GPUs are the right choice for computationally demanding tasks. The use of GPUs is the second contribution which reflects how strongly they fit for high performance tasks. The third contribution is improving the proposed algorithm performance to its peak as discussed in the results section.

  3. Improved efficiency of multi-criteria IMPT treatment planning using iterative resampling of randomly placed pencil beams

    Science.gov (United States)

    van de Water, S.; Kraan, A. C.; Breedveld, S.; Schillemans, W.; Teguh, D. N.; Kooy, H. M.; Madden, T. M.; Heijmen, B. J. M.; Hoogeman, M. S.

    2013-10-01

    This study investigates whether ‘pencil beam resampling’, i.e. iterative selection and weight optimization of randomly placed pencil beams (PBs), reduces optimization time and improves plan quality for multi-criteria optimization in intensity-modulated proton therapy, compared with traditional modes in which PBs are distributed over a regular grid. Resampling consisted of repeatedly performing: (1) random selection of candidate PBs from a very fine grid, (2) inverse multi-criteria optimization, and (3) exclusion of low-weight PBs. The newly selected candidate PBs were added to the PBs in the existing solution, causing the solution to improve with each iteration. Resampling and traditional regular grid planning were implemented into our in-house developed multi-criteria treatment planning system ‘Erasmus iCycle’. The system optimizes objectives successively according to their priorities as defined in the so-called ‘wish-list’. For five head-and-neck cancer patients and two PB widths (3 and 6 mm sigma at 230 MeV), treatment plans were generated using: (1) resampling, (2) anisotropic regular grids and (3) isotropic regular grids, while using varying sample sizes (resampling) or grid spacings (regular grid). We assessed differences in optimization time (for comparable plan quality) and in plan quality parameters (for comparable optimization time). Resampling reduced optimization time by a factor of 2.8 and 5.6 on average (7.8 and 17.0 at maximum) compared with the use of anisotropic and isotropic grids, respectively. Doses to organs-at-risk were generally reduced when using resampling, with median dose reductions ranging from 0.0 to 3.0 Gy (maximum: 14.3 Gy, relative: 0%-42%) compared with anisotropic grids and from -0.3 to 2.6 Gy (maximum: 11.4 Gy, relative: -4%-19%) compared with isotropic grids. Resampling was especially effective when using thin PBs (3 mm sigma). Resampling plans contained on average fewer PBs, energy layers and protons than anisotropic

  4. A Note on Using Partitioning Techniques for Solving Unconstrained Optimization Problems on Parallel Systems

    Directory of Open Access Journals (Sweden)

    Mehiddin Al-Baali

    2015-12-01

    Full Text Available We deal with the design of parallel algorithms by using variable partitioning techniques to solve nonlinear optimization problems. We propose an iterative solution method that is very efficient for separable functions, our scope being to discuss its performance for general functions. Experimental results on an illustrative example have suggested some useful modifications that, even though they improve the efficiency of our parallel method, leave some questions open for further investigation.

  5. Parallel conjugate gradient algorithms for manipulator dynamic simulation

    Science.gov (United States)

    Fijany, Amir; Scheld, Robert E.

    1989-01-01

    Parallel conjugate gradient algorithms for the computation of multibody dynamics are developed for the specialized case of a robot manipulator. For an n-dimensional positive-definite linear system, the Classical Conjugate Gradient (CCG) algorithms are guaranteed to converge in n iterations, each with a computation cost of O(n); this leads to a total computational cost of O(n sq) on a serial processor. A conjugate gradient algorithms is presented that provide greater efficiency using a preconditioner, which reduces the number of iterations required, and by exploiting parallelism, which reduces the cost of each iteration. Two Preconditioned Conjugate Gradient (PCG) algorithms are proposed which respectively use a diagonal and a tridiagonal matrix, composed of the diagonal and tridiagonal elements of the mass matrix, as preconditioners. Parallel algorithms are developed to compute the preconditioners and their inversions in O(log sub 2 n) steps using n processors. A parallel algorithm is also presented which, on the same architecture, achieves the computational time of O(log sub 2 n) for each iteration. Simulation results for a seven degree-of-freedom manipulator are presented. Variants of the proposed algorithms are also developed which can be efficiently implemented on the Robot Mathematics Processor (RMP).

  6. ITER overview

    International Nuclear Information System (INIS)

    Shimomura, Y.; Aymar, R.; Chuyanov, V.; Huguet, M.; Parker, R.R.

    2001-01-01

    This report summarizes technical works of six years done by the ITER Joint Central Team and Home Teams under terms of Agreement of the ITER Engineering Design Activities. The major products are as follows: complete and detailed engineering design with supporting assessments, industrial-based cost estimates and schedule, non-site specific comprehensive safety and environmental assessment, and technology R and D to validate and qualify design including proof of technologies and industrial manufacture and testing of full size or scalable models of key components. The ITER design is at an advanced stage of maturity and contains sufficient technical information for a construction decision. The operation of ITER will demonstrate the availability of a new energy source, fusion. (author)

  7. ITER Overview

    International Nuclear Information System (INIS)

    Shimomura, Y.; Aymar, R.; Chuyanov, V.; Huguet, M.; Parker, R.

    1999-01-01

    This report summarizes technical works of six years done by the ITER Joint Central Team and Home Teams under terms of Agreement of the ITER Engineering Design Activities. The major products are as follows: complete and detailed engineering design with supporting assessments, industrial-based cost estimates and schedule, non-site specific comprehensive safety and environmental assessment, and technology R and D to validate and qualify design including proof of technologies and industrial manufacture and testing of full size or scalable models of key components. The ITER design is at an advanced stage of maturity and contains sufficient technical information for a construction decision. The operation of ITER will demonstrate the availability of a new energy source, fusion. (author)

  8. GPU Parallel Bundle Block Adjustment

    Directory of Open Access Journals (Sweden)

    ZHENG Maoteng

    2017-09-01

    Full Text Available To deal with massive data in photogrammetry, we introduce the GPU parallel computing technology. The preconditioned conjugate gradient and inexact Newton method are also applied to decrease the iteration times while solving the normal equation. A brand new workflow of bundle adjustment is developed to utilize GPU parallel computing technology. Our method can avoid the storage and inversion of the big normal matrix, and compute the normal matrix in real time. The proposed method can not only largely decrease the memory requirement of normal matrix, but also largely improve the efficiency of bundle adjustment. It also achieves the same accuracy as the conventional method. Preliminary experiment results show that the bundle adjustment of a dataset with about 4500 images and 9 million image points can be done in only 1.5 minutes while achieving sub-pixel accuracy.

  9. A fast iterative scheme for the linearized Boltzmann equation

    Science.gov (United States)

    Wu, Lei; Zhang, Jun; Liu, Haihu; Zhang, Yonghao; Reese, Jason M.

    2017-06-01

    Iterative schemes to find steady-state solutions to the Boltzmann equation are efficient for highly rarefied gas flows, but can be very slow to converge in the near-continuum flow regime. In this paper, a synthetic iterative scheme is developed to speed up the solution of the linearized Boltzmann equation by penalizing the collision operator L into the form L = (L + Nδh) - Nδh, where δ is the gas rarefaction parameter, h is the velocity distribution function, and N is a tuning parameter controlling the convergence rate. The velocity distribution function is first solved by the conventional iterative scheme, then it is corrected such that the macroscopic flow velocity is governed by a diffusion-type equation that is asymptotic-preserving into the Navier-Stokes limit. The efficiency of this new scheme is assessed by calculating the eigenvalue of the iteration, as well as solving for Poiseuille and thermal transpiration flows. We find that the fastest convergence of our synthetic scheme for the linearized Boltzmann equation is achieved when Nδ is close to the average collision frequency. The synthetic iterative scheme is significantly faster than the conventional iterative scheme in both the transition and the near-continuum gas flow regimes. Moreover, due to its asymptotic-preserving properties, the synthetic iterative scheme does not need high spatial resolution in the near-continuum flow regime, which makes it even faster than the conventional iterative scheme. Using this synthetic scheme, with the fast spectral approximation of the linearized Boltzmann collision operator, Poiseuille and thermal transpiration flows between two parallel plates, through channels of circular/rectangular cross sections and various porous media are calculated over the whole range of gas rarefaction. Finally, the flow of a Ne-Ar gas mixture is solved based on the linearized Boltzmann equation with the Lennard-Jones intermolecular potential for the first time, and the difference

  10. Modular and efficient ozone systems based on massively parallel chemical processing in microchannel plasma arrays: performance and commercialization

    Science.gov (United States)

    Kim, M.-H.; Cho, J. H.; Park, S.-J.; Eden, J. G.

    2017-08-01

    Plasmachemical systems based on the production of a specific molecule (O3) in literally thousands of microchannel plasmas simultaneously have been demonstrated, developed and engineered over the past seven years, and commercialized. At the heart of this new plasma technology is the plasma chip, a flat aluminum strip fabricated by photolithographic and wet chemical processes and comprising 24-48 channels, micromachined into nanoporous aluminum oxide, with embedded electrodes. By integrating 4-6 chips into a module, the mass output of an ozone microplasma system is scaled linearly with the number of modules operating in parallel. A 115 g/hr (2.7 kg/day) ozone system, for example, is realized by the combined output of 18 modules comprising 72 chips and 1,800 microchannels. The implications of this plasma processing architecture for scaling ozone production capability, and reducing capital and service costs when introducing redundancy into the system, are profound. In contrast to conventional ozone generator technology, microplasma systems operate reliably (albeit with reduced output) in ambient air and humidity levels up to 90%, a characteristic attributable to the water adsorption/desorption properties and electrical breakdown strength of nanoporous alumina. Extensive testing has documented chip and system lifetimes (MTBF) beyond 5,000 hours, and efficiencies >130 g/kWh when oxygen is the feedstock gas. Furthermore, the weight and volume of microplasma systems are a factor of 3-10 lower than those for conventional ozone systems of comparable output. Massively-parallel plasmachemical processing offers functionality, performance, and commercial value beyond that afforded by conventional technology, and is currently in operation in more than 30 countries worldwide.

  11. Parallel preconditioned conjugate gradient algorithm applied to neutron diffusion problem

    International Nuclear Information System (INIS)

    Majumdar, A.; Martin, W.R.

    1992-01-01

    Numerical solution of the neutron diffusion problem requires solving a linear system of equations such as Ax = b, where A is an n x n symmetric positive definite (SPD) matrix; x and b are vectors with n components. The preconditioned conjugate gradient (PCG) algorithm is an efficient iterative method for solving such a linear system of equations. In this paper, the authors describe the implementation of a parallel PCG algorithm on a shared memory machine (BBN TC2000) and on a distributed workstation (IBM RS6000) environment created by the parallel virtual machine parallelization software

  12. A parallel offline CFD and closed-form approximation strategy for computationally efficient analysis of complex fluid flows

    Science.gov (United States)

    Allphin, Devin

    Computational fluid dynamics (CFD) solution approximations for complex fluid flow problems have become a common and powerful engineering analysis technique. These tools, though qualitatively useful, remain limited in practice by their underlying inverse relationship between simulation accuracy and overall computational expense. While a great volume of research has focused on remedying these issues inherent to CFD, one traditionally overlooked area of resource reduction for engineering analysis concerns the basic definition and determination of functional relationships for the studied fluid flow variables. This artificial relationship-building technique, called meta-modeling or surrogate/offline approximation, uses design of experiments (DOE) theory to efficiently approximate non-physical coupling between the variables of interest in a fluid flow analysis problem. By mathematically approximating these variables, DOE methods can effectively reduce the required quantity of CFD simulations, freeing computational resources for other analytical focuses. An idealized interpretation of a fluid flow problem can also be employed to create suitably accurate approximations of fluid flow variables for the purposes of engineering analysis. When used in parallel with a meta-modeling approximation, a closed-form approximation can provide useful feedback concerning proper construction, suitability, or even necessity of an offline approximation tool. It also provides a short-circuit pathway for further reducing the overall computational demands of a fluid flow analysis, again freeing resources for otherwise unsuitable resource expenditures. To validate these inferences, a design optimization problem was presented requiring the inexpensive estimation of aerodynamic forces applied to a valve operating on a simulated piston-cylinder heat engine. The determination of these forces was to be found using parallel surrogate and exact approximation methods, thus evidencing the comparative

  13. Conceptual design Fusion Experimental Reactor (FER/ITER)

    International Nuclear Information System (INIS)

    Uehara, Kazuya; Nagashima, Takashi; Ikeda, Yoshitaka

    1991-11-01

    This report describes a conceptual design of Lower Hybrid Wave (LH) system for FER and ITER. In JAERI, the conceptual design of LH system for FER has been performed in these 3 years in parallel to that of ITER. There must be a common design part with ITER and FER. The physical requirement of LH system is the saving of volt · sec in the current start-up phase, and the current drive at the boundary region. The frequency of 5GHz is mainly chosen for avoidance of the α particle absorption and for the availability of electron tube development. Seventy-two klystrons (FER) and one hundred klystrons (ITER) are necessary to inject the 30 MW (FER) and 45-50 MW (ITER) rf power into plasma using 0.7 - 0.8 MW klystron per one tube. The launching system is the multi-junction type and the rf spectrum must be as sharp as possible with high directivity to improve the current drive efficiency. One port (FER) and two ports (ITER) are used and the injection direction is in horizontal, in which the analysis of the ray-tracing code and the better coupling of LH wave is considered. The transmission line is over-sized waveguide with low rf loss. (author)

  14. Non-Cartesian parallel imaging reconstruction.

    Science.gov (United States)

    Wright, Katherine L; Hamilton, Jesse I; Griswold, Mark A; Gulani, Vikas; Seiberlich, Nicole

    2014-11-01

    Non-Cartesian parallel imaging has played an important role in reducing data acquisition time in MRI. The use of non-Cartesian trajectories can enable more efficient coverage of k-space, which can be leveraged to reduce scan times. These trajectories can be undersampled to achieve even faster scan times, but the resulting images may contain aliasing artifacts. Just as Cartesian parallel imaging can be used to reconstruct images from undersampled Cartesian data, non-Cartesian parallel imaging methods can mitigate aliasing artifacts by using additional spatial encoding information in the form of the nonhomogeneous sensitivities of multi-coil phased arrays. This review will begin with an overview of non-Cartesian k-space trajectories and their sampling properties, followed by an in-depth discussion of several selected non-Cartesian parallel imaging algorithms. Three representative non-Cartesian parallel imaging methods will be described, including Conjugate Gradient SENSE (CG SENSE), non-Cartesian generalized autocalibrating partially parallel acquisition (GRAPPA), and Iterative Self-Consistent Parallel Imaging Reconstruction (SPIRiT). After a discussion of these three techniques, several potential promising clinical applications of non-Cartesian parallel imaging will be covered. © 2014 Wiley Periodicals, Inc.

  15. A homotopy method for solving Riccati equations on a shared memory parallel computer

    International Nuclear Information System (INIS)

    Zigic, D.; Watson, L.T.; Collins, E.G. Jr.; Davis, L.D.

    1993-01-01

    Although there are numerous algorithms for solving Riccati equations, there still remains a need for algorithms which can operate efficiently on large problems and on parallel machines. This paper gives a new homotopy-based algorithm for solving Riccati equations on a shared memory parallel computer. The central part of the algorithm is the computation of the kernel of the Jacobian matrix, which is essential for the corrector iterations along the homotopy zero curve. Using a Schur decomposition the tensor product structure of various matrices can be efficiently exploited. The algorithm allows for efficient parallelization on shared memory machines

  16. ITER licensing

    International Nuclear Information System (INIS)

    Gordon, C.W.

    2005-01-01

    ITER was fortunate to have four countries interested in ITER siting to the point where licensing discussions were initiated. This experience uncovered the challenges of licensing a first of a kind, fusion machine under different licensing regimes and helped prepare the way for the site specific licensing process. These initial steps in licensing ITER have allowed for refining the safety case and provide confidence that the design and safety approach will be licensable. With site-specific licensing underway, the necessary regulatory submissions have been defined and are well on the way to being completed. Of course, there is still work to be done and details to be sorted out. However, the informal international discussions to bring both the proponent and regulatory authority up to a common level of understanding have laid the foundation for a licensing process that should proceed smoothly. This paper provides observations from the perspective of the International Team. (author)

  17. ITER Safety and Licensing

    International Nuclear Information System (INIS)

    Girard, J-.P; Taylor, N.; Garin, P.; Uzan-Elbez, J.; GULDEN, W.; Rodriguez-Rodrigo, L.

    2006-01-01

    The site for the construction of ITER has been chosen in June 2005. The facility will be implemented in Europe, south of France close to Marseille. The generic safety scheme is now under revision to adapt the design to the host country regulation. Even though ITER will be an international organization, it will have to comply with the French requirements in the fields of public and occupational health and safety, nuclear safety, radiation protection, licensing, nuclear substances and environmental protection. The organization of the central team together with its partners organized in domestic agencies for the in-kind procurement of components is a key issue for the success of the experimentation. ITER is the first facility that will achieve sustained nuclear fusion. It is both important for the experimental one-of-a-kind device, ITER itself, and for the future of fusion power plants to well understand the key safety issues of this potential new source of energy production. The main safety concern is confinement of the tritium, activated dust in the vacuum vessel and activated corrosion products in the coolant of the plasma-facing components. This is achieved in the design through multiple confinement barriers to implement the defence in depth approach. It will be demonstrated in documents submitted to the French regulator that these barriers maintain their function in all postulated incident and accident conditions. The licensing process started by examination of the safety options. This step has been performed by Europe during the candidature phase in 2002. In parallel to the final design, and taking into account the local regulations, the Preliminary Safety Report (RPrS) will be drafted with support of the European partner and others in the framework of ITER Task Agreements. Together with the license application, the RPrS will be forwarded to the regulatory bodies, which will launch public hearings and a safety review. Both processes must succeed in order to

  18. Improved parallel solution techniques for the integral transport matrix method

    Energy Technology Data Exchange (ETDEWEB)

    Zerr, R. Joseph, E-mail: rjz116@psu.edu [Department of Mechanical and Nuclear Engineering, The Pennsylvania State University, University Park, PA (United States); Azmy, Yousry Y., E-mail: yyazmy@ncsu.edu [Department of Nuclear Engineering, North Carolina State University, Burlington Engineering Laboratories, Raleigh, NC (United States)

    2011-07-01

    Alternative solution strategies to the parallel block Jacobi (PBJ) method for the solution of the global problem with the integral transport matrix method operators have been designed and tested. The most straightforward improvement to the Jacobi iterative method is the Gauss-Seidel alternative. The parallel red-black Gauss-Seidel (PGS) algorithm can improve on the number of iterations and reduce work per iteration by applying an alternating red-black color-set to the subdomains and assigning multiple sub-domains per processor. A parallel GMRES(m) method was implemented as an alternative to stationary iterations. Computational results show that the PGS method can improve on the PBJ method execution time by up to 10´ when eight sub-domains per processor are used. However, compared to traditional source iterations with diffusion synthetic acceleration, it is still approximately an order of magnitude slower. The best-performing cases are optically thick because sub-domains decouple, yielding faster convergence. Further tests revealed that 64 sub-domains per processor was the best performing level of sub-domain division. An acceleration technique that improves the convergence rate would greatly improve the ITMM. The GMRES(m) method with a diagonal block pre conditioner consumes approximately the same time as the PBJ solver but could be improved by an as yet undeveloped, more efficient pre conditioner. (author)

  19. Improved parallel solution techniques for the integral transport matrix method

    International Nuclear Information System (INIS)

    Zerr, R. Joseph; Azmy, Yousry Y.

    2011-01-01

    Alternative solution strategies to the parallel block Jacobi (PBJ) method for the solution of the global problem with the integral transport matrix method operators have been designed and tested. The most straightforward improvement to the Jacobi iterative method is the Gauss-Seidel alternative. The parallel red-black Gauss-Seidel (PGS) algorithm can improve on the number of iterations and reduce work per iteration by applying an alternating red-black color-set to the subdomains and assigning multiple sub-domains per processor. A parallel GMRES(m) method was implemented as an alternative to stationary iterations. Computational results show that the PGS method can improve on the PBJ method execution time by up to 10´ when eight sub-domains per processor are used. However, compared to traditional source iterations with diffusion synthetic acceleration, it is still approximately an order of magnitude slower. The best-performing cases are optically thick because sub-domains decouple, yielding faster convergence. Further tests revealed that 64 sub-domains per processor was the best performing level of sub-domain division. An acceleration technique that improves the convergence rate would greatly improve the ITMM. The GMRES(m) method with a diagonal block pre conditioner consumes approximately the same time as the PBJ solver but could be improved by an as yet undeveloped, more efficient pre conditioner. (author)

  20. Dynamical behaviour of neuronal networks iterated with memory

    International Nuclear Information System (INIS)

    Melatagia, P.M.; Ndoundam, R.; Tchuente, M.

    2005-11-01

    We study memory iteration where the updating consider a longer history of each site and the set of interaction matrices is palindromic. We analyze two different ways of updating the networks: parallel iteration with memory and sequential iteration with memory that we introduce in this paper. For parallel iteration, we define Lyapunov functional which permits us to characterize the periods behaviour and explicitly bounds the transient lengths of neural networks iterated with memory. For sequential iteration, we use an algebraic invariant to characterize the periods behaviour of the studied model of neural computation. (author)

  1. ITER shielding blanket

    Energy Technology Data Exchange (ETDEWEB)

    Strebkov, Yu [ENTEK, Moscow (Russian Federation); Avsjannikov, A [ENTEK, Moscow (Russian Federation); Baryshev, M [NIAT, Moscow (Russian Federation); Blinov, Yu [ENTEK, Moscow (Russian Federation); Shatalov, G [KIAE, Moscow (Russian Federation); Vasiliev, N [KIAE, Moscow (Russian Federation); Vinnikov, A [ENTEK, Moscow (Russian Federation); Chernjagin, A [DYNAMICA, Moscow (Russian Federation)

    1995-03-01

    A reference non-breeding blanket is under development now for the ITER Basic Performance Phase for the purpose of high reliability during the first stage of ITER operation. More severe operation modes are expected in this stage with first wall (FW) local heat loads up to 100-300Wcm{sup -2}. Integration of a blanket design with protective and start limiters requires new solutions to achieve high reliability, and possible use of beryllium as a protective material leads to technologies. The rigid shielding blanket concept was developed in Russia to satisfy the above-mentioned requirements. The concept is based on a copper alloy FW, austenitic stainless steel blanket structure, water cooling. Beryllium protection is integrated in the FW design. Fabrication technology and assembly procedure are described in parallel with the equipment used. (orig.).

  2. High-speed parallel solution of the neutron diffusion equation with the hierarchical domain decomposition boundary element method incorporating parallel communications

    International Nuclear Information System (INIS)

    Tsuji, Masashi; Chiba, Gou

    2000-01-01

    A hierarchical domain decomposition boundary element method (HDD-BEM) for solving the multiregion neutron diffusion equation (NDE) has been fully parallelized, both for numerical computations and for data communications, to accomplish a high parallel efficiency on distributed memory message passing parallel computers. Data exchanges between node processors that are repeated during iteration processes of HDD-BEM are implemented, without any intervention of the host processor that was used to supervise parallel processing in the conventional parallelized HDD-BEM (P-HDD-BEM). Thus, the parallel processing can be executed with only cooperative operations of node processors. The communication overhead was even the dominant time consuming part in the conventional P-HDD-BEM, and the parallelization efficiency decreased steeply with the increase of the number of processors. With the parallel data communication, the efficiency is affected only by the number of boundary elements assigned to decomposed subregions, and the communication overhead can be drastically reduced. This feature can be particularly advantageous in the analysis of three-dimensional problems where a large number of processors are required. The proposed P-HDD-BEM offers a promising solution to the deterioration problem of parallel efficiency and opens a new path to parallel computations of NDEs on distributed memory message passing parallel computers. (author)

  3. An accurate and efficient system model of iterative image reconstruction in high-resolution pinhole SPECT for small animal research

    Energy Technology Data Exchange (ETDEWEB)

    Huang, P-C; Hsu, C-H [Department of Biomedical Engineering and Environmental Sciences, National Tsing Hua University, Hsinchu, Taiwan (China); Hsiao, I-T [Department Medical Imaging and Radiological Sciences, Chang Gung University, Tao-Yuan, Taiwan (China); Lin, K M [Medical Engineering Research Division, National Health Research Institutes, Zhunan Town, Miaoli County, Taiwan (China)], E-mail: cghsu@mx.nthu.edu.tw

    2009-06-15

    Accurate modeling of the photon acquisition process in pinhole SPECT is essential for optimizing resolution. In this work, the authors develop an accurate system model in which pinhole finite aperture and depth-dependent geometric sensitivity are explicitly included. To achieve high-resolution pinhole SPECT, the voxel size is usually set in the range of sub-millimeter so that the total number of image voxels increase accordingly. It is inevitably that a system matrix that models a variety of favorable physical factors will become extremely sophisticated. An efficient implementation for such an accurate system model is proposed in this research. We first use the geometric symmetries to reduce redundant entries in the matrix. Due to the sparseness of the matrix, only non-zero terms are stored. A novel center-to-radius recording rule is also developed to effectively describe the relation between a voxel and its related detectors at every projection angle. The proposed system matrix is also suitable for multi-threaded computing. Finally, the accuracy and effectiveness of the proposed system model is evaluated in a workstation equipped with two Quad-Core Intel X eon processors.

  4. Parallel multigrid smoothing: polynomial versus Gauss-Seidel

    International Nuclear Information System (INIS)

    Adams, Mark; Brezina, Marian; Hu, Jonathan; Tuminaro, Ray

    2003-01-01

    Gauss-Seidel is often the smoother of choice within multigrid applications. In the context of unstructured meshes, however, maintaining good parallel efficiency is difficult with multiplicative iterative methods such as Gauss-Seidel. This leads us to consider alternative smoothers. We discuss the computational advantages of polynomial smoothers within parallel multigrid algorithms for positive definite symmetric systems. Two particular polynomials are considered: Chebyshev and a multilevel specific polynomial. The advantages of polynomial smoothing over traditional smoothers such as Gauss-Seidel are illustrated on several applications: Poisson's equation, thin-body elasticity, and eddy current approximations to Maxwell's equations. While parallelizing the Gauss-Seidel method typically involves a compromise between a scalable convergence rate and maintaining high flop rates, polynomial smoothers achieve parallel scalable multigrid convergence rates without sacrificing flop rates. We show that, although parallel computers are the main motivation, polynomial smoothers are often surprisingly competitive with Gauss-Seidel smoothers on serial machines

  5. Parallel multigrid smoothing: polynomial versus Gauss-Seidel

    Science.gov (United States)

    Adams, Mark; Brezina, Marian; Hu, Jonathan; Tuminaro, Ray

    2003-07-01

    Gauss-Seidel is often the smoother of choice within multigrid applications. In the context of unstructured meshes, however, maintaining good parallel efficiency is difficult with multiplicative iterative methods such as Gauss-Seidel. This leads us to consider alternative smoothers. We discuss the computational advantages of polynomial smoothers within parallel multigrid algorithms for positive definite symmetric systems. Two particular polynomials are considered: Chebyshev and a multilevel specific polynomial. The advantages of polynomial smoothing over traditional smoothers such as Gauss-Seidel are illustrated on several applications: Poisson's equation, thin-body elasticity, and eddy current approximations to Maxwell's equations. While parallelizing the Gauss-Seidel method typically involves a compromise between a scalable convergence rate and maintaining high flop rates, polynomial smoothers achieve parallel scalable multigrid convergence rates without sacrificing flop rates. We show that, although parallel computers are the main motivation, polynomial smoothers are often surprisingly competitive with Gauss-Seidel smoothers on serial machines.

  6. Parallelization in Modern C++

    CERN Multimedia

    CERN. Geneva

    2016-01-01

    The traditionally used and well established parallel programming models OpenMP and MPI are both targeting lower level parallelism and are meant to be as language agnostic as possible. For a long time, those models were the only widely available portable options for developing parallel C++ applications beyond using plain threads. This has strongly limited the optimization capabilities of compilers, has inhibited extensibility and genericity, and has restricted the use of those models together with other, modern higher level abstractions introduced by the C++11 and C++14 standards. The recent revival of interest in the industry and wider community for the C++ language has also spurred a remarkable amount of standardization proposals and technical specifications being developed. Those efforts however have so far failed to build a vision on how to seamlessly integrate various types of parallelism, such as iterative parallel execution, task-based parallelism, asynchronous many-task execution flows, continuation s...

  7. A Novel Parallel Algorithm for Edit Distance Computation

    Directory of Open Access Journals (Sweden)

    Muhammad Murtaza Yousaf

    2018-01-01

    Full Text Available The edit distance between two sequences is the minimum number of weighted transformation-operations that are required to transform one string into the other. The weighted transformation-operations are insert, remove, and substitute. Dynamic programming solution to find edit distance exists but it becomes computationally intensive when the lengths of strings become very large. This work presents a novel parallel algorithm to solve edit distance problem of string matching. The algorithm is based on resolving dependencies in the dynamic programming solution of the problem and it is able to compute each row of edit distance table in parallel. In this way, it becomes possible to compute the complete table in min(m,n iterations for strings of size m and n whereas state-of-the-art parallel algorithm solves the problem in max(m,n iterations. The proposed algorithm also increases the amount of parallelism in each of its iteration. The algorithm is also capable of exploiting spatial locality while its implementation. Additionally, the algorithm works in a load balanced way that further improves its performance. The algorithm is implemented for multicore systems having shared memory. Implementation of the algorithm in OpenMP shows linear speedup and better execution time as compared to state-of-the-art parallel approach. Efficiency of the algorithm is also proven better in comparison to its competitor.

  8. Progress and Achievements on the R&D Activities for ITER Vacuum Vessel

    Energy Technology Data Exchange (ETDEWEB)

    Nakahira, M. [Japan Atomic Energy Research Institute (JAERI); Koizumi, K. [Japan Atomic Energy Research Institute (JAERI); Takahashi, H. [Japan Atomic Energy Research Institute (JAERI); Onozuka, M. [ITER Joint Central Team, Garching, Germany; Ioki, K. [ITER Joint Central Team, Garching, Germany; Kuzumin, E. [D.V. Efremov Scientific Research Institute, St. Petersburg, Russia; Krylov, V. [D.V. Efremov Scientific Research Institute, St. Petersburg, Russia; Maslakowski, J. [Oak Ridge National Laboratory (ORNL); Nelson, Brad E [ORNL; Jones, L. [Max-Planck Institute, Garching, Germany; Danner, W. [Max-Planck Institute, Garching, Germany; Maisonnier, D. [Max-Planck Institute, Garching, Germany

    2001-01-01

    The ITER vacuum vessel (VV) is designed to be large double-walled structure with a D-shaped crosssection. The achievable fabrication tolerance of this structure was unknown due to the size and complexity of shape. The Full-scale Sector Model of ITER Vacuum Vessel, which was 15m in height, was fabricated and tested to obtain the fabrication and assembly tolerances. The model was fabricated within the target tolerance of 5mm and welding deformation during assembly operation was obtained. The port structure was also connected using remotized welding tools to demonstrate the basic maintenance activity. In parallel, the tests of advanced welding, cutting and inspection system were performed to improve the efficiency of fabrication and maintenance of the Vacuum Vessel. These activities show the feasibility of ITER Vacuum Vessel as feasible in a realistic way. This paper describes the major progress, achievement and latest status of the R&D activities on the ITER vacuum vessel.

  9. A cryogenic system design for the international thermonuclear experimental reactor (ITER)

    International Nuclear Information System (INIS)

    Slack, D.S.

    1991-01-01

    A conceptual design for ITER was completed last year. The author developed a suitable cryogenic system for ITER as part of this conceptual design effort. An overview of the design is reported. Emphasis is on the fact that cryogenics is a mature science, and a system supporting ITER needs can be made from time-proven components without loss of efficiency or reliability. Because of the large size of the ITER cryogenic system, large numbers of compressors and expanders must be used. Very high reliability is assured by arranging these components in parallel banks where servicing of individual components can be done without interruption of operations. This and other ideas based on the author's experience with Mirror Fusion Test Facility (MFTF) operations are described. 5 refs., 3 figs

  10. A multi-level surface rebalancing approach for efficient convergence acceleration of 3D full core multi-group fine grid nodal diffusion iterations

    International Nuclear Information System (INIS)

    Geemert, René van

    2014-01-01

    Highlights: • New type of multi-level rebalancing approach for nodal transport. • Generally improved and more mesh-independent convergence behavior. • Importance for intended regime of 3D pin-by-pin core computations. - Abstract: A new multi-level surface rebalancing (MLSR) approach has been developed, aimed at enabling an improved non-linear acceleration of nodal flux iteration convergence in 3D steady-state and transient reactor simulation. This development is meant specifically for anticipating computational needs for solving envisaged multi-group diffusion-like SP N calculations with enhanced mesh resolution (i.e. 3D multi-box up to 3D pin-by-pin grid). For the latter grid refinement regime, the previously available multi-level coarse mesh rebalancing (MLCMR) strategy has been observed to become increasingly inefficient with increasing 3D mesh resolution. Furthermore, for very fine 3D grids that feature a very fine axial mesh as well, non-convergence phenomena have been observed to emerge. In the verifications pursued up to now, these problems have been resolved by the new approach. The novelty arises from taking the interface current balance equations defined over all Cartesian box edges, instead of the nodal volume-integrated process-rate balance equation, as an appropriate restriction basis for setting up multi-level acceleration of fine grid interface current iterations. The new restriction strategy calls for the use of a newly derived set of adjoint spectral equations that are needed for computing a limited set of spectral response vectors per node. This enables a straightforward determination of group-condensed interface current spectral coupling operators that are of crucial relevance in the new rebalancing setup. Another novelty in the approach is a new variational method for computing the neutronic eigenvalue. Within this context, the latter is treated as a control parameter for driving another, newly defined and numerically more fundamental

  11. Conformable variational iteration method

    Directory of Open Access Journals (Sweden)

    Omer Acan

    2017-02-01

    Full Text Available In this study, we introduce the conformable variational iteration method based on new defined fractional derivative called conformable fractional derivative. This new method is applied two fractional order ordinary differential equations. To see how the solutions of this method, linear homogeneous and non-linear non-homogeneous fractional ordinary differential equations are selected. Obtained results are compared the exact solutions and their graphics are plotted to demonstrate efficiency and accuracy of the method.

  12. High-Bandwidth, High-Efficiency Envelope Tracking Power Supply for 40W RF Power Amplifier Using Paralleled Bandpass Current Sources

    DEFF Research Database (Denmark)

    Høyerby, Mikkel Christian Wendelboe; Andersen, Michael Andreas E.

    2005-01-01

    This paper presents a high-performance power conversion scheme for power supply applications that require very high output voltage slew rates (dV/dt). The concept is to parallel 2 switching bandpass current sources, each optimized for its passband frequency space and the expected load current....... The principle is demonstrated with a power supply, designed for supplying a 40 W linear RF power amplifier for efficient amplification of a 16-QAM modulated data stream...

  13. Totally parallel multilevel algorithms

    Science.gov (United States)

    Frederickson, Paul O.

    1988-01-01

    Four totally parallel algorithms for the solution of a sparse linear system have common characteristics which become quite apparent when they are implemented on a highly parallel hypercube such as the CM2. These four algorithms are Parallel Superconvergent Multigrid (PSMG) of Frederickson and McBryan, Robust Multigrid (RMG) of Hackbusch, the FFT based Spectral Algorithm, and Parallel Cyclic Reduction. In fact, all four can be formulated as particular cases of the same totally parallel multilevel algorithm, which are referred to as TPMA. In certain cases the spectral radius of TPMA is zero, and it is recognized to be a direct algorithm. In many other cases the spectral radius, although not zero, is small enough that a single iteration per timestep keeps the local error within the required tolerance.

  14. Efficient multi-objective calibration of a computationally intensive hydrologic model with parallel computing software in Python

    Science.gov (United States)

    With enhanced data availability, distributed watershed models for large areas with high spatial and temporal resolution are increasingly used to understand water budgets and examine effects of human activities and climate change/variability on water resources. Developing parallel computing software...

  15. Eigenvalues calculation algorithms for {lambda}-modes determination. Parallelization approach

    Energy Technology Data Exchange (ETDEWEB)

    Vidal, V. [Universidad Politecnica de Valencia (Spain). Departamento de Sistemas Informaticos y Computacion; Verdu, G.; Munoz-Cobo, J.L. [Universidad Politecnica de Valencia (Spain). Departamento de Ingenieria Quimica y Nuclear; Ginestart, D. [Universidad Politecnica de Valencia (Spain). Departamento de Matematica Aplicada

    1997-03-01

    In this paper, we review two methods to obtain the {lambda}-modes of a nuclear reactor, Subspace Iteration method and Arnoldi`s method, which are popular methods to solve the partial eigenvalue problem for a given matrix. In the developed application for the neutron diffusion equation we include improved acceleration techniques for both methods. Also, we propose two parallelization approaches for these methods, a coarse grain parallelization and a fine grain one. We have tested the developed algorithms with two realistic problems, focusing on the efficiency of the methods according to the CPU times. (author).

  16. Parallel 3-D method of characteristics in MPACT

    International Nuclear Information System (INIS)

    Kochunas, B.; Dovvnar, T. J.; Liu, Z.

    2013-01-01

    A new parallel 3-D MOC kernel has been developed and implemented in MPACT which makes use of the modular ray tracing technique to reduce computational requirements and to facilitate parallel decomposition. The parallel model makes use of both distributed and shared memory parallelism which are implemented with the MPI and OpenMP standards, respectively. The kernel is capable of parallel decomposition of problems in space, angle, and by characteristic rays up to 0(104) processors. Initial verification of the parallel 3-D MOC kernel was performed using the Takeda 3-D transport benchmark problems. The eigenvalues computed by MPACT are within the statistical uncertainty of the benchmark reference and agree well with the averages of other participants. The MPACT k eff differs from the benchmark results for rodded and un-rodded cases by 11 and -40 pcm, respectively. The calculations were performed for various numbers of processors and parallel decompositions up to 15625 processors; all producing the same result at convergence. The parallel efficiency of the worst case was 60%, while very good efficiency (>95%) was observed for cases using 500 processors. The overall run time for the 500 processor case was 231 seconds and 19 seconds for the case with 15625 processors. Ongoing work is focused on developing theoretical performance models and the implementation of acceleration techniques to minimize the number of iterations to converge. (authors)

  17. Existence test for asynchronous interval iterations

    DEFF Research Database (Denmark)

    Madsen, Kaj; Caprani, O.; Stauning, Ole

    1997-01-01

    In the search for regions that contain fixed points ofa real function of several variables, tests based on interval calculationscan be used to establish existence ornon-existence of fixed points in regions that are examined in the course ofthe search. The search can e.g. be performed...... as a synchronous (sequential) interval iteration:In each iteration step all components of the iterate are calculatedbased on the previous iterate. In this case it is straight forward to base simple interval existence and non-existencetests on the calculations done in each step of the iteration. The search can also...... on thecomponentwise calculations done in the course of the iteration. These componentwisetests are useful for parallel implementation of the search, sincethe tests can then be performed local to each processor and only when a test issuccessful do a processor communicate this result to other processors....

  18. Parallelized preconditioned BiCGStab solution of sparse linear system equations in F-COBRA-TF

    International Nuclear Information System (INIS)

    Geemert, Rene van; Glück, Markus; Riedmann, Michael; Gabriel, Harry

    2011-01-01

    Recently, the in-house development of a preconditioned and parallelized BiCGStab solver has been pursued successfully in AREVA’s advanced sub-channel code F-COBRA-TF. This solver can be run either in a sequential computation mode on a single CPU, or in a parallel computation mode on multiple parallel CPUs. The developed procedure enables the computation of several thousands of successive sparse linear system solutions in F-COBRA-TF with acceptable wall clock run times. The current paper provides general information about F-COBRA-TF in terms of modeling capabilities and application areas, and points out where the relevance arises for the efficient iterative solution of sparse linear systems. Furthermore, the preconditioning and parallelization strategies in the developed BiCGStab iterative solution approach are discussed. The paper is concluded with a number of verification examples. (author)

  19. ITER council proceedings: 2001

    International Nuclear Information System (INIS)

    2001-01-01

    Continuing the ITER EDA, two further ITER Council Meetings were held since the publication of ITER EDA documentation series no, 20, namely the ITER Council Meeting on 27-28 February 2001 in Toronto, and the ITER Council Meeting on 18-19 July in Vienna. That Meeting was the last one during the ITER EDA. This volume contains records of these Meetings, including: Records of decisions; List of attendees; ITER EDA status report; ITER EDA technical activities report; MAC report and advice; Final report of ITER EDA; and Press release

  20. On the adequacy of message-passing parallel supercomputers for solving neutron transport problems

    International Nuclear Information System (INIS)

    Azmy, Y.Y.

    1990-01-01

    A coarse-grained, static-scheduling parallelization of the standard iterative scheme used for solving the discrete-ordinates approximation of the neutron transport equation is described. The parallel algorithm is based on a decomposition of the angular domain along the discrete ordinates, thus naturally producing a set of completely uncoupled systems of equations in each iteration. Implementation of the parallel code on Intcl's iPSC/2 hypercube, and solutions to test problems are presented as evidence of the high speedup and efficiency of the parallel code. The performance of the parallel code on the iPSC/2 is analyzed, and a model for the CPU time as a function of the problem size (order of angular quadrature) and the number of participating processors is developed and validated against measured CPU times. The performance model is used to speculate on the potential of massively parallel computers for significantly speeding up real-life transport calculations at acceptable efficiencies. We conclude that parallel computers with a few hundred processors are capable of producing large speedups at very high efficiencies in very large three-dimensional problems. 10 refs., 8 figs

  1. An efficient implementation of 3D high-resolution imaging for large-scale seismic data with GPU/CPU heterogeneous parallel computing

    Science.gov (United States)

    Xu, Jincheng; Liu, Wei; Wang, Jin; Liu, Linong; Zhang, Jianfeng

    2018-02-01

    De-absorption pre-stack time migration (QPSTM) compensates for the absorption and dispersion of seismic waves by introducing an effective Q parameter, thereby making it an effective tool for 3D, high-resolution imaging of seismic data. Although the optimal aperture obtained via stationary-phase migration reduces the computational cost of 3D QPSTM and yields 3D stationary-phase QPSTM, the associated computational efficiency is still the main problem in the processing of 3D, high-resolution images for real large-scale seismic data. In the current paper, we proposed a division method for large-scale, 3D seismic data to optimize the performance of stationary-phase QPSTM on clusters of graphics processing units (GPU). Then, we designed an imaging point parallel strategy to achieve an optimal parallel computing performance. Afterward, we adopted an asynchronous double buffering scheme for multi-stream to perform the GPU/CPU parallel computing. Moreover, several key optimization strategies of computation and storage based on the compute unified device architecture (CUDA) were adopted to accelerate the 3D stationary-phase QPSTM algorithm. Compared with the initial GPU code, the implementation of the key optimization steps, including thread optimization, shared memory optimization, register optimization and special function units (SFU), greatly improved the efficiency. A numerical example employing real large-scale, 3D seismic data showed that our scheme is nearly 80 times faster than the CPU-QPSTM algorithm. Our GPU/CPU heterogeneous parallel computing framework significant reduces the computational cost and facilitates 3D high-resolution imaging for large-scale seismic data.

  2. Parallel keyed hash function construction based on chaotic maps

    International Nuclear Information System (INIS)

    Xiao Di; Liao Xiaofeng; Deng Shaojiang

    2008-01-01

    Recently, a variety of chaos-based hash functions have been proposed. Nevertheless, none of them works efficiently in parallel computing environment. In this Letter, an algorithm for parallel keyed hash function construction is proposed, whose structure can ensure the uniform sensitivity of hash value to the message. By means of the mechanism of both changeable-parameter and self-synchronization, the keystream establishes a close relation with the algorithm key, the content and the order of each message block. The entire message is modulated into the chaotic iteration orbit, and the coarse-graining trajectory is extracted as the hash value. Theoretical analysis and computer simulation indicate that the proposed algorithm can satisfy the performance requirements of hash function. It is simple, efficient, practicable, and reliable. These properties make it a good choice for hash on parallel computing platform

  3. An efficient heuristic versus a robust hybrid meta-heuristic for general framework of serial-parallel redundancy problem

    International Nuclear Information System (INIS)

    Sadjadi, Seyed Jafar; Soltani, R.

    2009-01-01

    We present a heuristic approach to solve a general framework of serial-parallel redundancy problem where the reliability of the system is maximized subject to some general linear constraints. The complexity of the redundancy problem is generally considered to be NP-Hard and the optimal solution is not normally available. Therefore, to evaluate the performance of the proposed method, a hybrid genetic algorithm is also implemented whose parameters are calibrated via Taguchi's robust design method. Then, various test problems are solved and the computational results indicate that the proposed heuristic approach could provide us some promising reliabilities, which are fairly close to optimal solutions in a reasonable amount of time.

  4. Efficiency Analysis of the access method with the cascading Bloom filter to the data warehouse on the parallel computing platform

    Science.gov (United States)

    Grigoriev, Yu A.; Proletarskaya, V. A.; Ermakov, E. Yu; Ermakov, O. Yu

    2017-10-01

    A new method was developed with a cascading Bloom filter (CBF) for executing SQL queries in the Apache Spark parallel computing environment. It includes the representation of the original query in the form of several subqueries, the development of a connection graph and the transformation of subqueries, the definition of connections where it is necessary to use Bloom filters, the representation of the graph in terms of Spark. On the example of the query Q3 of the TPC-H test, full-scale experiments were carried out, which confirmed the effectiveness of the developed method.

  5. Parallel assembling and equation solving via graph algorithms with an application to the FE simulation of metal extrusion processes

    CERN Document Server

    Unterkircher, A

    2005-01-01

    We propose methods for parallel assembling and iterative equation solving based on graph algorithms. The assembling technique is independent of dimension, element type and model shape. As a parallel solving technique we construct a multiplicative symmetric Schwarz preconditioner for the conjugate gradient method. Both methods have been incorporated into a non-linear FE code to simulate 3D metal extrusion processes. We illustrate the efficiency of these methods on shared memory computers by realistic examples.

  6. The ITER divertor concept

    International Nuclear Information System (INIS)

    Janeschitz, G.; Borrass, K.; Federici, G.; Igitkhanov, Y.; Kukushkin, A.; Pacher, H.D.; Pacher, G.W.; Sugihara, M.

    1995-01-01

    The ITER divertor must exhaust most of the alpha particle power and the He ash at acceptable erosion rates. The high recycling regime of the ITER-CDA for present parameters would yield high power loads and erosion rates on conventional targets. Improvement by radiation in the SOL at constant pressure is limited in principle. To permit a higher radiation fraction, the plasma pressure along the field must be reduced by more than a factor 10, reducing also the target ion flux. This pressure reduction can be obtained by strong plasma-neutral interaction below the X-point. Under these conditions T e in the divertor can be reduced to <5 eV along a flame like ionisation front by impurity radiation and CX losses. Downstream of the front, neutrals undergo more CX or i-n collisions than ionisation events, resulting in significant momentum loss via neutrals to the divertor chamber wall. The pressure reduction by this mechanism depends on the along-field length for neutral-plasma interaction, the parallel power flux, the neutral density, the ratio of neutral-neutral collision length to the plasma-wall distance and on the Mach number of ions and neutrals. A supersonic transition in the main plasma-neutral interaction region, expected to occur near the ionisation front, would be beneficial for momentum removal. The momentum transfer fraction to the side walls is calculated: low Knudsen number is beneficial. The impact of the different physics effects on the chosen geometry and on the ITER divertor design and the lifetime of the various divertor components are discussed. ((orig.))

  7. Efficient operator splitting algorithm for joint sparsity-regularized SPIRiT-based parallel MR imaging reconstruction.

    Science.gov (United States)

    Duan, Jizhong; Liu, Yu; Jing, Peiguang

    2018-02-01

    Self-consistent parallel imaging (SPIRiT) is an auto-calibrating model for the reconstruction of parallel magnetic resonance imaging, which can be formulated as a regularized SPIRiT problem. The Projection Over Convex Sets (POCS) method was used to solve the formulated regularized SPIRiT problem. However, the quality of the reconstructed image still needs to be improved. Though methods such as NonLinear Conjugate Gradients (NLCG) can achieve higher spatial resolution, these methods always demand very complex computation and converge slowly. In this paper, we propose a new algorithm to solve the formulated Cartesian SPIRiT problem with the JTV and JL1 regularization terms. The proposed algorithm uses the operator splitting (OS) technique to decompose the problem into a gradient problem and a denoising problem with two regularization terms, which is solved by our proposed split Bregman based denoising algorithm, and adopts the Barzilai and Borwein method to update step size. Simulation experiments on two in vivo data sets demonstrate that the proposed algorithm is 1.3 times faster than ADMM for datasets with 8 channels. Especially, our proposal is 2 times faster than ADMM for the dataset with 32 channels. Copyright © 2017 Elsevier Inc. All rights reserved.

  8. Contribution to the algorithmic and efficient programming of new parallel architectures including accelerators for neutron physics and shielding computations

    International Nuclear Information System (INIS)

    Dubois, J.

    2011-01-01

    In science, simulation is a key process for research or validation. Modern computer technology allows faster numerical experiments, which are cheaper than real models. In the field of neutron simulation, the calculation of eigenvalues is one of the key challenges. The complexity of these problems is such that a lot of computing power may be necessary. The work of this thesis is first the evaluation of new computing hardware such as graphics card or massively multi-core chips, and their application to eigenvalue problems for neutron simulation. Then, in order to address the massive parallelism of supercomputers national, we also study the use of asynchronous hybrid methods for solving eigenvalue problems with this very high level of parallelism. Then we experiment the work of this research on several national supercomputers such as the Titane hybrid machine of the Computing Center, Research and Technology (CCRT), the Curie machine of the Very Large Computing Centre (TGCC), currently being installed, and the Hopper machine at the Lawrence Berkeley National Laboratory (LBNL). We also do our experiments on local workstations to illustrate the interest of this research in an everyday use with local computing resources. (author) [fr

  9. Parallel paving: An algorithm for generating distributed, adaptive, all-quadrilateral meshes on parallel computers

    Energy Technology Data Exchange (ETDEWEB)

    Lober, R.R.; Tautges, T.J.; Vaughan, C.T.

    1997-03-01

    Paving is an automated mesh generation algorithm which produces all-quadrilateral elements. It can additionally generate these elements in varying sizes such that the resulting mesh adapts to a function distribution, such as an error function. While powerful, conventional paving is a very serial algorithm in its operation. Parallel paving is the extension of serial paving into parallel environments to perform the same meshing functions as conventional paving only on distributed, discretized models. This extension allows large, adaptive, parallel finite element simulations to take advantage of paving`s meshing capabilities for h-remap remeshing. A significantly modified version of the CUBIT mesh generation code has been developed to host the parallel paving algorithm and demonstrate its capabilities on both two dimensional and three dimensional surface geometries and compare the resulting parallel produced meshes to conventionally paved meshes for mesh quality and algorithm performance. Sandia`s {open_quotes}tiling{close_quotes} dynamic load balancing code has also been extended to work with the paving algorithm to retain parallel efficiency as subdomains undergo iterative mesh refinement.

  10. Parallel algorithms for nuclear reactor analysis via domain decomposition method

    International Nuclear Information System (INIS)

    Kim, Yong Hee

    1995-02-01

    the number of inner level iterations are limited. The analysis shows that mixed pseudo-boundary conditions have superior convergence properties if the pseudo-boundary parameters are optimally chosen. DN(or ND) conditions can be efficiently accelerated via under-relaxation concept, where DN(or ND) means that Dirichlet and Neumann conditions are independently imposed on neighbouring pseudo-boundaries. However, exact realization of such schemes is not practical since complete inner iteration is required. It is shown that limiting the number of inner iterations is equivalent to the under-relaxation concept, however, limiting the number of inner level iterations in MM scheme requires more outer iterations. Consequently, DN (or ND) algorithm with under-relaxation and MM algorithm may provide similar parallel performance in practical implementation, if the numerical solver used is not extraordinarily efficient. The parallel Schwarz algorithm is applied to two types of reactor benchmark problems: fixed source problems and eigenvalue problems. Several results of parallel computation for the problems are reported and compared with those of sequential computations. The results show that very high speedup can be achieved in fixed source problems in spite of the small problem size and that relatively high speedup, although lower than that of fixed source problems, can be obtained in eigenvalue problems

  11. ITER EDA status

    International Nuclear Information System (INIS)

    Aymar, R.

    2001-01-01

    The Project has focused on drafting the Plant Description Document (PDD), which will be published as the Technical Basis for the ITER Final Design Report (FDR), and its related documentation in time for the ITER review process. The preparations have involved continued intensive detailed design work, analyses and assessments by the Home Teams and the Joint Central Team, who have co-operated closely and efficiently. The main technical document has been completed in time for circulation, as planned, to TAC members for their review at TAC-17 (19-22 February 2001). Some of the supporting documents, such as the Plant Design Specification (PDS), Design Requirements and Guidelines (DRG1 and DRG2), and the Plant Safety Requirement (PSR) are also available for reference in draft form. A summary paper of the PDD for the Council's information is available as a separate document. A new documentation structure for the Project has been established. This hierarchical structure for documentation facilitates the entire organization in a way that allows better change control and avoids duplications. The initiative was intended to make this documentation system valid for the construction and operation phases of ITER. As requested, the Director and the JCT have been assisting the Explorations to plan for future joint technical activities during the Negotiations, and to consider technical issues important for ITER construction and operation for their introduction in the draft of a future joint implementation agreement. As charged by the Explorers, the Director has held discussions with the Home Team Leaders in order to prepare for the staffing of the International Team and Participants Teams during the Negotiations (Co-ordinated Technical Activities, CTA) and also in view of informing all ITER staff about their future directions in a timely fashion. One important element of the work was the completion by the Parties' industries of costing studies of about 83 ''procurement packages

  12. Energy efficiency for the multiport power converters architectures of series and parallel hybrid power source type used in plug-in/V2G fuel cell vehicles

    International Nuclear Information System (INIS)

    Bizon, Nicu

    2013-01-01

    Highlights: ► It is analyzed the series and parallel Hybrid Power Source (HPS) topology for plug-in Fuel Cell Vehicle (PFCV). ► An energy efficiency analysis of the Multiport Power Converter (MPC) of both HPSs is performed. ► The MPC energy efficiency features were shown by analytical computing in all PFCV regimes. -- Abstract: In this paper it is presented a mathematical analysis of the energy efficiency for the Multiport Power Converter (MPC) used in series and parallel Hybrid Power Source (HPS) architectures type on the plug-in Fuel Cell Vehicles (PFCVs). The aim of the analysis is to provide general conclusions for a wide range of PFCV operating regimes that are chosen for efficient use of the MPC architecture on each particular drive cycle. In relation with FC system of PFCV, the Energy Storage System (ESS) can operate in following regimes: (1) Charge-Sustaining (CS), (2) Charge-Depleting (CD), and (3) Charge-Increasing (CI). Considering the imposed window for the ESS State-Of-Charge (SOC), the MPC can be connected to renewable plug-in Charging Stations (PCSs) to exchange power with Electric Power (EP) system, when it is necessary for both. The Energy Management Unit (EMU) that communicates with the EP system will establish the moments to match the PFCV power demand with supply availability of the EP grid, stabilizing it. The MPC energy efficiency of the PFCVs is studied when the ESS is charged (discharged) from (to) the home/PCS/EP system. The comparative results were shown for both PFCV architectures through the analytical calculation performed and the appropriate Matlab/Simulink® simulations presented.

  13. Improving the temperature characteristics and catalytic efficiency of a mesophilic xylanase from Aspergillus oryzae, AoXyn11A, by iterative mutagenesis based on in silico design.

    Science.gov (United States)

    Li, Xue-Qing; Wu, Qin; Hu, Die; Wang, Rui; Liu, Yan; Wu, Min-Chen; Li, Jian-Fang

    2017-12-01

    To improve the temperature characteristics and catalytic efficiency of a glycoside hydrolase family (GHF) 11 xylanase from Aspergillus oryzae (AoXyn11A), its variants were predicted based on in silico design. Firstly, Gly 21 with the maximum B-factor value, which was confirmed by molecular dynamics (MD) simulation on the three-dimensional structure of AoXyn11A, was subjected to site-saturation mutagenesis. Thus, one variant with the highest thermostability, AoXyn11A G21I , was selected from the mutagenesis library, E. coli/Aoxyn11A G21X (X: any one of 20 amino acids). Secondly, based on the primary structure multiple alignment of AoXyn11A with seven thermophilic GHF11 xylanases, AoXyn11A Y13F or AoXyn11A G21I-Y13F , was designed by replacing Tyr 13 in AoXyn11A or AoXyn11A G21I with Phe. Finally, three variant-encoding genes, Aoxyn11A G21I , Aoxyn11A Y13F and Aoxyn11A G21I-Y13F , were constructed by two-stage whole-plasmid PCR method, and expressed in Pichia pastoris GS115, respectively. The temperature optimum (T opt ) of recombinant (re) AoXyn11A G21I-Y13F was 60 °C, being 5 °C higher than that of reAoXyn11A G21I or reAoXyn11A Y13F , and 10 °C higher than that of reAoXyn11A. The thermal inactivation half-life (t 1/2 ) of reAoXyn11A G21I-Y13F at 50 °C was 240 min, being 40-, 3.4- and 2.5-fold longer than those of reAoXyn11A, reAoXyn11A G21I and reAoXyn11A Y13F . The melting temperature (T m ) values of reAoXyn11A, reAoXyn11A G21I , reAoXyn11A Y13F and reAoXyn11A G21I-Y13F were 52.3, 56.5, 58.6 and 61.3 °C, respectively. These findings indicated that the iterative mutagenesis of both Gly21Ile and Tyr13Phe improved the temperature characteristics of AoXyn11A in a synergistic mode. Besides those, the catalytic efficiency (k cat /K m ) of reAoXyn11A G21I-Y13F was 473.1 mL mg -1 s -1 , which was 1.65-fold higher than that of reAoXyn11A.

  14. ITER Central Solenoid Module Fabrication

    Energy Technology Data Exchange (ETDEWEB)

    Smith, John [General Atomics, San Diego, CA (United States)

    2016-09-23

    The fabrication of the modules for the ITER Central Solenoid (CS) has started in a dedicated production facility located in Poway, California, USA. The necessary tools have been designed, built, installed, and tested in the facility to enable the start of production. The current schedule has first module fabrication completed in 2017, followed by testing and subsequent shipment to ITER. The Central Solenoid is a key component of the ITER tokamak providing the inductive voltage to initiate and sustain the plasma current and to position and shape the plasma. The design of the CS has been a collaborative effort between the US ITER Project Office (US ITER), the international ITER Organization (IO) and General Atomics (GA). GA’s responsibility includes: completing the fabrication design, developing and qualifying the fabrication processes and tools, and then completing the fabrication of the seven 110 tonne CS modules. The modules will be shipped separately to the ITER site, and then stacked and aligned in the Assembly Hall prior to insertion in the core of the ITER tokamak. A dedicated facility in Poway, California, USA has been established by GA to complete the fabrication of the seven modules. Infrastructure improvements included thick reinforced concrete floors, a diesel generator for backup power, along with, cranes for moving the tooling within the facility. The fabrication process for a single module requires approximately 22 months followed by five months of testing, which includes preliminary electrical testing followed by high current (48.5 kA) tests at 4.7K. The production of the seven modules is completed in a parallel fashion through ten process stations. The process stations have been designed and built with most stations having completed testing and qualification for carrying out the required fabrication processes. The final qualification step for each process station is achieved by the successful production of a prototype coil. Fabrication of the first

  15. Iterative Splitting Methods for Differential Equations

    CERN Document Server

    Geiser, Juergen

    2011-01-01

    Iterative Splitting Methods for Differential Equations explains how to solve evolution equations via novel iterative-based splitting methods that efficiently use computational and memory resources. It focuses on systems of parabolic and hyperbolic equations, including convection-diffusion-reaction equations, heat equations, and wave equations. In the theoretical part of the book, the author discusses the main theorems and results of the stability and consistency analysis for ordinary differential equations. He then presents extensions of the iterative splitting methods to partial differential

  16. ITER EDA newsletter. V. 10, no. 1

    International Nuclear Information System (INIS)

    2001-01-01

    This article provides a summary of results of the ITER Physics Committee Meeting, which was held on 14 October 2000 at the ITER Garching Joint Work Site, Germany. The ITER Physics Committee is the body responsible for overseeing, through the seven specialized Expert Groups, the R and D activities contributed voluntarily by the ITER Parties. The Parties' Physics Designated Persons, the Chairs and Co-Chairs of ITER Physics Expert Groups and the JCT members involved attended the Meeting. As usual, the meeting was chaired by the ITER Director, Dr. R. Aymar, who reported on the status of the ITER EDA. Dr. Aymar described the steps being taken in preparing the ITER-FEAT Final Design Report (FDR), and further stated that the Report would be available in time to be of benefit to the Negotiations on the ITER Joint Implementation, expected to start around May 2001. All Parties recognize that the ITER Physics Expert Group structure has been useful in focusing the tokamak physics activity on the ITER-relevant issues and provides an efficient worldwide collaboration on confirming innovative solutions. The concept of an international workshop to be organized as a pre-meeting of each Expert Group meeting, in order to involve U.S. scientists in the discussion of generic tokamak physics issues, was introduced in 2000, with some success, and its goal should be pursued

  17. Solving binary-state multi-objective reliability redundancy allocation series-parallel problem using efficient epsilon-constraint, multi-start partial bound enumeration algorithm, and DEA

    International Nuclear Information System (INIS)

    Khalili-Damghani, Kaveh; Amiri, Maghsoud

    2012-01-01

    In this paper, a procedure based on efficient epsilon-constraint method and data envelopment analysis (DEA) is proposed for solving binary-state multi-objective reliability redundancy allocation series-parallel problem (MORAP). In first module, a set of qualified non-dominated solutions on Pareto front of binary-state MORAP is generated using an efficient epsilon-constraint method. In order to test the quality of generated non-dominated solutions in this module, a multi-start partial bound enumeration algorithm is also proposed for MORAP. The performance of both procedures is compared using different metrics on well-known benchmark instance. The statistical analysis represents that not only the proposed efficient epsilon-constraint method outperform the multi-start partial bound enumeration algorithm but also it improves the founded upper bound of benchmark instance. Then, in second module, a DEA model is supplied to prune the generated non-dominated solutions of efficient epsilon-constraint method. This helps reduction of non-dominated solutions in a systematic manner and eases the decision making process for practical implementations. - Highlights: ► A procedure based on efficient epsilon-constraint method and DEA was proposed for solving MORAP. ► The performance of proposed procedure was compared with a multi-start PBEA. ► Methods were statistically compared using multi-objective metrics.

  18. Intramolecular Parallel [4+3] Cycloadditions of Cyclopropane 1,1-Diesters with [3]Dendralenes: Efficient Construction of [5.3.0]Decane and Corresponding Polycyclic Skeletons.

    Science.gov (United States)

    Zhang, Chi; Tian, Jun; Ren, Jun; Wang, Zhongwen

    2017-01-26

    Aiming to develop efficient and general strategies for construction of complex and diverse polycyclic skeletons, we have successfully developed [4+3]IMPC (intramolecular parallel cycloaddition) of cyclopropane 1,1-diesters with [3]dendralenes. With a combination of the [4+3]IMPC and subsequent [4+n] cycloadditions, trans-[5.3.0]decane skeleton and its corresponding structurally complex and diverse polycyclic variants could be constructed efficiently. This novel [4+3] cycloaddition reaction mode of donor-acceptor cyclopropanes proceeds as a result of the ring-strain relief of a trans-[3.3.0]octane. We strongly believe that the developed methods will demonstrate potential applications in natural products synthesis and drug discovery. © 2017 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim.

  19. Explorations of the implementation of a parallel IDW interpolation algorithm in a Linux cluster-based parallel GIS

    Science.gov (United States)

    Huang, Fang; Liu, Dingsheng; Tan, Xicheng; Wang, Jian; Chen, Yunping; He, Binbin

    2011-04-01

    To design and implement an open-source parallel GIS (OP-GIS) based on a Linux cluster, the parallel inverse distance weighting (IDW) interpolation algorithm has been chosen as an example to explore the working model and the principle of algorithm parallel pattern (APP), one of the parallelization patterns for OP-GIS. Based on an analysis of the serial IDW interpolation algorithm of GRASS GIS, this paper has proposed and designed a specific parallel IDW interpolation algorithm, incorporating both single process, multiple data (SPMD) and master/slave (M/S) programming modes. The main steps of the parallel IDW interpolation algorithm are: (1) the master node packages the related information, and then broadcasts it to the slave nodes; (2) each node calculates its assigned data extent along one row using the serial algorithm; (3) the master node gathers the data from all nodes; and (4) iterations continue until all rows have been processed, after which the results are outputted. According to the experiments performed in the course of this work, the parallel IDW interpolation algorithm can attain an efficiency greater than 0.93 compared with similar algorithms, which indicates that the parallel algorithm can greatly reduce processing time and maximize speed and performance.

  20. Parallel experimental design and multivariate analysis provides efficient screening of cell culture media supplements to improve biosimilar product quality.

    Science.gov (United States)

    Brühlmann, David; Sokolov, Michael; Butté, Alessandro; Sauer, Markus; Hemberger, Jürgen; Souquet, Jonathan; Broly, Hervé; Jordan, Martin

    2017-07-01

    Rational and high-throughput optimization of mammalian cell culture media has a great potential to modulate recombinant protein product quality. We present a process design method based on parallel design-of-experiment (DoE) of CHO fed-batch cultures in 96-deepwell plates to modulate monoclonal antibody (mAb) glycosylation using medium supplements. To reduce the risk of losing valuable information in an intricate joint screening, 17 compounds were separated into five different groups, considering their mode of biological action. The concentration ranges of the medium supplements were defined according to information encountered in the literature and in-house experience. The screening experiments produced wide glycosylation pattern ranges. Multivariate analysis including principal component analysis and decision trees was used to select the best performing glycosylation modulators. Subsequent D-optimal quadratic design with four factors (three promising compounds and temperature shift) in shake tubes confirmed the outcome of the selection process and provided a solid basis for sequential process development at a larger scale. The glycosylation profile with respect to the specifications for biosimilarity was greatly improved in shake tube experiments: 75% of the conditions were equally close or closer to the specifications for biosimilarity than the best 25% in 96-deepwell plates. Biotechnol. Bioeng. 2017;114: 1448-1458. © 2017 Wiley Periodicals, Inc. © 2017 Wiley Periodicals, Inc.

  1. Large-Scale Parallel Finite Element Analysis of the Stress Singular Problems

    International Nuclear Information System (INIS)

    Noriyuki Kushida; Hiroshi Okuda; Genki Yagawa

    2002-01-01

    In this paper, the convergence behavior of large-scale parallel finite element method for the stress singular problems was investigated. The convergence behavior of iterative solvers depends on the efficiency of the pre-conditioners. However, efficiency of pre-conditioners may be influenced by the domain decomposition that is necessary for parallel FEM. In this study the following results were obtained: Conjugate gradient method without preconditioning and the diagonal scaling preconditioned conjugate gradient method were not influenced by the domain decomposition as expected. symmetric successive over relaxation method preconditioned conjugate gradient method converged 6% faster as maximum if the stress singular area was contained in one sub-domain. (authors)

  2. Parallel solutions of the two-group neutron diffusion equations

    International Nuclear Information System (INIS)

    Zee, K.S.; Turinsky, P.J.

    1987-01-01

    Recent efforts to adapt various numerical solution algorithms to parallel computer architectures have addressed the possibility of substantially reducing the running time of few-group neutron diffusion calculations. The authors have developed an efficient iterative parallel algorithm and an associated computer code for the rapid solution of the finite difference method representation of the two-group neutron diffusion equations on the CRAY X/MP-48 supercomputer having multi-CPUs and vector pipelines. For realistic simulation of light water reactor cores, the code employees a macroscopic depletion model with trace capability for selected fission product transients and critical boron. In addition to this, moderator and fuel temperature feedback models are also incorporated into the code. The validity of the physics models used in the code were benchmarked against qualified codes and proved accurate. This work is an extension of previous work in that various feedback effects are accounted for in the system; the entire code is structured to accommodate extensive vectorization; and an additional parallelism by multitasking is achieved not only for the solution of the matrix equations associated with the inner iterations but also for the other segments of the code, e.g., outer iterations

  3. ITER council proceedings: 1998

    International Nuclear Information System (INIS)

    1999-01-01

    This volume contains documents of the 13th and the 14th ITER council meeting as well as of the 1st extraordinary ITER council meeting. Documents of the ITER meetings held in Vienna and Yokohama during 1998 are also included. The contents include an outline of the ITER objectives, the ITER parameters and design overview as well as operating scenarios and plasma performance. Furthermore, design features, safety and environmental characteristics are given

  4. Iterative solution of general sparse linear systems on clusters of workstations

    Energy Technology Data Exchange (ETDEWEB)

    Lo, Gen-Ching; Saad, Y. [Univ. of Minnesota, Minneapolis, MN (United States)

    1996-12-31

    Solving sparse irregularly structured linear systems on parallel platforms poses several challenges. First, sparsity makes it difficult to exploit data locality, whether in a distributed or shared memory environment. A second, perhaps more serious challenge, is to find efficient ways to precondition the system. Preconditioning techniques which have a large degree of parallelism, such as multicolor SSOR, often have a slower rate of convergence than their sequential counterparts. Finally, a number of other computational kernels such as inner products could ruin any gains gained from parallel speed-ups, and this is especially true on workstation clusters where start-up times may be high. In this paper we discuss these issues and report on our experience with PSPARSLIB, an on-going project for building a library of parallel iterative sparse matrix solvers.

  5. Parallel Atomistic Simulations

    Energy Technology Data Exchange (ETDEWEB)

    HEFFELFINGER,GRANT S.

    2000-01-18

    Algorithms developed to enable the use of atomistic molecular simulation methods with parallel computers are reviewed. Methods appropriate for bonded as well as non-bonded (and charged) interactions are included. While strategies for obtaining parallel molecular simulations have been developed for the full variety of atomistic simulation methods, molecular dynamics and Monte Carlo have received the most attention. Three main types of parallel molecular dynamics simulations have been developed, the replicated data decomposition, the spatial decomposition, and the force decomposition. For Monte Carlo simulations, parallel algorithms have been developed which can be divided into two categories, those which require a modified Markov chain and those which do not. Parallel algorithms developed for other simulation methods such as Gibbs ensemble Monte Carlo, grand canonical molecular dynamics, and Monte Carlo methods for protein structure determination are also reviewed and issues such as how to measure parallel efficiency, especially in the case of parallel Monte Carlo algorithms with modified Markov chains are discussed.

  6. A parallel algorithm for solving the integral form of the discrete ordinates equations

    International Nuclear Information System (INIS)

    Zerr, R. J.; Azmy, Y. Y.

    2009-01-01

    The integral form of the discrete ordinates equations involves a system of equations that has a large, dense coefficient matrix. The serial construction methodology is presented and properties that affect the execution times to construct and solve the system are evaluated. Two approaches for massively parallel implementation of the solution algorithm are proposed and the current results of one of these are presented. The system of equations May be solved using two parallel solvers-block Jacobi and conjugate gradient. Results indicate that both methods can reduce overall wall-clock time for execution. The conjugate gradient solver exhibits better performance to compete with the traditional source iteration technique in terms of execution time and scalability. The parallel conjugate gradient method is synchronous, hence it does not increase the number of iterations for convergence compared to serial execution, and the efficiency of the algorithm demonstrates an apparent asymptotic decline. (authors)

  7. DOVIS 2.0: an efficient and easy to use parallel virtual screening tool based on AutoDock 4.0.

    Science.gov (United States)

    Jiang, Xiaohui; Kumar, Kamal; Hu, Xin; Wallqvist, Anders; Reifman, Jaques

    2008-09-08

    Small-molecule docking is an important tool in studying receptor-ligand interactions and in identifying potential drug candidates. Previously, we developed a software tool (DOVIS) to perform large-scale virtual screening of small molecules in parallel on Linux clusters, using AutoDock 3.05 as the docking engine. DOVIS enables the seamless screening of millions of compounds on high-performance computing platforms. In this paper, we report significant advances in the software implementation of DOVIS 2.0, including enhanced screening capability, improved file system efficiency, and extended usability. To keep DOVIS up-to-date, we upgraded the software's docking engine to the more accurate AutoDock 4.0 code. We developed a new parallelization scheme to improve runtime efficiency and modified the AutoDock code to reduce excessive file operations during large-scale virtual screening jobs. We also implemented an algorithm to output docked ligands in an industry standard format, sd-file format, which can be easily interfaced with other modeling programs. Finally, we constructed a wrapper-script interface to enable automatic rescoring of docked ligands by arbitrarily selected third-party scoring programs. The significance of the new DOVIS 2.0 software compared with the previous version lies in its improved performance and usability. The new version makes the computation highly efficient by automating load balancing, significantly reducing excessive file operations by more than 95%, providing outputs that conform to industry standard sd-file format, and providing a general wrapper-script interface for rescoring of docked ligands. The new DOVIS 2.0 package is freely available to the public under the GNU General Public License.

  8. DOVIS 2.0: an efficient and easy to use parallel virtual screening tool based on AutoDock 4.0

    Directory of Open Access Journals (Sweden)

    Wallqvist Anders

    2008-09-01

    Full Text Available Abstract Background Small-molecule docking is an important tool in studying receptor-ligand interactions and in identifying potential drug candidates. Previously, we developed a software tool (DOVIS to perform large-scale virtual screening of small molecules in parallel on Linux clusters, using AutoDock 3.05 as the docking engine. DOVIS enables the seamless screening of millions of compounds on high-performance computing platforms. In this paper, we report significant advances in the software implementation of DOVIS 2.0, including enhanced screening capability, improved file system efficiency, and extended usability. Implementation To keep DOVIS up-to-date, we upgraded the software's docking engine to the more accurate AutoDock 4.0 code. We developed a new parallelization scheme to improve runtime efficiency and modified the AutoDock code to reduce excessive file operations during large-scale virtual screening jobs. We also implemented an algorithm to output docked ligands in an industry standard format, sd-file format, which can be easily interfaced with other modeling programs. Finally, we constructed a wrapper-script interface to enable automatic rescoring of docked ligands by arbitrarily selected third-party scoring programs. Conclusion The significance of the new DOVIS 2.0 software compared with the previous version lies in its improved performance and usability. The new version makes the computation highly efficient by automating load balancing, significantly reducing excessive file operations by more than 95%, providing outputs that conform to industry standard sd-file format, and providing a general wrapper-script interface for rescoring of docked ligands. The new DOVIS 2.0 package is freely available to the public under the GNU General Public License.

  9. ITER Council proceedings: 1993

    International Nuclear Information System (INIS)

    1994-01-01

    Records of the third ITER Council Meeting (IC-3), held on 21-22 April 1993, in Tokyo, Japan, and the fourth ITER Council Meeting (IC-4) held on 29 September - 1 October 1993 in San Diego, USA, are presented, giving essential information on the evolution of the ITER Engineering Design Activities (EDA), such as the text of the draft of Protocol 2 further elaborated in ''ITER EDA Agreement and Protocol 2'' (ITER EDA Documentation Series No. 5), recommendations on future work programmes: a description of technology R and D tasks; the establishment of a trust fund for the ITER EDA activities; arrangements for Visiting Home Team Personnel; the general framework for the involvement of other countries in the ITER EDA; conditions for the involvement of Canada in the Euratom Contribution to the ITER EDA; and other attachments as parts of the Records of Decision of the aforementioned ITER Council Meetings

  10. ITER council proceedings: 2000

    International Nuclear Information System (INIS)

    2001-01-01

    No ITER Council Meetings were held during 2000. However, two ITER EDA Meetings were held, one in Tokyo, January 19-20, and one in Moscow, June 29-30. The parties participating in these meetings were those that partake in the extended ITER EDA, namely the EU, the Russian Federation, and Japan. This document contains, a/o, the records of these meetings, the list of attendees, the agenda, the ITER EDA Status Reports issued during these meetings, the TAC (Technical Advisory Committee) reports and recommendations, the MAC Reports and Advice (also for the July 1999 Meeting), the ITER-FEAT Outline Design Report, the TAC Reports and Recommendations both meetings), Site requirements and Site Design Assumptions, the Tentative Sequence of technical Activities 2000-2001, Report of the ITER SWG-P2 on Joint Implementation of ITER, EU/ITER Canada Proposal for New ITER Identification

  11. ITER council proceedings: 1993

    Energy Technology Data Exchange (ETDEWEB)

    NONE

    1994-12-31

    Records of the third ITER Council Meeting (IC-3), held on 21-22 April 1993, in Tokyo, Japan, and the fourth ITER Council Meeting (IC-4) held on 29 September - 1 October 1993 in San Diego, USA, are presented, giving essential information on the evolution of the ITER Engineering Design Activities (EDA), such as the text of the draft of Protocol 2 further elaborated in ``ITER EDA Agreement and Protocol 2`` (ITER EDA Documentation Series No. 5), recommendations on future work programmes: a description of technology R and D tastes; the establishment of a trust fund for the ITER EDA activities; arrangements for Visiting Home Team Personnel; the general framework for the involvement of other countries in the ITER EDA; conditions for the involvement of Canada in the Euratom Contribution to the ITER EDA; and other attachments as parts of the Records of Decision of the aforementioned ITER Council Meetings.

  12. Parallel processing of two-dimensional Sn transport calculations

    International Nuclear Information System (INIS)

    Uematsu, M.

    1997-01-01

    A parallel processing method for the two-dimensional S n transport code DOT3.5 has been developed to achieve a drastic reduction in computation time. In the proposed method, parallelization is achieved with angular domain decomposition and/or space domain decomposition. The calculational speed of parallel processing by angular domain decomposition is largely influenced by frequent communications between processing elements. To assess parallelization efficiency, sample problems with up to 32 x 32 spatial meshes were solved with a Sun workstation using the PVM message-passing library. As a result, parallel calculation using 16 processing elements, for example, was found to be nine times as fast as that with one processing element. As for parallel processing by geometry segmentation, the influence of processing element communications on computation time is small; however, discontinuity at the segment boundary degrades convergence speed. To accelerate the convergence, an alternate sweep of angular flux in conjunction with space domain decomposition and a two-step rescaling method consisting of segmentwise rescaling and ordinary pointwise rescaling have been developed. By applying the developed method, the number of iterations needed to obtain a converged flux solution was reduced by a factor of 2. As a result, parallel calculation using 16 processing elements was found to be 5.98 times as fast as the original DOT3.5 calculation

  13. Parallel External Memory Graph Algorithms

    DEFF Research Database (Denmark)

    Arge, Lars Allan; Goodrich, Michael T.; Sitchinava, Nodari

    2010-01-01

    In this paper, we study parallel I/O efficient graph algorithms in the Parallel External Memory (PEM) model, one o f the private-cache chip multiprocessor (CMP) models. We study the fundamental problem of list ranking which leads to efficient solutions to problems on trees, such as computing lowest...... an optimal speedup of ¿(P) in parallel I/O complexity and parallel computation time, compared to the single-processor external memory counterparts....

  14. IHadoop: Asynchronous iterations for MapReduce

    KAUST Repository

    Elnikety, Eslam Mohamed Ibrahim

    2011-11-01

    MapReduce is a distributed programming frame-work designed to ease the development of scalable data-intensive applications for large clusters of commodity machines. Most machine learning and data mining applications involve iterative computations over large datasets, such as the Web hyperlink structures and social network graphs. Yet, the MapReduce model does not efficiently support this important class of applications. The architecture of MapReduce, most critically its dataflow techniques and task scheduling, is completely unaware of the nature of iterative applications; tasks are scheduled according to a policy that optimizes the execution for a single iteration which wastes bandwidth, I/O, and CPU cycles when compared with an optimal execution for a consecutive set of iterations. This work presents iHadoop, a modified MapReduce model, and an associated implementation, optimized for iterative computations. The iHadoop model schedules iterations asynchronously. It connects the output of one iteration to the next, allowing both to process their data concurrently. iHadoop\\'s task scheduler exploits inter-iteration data locality by scheduling tasks that exhibit a producer/consumer relation on the same physical machine allowing a fast local data transfer. For those iterative applications that require satisfying certain criteria before termination, iHadoop runs the check concurrently during the execution of the subsequent iteration to further reduce the application\\'s latency. This paper also describes our implementation of the iHadoop model, and evaluates its performance against Hadoop, the widely used open source implementation of MapReduce. Experiments using different data analysis applications over real-world and synthetic datasets show that iHadoop performs better than Hadoop for iterative algorithms, reducing execution time of iterative applications by 25% on average. Furthermore, integrating iHadoop with HaLoop, a variant Hadoop implementation that caches

  15. IHadoop: Asynchronous iterations for MapReduce

    KAUST Repository

    Elnikety, Eslam Mohamed Ibrahim; El Sayed, Tamer S.; Ramadan, Hany E.

    2011-01-01

    MapReduce is a distributed programming frame-work designed to ease the development of scalable data-intensive applications for large clusters of commodity machines. Most machine learning and data mining applications involve iterative computations over large datasets, such as the Web hyperlink structures and social network graphs. Yet, the MapReduce model does not efficiently support this important class of applications. The architecture of MapReduce, most critically its dataflow techniques and task scheduling, is completely unaware of the nature of iterative applications; tasks are scheduled according to a policy that optimizes the execution for a single iteration which wastes bandwidth, I/O, and CPU cycles when compared with an optimal execution for a consecutive set of iterations. This work presents iHadoop, a modified MapReduce model, and an associated implementation, optimized for iterative computations. The iHadoop model schedules iterations asynchronously. It connects the output of one iteration to the next, allowing both to process their data concurrently. iHadoop's task scheduler exploits inter-iteration data locality by scheduling tasks that exhibit a producer/consumer relation on the same physical machine allowing a fast local data transfer. For those iterative applications that require satisfying certain criteria before termination, iHadoop runs the check concurrently during the execution of the subsequent iteration to further reduce the application's latency. This paper also describes our implementation of the iHadoop model, and evaluates its performance against Hadoop, the widely used open source implementation of MapReduce. Experiments using different data analysis applications over real-world and synthetic datasets show that iHadoop performs better than Hadoop for iterative algorithms, reducing execution time of iterative applications by 25% on average. Furthermore, integrating iHadoop with HaLoop, a variant Hadoop implementation that caches

  16. An Energy-Efficient and Scalable Deep Learning/Inference Processor With Tetra-Parallel MIMD Architecture for Big Data Applications.

    Science.gov (United States)

    Park, Seong-Wook; Park, Junyoung; Bong, Kyeongryeol; Shin, Dongjoo; Lee, Jinmook; Choi, Sungpill; Yoo, Hoi-Jun

    2015-12-01

    Deep Learning algorithm is widely used for various pattern recognition applications such as text recognition, object recognition and action recognition because of its best-in-class recognition accuracy compared to hand-crafted algorithm and shallow learning based algorithms. Long learning time caused by its complex structure, however, limits its usage only in high-cost servers or many-core GPU platforms so far. On the other hand, the demand on customized pattern recognition within personal devices will grow gradually as more deep learning applications will be developed. This paper presents a SoC implementation to enable deep learning applications to run with low cost platforms such as mobile or portable devices. Different from conventional works which have adopted massively-parallel architecture, this work adopts task-flexible architecture and exploits multiple parallelism to cover complex functions of convolutional deep belief network which is one of popular deep learning/inference algorithms. In this paper, we implement the most energy-efficient deep learning and inference processor for wearable system. The implemented 2.5 mm × 4.0 mm deep learning/inference processor is fabricated using 65 nm 8-metal CMOS technology for a battery-powered platform with real-time deep inference and deep learning operation. It consumes 185 mW average power, and 213.1 mW peak power at 200 MHz operating frequency and 1.2 V supply voltage. It achieves 411.3 GOPS peak performance and 1.93 TOPS/W energy efficiency, which is 2.07× higher than the state-of-the-art.

  17. Iterative Adaptive Sampling For Accurate Direct Illumination

    National Research Council Canada - National Science Library

    Donikian, Michael

    2004-01-01

    This thesis introduces a new multipass algorithm, Iterative Adaptive Sampling, for efficiently computing the direct illumination in scenes with many lights, including area lights that cause realistic soft shadows...

  18. Preliminary Study on the Enhancement of Reconstruction Speed for Emission Computed Tomography Using Parallel Processing

    International Nuclear Information System (INIS)

    Park, Min Jae; Lee, Jae Sung; Kim, Soo Mee; Kang, Ji Yeon; Lee, Dong Soo; Park, Kwang Suk

    2009-01-01

    Conventional image reconstruction uses simplified physical models of projection. However, real physics, for example 3D reconstruction, takes too long time to process all the data in clinic and is unable in a common reconstruction machine because of the large memory for complex physical models. We suggest the realistic distributed memory model of fast-reconstruction using parallel processing on personal computers to enable large-scale technologies. The preliminary tests for the possibility on virtual machines and various performance test on commercial super computer, Tachyon were performed. Expectation maximization algorithm with common 2D projection and realistic 3D line of response were tested. Since the process time was getting slower (max 6 times) after a certain iteration, optimization for compiler was performed to maximize the efficiency of parallelization. Parallel processing of a program on multiple computers was available on Linux with MPICH and NFS. We verified that differences between parallel processed image and single processed image at the same iterations were under the significant digits of floating point number, about 6 bit. Double processors showed good efficiency (1.96 times) of parallel computing. Delay phenomenon was solved by vectorization method using SSE. Through the study, realistic parallel computing system in clinic was established to be able to reconstruct by plenty of memory using the realistic physical models which was impossible to simplify

  19. Development and test of the ITER conductor joints

    Energy Technology Data Exchange (ETDEWEB)

    Martovetsky, N., LLNL

    1998-05-14

    Joints for the ITER superconducting Central Solenoid should perform in rapidly varying magnetic field with low losses and low DC resistance. This paper describes the design of the ITER joint and presents its assembly process. Two joints were built and tested at the PTF facility at MIT. Test results are presented, losses in transverse and parallel field and the DC performance are discussed. The developed joint demonstrates sufficient margin for baseline ITER operating scenarios.

  20. ITER council proceedings: 1995

    International Nuclear Information System (INIS)

    1996-01-01

    Records of the 8. ITER Council Meeting (IC-8), held on 26-27 July 1995, in San Diego, USA, and the 9. ITER Council Meeting (IC-9) held on 12-13 December 1995, in Garching, Germany, are presented, giving essential information on the evolution of the ITER Engineering Design Activities (EDA) and the ITER Interim Design Report Package and Relevant Documents. Figs, tabs

  1. Software abstractions and computational issues in parallel structure adaptive mesh methods for electronic structure calculations

    Energy Technology Data Exchange (ETDEWEB)

    Kohn, S.; Weare, J.; Ong, E.; Baden, S.

    1997-05-01

    We have applied structured adaptive mesh refinement techniques to the solution of the LDA equations for electronic structure calculations. Local spatial refinement concentrates memory resources and numerical effort where it is most needed, near the atomic centers and in regions of rapidly varying charge density. The structured grid representation enables us to employ efficient iterative solver techniques such as conjugate gradient with FAC multigrid preconditioning. We have parallelized our solver using an object- oriented adaptive mesh refinement framework.

  2. Distributed Parallel Endmember Extraction of Hyperspectral Data Based on Spark

    Directory of Open Access Journals (Sweden)

    Zebin Wu

    2016-01-01

    Full Text Available Due to the increasing dimensionality and volume of remotely sensed hyperspectral data, the development of acceleration techniques for massive hyperspectral image analysis approaches is a very important challenge. Cloud computing offers many possibilities of distributed processing of hyperspectral datasets. This paper proposes a novel distributed parallel endmember extraction method based on iterative error analysis that utilizes cloud computing principles to efficiently process massive hyperspectral data. The proposed method takes advantage of technologies including MapReduce programming model, Hadoop Distributed File System (HDFS, and Apache Spark to realize distributed parallel implementation for hyperspectral endmember extraction, which significantly accelerates the computation of hyperspectral processing and provides high throughput access to large hyperspectral data. The experimental results, which are obtained by extracting endmembers of hyperspectral datasets on a cloud computing platform built on a cluster, demonstrate the effectiveness and computational efficiency of the proposed method.

  3. Parallel computing techniques for rotorcraft aerodynamics

    Science.gov (United States)

    Ekici, Kivanc

    The modification of unsteady three-dimensional Navier-Stokes codes for application on massively parallel and distributed computing environments is investigated. The Euler/Navier-Stokes code TURNS (Transonic Unsteady Rotor Navier-Stokes) was chosen as a test bed because of its wide use by universities and industry. For the efficient implementation of TURNS on parallel computing systems, two algorithmic changes are developed. First, main modifications to the implicit operator, Lower-Upper Symmetric Gauss Seidel (LU-SGS) originally used in TURNS, is performed. Second, application of an inexact Newton method, coupled with a Krylov subspace iterative method (Newton-Krylov method) is carried out. Both techniques have been tried previously for the Euler equations mode of the code. In this work, we have extended the methods to the Navier-Stokes mode. Several new implicit operators were tried because of convergence problems of traditional operators with the high cell aspect ratio (CAR) grids needed for viscous calculations on structured grids. Promising results for both Euler and Navier-Stokes cases are presented for these operators. For the efficient implementation of Newton-Krylov methods to the Navier-Stokes mode of TURNS, efficient preconditioners must be used. The parallel implicit operators used in the previous step are employed as preconditioners and the results are compared. The Message Passing Interface (MPI) protocol has been used because of its portability to various parallel architectures. It should be noted that the proposed methodology is general and can be applied to several other CFD codes (e.g. OVERFLOW).

  4. ITER council proceedings: 1999

    International Nuclear Information System (INIS)

    1999-01-01

    In 1999 the ITER meeting in Cadarache (10-11 March 1999) and the Programme Directors Meeting in Grenoble (28-29 July 1999) took place. Both meetings were exclusively devoted to ITER engineering design activities and their agendas covered all issues important for the development of ITER. This volume presents the documents of these two important meetings

  5. ITER council proceedings: 1996

    International Nuclear Information System (INIS)

    1997-01-01

    Records of the 10. ITER Council Meeting (IC-10), held on 26-27 July 1996, in St. Petersburg, Russia, and the 11. ITER Council Meeting (IC-11) held on 17-18 December 1996, in Tokyo, Japan, are presented, giving essential information on the evolution of the ITER Engineering Design Activities (EDA) and the cost review and safety analysis. Figs, tabs

  6. ITER EDA technical activities

    International Nuclear Information System (INIS)

    Aymar, R.

    1998-01-01

    Six years of technical work under the ITER EDA Agreement have resulted in a design which constitutes a complete description of the ITER device and of its auxiliary systems and facilities. The ITER Council commented that the Final Design Report provides the first comprehensive design of a fusion reactor based on well established physics and technology

  7. ITER radio frequency systems

    International Nuclear Information System (INIS)

    Bosia, G.

    1998-01-01

    Neutral Beam Injection and RF heating are two of the methods for heating and current drive in ITER. The three ITER RF systems, which have been developed during the EDA, offer several complementary services and are able to fulfil ITER operational requirements

  8. Fourier analysis of parallel block-Jacobi splitting with transport synthetic acceleration in two-dimensional geometry

    International Nuclear Information System (INIS)

    Rosa, M.; Warsa, J. S.; Chang, J. H.

    2007-01-01

    A Fourier analysis is conducted in two-dimensional (2D) Cartesian geometry for the discrete-ordinates (SN) approximation of the neutron transport problem solved with Richardson iteration (Source Iteration) and Richardson iteration preconditioned with Transport Synthetic Acceleration (TSA), using the Parallel Block-Jacobi (PBJ) algorithm. The results for the un-accelerated algorithm show that convergence of PBJ can degrade, leading in particular to stagnation of GMRES(m) in problems containing optically thin sub-domains. The results for the accelerated algorithm indicate that TSA can be used to efficiently precondition an iterative method in the optically thin case when implemented in the 'modified' version MTSA, in which only the scattering in the low order equations is reduced by some non-negative factor β<1. (authors)

  9. Recommendations for a cryogenic system for ITER [International Thermonuclear Experimental Reactor

    International Nuclear Information System (INIS)

    Slack, D.S.

    1989-01-01

    The International Thermonuclear Experimental Reactor (ITER) is a new tokamak design project with joint participation from Japan, the European Community, the Soviet Union, and the United States. ITER will be a large machine requiring up to 100 kW of refrigeration at 4.5 K to cool its superconducting magnets. Unlike earlier fusion experiments, the ITER cryogenic system must handle pulse loads constituting a large percentage of the total load. These come from neutron heating during a fusion burn and from ac losses during ramping of current in the PF (poloidal field) coils. This paper presents a conceptual design for a cryogenic system that meets ITER requirements. It describes a system with the following features: Only time-proven components are used. The system obtains a high efficiency without use of cold pumps or other developmental components. High reliability is achieved by paralleling compressors and expanders and by using adequate isolation valving. The problem of load fluctuations is solved by a simple load-leveling device. The cryogenic system can be housed in a separate building located at a considerable distance from the ITER core, if desired. The paper also summarizes physical plant size, cost estimates, and means of handling vented helium during magnet quench. 4 refs., 4 figs., 3 tabs

  10. ITER-FEAT safety

    International Nuclear Information System (INIS)

    Gordon, C.W.; Bartels, H.-W.; Honda, T.; Raeder, J.; Topilski, L.; Iseli, M.; Moshonas, K.; Taylor, N.; Gulden, W.; Kolbasov, B.; Inabe, T.; Tada, E.

    2001-01-01

    Safety has been an integral part of the design process for ITER since the Conceptual Design Activities of the project. The safety approach adopted in the ITER-FEAT design and the complementary assessments underway, to be documented in the Generic Site Safety Report (GSSR), are expected to help demonstrate the attractiveness of fusion and thereby set a good precedent for future fusion power reactors. The assessments address ITER's radiological hazards taking into account fusion's favourable safety characteristics. The expectation that ITER will need regulatory approval has influenced the entire safety design and assessment approach. This paper summarises the ITER-FEAT safety approach and assessments underway. (author)

  11. Parallel rendering

    Science.gov (United States)

    Crockett, Thomas W.

    1995-01-01

    This article provides a broad introduction to the subject of parallel rendering, encompassing both hardware and software systems. The focus is on the underlying concepts and the issues which arise in the design of parallel rendering algorithms and systems. We examine the different types of parallelism and how they can be applied in rendering applications. Concepts from parallel computing, such as data decomposition, task granularity, scalability, and load balancing, are considered in relation to the rendering problem. We also explore concepts from computer graphics, such as coherence and projection, which have a significant impact on the structure of parallel rendering algorithms. Our survey covers a number of practical considerations as well, including the choice of architectural platform, communication and memory requirements, and the problem of image assembly and display. We illustrate the discussion with numerous examples from the parallel rendering literature, representing most of the principal rendering methods currently used in computer graphics.

  12. Iterative decomposition of water and fat with echo asymmetry and least-squares estimation (IDEAL) imaging of multiple myeloma: initial clinical efficiency results

    International Nuclear Information System (INIS)

    Takasu, Miyuki; Tani, Chihiro; Sakoda, Yasuko; Ishikawa, Miho; Tanitame, Keizo; Date, Shuji; Akiyama, Yuji; Awai, Kazuo; Sakai, Akira; Asaoku, Hideki; Kajima, Toshio

    2012-01-01

    To evaluate the effectiveness of the iterative decomposition of water and fat with echo asymmetric and least-squares estimation (IDEAL) MRI to quantify tumour infiltration into the lumbar vertebrae in myeloma patients without visible focal lesions. The lumbar spine was examined with 3 T MRI in 24 patients with multiple myeloma and in 26 controls. The fat-signal fraction was calculated as the mean value from three vertebral bodies. A post hoc test was used to compare the fat-signal fraction in controls and patients with monoclonal gammopathy of undetermined significance (MGUS), asymptomatic myeloma or symptomatic myeloma. Differences were considered significant at P 2 -microglobulin-to-albumin ratio were entered into the discriminant analysis. Fat-signal fractions were significantly lower in patients with symptomatic myelomas (43.9 ±19.7%, P 2 -microglobulin-to-albumin ratio facilitated discrimination of symptomatic myeloma from non-symptomatic myeloma in patients without focal bone lesions. circle A new magnetic resonance technique (IDEAL) offers new insights in multiple myeloma. (orig.)

  13. Parallel computations

    CERN Document Server

    1982-01-01

    Parallel Computations focuses on parallel computation, with emphasis on algorithms used in a variety of numerical and physical applications and for many different types of parallel computers. Topics covered range from vectorization of fast Fourier transforms (FFTs) and of the incomplete Cholesky conjugate gradient (ICCG) algorithm on the Cray-1 to calculation of table lookups and piecewise functions. Single tridiagonal linear systems and vectorized computation of reactive flow are also discussed.Comprised of 13 chapters, this volume begins by classifying parallel computers and describing techn

  14. Massively parallel mathematical sieves

    Energy Technology Data Exchange (ETDEWEB)

    Montry, G.R.

    1989-01-01

    The Sieve of Eratosthenes is a well-known algorithm for finding all prime numbers in a given subset of integers. A parallel version of the Sieve is described that produces computational speedups over 800 on a hypercube with 1,024 processing elements for problems of fixed size. Computational speedups as high as 980 are achieved when the problem size per processor is fixed. The method of parallelization generalizes to other sieves and will be efficient on any ensemble architecture. We investigate two highly parallel sieves using scattered decomposition and compare their performance on a hypercube multiprocessor. A comparison of different parallelization techniques for the sieve illustrates the trade-offs necessary in the design and implementation of massively parallel algorithms for large ensemble computers.

  15. Parallelization of pressure equation solver for incompressible N-S equations

    International Nuclear Information System (INIS)

    Ichihara, Kiyoshi; Yokokawa, Mitsuo; Kaburaki, Hideo.

    1996-03-01

    A pressure equation solver in a code for 3-dimensional incompressible flow analysis has been parallelized by using red-black SOR method and PCG method on Fujitsu VPP500, a vector parallel computer with distributed memory. For the comparison of scalability, the solver using the red-black SOR method has been also parallelized on the Intel Paragon, a scalar parallel computer with a distributed memory. The scalability of the red-black SOR method on both VPP500 and Paragon was lost, when number of processor elements was increased. The reason of non-scalability on both systems is increasing communication time between processor elements. In addition, the parallelization by DO-loop division makes the vectorizing efficiency lower on VPP500. For an effective implementation on VPP500, a large scale problem which holds very long vectorized DO-loops in the parallel program should be solved. PCG method with red-black SOR method applied to incomplete LU factorization (red-black PCG) has more iteration steps than normal PCG method with forward and backward substitution, in spite of same number of the floating point operations in a DO-loop of incomplete LU factorization. The parallelized red-black PCG method has less merits than the parallelized red-black SOR method when the computational region has fewer grids, because the low vectorization efficiency is obtained in red-black PCG method. (author)

  16. ITER Neutral Beam Injection System

    International Nuclear Information System (INIS)

    Ohara, Yoshihiro; Tanaka, Shigeru; Akiba, Masato

    1991-03-01

    A Japanese design proposal of the ITER Neutral Beam Injection System (NBS) which is consistent with the ITER common design requirements is described. The injection system is required to deliver a neutral deuterium beam of 75MW at 1.3MeV to the reactor plasma and utilized not only for plasma heating but also for current drive and current profile control. The injection system is composed of 9 modules, each of which is designed so as to inject a 1.3MeV, 10MW neutral beam. The most important point in the design is that the injection system is based on the utilization of a cesium-seeded volume negative ion source which can produce an intense negative ion beam with high current density at a low source operating pressure. The design value of the source is based on the experimental values achieved at JAERI. The utilization of the cesium-seeded volume source is essential to the design of an efficient and compact neutral beam injection system which satisfies the ITER common design requirements. The critical components to realize this design are the 1.3MeV, 17A electrostatic accelerator and the high voltage DC acceleration power supply, whose performances must be demonstrated prior to the construction of ITER NBI system. (author)

  17. Efficient, approximate and parallel Hartree-Fock and hybrid DFT calculations. A 'chain-of-spheres' algorithm for the Hartree-Fock exchange

    International Nuclear Information System (INIS)

    Neese, Frank; Wennmohs, Frank; Hansen, Andreas; Becker, Ute

    2009-01-01

    In this paper, the possibility is explored to speed up Hartree-Fock and hybrid density functional calculations by forming the Coulomb and exchange parts of the Fock matrix by different approximations. For the Coulomb part the previously introduced Split-RI-J variant (F. Neese, J. Comput. Chem. 24 (2003) 1740) of the well-known 'density fitting' approximation is used. The exchange part is formed by semi-numerical integration techniques that are closely related to Friesner's pioneering pseudo-spectral approach. Our potentially linear scaling realization of this algorithm is called the 'chain-of-spheres exchange' (COSX). A combination of semi-numerical integration and density fitting is also proposed. Both Split-RI-J and COSX scale very well with the highest angular momentum in the basis sets. It is shown that for extended basis sets speed-ups of up to two orders of magnitude compared to traditional implementations can be obtained in this way. Total energies are reproduced with an average error of <0.3 kcal/mol as determined from extended test calculations with various basis sets on a set of 26 molecules with 20-200 atoms and up to 2000 basis functions. Reaction energies agree to within 0.2 kcal/mol (Hartree-Fock) or 0.05 kcal/mol (hybrid DFT) with the canonical values. The COSX algorithm parallelizes with a speedup of 8.6 observed for 10 processes. Minimum energy geometries differ by less than 0.3 pm in the bond distances and 0.5 deg. in the bond angels from their canonical values. These developments enable highly efficient and accurate self-consistent field calculations including nonlocal Hartree-Fock exchange for large molecules. In combination with the RI-MP2 method and large basis sets, second-order many body perturbation energies can be obtained for medium sized molecules with unprecedented efficiency. The algorithms are implemented into the ORCA electronic structure system

  18. Iteration and accelerator dynamics

    International Nuclear Information System (INIS)

    Peggs, S.

    1987-10-01

    Four examples of iteration in accelerator dynamics are studied in this paper. The first three show how iterations of the simplest maps reproduce most of the significant nonlinear behavior in real accelerators. Each of these examples can be easily reproduced by the reader, at the minimal cost of writing only 20 or 40 lines of code. The fourth example outlines a general way to iteratively solve nonlinear difference equations, analytically or numerically

  19. Fuel cycle design for ITER and its extrapolation to DEMO

    Energy Technology Data Exchange (ETDEWEB)

    Konishi, Satoshi [Institute of Advanced Energy, Kyoto University, Kyoto 611-0011 (Japan)], E-mail: s-konishi@iae.kyoto-u.ac.jp; Glugla, Manfred [Forschungszentrum Karlsruhe, P.O. Box 3640, D 76021 Karlsruhe (Germany); Hayashi, Takumi [Apan Atomic Energy AgencyTokai, Ibaraki 319-0015 Japan (Japan)

    2008-12-15

    ITER is the first fusion device that continuously processes DT plasma exhaust and supplies recycled fuel in a closed loop. All the tritium and deuterium in the exhaust are recovered, purified and returned to the tokamak with minimal delay, so that extended burn can be sustained with limited inventory. To maintain the safety of the entire facility, plant scale detritiation systems will also continuously run to remove tritium from the effluents at the maximum efficiency. In this entire tritium plant system, extremely high decontamination factor, that is the ratio of the tritium loss to the processing flow rate, is required for fuel economy and minimized tritium emissions, and the system design based on the state-of-the-art technology is expected to satisfy all the requirements without significant technical challenges. Considerable part of the fusion tritium system will be verified with ITER and its decades of operation experiences. Toward the DEMO plant that will actually generate energy and operate its closed fuel cycle, breeding blanket and power train that caries high temperature and pressure media from the fusion device to the generation system will be the major addition. For the tritium confinement, safety and environmental emission, particularly blanket, its coolant, and generation systems such as heat exchanger, steam generator and turbine will be the critical systems, because the tritium permeation from the breeder and handling large amount of high temperature, high pressure coolant will be further more difficult than that required for ITER. Detritiation of solid waste such as used blanket and divertor will be another issue for both tritium economy and safety. Unlike in the case of ITER that is regarded as experimental facility, DEMO will be expected to demonstrate the safety, reliability and social acceptance issue, even if economical feature is excluded. Fuel and environmental issue to be tested in the DEMO will determine the viability of the fusion as a

  20. Fuel cycle design for ITER and its extrapolation to DEMO

    International Nuclear Information System (INIS)

    Konishi, Satoshi; Glugla, Manfred; Hayashi, Takumi

    2008-01-01

    ITER is the first fusion device that continuously processes DT plasma exhaust and supplies recycled fuel in a closed loop. All the tritium and deuterium in the exhaust are recovered, purified and returned to the tokamak with minimal delay, so that extended burn can be sustained with limited inventory. To maintain the safety of the entire facility, plant scale detritiation systems will also continuously run to remove tritium from the effluents at the maximum efficiency. In this entire tritium plant system, extremely high decontamination factor, that is the ratio of the tritium loss to the processing flow rate, is required for fuel economy and minimized tritium emissions, and the system design based on the state-of-the-art technology is expected to satisfy all the requirements without significant technical challenges. Considerable part of the fusion tritium system will be verified with ITER and its decades of operation experiences. Toward the DEMO plant that will actually generate energy and operate its closed fuel cycle, breeding blanket and power train that caries high temperature and pressure media from the fusion device to the generation system will be the major addition. For the tritium confinement, safety and environmental emission, particularly blanket, its coolant, and generation systems such as heat exchanger, steam generator and turbine will be the critical systems, because the tritium permeation from the breeder and handling large amount of high temperature, high pressure coolant will be further more difficult than that required for ITER. Detritiation of solid waste such as used blanket and divertor will be another issue for both tritium economy and safety. Unlike in the case of ITER that is regarded as experimental facility, DEMO will be expected to demonstrate the safety, reliability and social acceptance issue, even if economical feature is excluded. Fuel and environmental issue to be tested in the DEMO will determine the viability of the fusion as a

  1. Future plan of ITER

    International Nuclear Information System (INIS)

    Kitsunezaki, Akio

    1998-01-01

    In cooperation of four countries, Japan, USA, EU and Russia, ITER plan has been proceeding as ''the conceptual design activities'' from 1988 to 1990 and ''the industrial design activities'' since 1992. To construct ITER, the legal and work side of ITER operation has been investigated by four countries. However, their economic conditions have been changed to be wrong. So that, construction of ITER can not begin after end of industrial design activities in 1998. Accordingly, they determined to continue the industrial design activities more three years in order to study low cost options and to test the superconductive model·coil. (S.Y.)

  2. ITER test programme

    International Nuclear Information System (INIS)

    Abdou, M.; Baker, C.; Casini, G.

    1991-01-01

    ITER has been designed to operate in two phases. The first phase which lasts for 6 years, is devoted to machine checkout and physics testing. The second phase lasts for 8 years and is devoted primarily to technology testing. This report describes the technology test program development for ITER, the ancillary equipment outside the torus necessary to support the test modules, the international collaboration aspects of conducting the test program on ITER, the requirements on the machine major parameters and the R and D program required to develop the test modules for testing in ITER. 15 refs, figs and tabs

  3. Massively parallel multicanonical simulations

    Science.gov (United States)

    Gross, Jonathan; Zierenberg, Johannes; Weigel, Martin; Janke, Wolfhard

    2018-03-01

    Generalized-ensemble Monte Carlo simulations such as the multicanonical method and similar techniques are among the most efficient approaches for simulations of systems undergoing discontinuous phase transitions or with rugged free-energy landscapes. As Markov chain methods, they are inherently serial computationally. It was demonstrated recently, however, that a combination of independent simulations that communicate weight updates at variable intervals allows for the efficient utilization of parallel computational resources for multicanonical simulations. Implementing this approach for the many-thread architecture provided by current generations of graphics processing units (GPUs), we show how it can be efficiently employed with of the order of 104 parallel walkers and beyond, thus constituting a versatile tool for Monte Carlo simulations in the era of massively parallel computing. We provide the fully documented source code for the approach applied to the paradigmatic example of the two-dimensional Ising model as starting point and reference for practitioners in the field.

  4. Parallel algorithms

    CERN Document Server

    Casanova, Henri; Robert, Yves

    2008-01-01

    ""…The authors of the present book, who have extensive credentials in both research and instruction in the area of parallelism, present a sound, principled treatment of parallel algorithms. … This book is very well written and extremely well designed from an instructional point of view. … The authors have created an instructive and fascinating text. The book will serve researchers as well as instructors who need a solid, readable text for a course on parallelism in computing. Indeed, for anyone who wants an understandable text from which to acquire a current, rigorous, and broad vi

  5. Parallel high-performance grid computing: capabilities and opportunities of a novel demanding service and business class allowing highest resource efficiency

    NARCIS (Netherlands)

    F.N. Kepper (Nick); R. Ettig (Ramona); F. Dickmann (Frank); R. Stehr (Rene); F.G. Grosveld (Frank); G. Wedemann (Gero); T.A. Knoch (Tobias)

    2010-01-01

    textabstractThe hardware and software requirements for parallel applications depend on the problem size, type and the number particles / parameters, the degree of parallelization possible, the load balancing over different processors / memory, the calculation type and the input / output and

  6. LHCD and coupling experiments with an ITER-like PAM launcher on the FTU tokamak

    International Nuclear Information System (INIS)

    Pericoli Ridolfini, V.; Apicella, M.L.; Barbato, E.; Buratti, P.; Calabro, G.; Cardinali, A.; Mirizzi, F.; Panaccione, L.; Podda, S.; Tuccillo, A.A.; Bibet, Ph.; Granucci, G.; Sozzi, C.

    2005-01-01

    Successful experimental tests on a PAM (passive active multijunction) prototype antenna for the Lower Hybrid (LH) waves similar to that foreseen for ITER have been carried out on FTU. The power level routinely achieved without any fault in the transmission lines for the maximum time allowed by the LH power plant, i.e. 0.9 s, is 250 kW versus a design value of 270. It corresponds to 50 MW/m 2 through the ITER antenna active area if it is scaled for the different LH frequencies (5 GHz in ITER, 8 GHz in FTU) and it is more than 1.4 times the goal of the ITER design (33 MW/m 2 ). The test results validate the main features indicated by the simulation codes, concerning the power handling, the coupling and the launched N parallel spectrum. The power reflection coefficient R c is always ≤ 2.5%, once the PAM launcher has been properly conditioned, even with the grill mouth retracted 2 mm inside the port shadow, with density in front of the launcher very close or even lower than the cut-off value. The current drive efficiency is comparable to a conventional grill in similar conditions, once the lower directivity is taken into account. The flexibility in the N parallel spectrum is confirmed by the HXR and ECE spectra. Conditioning the PAM to operate at the ITER equivalent power level has required only one day of RF operation, without a previous baking of the waveguides. (author)

  7. Automatic Loop Parallelization via Compiler Guided Refactoring

    DEFF Research Database (Denmark)

    Larsen, Per; Ladelsky, Razya; Lidman, Jacob

    For many parallel applications, performance relies not on instruction-level parallelism, but on loop-level parallelism. Unfortunately, many modern applications are written in ways that obstruct automatic loop parallelization. Since we cannot identify sufficient parallelization opportunities...... for these codes in a static, off-line compiler, we developed an interactive compilation feedback system that guides the programmer in iteratively modifying application source, thereby improving the compiler’s ability to generate loop-parallel code. We use this compilation system to modify two sequential...... benchmarks, finding that the code parallelized in this way runs up to 8.3 times faster on an octo-core Intel Xeon 5570 system and up to 12.5 times faster on a quad-core IBM POWER6 system. Benchmark performance varies significantly between the systems. This suggests that semi-automatic parallelization should...

  8. Improving computational efficiency of Monte Carlo simulations with variance reduction

    International Nuclear Information System (INIS)

    Turner, A.; Davis, A.

    2013-01-01

    CCFE perform Monte-Carlo transport simulations on large and complex tokamak models such as ITER. Such simulations are challenging since streaming and deep penetration effects are equally important. In order to make such simulations tractable, both variance reduction (VR) techniques and parallel computing are used. It has been found that the application of VR techniques in such models significantly reduces the efficiency of parallel computation due to 'long histories'. VR in MCNP can be accomplished using energy-dependent weight windows. The weight window represents an 'average behaviour' of particles, and large deviations in the arriving weight of a particle give rise to extreme amounts of splitting being performed and a long history. When running on parallel clusters, a long history can have a detrimental effect on the parallel efficiency - if one process is computing the long history, the other CPUs complete their batch of histories and wait idle. Furthermore some long histories have been found to be effectively intractable. To combat this effect, CCFE has developed an adaptation of MCNP which dynamically adjusts the WW where a large weight deviation is encountered. The method effectively 'de-optimises' the WW, reducing the VR performance but this is offset by a significant increase in parallel efficiency. Testing with a simple geometry has shown the method does not bias the result. This 'long history method' has enabled CCFE to significantly improve the performance of MCNP calculations for ITER on parallel clusters, and will be beneficial for any geometry combining streaming and deep penetration effects. (authors)

  9. United States rejoin ITER

    International Nuclear Information System (INIS)

    Roberts, M.

    2003-01-01

    Upon pressure from the United States Congress, the US Department of Energy had to withdraw from further American participation in the ITER Engineering Design Activities after the end of its commitment to the EDA in July 1998. In the years since that time, changes have taken place in both the ITER activity and the US fusion community's position on burning plasma physics. Reflecting the interest in the United States in pursuing burning plasma physics, the DOE's Office of Science commissioned three studies as part of its examination of the option of entering the Negotiations on the Agreement on the Establishment of the International Fusion Energy Organization for the Joint Implementation of the ITER Project. These were a National Academy Review Panel Report supporting the burning plasma mission; a Fusion Energy Sciences Advisory Committee (FESAC) report confirming the role of ITER in achieving fusion power production, and The Lehman Review of the ITER project costing and project management processes (for the latter one, see ITER CTA Newsletter, no. 15, December 2002). All three studies have endorsed the US return to the ITER activities. This historical decision was announced by DOE Secretary Abraham during his remarks to employees of the Department's Princeton Plasma Physics Laboratory. The United States will be working with the other Participants in the ITER Negotiations on the Agreement and is preparing to participate in the ITA

  10. Development of a parallelization strategy for the VARIANT code

    International Nuclear Information System (INIS)

    Hanebutte, U.R.; Khalil, H.S.; Palmiotti, G.; Tatsumi, M.

    1996-01-01

    The VARIANT code solves the multigroup steady-state neutron diffusion and transport equation in three-dimensional Cartesian and hexagonal geometries using the variational nodal method. VARIANT consists of four major parts that must be executed sequentially: input handling, calculation of response matrices, solution algorithm (i.e. inner-outer iteration), and output of results. The objective of the parallelization effort was to reduce the overall computing time by distributing the work of the two computationally intensive (sequential) tasks, the coupling coefficient calculation and the iterative solver, equally among a group of processors. This report describes the code's calculations and gives performance results on one of the benchmark problems used to test the code. The performance analysis in the IBM SPx system shows good efficiency for well-load-balanced programs. Even for relatively small problem sizes, respectable efficiencies are seen for the SPx. An extension to achieve a higher degree of parallelism will be addressed in future work. 7 refs., 1 tab

  11. ITER CTA newsletter. No. 3

    International Nuclear Information System (INIS)

    2001-11-01

    This ITER CTA newsletter comprises reports of Dr. P. Barnard, Iter Canada Chairman and CEO, about the progress of the first formal ITER negotiations and about the demonstration of details of Canada's bid on ITER workshops, and Dr. V. Vlasenkov, Project Board Secretary, about the meeting of the ITER CTA project board

  12. ITER at Cadarache; ITER a Cadarache

    Energy Technology Data Exchange (ETDEWEB)

    NONE

    2005-06-15

    This public information document presents the ITER project (International Thermonuclear Experimental Reactor), the definition of the fusion, the international cooperation and the advantages of the project. It presents also the site of Cadarache, an appropriate scientifical and economical environment. The last part of the documentation recalls the historical aspect of the project and the today mobilization of all partners. (A.L.B.)

  13. ITER council proceedings: 1992

    International Nuclear Information System (INIS)

    1994-01-01

    At the signing of the ITER EDA Agreement on July, 1992, each of the Parties presented to the Director General the names of their designated members of the ITER Council. Upon receiving those names, the Director General stated that the ITER Engineering Design Activities were ''ready to begin''. The next step in this process was the convening of the first meeting of the ITER Council. The first meeting of the Council, held in Vienna, was opened by Director General Hans Blix. The second meeting was held in Moscow, the formal seat of the Council. This volume presents records of these first two Council meetings and, together with the previous volumes on the text of the Agreement and Protocol 1 and the preparations for their signing respectively, represents essential information on the evolution of the ITER EDA

  14. Fast iterative censoring CFAR algorithm for ship detection from SAR images

    Science.gov (United States)

    Gu, Dandan; Yue, Hui; Zhang, Yuan; Gao, Pengcheng

    2017-11-01

    Ship detection is one of the essential techniques for ship recognition from synthetic aperture radar (SAR) images. This paper presents a fast iterative detection procedure to eliminate the influence of target returns on the estimation of local sea clutter distributions for constant false alarm rate (CFAR) detectors. A fast block detector is first employed to extract potential target sub-images; and then, an iterative censoring CFAR algorithm is used to detect ship candidates from each target blocks adaptively and efficiently, where parallel detection is available, and statistical parameters of G0 distribution fitting local sea clutter well can be quickly estimated based on an integral image operator. Experimental results of TerraSAR-X images demonstrate the effectiveness of the proposed technique.

  15. New algorithms for parallel MRI

    International Nuclear Information System (INIS)

    Anzengruber, S; Ramlau, R; Bauer, F; Leitao, A

    2008-01-01

    Magnetic Resonance Imaging with parallel data acquisition requires algorithms for reconstructing the patient's image from a small number of measured lines of the Fourier domain (k-space). In contrast to well-known algorithms like SENSE and GRAPPA and its flavors we consider the problem as a non-linear inverse problem. However, in order to avoid cost intensive derivatives we will use Landweber-Kaczmarz iteration and in order to improve the overall results some additional sparsity constraints.

  16. Non-iterative Voltage Stability

    Energy Technology Data Exchange (ETDEWEB)

    Makarov, Yuri V. [Pacific Northwest National Lab. (PNNL), Richland, WA (United States); Vyakaranam, Bharat [Pacific Northwest National Lab. (PNNL), Richland, WA (United States); Hou, Zhangshuan [Pacific Northwest National Lab. (PNNL), Richland, WA (United States); Wu, Di [Pacific Northwest National Lab. (PNNL), Richland, WA (United States); Meng, Da [Pacific Northwest National Lab. (PNNL), Richland, WA (United States); Wang, Shaobu [Pacific Northwest National Lab. (PNNL), Richland, WA (United States); Elbert, Stephen T. [Pacific Northwest National Lab. (PNNL), Richland, WA (United States); Miller, Laurie E. [Pacific Northwest National Lab. (PNNL), Richland, WA (United States); Huang, Zhenyu [Pacific Northwest National Lab. (PNNL), Richland, WA (United States)

    2014-09-30

    This report demonstrates promising capabilities and performance characteristics of the proposed method using several power systems models. The new method will help to develop a new generation of highly efficient tools suitable for real-time parallel implementation. The ultimate benefit obtained will be early detection of system instability and prevention of system blackouts in real time.

  17. Design iteration in construction projects – Review and directions

    Directory of Open Access Journals (Sweden)

    Purva Mujumdar

    2018-03-01

    Full Text Available Design phase of any construction project involves several designers who exchange information with each other most often in an unstructured manner throughout the design phase. When these information exchanges happen to occur in cycles/loops, it is termed as design iteration. Iteration is an inherent and unavoidable aspect of any design phase which requires proper planning. Till date, very few researchers have explored the design iteration (“complexity” in construction sector. Hence, the objective of this paper was to document and review the complexities of iteration during design phase of construction projects for efficient design planning. To achieve this objective, exhaustive literature review on design iteration was done for four sectors – construction, manufacturing, aerospace, and software development. In addition, semi-structured interviews and discussions were done with a few design experts to verify the different dimensions of iteration. Finally, a design iteration framework was presented in this study that facilitates successful planning. Keywords: Design iteration, Types of iteration, Causes and impact of iteration, Models of iteration, Execution strategies of iteration

  18. Iterative solution of high order compact systems

    Energy Technology Data Exchange (ETDEWEB)

    Spotz, W.F.; Carey, G.F. [Univ. of Texas, Austin, TX (United States)

    1996-12-31

    We have recently developed a class of finite difference methods which provide higher accuracy and greater stability than standard central or upwind difference methods, but still reside on a compact patch of grid cells. In the present study we investigate the performance of several gradient-type iterative methods for solving the associated sparse systems. Both serial and parallel performance studies have been made. Representative examples are taken from elliptic PDE`s for diffusion, convection-diffusion, and viscous flow applications.

  19. ITER towards the construction

    International Nuclear Information System (INIS)

    Shimomura, Y.

    2005-01-01

    The ITER Project has been significantly developed in the last few years in preparation for its construction. The ITER Participant's Negotiators have developed the Joint Implementation Agreement (JIA), ready for finalisation following selection of the construction site and nomination of the project's Director General. The ITER International Team and Participant Teams have continued technical and organisational preparations. Construction will be able to start immediately after the international ITER organisation is established, following signature of the JIA. The Project is strongly supported by the governments of the Participants as well as by the scientific community. The real negotiations, including siting and the final details of cost sharing, started in December 2003. The EU, with Cadarache, and Japan, with Rokkasho, have both promised large contributions to the project to strongly support their construction site proposals. Their wish to host ITER construction is too strong to allow convergence to a single site considering the ITER device in isolation. A broader collaboration among the Parties is therefore being contemplated, covering complementary activities to help accelerate fusion development towards a viable power source, and allow the Participants to reach a conclusion on ITER siting. This report reviews these preparations, and the status of negotiations

  20. Parallel particle swarm optimization algorithm in nuclear problems

    International Nuclear Information System (INIS)

    Waintraub, Marcel; Pereira, Claudio M.N.A.; Schirru, Roberto

    2009-01-01

    Particle Swarm Optimization (PSO) is a population-based metaheuristic (PBM), in which solution candidates evolve through simulation of a simplified social adaptation model. Putting together robustness, efficiency and simplicity, PSO has gained great popularity. Many successful applications of PSO are reported, in which PSO demonstrated to have advantages over other well-established PBM. However, computational costs are still a great constraint for PSO, as well as for all other PBMs, especially in optimization problems with time consuming objective functions. To overcome such difficulty, parallel computation has been used. The default advantage of parallel PSO (PPSO) is the reduction of computational time. Master-slave approaches, exploring this characteristic are the most investigated. However, much more should be expected. It is known that PSO may be improved by more elaborated neighborhood topologies. Hence, in this work, we develop several different PPSO algorithms exploring the advantages of enhanced neighborhood topologies implemented by communication strategies in multiprocessor architectures. The proposed PPSOs have been applied to two complex and time consuming nuclear engineering problems: reactor core design and fuel reload optimization. After exhaustive experiments, it has been concluded that: PPSO still improves solutions after many thousands of iterations, making prohibitive the efficient use of serial (non-parallel) PSO in such kind of realworld problems; and PPSO with more elaborated communication strategies demonstrated to be more efficient and robust than the master-slave model. Advantages and peculiarities of each model are carefully discussed in this work. (author)

  1. Practical parallel programming

    CERN Document Server

    Bauer, Barr E

    2014-01-01

    This is the book that will teach programmers to write faster, more efficient code for parallel processors. The reader is introduced to a vast array of procedures and paradigms on which actual coding may be based. Examples and real-life simulations using these devices are presented in C and FORTRAN.

  2. Iterative Decoding of Concatenated Codes: A Tutorial

    Directory of Open Access Journals (Sweden)

    Phillip A. Regalia

    2005-05-01

    Full Text Available The turbo decoding algorithm of a decade ago constituted a milestone in error-correction coding for digital communications, and has inspired extensions to generalized receiver topologies, including turbo equalization, turbo synchronization, and turbo CDMA, among others. Despite an accrued understanding of iterative decoding over the years, the “turbo principle” remains elusive to master analytically, thereby inciting interest from researchers outside the communications domain. In this spirit, we develop a tutorial presentation of iterative decoding for parallel and serial concatenated codes, in terms hopefully accessible to a broader audience. We motivate iterative decoding as a computationally tractable attempt to approach maximum-likelihood decoding, and characterize fixed points in terms of a “consensus” property between constituent decoders. We review how the decoding algorithm for both parallel and serial concatenated codes coincides with an alternating projection algorithm, which allows one to identify conditions under which the algorithm indeed converges to a maximum-likelihood solution, in terms of particular likelihood functions factoring into the product of their marginals. The presentation emphasizes a common framework applicable to both parallel and serial concatenated codes.

  3. An efficient route to selective bio-oxidation catalysts: an iterative approach comprising modeling, diversification, and screening, based on CYP102A1.

    Science.gov (United States)

    Seifert, Alexander; Antonovici, Mihaela; Hauer, Bernhard; Pleiss, Jürgen

    2011-06-14

    Perillyl alcohol is the terminal hydroxylation product of the cheap and readily available terpene, limonene. It has high potential as an anti-tumor substance, but is of limited availability. In principle, cytochrome P450 monooxygenases, such as the self-sufficient CYP102A1, are promising catalysts for the oxidation of limonene or other inert hydrocarbons. The wild-type enzyme converts (4R)-limonene to four different oxidation products; however, terminal hydroxylation at the allylic C7 is not observed. Here we describe a generic strategy to engineer this widely used enzyme to hydroxylate exclusively the exposed, but chemically less reactive, primary C7 in the presence of other reactive positions. The approach presented here turns CYP102A1 into a highly selective catalyst with a shifted product spectra by successive rounds of modeling, the design of small focused libraries, and screening. In the first round a minimal CYP102A1 mutant library was rationally designed. It contained variants with improved or strongly shifted regio-, stereo- and chemoselectivity, compared to wild-type. From this library the variant with the highest perillyl alcohol ratio was fine-tuned by two additional rounds of molecular modeling, diversification, and screening. In total only 29 variants needed to be screened to identify the triple mutant A264V/A238V/L437F that converts (4R)-limonene to perillyl alcohol with a selectivity of 97 %. Focusing mutagenesis on a small number of relevant positions identified by computational approaches is the key for efficient screening for enzyme selectivity. Copyright © 2011 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  4. News from ITER controls - a status report

    International Nuclear Information System (INIS)

    Wallander, A.; Abadie, L.; Di Maio, F.; Evrard, B.; Fourneron, J.M.; Gulati, H.; Hansalia, C.; Journeaux, J.Y.; Kim, C.; Klotz, W.D.; Mahajan, K.; Makijarvi, P; Matsumoto, Y.; Pande, S.; Simrock, S.; Stepanov, D.; Utzel, N.; Vergara, A.; Winter, A.; Yonekawa, I.

    2012-01-01

    Construction of ITER has started at the Cadarache site in southern France. The first buildings are taking shape and more than 60 % of the in-kind procurement has been committed by the seven ITER member states (China, Europe, India, Japan, Korea, Russia and United States). The design and manufacturing of the main components of the machine is now underway all over the world. Each of these components comes with a local control system, which must be integrated in the central control system. The control group at ITER has developed two products to facilitate it; the plant control design handbook (PCDH) and the control, data access and communication (CODAC) core system. PCDH is a document which prescribes the technologies and methods to be used in developing local control systems and sets the rules applicable to the in-kind procurements. CODAC core system is a software package, distributed to all in-kind procurement developers, which implements the PCDH and facilitates the compliance of the local control system. In parallel, the ITER control group is proceeding with the design of the central control system to allow fully integrated and automated operation of ITER. In this paper we report on the progress of the design and technology choices and we discuss justifications of those choices. We also report on the results of some pilot projects aimed at validating the design and technologies. (authors)

  5. Comparison of Non-overlapping and Overlapping Local/Global Iteration Schemes for Whole-Core Deterministic Transport Calculation

    International Nuclear Information System (INIS)

    Yuk, Seung Su; Cho, Bumhee; Cho, Nam Zin

    2013-01-01

    In the case of deterministic transport model, fixed-k problem formulation is necessary and the overlapping local domain is chosen. However, as mentioned in, the partial current-based Coarse Mesh Finite Difference (p-CMFD) procedure enables also non-overlapping local/global (NLG) iteration. In this paper, NLG iteration is combined with p-CMFD and with CMFD (augmented with a concept of p-CMFD), respectively, and compared to OLG iteration on a 2-D test problem. Non-overlapping local/global iteration with p-CMFD and CMFD global calculation is introduced and tested on a 2-D deterministic transport problem. The modified C5G7 problem is analyzed with both NLG and OLG methods and the solutions converge to the reference solution except for some cases of NLG with CMFD. NLG with CMFD gives the best performance if the solution converges. But if fission-source iteration in local calculation is not enough, it is prone to diverge. The p-CMFD global solver gives unconditional convergence (for both OLG and NLG). A study of switching scheme is in progress, where NLG/p-CMFD is used as 'starter' and then switched to NLG/CMFD to render the whole-core transport calculation more efficient and robust. Parallel computation is another obvious future work

  6. Perl Modules for Constructing Iterators

    Science.gov (United States)

    Tilmes, Curt

    2009-01-01

    The Iterator Perl Module provides a general-purpose framework for constructing iterator objects within Perl, and a standard API for interacting with those objects. Iterators are an object-oriented design pattern where a description of a series of values is used in a constructor. Subsequent queries can request values in that series. These Perl modules build on the standard Iterator framework and provide iterators for some other types of values. Iterator::DateTime constructs iterators from DateTime objects or Date::Parse descriptions and ICal/RFC 2445 style re-currence descriptions. It supports a variety of input parameters, including a start to the sequence, an end to the sequence, an Ical/RFC 2445 recurrence describing the frequency of the values in the series, and a format description that can refine the presentation manner of the DateTime. Iterator::String constructs iterators from string representations. This module is useful in contexts where the API consists of supplying a string and getting back an iterator where the specific iteration desired is opaque to the caller. It is of particular value to the Iterator::Hash module which provides nested iterations. Iterator::Hash constructs iterators from Perl hashes that can include multiple iterators. The constructed iterators will return all the permutations of the iterations of the hash by nested iteration of embedded iterators. A hash simply includes a set of keys mapped to values. It is a very common data structure used throughout Perl programming. The Iterator:: Hash module allows a hash to include strings defining iterators (parsed and dispatched with Iterator::String) that are used to construct an overall series of hash values.

  7. Infrared laser diagnostics for ITER

    International Nuclear Information System (INIS)

    Hutchinson, D.P.; Richards, R.K.; Ma, C.H.

    1995-01-01

    Two infrared laser-based diagnostics are under development at ORNL for measurements on burning plasmas such as ITER. The primary effort is the development of a CO 2 laser Thomson scattering diagnostic for the measurement of the velocity distribution of confined fusion-product alpha particles. Key components of the system include a high-power, single-mode CO 2 pulsed laser, an efficient optics system for beam transport and a multichannel low-noise infrared heterodyne receiver. A successful proof-of-principle experiment has been performed on the Advanced Toroidal Facility (ATF) stellerator at ORNL utilizing scattering from electron plasma frequency satellites. The diagnostic system is currently being installed on Alcator C-Mod at MIT for measurements of the fast ion tail produced by ICRH heating. A second diagnostic under development at ORNL is an infrared polarimeter for Faraday rotation measurements in future fusion experiments. A preliminary feasibility study of a CO 2 laser tangential viewing polarimeter for measuring electron density profiles in ITER has been completed. For ITER plasma parameters and a polarimeter wavelength of 10.6 microm, a Faraday rotation of up to 26 degree is predicted. An electro-optic polarization modulation technique has been developed at ORNL. Laboratory tests of this polarimeter demonstrated a sensitivity of ≤ 0.01 degree. Because of the similarity in the expected Faraday rotation in ITER and Alcator C-Mod, a collaboration between ORNL and the MIT Plasma Fusion Center has been undertaken to test this polarimeter system on Alcator C-Mod. A 10.6 microm polarimeter for this measurement has been constructed and integrated into the existing C-Mod multichannel two-color interferometer. With present experimental parameters for C-Mod, the predicted Faraday rotation was on the order of 0.1 degree. Significant output signals were observed during preliminary tests. Further experiment and detailed analyses are under way

  8. ITER definition phase

    International Nuclear Information System (INIS)

    1989-01-01

    The International Thermonuclear Experimental Reactor (ITER) is envisioned as a fusion device which would demonstrate the scientific and technological feasibility of fusion power. As a first step towards achieving this goal, the European Community, Japan, the Soviet Union, and the United States of America have entered into joint conceptual design activities under the auspices of the International Atomic Energy Agency. A brief summary of the Definition Phase of ITER activities is contained in this report. Included in this report are the background, objectives, organization, definition phase activities, and research and development plan of this endeavor in international scientific collaboration. A more extended technical summary is contained in the two-volume report, ''ITER Concept Definition,'' IAEA/ITER/DS/3. 2 figs, 2 tabs

  9. Power converters for ITER

    CERN Document Server

    Benfatto, I

    2006-01-01

    The International Thermonuclear Experimental Reactor (ITER) is a thermonuclear fusion experiment designed to provide long deuterium– tritium burning plasma operation. After a short description of ITER objectives, the main design parameters and the construction schedule, the paper describes the electrical characteristics of the French 400 kV grid at Cadarache: the European site proposed for ITER. Moreover, the paper describes the main requirements and features of the power converters designed for the ITER coil and additional heating power supplies, characterized by a total installed power of about 1.8 GVA, modular design with basic units up to 90 MVA continuous duty, dc currents up to 68 kA, and voltages from 1 kV to 1 MV dc.

  10. ITER convertible blanket evaluation

    International Nuclear Information System (INIS)

    Wong, C.P.C.; Cheng, E.

    1995-01-01

    Proposed International Thermonuclear Experimental Reactor (ITER) convertible blankets were reviewed. Key design difficulties were identified. A new particle filter concept is introduced and key performance parameters estimated. Results show that this particle filter concept can satisfy all of the convertible blanket design requirements except the generic issue of Be blanket lifetime. If the convertible blanket is an acceptable approach for ITER operation, this particle filter option should be a strong candidate

  11. ITER EDA and technology

    International Nuclear Information System (INIS)

    Baker, C.C.

    2001-01-01

    The year 1998 was the culmination of the six-year Engineering Design Activities (EDA) of the International Thermonuclear Experimental Reactor (ITER) Project. The EDA results in design and validating technology R and D, plus the associated effort in voluntary physics research, is a significant achievement and major milestone in the history of magnetic fusion energy development. Consequently, the ITER EDA was a major theme at this Conference, contributing almost 40 papers

  12. Toward construction of ITER

    International Nuclear Information System (INIS)

    Shimomura, Yasuo

    2005-01-01

    The ITER Project has been significantly developed in the past years in preparation for its construction. The ITER Negotiators have developed a draft Joint Implementation Agreement (JIA), ready for completion following the nomination of the Project's Director General (DG). The ITER International Team and Participant Teams have continued technical and organizational preparations. The actual construction will be able to start immediately after the international ITER organization will be established, following signature of the JIA. The Project is now strongly supported by all the participants as well as by the scientific community with the final high-level negotiations, focused on siting and the concluding details of cost sharing, started in December 2003. The EU, with Cadarache, and Japan, with Rokkasho, have both promised large contributions to the project to strongly support their construction site proposals. The extent to which they both wish to host the ITER facility is such that large contributions to a broader collaboration among the Parties are also proposed by them. This covers complementary activities to help accelerate fusion development towards a viable power source, as well as may allow the Participants to reach a conclusion on ITER siting. (author)

  13. ITER Status and Plans

    Science.gov (United States)

    Greenfield, Charles M.

    2017-10-01

    The US Burning Plasma Organization is pleased to welcome Dr. Bernard Bigot, who will give an update on progress in the ITER Project. Dr. Bigot took over as Director General of the ITER Organization in early 2015 following a distinguished career that included serving as Chairman and CEO of the French Alternative Energies and Atomic Energy Commission and as High Commissioner for ITER in France. During his tenure at ITER the project has moved into high gear, with rapid progress evident on the construction site and preparation of a staged schedule and a research plan leading from where we are today through all the way to full DT operation. In an unprecedented international effort, seven partners ``China, the European Union, India, Japan, Korea, Russia and the United States'' have pooled their financial and scientific resources to build the biggest fusion reactor in history. ITER will open the way to the next step: a demonstration fusion power plant. All DPP attendees are welcome to attend this ITER town meeting.

  14. High-performance blob-based iterative three-dimensional reconstruction in electron tomography using multi-GPUs

    Directory of Open Access Journals (Sweden)

    Wan Xiaohua

    2012-06-01

    Full Text Available Abstract Background Three-dimensional (3D reconstruction in electron tomography (ET has emerged as a leading technique to elucidate the molecular structures of complex biological specimens. Blob-based iterative methods are advantageous reconstruction methods for 3D reconstruction in ET, but demand huge computational costs. Multiple graphic processing units (multi-GPUs offer an affordable platform to meet these demands. However, a synchronous communication scheme between multi-GPUs leads to idle GPU time, and a weighted matrix involved in iterative methods cannot be loaded into GPUs especially for large images due to the limited available memory of GPUs. Results In this paper we propose a multilevel parallel strategy combined with an asynchronous communication scheme and a blob-ELLR data structure to efficiently perform blob-based iterative reconstructions on multi-GPUs. The asynchronous communication scheme is used to minimize the idle GPU time so as to asynchronously overlap communications with computations. The blob-ELLR data structure only needs nearly 1/16 of the storage space in comparison with ELLPACK-R (ELLR data structure and yields significant acceleration. Conclusions Experimental results indicate that the multilevel parallel scheme combined with the asynchronous communication scheme and the blob-ELLR data structure allows efficient implementations of 3D reconstruction in ET on multi-GPUs.

  15. SPARSE ELECTROMAGNETIC IMAGING USING NONLINEAR LANDWEBER ITERATIONS

    KAUST Repository

    Desmal, Abdulla

    2015-07-29

    A scheme for efficiently solving the nonlinear electromagnetic inverse scattering problem on sparse investigation domains is described. The proposed scheme reconstructs the (complex) dielectric permittivity of an investigation domain from fields measured away from the domain itself. Least-squares data misfit between the computed scattered fields, which are expressed as a nonlinear function of the permittivity, and the measured fields is constrained by the L0/L1-norm of the solution. The resulting minimization problem is solved using nonlinear Landweber iterations, where at each iteration a thresholding function is applied to enforce the sparseness-promoting L0/L1-norm constraint. The thresholded nonlinear Landweber iterations are applied to several two-dimensional problems, where the ``measured\\'\\' fields are synthetically generated or obtained from actual experiments. These numerical experiments demonstrate the accuracy, efficiency, and applicability of the proposed scheme in reconstructing sparse profiles with high permittivity values.

  16. Advances in iterative methods for nonlinear equations

    CERN Document Server

    Busquier, Sonia

    2016-01-01

    This book focuses on the approximation of nonlinear equations using iterative methods. Nine contributions are presented on the construction and analysis of these methods, the coverage encompassing convergence, efficiency, robustness, dynamics, and applications. Many problems are stated in the form of nonlinear equations, using mathematical modeling. In particular, a wide range of problems in Applied Mathematics and in Engineering can be solved by finding the solutions to these equations. The book reveals the importance of studying convergence aspects in iterative methods and shows that selection of the most efficient and robust iterative method for a given problem is crucial to guaranteeing a good approximation. A number of sample criteria for selecting the optimal method are presented, including those regarding the order of convergence, the computational cost, and the stability, including the dynamics. This book will appeal to researchers whose field of interest is related to nonlinear problems and equations...

  17. ITER Remote Maintenance System (IRMS) lifecycle management

    Energy Technology Data Exchange (ETDEWEB)

    Tesini, Alessandro, E-mail: alessandro.tesini@iter.org [ITER Organization, CS 90 046, 13067 St. Paul Lez Durance Cedex (France); Otto' , Bede [Oxford Technologies Ltd, 7, Nuffield Way, Abingdon, Oxon OX14 1RJ (United Kingdom); Blight, John [FAAST 31c Allee de la Granette, 13600 Ceyreste (France); Choi, Chang-Hwan; Friconneau, Jean-Pierre; Gotewal, Krishan Kumar; Hamilton, David [ITER Organization, CS 90 046, 13067 St. Paul Lez Durance Cedex (France); Heckendorn, Frank [FD Technologies, PO Box 6686, Aiken, SC (United States); Martins, Jean-Pierre [ITER Organization, CS 90 046, 13067 St. Paul Lez Durance Cedex (France); Marty, Thomas [Westinghouse, 122, avenue de Hambourg, 13008 Marseille (France); Nakahira, Masataka; Palmer, Jim; Subramanian, Rajendran [ITER Organization, CS 90 046, 13067 St. Paul Lez Durance Cedex (France)

    2011-10-15

    The availability of the ITER machine to perform its scientific program is strongly dependent on the performance of the different Remote Handling (RH) systems constituting the ITER Remote Maintenance System (IRMS). The lifecycle of the IRMS will largely exceed 40 years from initial concept design and proof testing through to machine decommissioning. Such a long lifecycle requires that a rigorous approach is put in place to guarantee the technical capabilities of the highly innovative IRMS, its efficiency and its availability. For this purpose, an IRMS System Engineering and IRMS lifecycle management approach has been adopted by ITER. The approach aims at ensuring the IRMS full operability and availability at an acceptable cost of ownership over the full ITER machine assembly and operations period. The IRMS lifecycle management method described in this paper covers such subjects as specific requirements for IRMS design reviews, monitoring during manufacture, factory and site acceptance testing, integrated commissioning, decontamination, maintenance and re-qualification strategies, requirements for Integrated Logistical Support during operations. The updating and implementation of the IRMS lifecycle strategy and this procedure will be managed and monitored by the Remote Handling Integrated Product Team (RH-IPT). Although developed for the IRMS, the basic principles and procedures of lifecycle management could be applied to other ITER plant systems whose reliability and availability will be essential for the continued operation of the ITER machine.

  18. ITER Remote Maintenance System (IRMS) lifecycle management

    International Nuclear Information System (INIS)

    Tesini, Alessandro; Otto', Bede; Blight, John; Choi, Chang-Hwan; Friconneau, Jean-Pierre; Gotewal, Krishan Kumar; Hamilton, David; Heckendorn, Frank; Martins, Jean-Pierre; Marty, Thomas; Nakahira, Masataka; Palmer, Jim; Subramanian, Rajendran

    2011-01-01

    The availability of the ITER machine to perform its scientific program is strongly dependent on the performance of the different Remote Handling (RH) systems constituting the ITER Remote Maintenance System (IRMS). The lifecycle of the IRMS will largely exceed 40 years from initial concept design and proof testing through to machine decommissioning. Such a long lifecycle requires that a rigorous approach is put in place to guarantee the technical capabilities of the highly innovative IRMS, its efficiency and its availability. For this purpose, an IRMS System Engineering and IRMS lifecycle management approach has been adopted by ITER. The approach aims at ensuring the IRMS full operability and availability at an acceptable cost of ownership over the full ITER machine assembly and operations period. The IRMS lifecycle management method described in this paper covers such subjects as specific requirements for IRMS design reviews, monitoring during manufacture, factory and site acceptance testing, integrated commissioning, decontamination, maintenance and re-qualification strategies, requirements for Integrated Logistical Support during operations. The updating and implementation of the IRMS lifecycle strategy and this procedure will be managed and monitored by the Remote Handling Integrated Product Team (RH-IPT). Although developed for the IRMS, the basic principles and procedures of lifecycle management could be applied to other ITER plant systems whose reliability and availability will be essential for the continued operation of the ITER machine.

  19. ITER CTA newsletter. No. 6

    International Nuclear Information System (INIS)

    2002-01-01

    This ITER CTA Newsletter issue comprises information about the following ITER Meetings: The second negotiation meeting on the joint implementation of ITER, held in Tokyo(Japan) on 22-23 January 2002, and an international ITER symposium on burning plasma science and technology, held the day later after the second negotiation meeting at the same place

  20. ITER CTA newsletter. No. 2

    International Nuclear Information System (INIS)

    2001-10-01

    This ITER CTA newsletter contains results of the ITER toroidal field model coil project presented by ITER EU Home Team (Garching) and an article in commemoration of the late Dr. Charles Maisonnier, one of the former leaders of ITER who made significant contributions to its development

  1. Parallel computation

    International Nuclear Information System (INIS)

    Jejcic, A.; Maillard, J.; Maurel, G.; Silva, J.; Wolff-Bacha, F.

    1997-01-01

    The work in the field of parallel processing has developed as research activities using several numerical Monte Carlo simulations related to basic or applied current problems of nuclear and particle physics. For the applications utilizing the GEANT code development or improvement works were done on parts simulating low energy physical phenomena like radiation, transport and interaction. The problem of actinide burning by means of accelerators was approached using a simulation with the GEANT code. A program of neutron tracking in the range of low energies up to the thermal region has been developed. It is coupled to the GEANT code and permits in a single pass the simulation of a hybrid reactor core receiving a proton burst. Other works in this field refers to simulations for nuclear medicine applications like, for instance, development of biological probes, evaluation and characterization of the gamma cameras (collimators, crystal thickness) as well as the method for dosimetric calculations. Particularly, these calculations are suited for a geometrical parallelization approach especially adapted to parallel machines of the TN310 type. Other works mentioned in the same field refer to simulation of the electron channelling in crystals and simulation of the beam-beam interaction effect in colliders. The GEANT code was also used to simulate the operation of germanium detectors designed for natural and artificial radioactivity monitoring of environment

  2. Parallelization of the preconditioned IDR solver for modern multicore computer systems

    Science.gov (United States)

    Bessonov, O. A.; Fedoseyev, A. I.

    2012-10-01

    This paper present the analysis, parallelization and optimization approach for the large sparse matrix solver CNSPACK for modern multicore microprocessors. CNSPACK is an advanced solver successfully used for coupled solution of stiff problems arising in multiphysics applications such as CFD, semiconductor transport, kinetic and quantum problems. It employs iterative IDR algorithm with ILU preconditioning (user chosen ILU preconditioning order). CNSPACK has been successfully used during last decade for solving problems in several application areas, including fluid dynamics and semiconductor device simulation. However, there was a dramatic change in processor architectures and computer system organization in recent years. Due to this, performance criteria and methods have been revisited, together with involving the parallelization of the solver and preconditioner using Open MP environment. Results of the successful implementation for efficient parallelization are presented for the most advances computer system (Intel Core i7-9xx or two-processor Xeon 55xx/56xx).

  3. Cell verification of parallel burnup calculation program MCBMPI based on MPI

    International Nuclear Information System (INIS)

    Yang Wankui; Liu Yaoguang; Ma Jimin; Wang Guanbo; Yang Xin; She Ding

    2014-01-01

    The parallel burnup calculation program MCBMPI was developed. The program was modularized. The parallel MCNP5 program MCNP5MPI was employed as neutron transport calculation module. And a composite of three solution methods was used to solve burnup equation, i.e. matrix exponential technique, TTA analytical solution, and Gauss Seidel iteration. MPI parallel zone decomposition strategy was concluded in the program. The program system only consists of MCNP5MPI and burnup subroutine. The latter achieves three main functions, i.e. zone decomposition, nuclide transferring and decaying, and data exchanging with MCNP5MPI. Also, the program was verified with the pressurized water reactor (PWR) cell burnup benchmark. The results show that it,s capable to apply the program to burnup calculation of multiple zones, and the computation efficiency could be significantly improved with the development of computer hardware. (authors)

  4. Parallel Newton-Krylov-Schwarz algorithms for the transonic full potential equation

    Science.gov (United States)

    Cai, Xiao-Chuan; Gropp, William D.; Keyes, David E.; Melvin, Robin G.; Young, David P.

    1996-01-01

    We study parallel two-level overlapping Schwarz algorithms for solving nonlinear finite element problems, in particular, for the full potential equation of aerodynamics discretized in two dimensions with bilinear elements. The overall algorithm, Newton-Krylov-Schwarz (NKS), employs an inexact finite-difference Newton method and a Krylov space iterative method, with a two-level overlapping Schwarz method as a preconditioner. We demonstrate that NKS, combined with a density upwinding continuation strategy for problems with weak shocks, is robust and, economical for this class of mixed elliptic-hyperbolic nonlinear partial differential equations, with proper specification of several parameters. We study upwinding parameters, inner convergence tolerance, coarse grid density, subdomain overlap, and the level of fill-in in the incomplete factorization, and report their effect on numerical convergence rate, overall execution time, and parallel efficiency on a distributed-memory parallel computer.

  5. Accelerating Lattice QCD Multigrid on GPUs Using Fine-Grained Parallelization

    Energy Technology Data Exchange (ETDEWEB)

    Clark, M. A. [NVIDIA Corp., Santa Clara; Joó, Bálint [Jefferson Lab; Strelchenko, Alexei [Fermilab; Cheng, Michael [Boston U., Ctr. Comp. Sci.; Gambhir, Arjun [William-Mary Coll.; Brower, Richard [Boston U.

    2016-12-22

    The past decade has witnessed a dramatic acceleration of lattice quantum chromodynamics calculations in nuclear and particle physics. This has been due to both significant progress in accelerating the iterative linear solvers using multi-grid algorithms, and due to the throughput improvements brought by GPUs. Deploying hierarchical algorithms optimally on GPUs is non-trivial owing to the lack of parallelism on the coarse grids, and as such, these advances have not proved multiplicative. Using the QUDA library, we demonstrate that by exposing all sources of parallelism that the underlying stencil problem possesses, and through appropriate mapping of this parallelism to the GPU architecture, we can achieve high efficiency even for the coarsest of grids. Results are presented for the Wilson-Clover discretization, where we demonstrate up to 10x speedup over present state-of-the-art GPU-accelerated methods on Titan. Finally, we look to the future, and consider the software implications of our findings.

  6. Study of wall conditioning in tokamaks with application to ITER

    International Nuclear Information System (INIS)

    Kogut, Dmitri

    2014-01-01

    Thesis is devoted to studies of performance and efficiency of wall conditioning techniques in fusion reactors, such as ITER. Conditioning is necessary to control the state of the surface of plasma facing components to ensure plasma initiation and performance. Conditioning and operation of the JET tokamak with ITER-relevant material mix is extensively studied. A 2D model of glow conditioning discharges is developed and validated; it predicts reasonably uniform discharges in ITER. In the nuclear phase of ITER operation conditioning will be needed to control tritium inventory. It is shown here that isotopic exchange is an efficient mean to eliminate tritium from the walls by replacing it with deuterium. Extrapolations for tritium removal are comparable with expected retention per a nominal plasma pulse in ITER. A 1D model of hydrogen isotopic exchange in beryllium is developed and validated. It shows that fluence and temperature of the surface influence efficiency of the isotopic exchange. (author) [fr

  7. Improving the iterative Linear Interaction Energy approach using automated recognition of configurational transitions.

    Science.gov (United States)

    Vosmeer, C Ruben; Kooi, Derk P; Capoferri, Luigi; Terpstra, Margreet M; Vermeulen, Nico P E; Geerke, Daan P

    2016-01-01

    Recently an iterative method was proposed to enhance the accuracy and efficiency of ligand-protein binding affinity prediction through linear interaction energy (LIE) theory. For ligand binding to flexible Cytochrome P450s (CYPs), this method was shown to decrease the root-mean-square error and standard deviation of error prediction by combining interaction energies of simulations starting from different conformations. Thereby, different parts of protein-ligand conformational space are sampled in parallel simulations. The iterative LIE framework relies on the assumption that separate simulations explore different local parts of phase space, and do not show transitions to other parts of configurational space that are already covered in parallel simulations. In this work, a method is proposed to (automatically) detect such transitions during the simulations that are performed to construct LIE models and to predict binding affinities. Using noise-canceling techniques and splines to fit time series of the raw data for the interaction energies, transitions during simulation between different parts of phase space are identified. Boolean selection criteria are then applied to determine which parts of the interaction energy trajectories are to be used as input for the LIE calculations. Here we show that this filtering approach benefits the predictive quality of our previous CYP 2D6-aryloxypropanolamine LIE model. In addition, an analysis is performed of the gain in computational efficiency that can be obtained from monitoring simulations using the proposed filtering method and by prematurely terminating simulations accordingly.

  8. Locality-Driven Parallel Static Analysis for Power Delivery Networks

    KAUST Repository

    Zeng, Zhiyu

    2011-06-01

    Large VLSI on-chip Power Delivery Networks (PDNs) are challenging to analyze due to the sheer network complexity. In this article, a novel parallel partitioning-based PDN analysis approach is presented. We use the boundary circuit responses of each partition to divide the full grid simulation problem into a set of independent subgrid simulation problems. Instead of solving exact boundary circuit responses, a more efficient scheme is proposed to provide near-exact approximation to the boundary circuit responses by exploiting the spatial locality of the flip-chip-type power grids. This scheme is also used in a block-based iterative error reduction process to achieve fast convergence. Detailed computational cost analysis and performance modeling is carried out to determine the optimal (or near-optimal) number of partitions for parallel implementation. Through the analysis of several large power grids, the proposed approach is shown to have excellent parallel efficiency, fast convergence, and favorable scalability. Our approach can solve a 16-million-node power grid in 18 seconds on an IBM p5-575 processing node with 16 Power5+ processors, which is 18.8X faster than a state-of-the-art direct solver. © 2011 ACM.

  9. ITER tokamak device

    International Nuclear Information System (INIS)

    Doggett, J.; Salpietro, E.; Shatalov, G.

    1991-01-01

    The results of the Conceptual Design Activities for the International Thermonuclear Experimental Reactor (ITER) are summarized. These activities, carried out between April 1988 and December 1990, produced a consistent set of technical characteristics and preliminary plans for co-ordinated research and development support of ITER; and a conceptual design, a description of design requirements and a preliminary construction schedule and cost estimate. After a description of the design basis, an overview is given of the tokamak device, its auxiliary systems, facility and maintenance. The interrelation and integration of the various subsystems that form the ITER tokamak concept are discussed. The 16 ITER equatorial port allocations, used for nuclear testing, diagnostics, fuelling, maintenance, and heating and current drive, are given, as well as a layout of the reactor building. Finally, brief descriptions are given of the major ITER sub-systems, i.e., (i) magnet systems (toroidal and poloidal field coils and cryogenic systems), (ii) containment structures (vacuum and cryostat vessels, machine gravity supports, attaching locks, passive loops and active coils), (iii) first wall, (iv) divertor plate (design and materials, performance and lifetime, a.o.), (v) blanket/shield system, (vi) maintenance equipment, (vii) current drive and heating, (viii) fuel cycle system, and (ix) diagnostics. 11 refs, figs and tabs

  10. ITER-FEAT operation

    International Nuclear Information System (INIS)

    Shimomura, Y.; Huguet, M.; Mizoguchi, T.; Murakami, Y.; Polevoi, A.R.; Shimada, M.; Aymar, R.; Chuyanov, V.A.; Matsumoto, H.

    2001-01-01

    ITER is planned to be the first fusion experimental reactor in the world operating for research in physics and engineering. The first ten years of operation will be devoted primarily to physics issues at low neutron fluence and the following ten years of operation to engineering testing at higher fluence. ITER can accommodate various plasma configurations and plasma operation modes, such as inductive high Q modes, long pulse hybrid modes and non-inductive steady state modes, with large ranges of plasma current, density, beta and fusion power, and with various heating and current drive methods. This flexibility will provide an advantage for coping with uncertainties in the physics database, in studying burning plasmas, in introducing advanced features and in optimizing the plasma performance for the different programme objectives. Remote sites will be able to participate in the ITER experiment. This concept will provide an advantage not only in operating ITER for 24 hours a day but also in involving the worldwide fusion community and in promoting scientific competition among the ITER Parties. (author)

  11. STICS: surface-tethered iterative carbohydrate synthesis.

    Science.gov (United States)

    Pornsuriyasak, Papapida; Ranade, Sneha C; Li, Aixiao; Parlato, M Cristina; Sims, Charles R; Shulga, Olga V; Stine, Keith J; Demchenko, Alexei V

    2009-04-14

    A new surface-tethered iterative carbohydrate synthesis (STICS) technology is presented in which a surface functionalized 'stick' made of chemically stable high surface area porous gold allows one to perform cost efficient and simple synthesis of oligosaccharide chains; at the end of the synthesis, the oligosaccharide can be cleaved off and the stick reused for subsequent syntheses.

  12. Parallelizing More Loops with Compiler Guided Refactoring

    DEFF Research Database (Denmark)

    Larsen, Per; Ladelsky, Razya; Lidman, Jacob

    2012-01-01

    an interactive compilation feedback system that guides programmers in iteratively modifying their application source code. This helps leverage the compiler’s ability to generate loop-parallel code. We employ our system to modify two sequential benchmarks dealing with image processing and edge detection...

  13. Vector and parallel processors in computational science

    International Nuclear Information System (INIS)

    Duff, I.S.; Reid, J.K.

    1985-01-01

    This book presents the papers given at a conference which reviewed the new developments in parallel and vector processing. Topics considered at the conference included hardware (array processors, supercomputers), programming languages, software aids, numerical methods (e.g., Monte Carlo algorithms, iterative methods, finite elements, optimization), and applications (e.g., neutron transport theory, meteorology, image processing)

  14. Recent Progress on ECH Technology for ITER

    Science.gov (United States)

    Sirigiri, Jagadishwar

    2005-10-01

    The Electron Cyclotron Heating and Current Drive (ECH&CD) system for ITER is a critical ITER system that must be available for use on Day 1 of the ITER experimental program. The applications of the system include plasma start-up, plasma heating and suppression of Neoclassical Tearing Modes (NTMs). These applications are accomplished using 27 one megawatt continuous wave gyrotrons: 24 at a frequency of 170 GHz and 3 at a frequency of 120 GHz. There are DC power supplies for the gyrotrons, a transmission line system, one launcher at the equatorial plane and three upper port launchers. The US will play a major role in delivering parts of the ECH&CD system to ITER. The present state-of-the-art includes major advances in all areas of ECH technology. In the US, a major effort is underway to supply gyrotrons of up to 1.5 MW power level at 110 GHz to General Atomics for use in heating the DIII-D tokamak. This presentation will include a brief review of the state-of-the-art, worldwide, in ECH technology. The requirements for the ITER ECH&CD system will then be reviewed. ITER calls for gyrotrons capable of operating from a 50 kV power supply, after potential depression, with a minimum of 50% overall efficiency. This is a very significant challenge and some approaches to meeting this goal will be presented. Recent experimental results at MIT showing improved efficiency of high frequency, 1.5 MW gyrotrons will be described. These results will be incorporated into the planned development of gyrotrons for ITER. The ITER ECH&CD system will also be a challenge to the transmission lines, which must operate at high average power at up to 1000 seconds and with high efficiency. The technology challenges and efforts in the US and other ITER parties to solve these problems will be reviewed. *In collaboration with E. Choi, C. Marchewka, I. Mastovosky, M. A. Shapiro and R. J. Temkin. This work is supported by the Office of Fusion Energy Sciences of the U. S. Department of Energy.

  15. A novel iterative scheme and its application to differential equations.

    Science.gov (United States)

    Khan, Yasir; Naeem, F; Šmarda, Zdeněk

    2014-01-01

    The purpose of this paper is to employ an alternative approach to reconstruct the standard variational iteration algorithm II proposed by He, including Lagrange multiplier, and to give a simpler formulation of Adomian decomposition and modified Adomian decomposition method in terms of newly proposed variational iteration method-II (VIM). Through careful investigation of the earlier variational iteration algorithm and Adomian decomposition method, we find unnecessary calculations for Lagrange multiplier and also repeated calculations involved in each iteration, respectively. Several examples are given to verify the reliability and efficiency of the method.

  16. Various Newton-type iterative methods for solving nonlinear equations

    Directory of Open Access Journals (Sweden)

    Manoj Kumar

    2013-10-01

    Full Text Available The aim of the present paper is to introduce and investigate new ninth and seventh order convergent Newton-type iterative methods for solving nonlinear equations. The ninth order convergent Newton-type iterative method is made derivative free to obtain seventh-order convergent Newton-type iterative method. These new with and without derivative methods have efficiency indices 1.5518 and 1.6266, respectively. The error equations are used to establish the order of convergence of these proposed iterative methods. Finally, various numerical comparisons are implemented by MATLAB to demonstrate the performance of the developed methods.

  17. The ITER activity

    International Nuclear Information System (INIS)

    Glass, A.J.

    1991-01-01

    The International Thermonuclear Experimental Reactor (ITER) project is a collaboration among four parties, the United States, the Soviet Union, Japan, and the European Communities, to demonstrate the scientific and technological feasibility of fusion power for peaceful purposes. ITER will demonstrate this through the construction of a tokamak fusion reactor capable of generating 1000 megawatts of fusion power. The ITER project has three missions, as follows: (1) Physics mission -- to demonstrate ignition and controlled burn, with pulse durations from 200 to 1000 S; (2) Technology mission -- to demonstrate the technologies essential to a reactor in an integrated system, operating with high reliability and availability in pulsed operation, with steady-state operation as the ultimate goal; and (3) Testing mission -- to test nuclear and high-heat-flux components at flux levels for 1 mw/m 2 , and fluences of order 1 mw-yr/m 2

  18. Parallel R

    CERN Document Server

    McCallum, Ethan

    2011-01-01

    It's tough to argue with R as a high-quality, cross-platform, open source statistical software product-unless you're in the business of crunching Big Data. This concise book introduces you to several strategies for using R to analyze large datasets. You'll learn the basics of Snow, Multicore, Parallel, and some Hadoop-related tools, including how to find them, how to use them, when they work well, and when they don't. With these packages, you can overcome R's single-threaded nature by spreading work across multiple CPUs, or offloading work to multiple machines to address R's memory barrier.

  19. Parallelization of the FLAPW method

    International Nuclear Information System (INIS)

    Canning, A.; Mannstadt, W.; Freeman, A.J.

    1999-01-01

    The FLAPW (full-potential linearized-augmented plane-wave) method is one of the most accurate first-principles methods for determining electronic and magnetic properties of crystals and surfaces. Until the present work, the FLAPW method has been limited to systems of less than about one hundred atoms due to a lack of an efficient parallel implementation to exploit the power and memory of parallel computers. In this work we present an efficient parallelization of the method by division among the processors of the plane-wave components for each state. The code is also optimized for RISC (reduced instruction set computer) architectures, such as those found on most parallel computers, making full use of BLAS (basic linear algebra subprograms) wherever possible. Scaling results are presented for systems of up to 686 silicon atoms and 343 palladium atoms per unit cell, running on up to 512 processors on a CRAY T3E parallel computer

  20. Parallelization of the FLAPW method

    Science.gov (United States)

    Canning, A.; Mannstadt, W.; Freeman, A. J.

    2000-08-01

    The FLAPW (full-potential linearized-augmented plane-wave) method is one of the most accurate first-principles methods for determining structural, electronic and magnetic properties of crystals and surfaces. Until the present work, the FLAPW method has been limited to systems of less than about a hundred atoms due to the lack of an efficient parallel implementation to exploit the power and memory of parallel computers. In this work, we present an efficient parallelization of the method by division among the processors of the plane-wave components for each state. The code is also optimized for RISC (reduced instruction set computer) architectures, such as those found on most parallel computers, making full use of BLAS (basic linear algebra subprograms) wherever possible. Scaling results are presented for systems of up to 686 silicon atoms and 343 palladium atoms per unit cell, running on up to 512 processors on a CRAY T3E parallel supercomputer.

  1. Earthly sun called ITER

    International Nuclear Information System (INIS)

    Pozdeyev, Mikhail

    2002-01-01

    Full text: Participating in the film are Academicians Velikhov and Glukhikh, Mr. Filatof, ITER Director from Russia, Mr. Sannikov from Kurchatov Institute. The film tells about the starting point of the project (Mr. Lavrentyev), the pioneers of the project (Academicians Tamme, Sakharov, Artsimovich) and about the situation the project is standing now. Participating in [ITER now are the US, Russia, Japan and the European Union. There are two associated members as well - Kazakhstan and Canada. By now the engineering design phase has been finished. Computer animation used in the video gives us the idea how the first thermonuclear reactor based on famous Russian TOKOMAK works. (author)

  2. ITER plant systems

    International Nuclear Information System (INIS)

    Kolbasov, B.; Barnes, C.; Blevins, J.

    1991-01-01

    As part of a series of documents published by the IAEA that summarize the results of the Conceptual Design Activities for the ITER project, this publication describes the conceptual design of the ITER plant systems, in particular (i) the heat transport system, (ii) the electrical distribution system, (iii) the requirements for radioactive equipment handling, the hot cell, and waste management, (iv) the supply system for fluids and operational chemicals, (v) the qualitative analyses of failure scenarios and methods of burn stability control and emergency shutdown control, (vi) analyses of tokamak building functions and design requirements, (vii) a plant layout, and (viii) site requirements. Refs, figs and tabs

  3. Iterated multidimensional wave conversion

    International Nuclear Information System (INIS)

    Brizard, A. J.; Tracy, E. R.; Johnston, D.; Kaufman, A. N.; Richardson, A. S.; Zobin, N.

    2011-01-01

    Mode conversion can occur repeatedly in a two-dimensional cavity (e.g., the poloidal cross section of an axisymmetric tokamak). We report on two novel concepts that allow for a complete and global visualization of the ray evolution under iterated conversions. First, iterated conversion is discussed in terms of ray-induced maps from the two-dimensional conversion surface to itself (which can be visualized in terms of three-dimensional rooms). Second, the two-dimensional conversion surface is shown to possess a symplectic structure derived from Dirac constraints associated with the two dispersion surfaces of the interacting waves.

  4. Physics fundamentals for ITER

    International Nuclear Information System (INIS)

    Rosenbluth, M.N.

    1999-01-01

    The design of an experimental thermonuclear reactor requires both cutting-edge technology and physics predictions precise enough to carry forward the design. The past few years of worldwide physics studies have seen great progress in understanding, innovation and integration. We will discuss this progress and the remaining issues in several key physics areas. (1) Transport and plasma confinement. A worldwide database has led to an 'empirical scaling law' for tokamaks which predicts adequate confinement for the ITER fusion mission, albeit with considerable but acceptable uncertainty. The ongoing revolution in computer capabilities has given rise to new gyrofluid and gyrokinetic simulations of microphysics which may be expected in the near future to attain predictive accuracy. Important databases on H-mode characteristics and helium retention have also been assembled. (2) Divertors, heat removal and fuelling. A novel concept for heat removal - the radiative, baffled, partially detached divertor - has been designed for ITER. Extensive two-dimensional (2D) calculations have been performed and agree qualitatively with recent experiments. Preliminary studies of the interaction of this configuration with core confinement are encouraging and the success of inside pellet launch provides an attractive alternative fuelling method. (3) Macrostability. The ITER mission can be accomplished well within ideal magnetohydrodynamic (MHD) stability limits, except for internal kink modes. Comparisons with JET, as well as a theoretical model including kinetic effects, predict such sawteeth will be benign in ITER. Alternative scenarios involving delayed current penetration or off-axis current drive may be employed if required. The recent discovery of neoclassical beta limits well below ideal MHD limits poses a threat to performance. Extrapolation to reactor scale is as yet unclear. In theory such modes are controllable by current drive profile control or feedback and experiments should

  5. ITER and the fusion reactor: status and challenge to technology

    International Nuclear Information System (INIS)

    Lackner, K.

    2001-01-01

    Fusion has a high potential, but requires an integrated physics and technology effort without precedence in non-military R and D, the basic physics feasibility demonstration will be concluded with ITER, although R and D for efficiency improvement will continue. The essential technological issues remaining at the start of ITER operation concern materials questions: first wall components and radiation tolerant (low activation materials). This paper comprised just the copy of the slides presentation with the following subjects: magnetic confinement fusion, the Tokamak, progress in Tokamak performance, ITER: its geneology, physics basis-critical issues, cutaway of ITER-FEAT, R and D - divertor cassette (L-5), differences power plant-ITER, challenges for ITER and fusion plants, main technological problems (plasma facing materials), structural and functional materials for fusion power plants, ferritic steels, EUROFER development, improvements beyond ferritic steels, costing among others. (nevyjel)

  6. Migration of vectorized iterative solvers to distributed memory architectures

    Energy Technology Data Exchange (ETDEWEB)

    Pommerell, C. [AT& T Bell Labs., Murray Hill, NJ (United States); Ruehl, R. [CSCS-ETH, Manno (Switzerland)

    1994-12-31

    Both necessity and opportunity motivate the use of high-performance computers for iterative linear solvers. Necessity results from the size of the problems being solved-smaller problems are often better handled by direct methods. Opportunity arises from the formulation of the iterative methods in terms of simple linear algebra operations, even if this {open_quote}natural{close_quotes} parallelism is not easy to exploit in irregularly structured sparse matrices and with good preconditioners. As a result, high-performance implementations of iterative solvers have attracted a lot of interest in recent years. Most efforts are geared to vectorize or parallelize the dominating operation-structured or unstructured sparse matrix-vector multiplication, or to increase locality and parallelism by reformulating the algorithm-reducing global synchronization in inner products or local data exchange in preconditioners. Target architectures for iterative solvers currently include mostly vector supercomputers and architectures with one or few optimized (e.g., super-scalar and/or super-pipelined RISC) processors and hierarchical memory systems. More recently, parallel computers with physically distributed memory and a better price/performance ratio have been offered by vendors as a very interesting alternative to vector supercomputers. However, programming comfort on such distributed memory parallel processors (DMPPs) still lags behind. Here the authors are concerned with iterative solvers and their changing computing environment. In particular, they are considering migration from traditional vector supercomputers to DMPPs. Application requirements force one to use flexible and portable libraries. They want to extend the portability of iterative solvers rather than reimplementing everything for each new machine, or even for each new architecture.

  7. Physics research needs for ITER

    International Nuclear Information System (INIS)

    Sauthoff, N.R.

    1995-01-01

    Design of ITER entails the application of physics design tools that have been validated against the world-wide data base of fusion research. In many cases, these tools do not yet exist and must be developed as part of the ITER physics program. ITER's considerable increases in power and size demand significant extrapolations from the current data base; in several cases, new physical effects are projected to dominate the behavior of the ITER plasma. This paper focuses on those design tools and data that have been identified by the ITER team and are not yet available; these needs serve as the basis for the ITER Physics Research Needs, which have been developed jointly by the ITER Physics Expert Groups and the ITER design team. Development of the tools and the supporting data base is an on-going activity that constitutes a significant opportunity for contributions to the ITER program by fusion research programs world-wide

  8. Parallel Implementation of the Recursive Approximation of an Unsupervised Hierarchical Segmentation Algorithm. Chapter 5

    Science.gov (United States)

    Tilton, James C.; Plaza, Antonio J. (Editor); Chang, Chein-I. (Editor)

    2008-01-01

    The hierarchical image segmentation algorithm (referred to as HSEG) is a hybrid of hierarchical step-wise optimization (HSWO) and constrained spectral clustering that produces a hierarchical set of image segmentations. HSWO is an iterative approach to region grooving segmentation in which the optimal image segmentation is found at N(sub R) regions, given a segmentation at N(sub R+1) regions. HSEG's addition of constrained spectral clustering makes it a computationally intensive algorithm, for all but, the smallest of images. To counteract this, a computationally efficient recursive approximation of HSEG (called RHSEG) has been devised. Further improvements in processing speed are obtained through a parallel implementation of RHSEG. This chapter describes this parallel implementation and demonstrates its computational efficiency on a Landsat Thematic Mapper test scene.

  9. Concurrent computation of attribute filters on shared memory parallel machines

    NARCIS (Netherlands)

    Wilkinson, Michael H.F.; Gao, Hui; Hesselink, Wim H.; Jonker, Jan-Eppo; Meijster, Arnold

    2008-01-01

    Morphological attribute filters have not previously been parallelized mainly because they are both global and nonseparable. We propose a parallel algorithm that achieves efficient parallelism for a large class of attribute filters, including attribute openings, closings, thinnings, and thickenings,

  10. iHadoop: Asynchronous Iterations Support for MapReduce

    KAUST Repository

    Elnikety, Eslam

    2011-08-01

    MapReduce is a distributed programming framework designed to ease the development of scalable data-intensive applications for large clusters of commodity machines. Most machine learning and data mining applications involve iterative computations over large datasets, such as the Web hyperlink structures and social network graphs. Yet, the MapReduce model does not efficiently support this important class of applications. The architecture of MapReduce, most critically its dataflow techniques and task scheduling, is completely unaware of the nature of iterative applications; tasks are scheduled according to a policy that optimizes the execution for a single iteration which wastes bandwidth, I/O, and CPU cycles when compared with an optimal execution for a consecutive set of iterations. This work presents iHadoop, a modified MapReduce model, and an associated implementation, optimized for iterative computations. The iHadoop model schedules iterations asynchronously. It connects the output of one iteration to the next, allowing both to process their data concurrently. iHadoop\\'s task scheduler exploits inter- iteration data locality by scheduling tasks that exhibit a producer/consumer relation on the same physical machine allowing a fast local data transfer. For those iterative applications that require satisfying certain criteria before termination, iHadoop runs the check concurrently during the execution of the subsequent iteration to further reduce the application\\'s latency. This thesis also describes our implementation of the iHadoop model, and evaluates its performance against Hadoop, the widely used open source implementation of MapReduce. Experiments using different data analysis applications over real-world and synthetic datasets show that iHadoop performs better than Hadoop for iterative algorithms, reducing execution time of iterative applications by 25% on average. Furthermore, integrating iHadoop with HaLoop, a variant Hadoop implementation that caches

  11. Iterative List Decoding

    DEFF Research Database (Denmark)

    Justesen, Jørn; Høholdt, Tom; Hjaltason, Johan

    2005-01-01

    We analyze the relation between iterative decoding and the extended parity check matrix. By considering a modified version of bit flipping, which produces a list of decoded words, we derive several relations between decodable error patterns and the parameters of the code. By developing a tree...... of codewords at minimal distance from the received vector, we also obtain new information about the code....

  12. ITER power electrical networks

    International Nuclear Information System (INIS)

    Sejas Portela, S.

    2011-01-01

    The ITER project (International Thermonuclear Experimental Reactor) is an international effort to research and development to design, build and operate an experimental facility to demonstrate the scientific and technological possibility of obtaining useful energy from the physical phenomenon known as nuclear fusion.

  13. ITER conceptual design report

    International Nuclear Information System (INIS)

    1991-01-01

    Results of the International Thermonuclear Experimental Reactor (ITER) Conceptual Design Activity (CDA) are reported. This report covers the Terms of Reference for the project: defining the technical specifications, defining future research needs, define site requirements, and carrying out a coordinated research effort coincident with the CDA. Refs, figs and tabs

  14. Nuclear analysis for ITER

    International Nuclear Information System (INIS)

    Santoro, R.T.; Iida, H.; Khripunov, V.; Petrizzi, L.; Sato, S.; Sawan, M.; Shatalov, G.; Schipakin, O.

    2001-01-01

    This paper summarizes the main results of nuclear analysis calculations performed during the International Thermonuclear Experimental Reactor (ITER) Engineering Design Activity (EDA). Major efforts were devoted to fulfilling the General Design Requirements to minimize the nuclear heating rate in the superconducting magnets and ensuring that radiation conditions at the cryostat are suitable for hands-on-maintenance after reactor shut-down. (author)

  15. ITER at Cadarache

    International Nuclear Information System (INIS)

    2005-06-01

    This public information document presents the ITER project (International Thermonuclear Experimental Reactor), the definition of the fusion, the international cooperation and the advantages of the project. It presents also the site of Cadarache, an appropriate scientifical and economical environment. The last part of the documentation recalls the historical aspect of the project and the today mobilization of all partners. (A.L.B.)

  16. ITER conceptual design

    International Nuclear Information System (INIS)

    Tomabechi, K.; Gilleland, J.R.; Sokolov, Yu.A.; Toschi, R.

    1991-01-01

    The Conceptual Design Activities of the International Thermonuclear Experimental Reactor (ITER) were carried out jointly by the European Community, Japan, the Soviet Union and the United States of America, under the auspices of the International Atomic Energy Agency. The European Community provided the site for joint work sessions at the Max-Planck-Institut fuer Plasmaphysik in Garching, Germany. The Conceptual Design Activities began in the spring of 1988 and ended in December 1990. The objectives of the activities were to develop the design of ITER, to perform a safety and environmental analysis, to define the site requirements as well as the future research and development needs, to estimate the cost and manpower, and to prepare a schedule for detailed engineering design, construction and operation. On the basis of the investigation and analysis performed, a concept of ITER was developed which incorporated maximum flexibility of the performance of the device and allowed a variety of operating scenarios to be adopted. The heart of the machine is a tokamak having a plasma major radius of 6 m, a plasma minor radius of 2.15 m, a nominal plasma current of 22 MA and a nominal fusion power of 1 GW. The conceptual design can meet the technical objectives of the ITER programme. Because of the success of the Conceptual Design Activities, the Parties are now considering the implementation of the next phase, called the Engineering Design Activities. (author). Refs, figs and tabs

  17. ITER-FEAT operation

    International Nuclear Information System (INIS)

    Shimomura, Y.; Huget, M.; Mizoguchi, T.; Murakami, Y.; Polevoi, A.; Shimada, M.; Aymar, R.; Chuyanov, V.; Matsumoto, H.

    2001-01-01

    ITER is planned to be the first fusion experimental reactor in the world operating for research in physics and engineering. The first 10 years' operation will be devoted primarily to physics issues at low neutron fluence and the following 10 years' operation to engineering testing at higher fluence. ITER can accommodate various plasma configurations and plasma operation modes such as inductive high Q modes, long pulse hybrid modes, non-inductive steady-state modes, with large ranges of plasma current, density, beta and fusion power, and with various heating and current drive methods. This flexibility will provide an advantage for coping with uncertainties in the physics database, in studying burning plasmas, in introducing advanced features and in optimizing the plasma performance for the different programme objectives. Remote sites will be able to participate in the ITER experiment. This concept will provide an advantage not only in operating ITER for 24 hours per day but also in involving the world-wide fusion communities and in promoting scientific competition among the Parties. (author)

  18. US ITER Management Plan

    International Nuclear Information System (INIS)

    1991-12-01

    This US ITER Management Plan is the plan for conducting the Engineering Design Activities within the US. The plan applies to all design, analyses, and associated physics and technology research and development (R ampersand D) required to support the program. The plan defines the management considerations associated with these activities. The plan also defines the management controls that the project participants will follow to establish, implement, monitor, and report these activities. The activities are to be conducted by the project in accordance with this plan. The plan will be updated to reflect the then-current management approach required to meet the project objectives. The plan will be reviewed at least annually for possible revision. Section 2 presents the ITER objectives, a brief description of the ITER concept as developed during the Conceptual Design Activities, and comments on the Engineering Design Activities. Section 3 discusses the planned international organization for the Engineering Design Activities, from which the tasks will flow to the US Home Team. Section 4 describes the US ITER management organization and responsibilities during the Engineering Design Activities. Section 5 describes the project management and control to be used to perform the assigned tasks during the Engineering Design Activities. Section 6 presents the references. Several appendices are provided that contain detailed information related to the front material

  19. ITER fuel cycle

    International Nuclear Information System (INIS)

    Leger, D.; Dinner, P.; Yoshida, H.

    1991-01-01

    Resulting from the Conceptual Design Activities (1988-1990) by the parties involved in the International Thermonuclear Experimental Reactor (ITER) project, this document summarizes the design requirements and the Conceptual Design Descriptions for each of the principal subsystems and design options of the ITER Fuel Cycle conceptual design. The ITER Fuel Cycle system provides for the handling of all tritiated water and gas mixtures on ITER. The system is subdivided into subsystems for fuelling, primary (torus) vacuum pumping, fuel processing, blanket tritium recovery, and common processes (including isotopic separation, fuel management and storage, and processes for detritiation of solid, liquid, and gaseous wastes). After an introduction describing system function and conceptual design procedure, a summary of the design is presented including a discussion of scope and main parameters, and the fuel design options for fuelling, plasma chamber vacuum pumping, fuel cleanup, blanket tritium recovery, and auxiliary and common processes. Design requirements are defined and design descriptions are given for the various subsystems (fuelling, plasma vacuum pumping, fuel cleanup, blanket tritium recovery, and auxiliary/common processes). The document ends with sections on fuel cycle design integration, fuel cycle building layout, safety considerations, a summary of the research and development programme, costing, and conclusions. Refs, figs and tabs

  20. ITER blanket designs

    International Nuclear Information System (INIS)

    Gohar, Y.; Parker, R.; Rebut, P.H.

    1995-01-01

    The ITER first wall, blanket, and shield system is being designed to handle 1.5±0.3 GW of fusion power and 3 MWa m -2 average neutron fluence. In the basic performance phase of ITER operation, the shielding blanket uses austenitic steel structural material and water coolant. The first wall is made of bimetallic structure, austenitic steel and copper alloy, coated with beryllium and it is protected by beryllium bumper limiters. The choice of copper first wall is dictated by the surface heat flux values anticipated during ITER operation. The water coolant is used at low pressure and low temperature. A breeding blanket has been designed to satisfy the technical objectives of the Enhanced Performance Phase of ITER operation for the Test Program. The breeding blanket design is geometrically similar to the shielding blanket design except it is a self-cooled liquid lithium system with vanadium structural material. Self-healing electrical insulator (aluminum nitride) is used to reduce the MHD pressure drop in the system. Reactor relevancy, low tritium inventory, low activation material, low decay heat, and a tritium self-sufficiency goal are the main features of the breeding blanket design. (orig.)

  1. Advances in iterative methods

    International Nuclear Information System (INIS)

    Beauwens, B.; Arkuszewski, J.; Boryszewicz, M.

    1981-01-01

    Results obtained in the field of linear iterative methods within the Coordinated Research Program on Transport Theory and Advanced Reactor Calculations are summarized. The general convergence theory of linear iterative methods is essentially based on the properties of nonnegative operators on ordered normed spaces. The following aspects of this theory have been improved: new comparison theorems for regular splittings, generalization of the notions of M- and H-matrices, new interpretations of classical convergence theorems for positive-definite operators. The estimation of asymptotic convergence rates was developed with two purposes: the analysis of model problems and the optimization of relaxation parameters. In the framework of factorization iterative methods, model problem analysis is needed to investigate whether the increased computational complexity of higher-order methods does not offset their increased asymptotic convergence rates, as well as to appreciate the effect of standard relaxation techniques (polynomial relaxation). On the other hand, the optimal use of factorization iterative methods requires the development of adequate relaxation techniques and their optimization. The relative performances of a few possibilities have been explored for model problems. Presently, the best results have been obtained with optimal diagonal-Chebyshev relaxation

  2. ITER neutral beam system

    International Nuclear Information System (INIS)

    Mondino, P.L.; Di Pietro, E.; Bayetti, P.

    1999-01-01

    The Neutral Beam (NB) system for the International Thermonuclear Experimental Reactor (ITER) has reached a high degree of integration with the tokamak and with the rest of the plant. Operational requirements and maintainability have been considered in the design. The paper considers the integration with the tokamak, discusses design improvements which appear necessary and finally notes R and D progress in key areas. (author)

  3. Iterative software kernels

    Energy Technology Data Exchange (ETDEWEB)

    Duff, I.

    1994-12-31

    This workshop focuses on kernels for iterative software packages. Specifically, the three speakers discuss various aspects of sparse BLAS kernels. Their topics are: `Current status of user lever sparse BLAS`; Current status of the sparse BLAS toolkit`; and `Adding matrix-matrix and matrix-matrix-matrix multiply to the sparse BLAS toolkit`.

  4. Low-memory iterative density fitting.

    Science.gov (United States)

    Grajciar, Lukáš

    2015-07-30

    A new low-memory modification of the density fitting approximation based on a combination of a continuous fast multipole method (CFMM) and a preconditioned conjugate gradient solver is presented. Iterative conjugate gradient solver uses preconditioners formed from blocks of the Coulomb metric matrix that decrease the number of iterations needed for convergence by up to one order of magnitude. The matrix-vector products needed within the iterative algorithm are calculated using CFMM, which evaluates them with the linear scaling memory requirements only. Compared with the standard density fitting implementation, up to 15-fold reduction of the memory requirements is achieved for the most efficient preconditioner at a cost of only 25% increase in computational time. The potential of the method is demonstrated by performing density functional theory calculations for zeolite fragment with 2592 atoms and 121,248 auxiliary basis functions on a single 12-core CPU workstation. © 2015 Wiley Periodicals, Inc.

  5. Re-starting an Arnoldi iteration

    Energy Technology Data Exchange (ETDEWEB)

    Lehoucq, R.B. [Argonne National Lab., IL (United States)

    1996-12-31

    The Arnoldi iteration is an efficient procedure for approximating a subset of the eigensystem of a large sparse n x n matrix A. The iteration produces a partial orthogonal reduction of A into an upper Hessenberg matrix H{sub m} of order m. The eigenvalues of this small matrix H{sub m} are used to approximate a subset of the eigenvalues of the large matrix A. The eigenvalues of H{sub m} improve as estimates to those of A as m increases. Unfortunately, so does the cost and storage of the reduction. The idea of re-starting the Arnoldi iteration is motivated by the prohibitive cost associated with building a large factorization.

  6. Parallel Lines

    Directory of Open Access Journals (Sweden)

    James G. Worner

    2017-05-01

    Full Text Available James Worner is an Australian-based writer and scholar currently pursuing a PhD at the University of Technology Sydney. His research seeks to expose masculinities lost in the shadow of Australia’s Anzac hegemony while exploring new opportunities for contemporary historiography. He is the recipient of the Doctoral Scholarship in Historical Consciousness at the university’s Australian Centre of Public History and will be hosted by the University of Bologna during 2017 on a doctoral research writing scholarship.   ‘Parallel Lines’ is one of a collection of stories, The Shapes of Us, exploring liminal spaces of modern life: class, gender, sexuality, race, religion and education. It looks at lives, like lines, that do not meet but which travel in proximity, simultaneously attracted and repelled. James’ short stories have been published in various journals and anthologies.

  7. ITER ITA newsletter. No. 24, July 2005

    International Nuclear Information System (INIS)

    2005-08-01

    stimulant for international co-operation on science and technology in the twenty first century, and taking a broader view of the situation, Japan has decided that they will let the EU host the ITER site. Dr. J. Potocnik, European Commissioner for Science and Research, thanked Minister Nakayama for the highly constructive spirit with which he and his colleagues had conducted the bilateral discussions. He expressed his respect for the honourable manner in which the most sensitive stages were handled. He pointed out that the EU was well aware of the important task it had in front of it as the Host of ITER. The action taken had implications beyond that of establishing fusion energy. It was also an expression of mutual confidence to face the scientific, technical and political challenges that will occur in the course of this first-of-a-kind true international science cooperation among the leading nations of the world. ITER was establishing a model of global co-operation to address the increasingly global nature of the challenges confronting today's society. The Chinese Minister of Science and Technology, Mr. Xu Guanhua, expressed his pleasure that agreement on the site had been found within the six-Party framework. China considered that a sustainable solution to the world's energy source problem required multilateral international collaboration on fusion, so that participants could complement each other's skills and pool resources in the shared challenge. Mr. S. Choi, Vice-Minister of Science and Technology, Republic of Korea, reminded the delegates that the eyes of the world were on ITER as one of the most significant projects of the century, with a view to it being a peaceful and affluent one. Having just crossed the barrier of the site decision, there was still more to be done ahead, particularly by concluding the ITER Joint Implementation Agreement as soon as possible. He quoted a Korean proverb, literally translated as 'After rain ground hardens', which parallels with the

  8. Iterative group splitting algorithm for opportunistic scheduling systems

    KAUST Repository

    Nam, Haewoon; Alouini, Mohamed-Slim

    2014-01-01

    An efficient feedback algorithm for opportunistic scheduling systems based on iterative group splitting is proposed in this paper. Similar to the opportunistic splitting algorithm, the proposed algorithm adjusts (or lowers) the feedback threshold

  9. Status of ITER

    International Nuclear Information System (INIS)

    Aymar, R.

    2002-01-01

    At the end of engineering design activities (EDA) in July 2001, all the essential elements became available to make a decision on construction of ITER. A sufficiently detailed and integrated engineering design now exists for a generic site, has been assessed for feasibility, and costed, and essential physics and technology R and D has been carried out to underpin the design choices. Formal negotiations have now begun between the current participants--Canada, Euratom, Japan, and the Russian Federation--on a Joint Implementation Agreement for ITER which also establishes the legal entity to run ITER. These negotiations are supported on technical aspects by Coordinated Technical Activities (CTA), which maintain the integrity of the project, for the good of all participants, and concentrate on preparing for procurement by industry of the longest lead items, and for formal application for a construction license with the host country. This paper highlights the main features of the ITER design. With cryogenically-cooled magnets close to neutron-generating plasma, the design of shielding with adequate access via port plugs for auxiliaries such as heating and diagnostics, and of remote replacement and refurbishing systems for in-vessel components, are particularly interesting nuclear technology challenges. Making a safety case for ITER to satisfy potential regulators and to demonstrate, as far as possible at this stage, the environmental attractiveness of fusion as an energy source, is also important. The paper gives illustrative details on this work, and an update on the progress of technical preparations for construction, as well as the status of the above negotiations

  10. The numerical parallel computing of photon transport

    International Nuclear Information System (INIS)

    Huang Qingnan; Liang Xiaoguang; Zhang Lifa

    1998-12-01

    The parallel computing of photon transport is investigated, the parallel algorithm and the parallelization of programs on parallel computers both with shared memory and with distributed memory are discussed. By analyzing the inherent law of the mathematics and physics model of photon transport according to the structure feature of parallel computers, using the strategy of 'to divide and conquer', adjusting the algorithm structure of the program, dissolving the data relationship, finding parallel liable ingredients and creating large grain parallel subtasks, the sequential computing of photon transport into is efficiently transformed into parallel and vector computing. The program was run on various HP parallel computers such as the HY-1 (PVP), the Challenge (SMP) and the YH-3 (MPP) and very good parallel speedup has been gotten

  11. Eliminating graphs by means of parallel knock-out schemes

    NARCIS (Netherlands)

    Broersma, H.J.; Fomin, F.V.; Královic, R.; Woeginger, G.J.

    2007-01-01

    In 1997 Lampert and Slater introduced parallel knock-out schemes, an iterative process on graphs that goes through several rounds. In each round of this process, every vertex eliminates exactly one of its neighbors. The parallel knock-out number of a graph is the minimum number of rounds after which

  12. Eliminating graphs by means of parallel knock-out schemes

    NARCIS (Netherlands)

    Broersma, Haitze J.; Fomin, F.V.; Královič, R.; Woeginger, Gerhard

    In 1997 Lampert and Slater introduced parallel knock-out schemes, an iterative process on graphs that goes through several rounds. In each round of this process, every vertex eliminates exactly one of its neighbors. The parallel knock-out number of a graph is the minimum number of rounds after which

  13. A dimension decomposition approach based on iterative observer design for an elliptic Cauchy problem

    KAUST Repository

    Majeed, Muhammad Usman; Laleg-Kirati, Taous-Meriem

    2015-01-01

    A state observer inspired iterative algorithm is presented to solve boundary estimation problem for Laplace equation using one of the space variables as a time-like variable. Three dimensional domain with two congruent parallel surfaces

  14. Parallel computation of rotating flows

    DEFF Research Database (Denmark)

    Lundin, Lars Kristian; Barker, Vincent A.; Sørensen, Jens Nørkær

    1999-01-01

    This paper deals with the simulation of 3‐D rotating flows based on the velocity‐vorticity formulation of the Navier‐Stokes equations in cylindrical coordinates. The governing equations are discretized by a finite difference method. The solution is advanced to a new time level by a two‐step process...... is that of solving a singular, large, sparse, over‐determined linear system of equations, and the iterative method CGLS is applied for this purpose. We discuss some of the mathematical and numerical aspects of this procedure and report on the performance of our software on a wide range of parallel computers. Darbe...

  15. Parallel Conjugate Gradient: Effects of Ordering Strategies, Programming Paradigms, and Architectural Platforms

    Science.gov (United States)

    Oliker, Leonid; Heber, Gerd; Biswas, Rupak

    2000-01-01

    The Conjugate Gradient (CG) algorithm is perhaps the best-known iterative technique to solve sparse linear systems that are symmetric and positive definite. A sparse matrix-vector multiply (SPMV) usually accounts for most of the floating-point operations within a CG iteration. In this paper, we investigate the effects of various ordering and partitioning strategies on the performance of parallel CG and SPMV using different programming paradigms and architectures. Results show that for this class of applications, ordering significantly improves overall performance, that cache reuse may be more important than reducing communication, and that it is possible to achieve message passing performance using shared memory constructs through careful data ordering and distribution. However, a multi-threaded implementation of CG on the Tera MTA does not require special ordering or partitioning to obtain high efficiency and scalability.

  16. HPC-NMF: A High-Performance Parallel Algorithm for Nonnegative Matrix Factorization

    Energy Technology Data Exchange (ETDEWEB)

    2016-08-22

    NMF is a useful tool for many applications in different domains such as topic modeling in text mining, background separation in video analysis, and community detection in social networks. Despite its popularity in the data mining community, there is a lack of efficient distributed algorithms to solve the problem for big data sets. We propose a high-performance distributed-memory parallel algorithm that computes the factorization by iteratively solving alternating non-negative least squares (NLS) subproblems for $\\WW$ and $\\HH$. It maintains the data and factor matrices in memory (distributed across processors), uses MPI for interprocessor communication, and, in the dense case, provably minimizes communication costs (under mild assumptions). As opposed to previous implementation, our algorithm is also flexible: It performs well for both dense and sparse matrices, and allows the user to choose any one of the multiple algorithms for solving the updates to low rank factors $\\WW$ and $\\HH$ within the alternating iterations.

  17. A kind of iteration algorithm for fast wave heating

    International Nuclear Information System (INIS)

    Zhu Xueguang; Kuang Guangli; Zhao Yanping; Li Youyi; Xie Jikang

    1998-03-01

    The standard normal distribution for particles in Tokamak geometry is usually assumed in fast wave heating. In fact, due to the quasi-linear diffusion effect, the parallel and vertical temperature of resonant particles is not equal, so, this will bring some error. For this case, the Fokker-Planck equation is introduced, and iteration algorithm is adopted to solve the problem well

  18. A parallel algorithm for the two-dimensional time fractional diffusion equation with implicit difference method.

    Science.gov (United States)

    Gong, Chunye; Bao, Weimin; Tang, Guojian; Jiang, Yuewen; Liu, Jie

    2014-01-01

    It is very time consuming to solve fractional differential equations. The computational complexity of two-dimensional fractional differential equation (2D-TFDE) with iterative implicit finite difference method is O(M(x)M(y)N(2)). In this paper, we present a parallel algorithm for 2D-TFDE and give an in-depth discussion about this algorithm. A task distribution model and data layout with virtual boundary are designed for this parallel algorithm. The experimental results show that the parallel algorithm compares well with the exact solution. The parallel algorithm on single Intel Xeon X5540 CPU runs 3.16-4.17 times faster than the serial algorithm on single CPU core. The parallel efficiency of 81 processes is up to 88.24% compared with 9 processes on a distributed memory cluster system. We do think that the parallel computing technology will become a very basic method for the computational intensive fractional applications in the near future.

  19. Expressing Parallelism with ROOT

    Energy Technology Data Exchange (ETDEWEB)

    Piparo, D. [CERN; Tejedor, E. [CERN; Guiraud, E. [CERN; Ganis, G. [CERN; Mato, P. [CERN; Moneta, L. [CERN; Valls Pla, X. [CERN; Canal, P. [Fermilab

    2017-11-22

    The need for processing the ever-increasing amount of data generated by the LHC experiments in a more efficient way has motivated ROOT to further develop its support for parallelism. Such support is being tackled both for shared-memory and distributed-memory environments. The incarnations of the aforementioned parallelism are multi-threading, multi-processing and cluster-wide executions. In the area of multi-threading, we discuss the new implicit parallelism and related interfaces, as well as the new building blocks to safely operate with ROOT objects in a multi-threaded environment. Regarding multi-processing, we review the new MultiProc framework, comparing it with similar tools (e.g. multiprocessing module in Python). Finally, as an alternative to PROOF for cluster-wide executions, we introduce the efforts on integrating ROOT with state-of-the-art distributed data processing technologies like Spark, both in terms of programming model and runtime design (with EOS as one of the main components). For all the levels of parallelism, we discuss, based on real-life examples and measurements, how our proposals can increase the productivity of scientists.

  20. Expressing Parallelism with ROOT

    Science.gov (United States)

    Piparo, D.; Tejedor, E.; Guiraud, E.; Ganis, G.; Mato, P.; Moneta, L.; Valls Pla, X.; Canal, P.

    2017-10-01

    The need for processing the ever-increasing amount of data generated by the LHC experiments in a more efficient way has motivated ROOT to further develop its support for parallelism. Such support is being tackled both for shared-memory and distributed-memory environments. The incarnations of the aforementioned parallelism are multi-threading, multi-processing and cluster-wide executions. In the area of multi-threading, we discuss the new implicit parallelism and related interfaces, as well as the new building blocks to safely operate with ROOT objects in a multi-threaded environment. Regarding multi-processing, we review the new MultiProc framework, comparing it with similar tools (e.g. multiprocessing module in Python). Finally, as an alternative to PROOF for cluster-wide executions, we introduce the efforts on integrating ROOT with state-of-the-art distributed data processing technologies like Spark, both in terms of programming model and runtime design (with EOS as one of the main components). For all the levels of parallelism, we discuss, based on real-life examples and measurements, how our proposals can increase the productivity of scientists.

  1. Parallel hierarchical radiosity rendering

    Energy Technology Data Exchange (ETDEWEB)

    Carter, Michael [Iowa State Univ., Ames, IA (United States)

    1993-07-01

    In this dissertation, the step-by-step development of a scalable parallel hierarchical radiosity renderer is documented. First, a new look is taken at the traditional radiosity equation, and a new form is presented in which the matrix of linear system coefficients is transformed into a symmetric matrix, thereby simplifying the problem and enabling a new solution technique to be applied. Next, the state-of-the-art hierarchical radiosity methods are examined for their suitability to parallel implementation, and scalability. Significant enhancements are also discovered which both improve their theoretical foundations and improve the images they generate. The resultant hierarchical radiosity algorithm is then examined for sources of parallelism, and for an architectural mapping. Several architectural mappings are discussed. A few key algorithmic changes are suggested during the process of making the algorithm parallel. Next, the performance, efficiency, and scalability of the algorithm are analyzed. The dissertation closes with a discussion of several ideas which have the potential to further enhance the hierarchical radiosity method, or provide an entirely new forum for the application of hierarchical methods.

  2. FAST ITERATIVE KILOVOLTAGE CONE BEAM TOMOGRAPHY

    Directory of Open Access Journals (Sweden)

    S. A. Zolotarev

    2015-01-01

    Full Text Available Creating a fast parallel iterative tomographic algorithms based on the use of graphics accelerators, which simultaneously provide the minimization of residual and total variation of the reconstructed image is an important and urgent task, which is of great scientific and practical importance. Such algorithms can be used, for example, in the implementation of radiation therapy patients, because it is always done pre-computed tomography of patients in order to better identify areas which can then be subjected to radiation exposure. 

  3. Distributed Video Coding: Iterative Improvements

    DEFF Research Database (Denmark)

    Luong, Huynh Van

    Nowadays, emerging applications such as wireless visual sensor networks and wireless video surveillance are requiring lightweight video encoding with high coding efficiency and error-resilience. Distributed Video Coding (DVC) is a new coding paradigm which exploits the source statistics...... and noise modeling and also learn from the previous decoded Wyner-Ziv (WZ) frames, side information and noise learning (SING) is proposed. The SING scheme introduces an optical flow technique to compensate the weaknesses of the block based SI generation and also utilizes clustering of DCT blocks to capture...... cross band correlation and increase local adaptivity in noise modeling. During decoding, the updated information is used to iteratively reestimate the motion and reconstruction in the proposed motion and reconstruction reestimation (MORE) scheme. The MORE scheme not only reestimates the motion vectors...

  4. ITER CTA newsletter. No. 4

    International Nuclear Information System (INIS)

    2001-12-01

    This ITER CTA Newsletter contains information about the organization of the ITER Co-ordinated Technical Activities (CTA) International Team as the follow-up of the ITER CTA project board meeting in Toronto on 7 November 2001. It also includes a summary on the start of the international tokamak physics activity by Dr. D. Campbell, Chair of the ITPA Co-ordinating Committee

  5. ITER CTA newsletter. No. 9

    International Nuclear Information System (INIS)

    2002-06-01

    This ITER CTA newsletter contains information about the Fourth Negotiations Meeting on the Joint Implementation of ITER held in Cadarache, France on 4-6 June 2002 and about the meeting of the ITER CTA Project Board which took place on the occasion of the N4 Meeting at Cadarache on 3-4 June 2002

  6. ITER management advisory committee meeting

    International Nuclear Information System (INIS)

    Yoshikawa, M.

    2001-01-01

    The ITER Management Advisory Committee (MAC) Meeting was held on 23 February in Garching, Germany. The main topics were: the consideration of the report by the Director on the ITER EDA Status, the review of the Work Programme, the review of the Joint Fund, the review of a schedule of ITER meetings, and the arrangements for termination and wind-up of the EDA

  7. ITER CTA newsletter. No. 1

    International Nuclear Information System (INIS)

    2001-01-01

    This ITER CTA newsletter comprises reports on ITER co-ordinated technical activities, information about the Meeting of the ITER CTA project board which took place in Vienna on 16 July 2001, and the Meeting of the expert group on MHD, disruptions and plasma control which was held on 25-26 June 2001 in Funchal, Madeira

  8. Status of the ITER EDA

    International Nuclear Information System (INIS)

    Aymar, R.

    2000-01-01

    This article summarizes progress made in the ITER Engineering Design Activities in the period between the ITER Meeting in Tokyo (January 2000) and June 2000. Topics: Termination of EDA, Joint Central Team and Support, Task Assignments, ITER Physics, Urgent and High Priority Physics Research Areas

  9. On the Convergence of Iterative Receiver Algorithms Utilizing Hard Decisions

    Directory of Open Access Journals (Sweden)

    Jürgen F. Rößler

    2009-01-01

    Full Text Available The convergence of receivers performing iterative hard decision interference cancellation (IHDIC is analyzed in a general framework for ASK, PSK, and QAM constellations. We first give an overview of IHDIC algorithms known from the literature applied to linear modulation and DS-CDMA-based transmission systems and show the relation to Hopfield neural network theory. It is proven analytically that IHDIC with serial update scheme always converges to a stable state in the estimated values in course of iterations and that IHDIC with parallel update scheme converges to cycles of length 2. Additionally, we visualize the convergence behavior with the aid of convergence charts. Doing so, we give insight into possible errors occurring in IHDIC which turn out to be caused by locked error situations. The derived results can directly be applied to those iterative soft decision interference cancellation (ISDIC receivers whose soft decision functions approach hard decision functions in course of the iterations.

  10. Fabrication progress of the ITER vacuum vessel sector in Korea

    Energy Technology Data Exchange (ETDEWEB)

    Kim, B.C., E-mail: bckim@nfri.re.kr [National Fusion Research Institute, Gwahangno 113, Yuseong-gu, Daejeon (Korea, Republic of); Lee, Y.J.; Hong, K.H.; Sa, J.W.; Kim, H.S.; Park, C.K.; Ahn, H.J.; Bak, J.S.; Jung, K.J. [National Fusion Research Institute, Gwahangno 113, Yuseong-gu, Daejeon (Korea, Republic of); Park, K.H.; Roh, B.R.; Kim, T.S.; Lee, J.S.; Jung, Y.H.; Sung, H.J.; Choi, S.Y.; Kim, H.G.; Kwon, I.K.; Kwon, T.H. [Hyundai Heavy Industries Co. Ltd., Dong-gu, Ulsan (Korea, Republic of)

    2013-10-15

    Highlights: ► Fabrication of ITER vacuum vessel sector full scale mock-up to develop fabrication procedures. ► The welding and nondestructive examination techniques conform to RCC-MR. ► The preparation of real manufacturing of ITER vacuum vessel sector. -- Abstract: As a participant of ITER project, ITER Korea has to supply two ITER vacuum vessel sectors (Sector no. 6, no. 1) of total nine ITER VV sectors. After the procurement arrangement with ITER Organization, ITER Korea made the contract with Hyundai Heavy Industries (HHI) for fabrication of two sectors. Then the start of the manufacturing design was initiated from January 2010. HHI made three real scale R and D mock-ups to verify the critical fabrication feasibility issues on electron beam welding, 3D forming, welding distortion and achievable tolerances. The documentation according to IO and the French nuclear safety regulation requirement, the qualification of welding and nondestructive examination procedures conform to RCC-MR 2007 were proceed in parallel. The mass production of raw material was done after receiving ANB (agreed notified body) verification of product/parts and shop qualification. The manufacturing drawing, manufacturing and inspection plan of VV sector with supporting fabrication procedures are also verified by ANB, accordingly the first cutting and forming of plates for VV sector fabrication started from February 2012. This paper reports the latest fabrication progress of ITER vacuum vessel Sector no. 6 that will be assembled as the first sector in the ITER pit. The overall fabrication route, R and D mock-up fabrication results with forming and welding distortion analysis, qualification status of welding and nondestructive examination (NDE) are also presented.

  11. Parallel SN algorithms in shared- and distributed-memory environments

    International Nuclear Information System (INIS)

    Haghighat, Alireza; Hunter, Melissa A.; Mattis, Ronald E.

    1995-01-01

    Different 2-D spatial domain partitioning Sn transport theory algorithms have been developed on the basis of the Block-Jacobi iterative scheme. These algorithms have been incorporated into TWOTRAN-II, and tested on a shared-memory CRAY Y-MP C90 and a distributed-memory IBM SP1. For a series of fixed source r-z geometry homogeneous problems, parallel efficiencies in a range of 50-90% are achieved on the C90 with 6 processors, and lower values (20-60%) are obtained on the SP1. It is demonstrated that better performance is attainable if one addresses issues such as convergence rate, load-balancing, and granularity for both architectures, as well as message passing (network bandwidth and latency) for SP1. (author). 17 refs, 4 figs

  12. Iterative solution of the Helmholtz equation

    Energy Technology Data Exchange (ETDEWEB)

    Larsson, E.; Otto, K. [Uppsala Univ. (Sweden)

    1996-12-31

    We have shown that the numerical solution of the two-dimensional Helmholtz equation can be obtained in a very efficient way by using a preconditioned iterative method. We discretize the equation with second-order accurate finite difference operators and take special care to obtain non-reflecting boundary conditions. We solve the large, sparse system of equations that arises with the preconditioned restarted GMRES iteration. The preconditioner is of {open_quotes}fast Poisson type{close_quotes}, and is derived as a direct solver for a modified PDE problem.The arithmetic complexity for the preconditioner is O(n log{sub 2} n), where n is the number of grid points. As a test problem we use the propagation of sound waves in water in a duct with curved bottom. Numerical experiments show that the preconditioned iterative method is very efficient for this type of problem. The convergence rate does not decrease dramatically when the frequency increases. Compared to banded Gaussian elimination, which is a standard solution method for this type of problems, the iterative method shows significant gain in both storage requirement and arithmetic complexity. Furthermore, the relative gain increases when the frequency increases.

  13. Iterative supervirtual refraction interferometry

    KAUST Repository

    Al-Hagan, Ola

    2014-05-02

    In refraction tomography, the low signal-to-noise ratio (S/N) can be a major obstacle in picking the first-break arrivals at the far-offset receivers. To increase the S/N, we evaluated iterative supervirtual refraction interferometry (ISVI), which is an extension of the supervirtual refraction interferometry method. In this method, supervirtual traces are computed and then iteratively reused to generate supervirtual traces with a higher S/N. Our empirical results with both synthetic and field data revealed that ISVI can significantly boost up the S/N of far-offset traces. The drawback is that using refraction events from more than one refractor can introduce unacceptable artifacts into the final traveltime versus offset curve. This problem can be avoided by careful windowing of refraction events.

  14. Iterative supervirtual refraction interferometry

    KAUST Repository

    Al-Hagan, Ola; Hanafy, Sherif M.; Schuster, Gerard T.

    2014-01-01

    In refraction tomography, the low signal-to-noise ratio (S/N) can be a major obstacle in picking the first-break arrivals at the far-offset receivers. To increase the S/N, we evaluated iterative supervirtual refraction interferometry (ISVI), which is an extension of the supervirtual refraction interferometry method. In this method, supervirtual traces are computed and then iteratively reused to generate supervirtual traces with a higher S/N. Our empirical results with both synthetic and field data revealed that ISVI can significantly boost up the S/N of far-offset traces. The drawback is that using refraction events from more than one refractor can introduce unacceptable artifacts into the final traveltime versus offset curve. This problem can be avoided by careful windowing of refraction events.

  15. ITER technical basis

    Energy Technology Data Exchange (ETDEWEB)

    NONE

    2002-01-01

    Following on from the Final Report of the EDA(DS/21), and the summary of the ITER Final Design report(DS/22), the technical basis gives further details of the design of ITER. It is in two parts. The first, the Plant Design specification, summarises the main constraints on the plant design and operation from the viewpoint of engineering and physics assumptions, compliance with safety regulations, and siting requirements and assumptions. The second, the Plant Description Document, describes the physics performance and engineering characteristics of the plant design, illustrates the potential operational consequences foe the locality of a generic site, gives the construction, commissioning, exploitation and decommissioning schedule, and reports the estimated lifetime costing based on data from the industry of the EDA parties.

  16. ITER technical basis

    International Nuclear Information System (INIS)

    2002-01-01

    Following on from the Final Report of the EDA(DS/21), and the summary of the ITER Final Design report(DS/22), the technical basis gives further details of the design of ITER. It is in two parts. The first, the Plant Design specification, summarises the main constraints on the plant design and operation from the viewpoint of engineering and physics assumptions, compliance with safety regulations, and siting requirements and assumptions. The second, the Plant Description Document, describes the physics performance and engineering characteristics of the plant design, illustrates the potential operational consequences foe the locality of a generic site, gives the construction, commissioning, exploitation and decommissioning schedule, and reports the estimated lifetime costing based on data from the industry of the EDA parties

  17. DGDFT: A massively parallel method for large scale density functional theory calculations.

    Science.gov (United States)

    Hu, Wei; Lin, Lin; Yang, Chao

    2015-09-28

    We describe a massively parallel implementation of the recently developed discontinuous Galerkin density functional theory (DGDFT) method, for efficient large-scale Kohn-Sham DFT based electronic structure calculations. The DGDFT method uses adaptive local basis (ALB) functions generated on-the-fly during the self-consistent field iteration to represent the solution to the Kohn-Sham equations. The use of the ALB set provides a systematic way to improve the accuracy of the approximation. By using the pole expansion and selected inversion technique to compute electron density, energy, and atomic forces, we can make the computational complexity of DGDFT scale at most quadratically with respect to the number of electrons for both insulating and metallic systems. We show that for the two-dimensional (2D) phosphorene systems studied here, using 37 basis functions per atom allows us to reach an accuracy level of 1.3 × 10(-4) Hartree/atom in terms of the error of energy and 6.2 × 10(-4) Hartree/bohr in terms of the error of atomic force, respectively. DGDFT can achieve 80% parallel efficiency on 128,000 high performance computing cores when it is used to study the electronic structure of 2D phosphorene systems with 3500-14 000 atoms. This high parallel efficiency results from a two-level parallelization scheme that we will describe in detail.

  18. DGDFT: A massively parallel method for large scale density functional theory calculations

    Energy Technology Data Exchange (ETDEWEB)

    Hu, Wei, E-mail: whu@lbl.gov; Yang, Chao, E-mail: cyang@lbl.gov [Computational Research Division, Lawrence Berkeley National Laboratory, Berkeley, California 94720 (United States); Lin, Lin, E-mail: linlin@math.berkeley.edu [Computational Research Division, Lawrence Berkeley National Laboratory, Berkeley, California 94720 (United States); Department of Mathematics, University of California, Berkeley, California 94720 (United States)

    2015-09-28

    We describe a massively parallel implementation of the recently developed discontinuous Galerkin density functional theory (DGDFT) method, for efficient large-scale Kohn-Sham DFT based electronic structure calculations. The DGDFT method uses adaptive local basis (ALB) functions generated on-the-fly during the self-consistent field iteration to represent the solution to the Kohn-Sham equations. The use of the ALB set provides a systematic way to improve the accuracy of the approximation. By using the pole expansion and selected inversion technique to compute electron density, energy, and atomic forces, we can make the computational complexity of DGDFT scale at most quadratically with respect to the number of electrons for both insulating and metallic systems. We show that for the two-dimensional (2D) phosphorene systems studied here, using 37 basis functions per atom allows us to reach an accuracy level of 1.3 × 10{sup −4} Hartree/atom in terms of the error of energy and 6.2 × 10{sup −4} Hartree/bohr in terms of the error of atomic force, respectively. DGDFT can achieve 80% parallel efficiency on 128,000 high performance computing cores when it is used to study the electronic structure of 2D phosphorene systems with 3500-14 000 atoms. This high parallel efficiency results from a two-level parallelization scheme that we will describe in detail.

  19. DGDFT: A massively parallel method for large scale density functional theory calculations

    International Nuclear Information System (INIS)

    Hu, Wei; Yang, Chao; Lin, Lin

    2015-01-01

    We describe a massively parallel implementation of the recently developed discontinuous Galerkin density functional theory (DGDFT) method, for efficient large-scale Kohn-Sham DFT based electronic structure calculations. The DGDFT method uses adaptive local basis (ALB) functions generated on-the-fly during the self-consistent field iteration to represent the solution to the Kohn-Sham equations. The use of the ALB set provides a systematic way to improve the accuracy of the approximation. By using the pole expansion and selected inversion technique to compute electron density, energy, and atomic forces, we can make the computational complexity of DGDFT scale at most quadratically with respect to the number of electrons for both insulating and metallic systems. We show that for the two-dimensional (2D) phosphorene systems studied here, using 37 basis functions per atom allows us to reach an accuracy level of 1.3 × 10 −4 Hartree/atom in terms of the error of energy and 6.2 × 10 −4 Hartree/bohr in terms of the error of atomic force, respectively. DGDFT can achieve 80% parallel efficiency on 128,000 high performance computing cores when it is used to study the electronic structure of 2D phosphorene systems with 3500-14 000 atoms. This high parallel efficiency results from a two-level parallelization scheme that we will describe in detail

  20. Iterated Leavitt Path Algebras

    International Nuclear Information System (INIS)

    Hazrat, R.

    2009-11-01

    Leavitt path algebras associate to directed graphs a Z-graded algebra and in their simplest form recover the Leavitt algebras L(1,k). In this note, we introduce iterated Leavitt path algebras associated to directed weighted graphs which have natural ± Z grading and in their simplest form recover the Leavitt algebras L(n,k). We also characterize Leavitt path algebras which are strongly graded. (author)

  1. ICP (ITER Collaborative Platform)

    Energy Technology Data Exchange (ETDEWEB)

    Capuano, C.; Carayon, F.; Patel, V. [ITER, 13 - St. Paul-Lez Durance (France)

    2009-07-01

    The ITER organization has the necessity to manage a massive amount of data and processes. Each team requires different process and databases often interconnected with those of others teams. ICP is the current central ITER repository of structured and unstructured data. All data in ICP is served and managed via a web interface that provides global accessibility with a common user friendly interface. This paper will explain the model used by ICP and how it serves the ITER project by providing a robust and agile platform. ICP is developed in ASP.NET using MSSQL Server for data storage. It currently houses 15 data driven applications, 150 different types of record, 500 k objects and 2.5 M references. During European working hours the system averages 150 concurrent users and 20 requests per second. ICP connects to external database applications to provide a single entry point to ITER data and a safe shared storage place to maintain this data long-term. The Core model provides an easy to extend framework to meet the future needs of the Organization. ICP follows a multi-tier architecture, providing logical separation of process. The standard three-tier architecture is expanded, with the data layer separated into data storage, data structure, and data access components. The business or applications logic layer is broken up into a common business functionality layer, a type specific logic layer, and a detached work-flow layer. Finally the presentation tier comprises a presentation adapter layer and an interface layer. Each layer is built up from small blocks which can be combined to create a wide range of more complex functionality. Each new object type developed gains access to a wealth of existing code functionality, while also free to adapt and extend this. The hardware structure is designed to provide complete redundancy, high availability and to handle high load. This document is composed of an abstract followed by the presentation transparencies. (authors)

  2. Metrology for ITER Assembly

    International Nuclear Information System (INIS)

    Bogusch, E.

    2006-01-01

    The overall dimensions of the ITER Tokamak and the particular assembly sequence preclude the use of conventional optical metrology, mechanical jigs and traditional dimensional control equipment, as used for the assembly of smaller, previous generation, fusion devices. This paper describes the state of the art of the capabilities of available metrology systems, with reference to the previous experience in Fusion engineering and in other industries. Two complementary procedures of transferring datum from the primary datum network on the bioshield to the secondary datum s inside the VV with the desired accuracy of about 0.1 mm is described, one method using the access directly through the ports and the other using transfer techniques, developed during the co-operation with ITER/EFDA. Another important task described is the development of a method for the rapid and easy measurement of the gaps between sectors, required for the production of the customised splice plates between them. The scope of the paper includes the evaluation of the composition and cost of the systems and team of technical staff required to meet the requirements of the assembly procedure. The results from a practical, full-scale demonstration of the methodologies used, using the proposed equipment, is described. This work has demonstrated the feasibility of achieving the necessary accuracies for the successful building of ITER. (author)

  3. The ITER tritium systems

    International Nuclear Information System (INIS)

    Glugla, M.; Antipenkov, A.; Beloglazov, S.; Caldwell-Nichols, C.; Cristescu, I.R.; Cristescu, I.; Day, C.; Doerr, L.; Girard, J.-P.; Tada, E.

    2007-01-01

    ITER is the first fusion machine fully designed for operation with equimolar deuterium-tritium mixtures. The tokamak vessel will be fuelled through gas puffing and pellet injection, and the Neutral Beam heating system will introduce deuterium into the machine. Employing deuterium and tritium as fusion fuel will cause alpha heating of the plasma and will eventually provide energy. Due to the small burn-up fraction in the vacuum vessel a closed deuterium-tritium loop is required, along with all the auxiliary systems necessary for the safe handling of tritium. The ITER inner fuel cycle systems are designed to process considerable and unprecedented deuterium-tritium flow rates with high flexibility and reliability. High decontamination factors for effluent and release streams and low tritium inventories in all systems are needed to minimize chronic and accidental emissions. A multiple barrier concept assures the confinement of tritium within its respective processing components; atmosphere and vent detritiation systems are essential elements in this concept. Not only the interfaces between the primary fuel cycle systems - being procured through different Participant Teams - but also those to confinement systems such as Atmosphere Detritiation or those to fuelling and pumping - again procured through different Participant Teams - and interfaces to buildings are calling for definition and for detailed analysis to assure proper design integration. Considering the complexity of the ITER Tritium Plant configuration management and interface control will be a challenging task

  4. Neutron cameras for ITER

    International Nuclear Information System (INIS)

    Johnson, L.C.; Barnes, C.W.; Batistoni, P.

    1998-01-01

    Neutron cameras with horizontal and vertical views have been designed for ITER, based on systems used on JET and TFTR. The cameras consist of fan-shaped arrays of collimated flight tubes, with suitably chosen detectors situated outside the biological shield. The sight lines view the ITER plasma through slots in the shield blanket and penetrate the vacuum vessel, cryostat, and biological shield through stainless steel windows. This paper analyzes the expected performance of several neutron camera arrangements for ITER. In addition to the reference designs, the authors examine proposed compact cameras, in which neutron fluxes are inferred from 16 N decay gammas in dedicated flowing water loops, and conventional cameras with fewer sight lines and more limited fields of view than in the reference designs. It is shown that the spatial sampling provided by the reference designs is sufficient to satisfy target measurement requirements and that some reduction in field of view may be permissible. The accuracy of measurements with 16 N-based compact cameras is not yet established, and they fail to satisfy requirements for parameter range and time resolution by large margins

  5. The Galley Parallel File System

    Science.gov (United States)

    Nieuwejaar, Nils; Kotz, David

    1996-01-01

    Most current multiprocessor file systems are designed to use multiple disks in parallel, using the high aggregate bandwidth to meet the growing I/0 requirements of parallel scientific applications. Many multiprocessor file systems provide applications with a conventional Unix-like interface, allowing the application to access multiple disks transparently. This interface conceals the parallelism within the file system, increasing the ease of programmability, but making it difficult or impossible for sophisticated programmers and libraries to use knowledge about their I/O needs to exploit that parallelism. In addition to providing an insufficient interface, most current multiprocessor file systems are optimized for a different workload than they are being asked to support. We introduce Galley, a new parallel file system that is intended to efficiently support realistic scientific multiprocessor workloads. We discuss Galley's file structure and application interface, as well as the performance advantages offered by that interface.

  6. Fast parallel algorithm for CT image reconstruction.

    Science.gov (United States)

    Flores, Liubov A; Vidal, Vicent; Mayo, Patricia; Rodenas, Francisco; Verdú, Gumersindo

    2012-01-01

    In X-ray computed tomography (CT) the X rays are used to obtain the projection data needed to generate an image of the inside of an object. The image can be generated with different techniques. Iterative methods are more suitable for the reconstruction of images with high contrast and precision in noisy conditions and from a small number of projections. Their use may be important in portable scanners for their functionality in emergency situations. However, in practice, these methods are not widely used due to the high computational cost of their implementation. In this work we analyze iterative parallel image reconstruction with the Portable Extensive Toolkit for Scientific computation (PETSc).

  7. ITER concept definition. V.2

    International Nuclear Information System (INIS)

    1989-01-01

    Volume II of the two volumes describing the concept definition of the International Thermonuclear Experimental Reactor deals with the ITER concept in technical depth, and covers all areas of design of the ITER tokamak. Included are an assessment of the current database for design, scoping studies, rationale for concepts selection, performance flexibility, the ITER concept, the operations and experimental/testing program, ITER parameters and design phase schedule, and research and development specific to ITER. This latter includes a definition of specific research and development tasks, a division of tasks among members, specific milestones, required results, and schedules. Figs and tabs

  8. ITER CTA newsletter. No. 10

    International Nuclear Information System (INIS)

    2002-07-01

    This ITER CTA newsletter issue comprises the ITER backgrounder, which was approved as an official document by the participants in the Negotiations on the ITER Implementation agreement at their fourth meeting, held in Cadarache from 4-6 June 2002, and information about two ITER meetings: one is the third meeting of the ITER parties' designated Safety Representatives, which took place in Cadarache, France from 6-7 June 2002, and the other is the second meeting of the International Tokamak Physics Activity (ITPA) topical group on diagnostics, which was held at General Atomics, San Diego, USA, from 4-8 March 2002

  9. ITER EDA newsletter. V. 7, no. 7

    International Nuclear Information System (INIS)

    1998-07-01

    This newsletter contains the articles: 'Extraordinary ITER council meeting', 'ITER EDA final safety meeting' and 'Summary report of the 3rd combined workshop of the ITER confinement and transport and ITER confinement database and modeling expert groups'

  10. R and D on support to ITER safety assessment

    International Nuclear Information System (INIS)

    Van Dorsselaere, J.P.; Perrault, D.; Barrachin, M.; Bentaib, A.; Bez, J.; Cortes, P.; Seropian, C.; Tregoures, N.; Vendel, J.

    2009-01-01

    After performing its first ITER safety assessment in 2002 on behalf of the French 'Autorite de Surete Nucleaire (ASN)', the French 'Institut de Radioprotection et de Surete Nucleaire (IRSN)' is now analysing the new ITER Fusion facility safety file. The operator delivered this file to the ASN as part of its request for a creation decree, legally necessary before building works can begin on the site. The IRSN first task in following ITER throughout its lifetime is to study the safety approach adopted by the operator and the associated issues. Such a challenging new technology calls for further in-house expertise and so in parallel a R and D program has been set up to support this safety assessment process, now and in the next years. Its main objectives are to identify the key parameters for mastering some risks (that would have been insufficiently justified by the operator) and to perform some verifications with methods and codes independent from the operator's ones. Priority has been given to four technical issues (others could be investigated in the future, like the behaviour of activated corrosion products). The first issue concerns the simulation of accident sequences with the help of the ASTEC European system code, developed by IRSN (jointly with its German counterpart, the GRS) for severe accidents in Pressurised Water Reactors. A preliminary analysis showed that most of its physical models are already applicable, e.g., for thermal-hydraulics in accidents caused by water or air ingress into the vacuum vessel (VV) or dust transport. Work has started in 2008 on some model adaptations, for instance oxidation of VV first wall materials by steam or air, and on validation on the ITER-specific ICE and LOVA experiments. Other model improvements are planned in the next years, as feedback from the work done for the other technical issues and from the code validation. The second issue concerns the risk of gas explosion due to concentrations of hydrogen and carbon

  11. Spirit and prospects of ITER

    Energy Technology Data Exchange (ETDEWEB)

    Velikhov, E.P. [Kurchatov Institute of Atomic Energy, Moscow (Russian Federation)

    2002-10-01

    ITER is the unique and the most straightforward way to study the burning plasma science in the nearest future. ITER has a firm physics ground based on the results from the world tokamaks in terms of confinement, stability, heating, current drive, divertor, energetic particle confinement to an extend required in ITER. The flexibility of ITER will allow the exploration of broad operation space of fusion power, beta, pulse length and Q values in various operational scenarios. Success of the engineering R and D programs has demonstrated that all party has an enough capability to produce all the necessary equipment in agreement with the specifications of ITER. The acquired knowledge and technologies in ITER project allow us to demonstrate the scientific and technical feasibility of a fusion reactor. It can be concluded that ITER must be constructed in the nearest future. (author)

  12. Spirit and prospects of ITER

    International Nuclear Information System (INIS)

    Velikhov, E.P.

    2002-01-01

    ITER is the unique and the most straightforward way to study the burning plasma science in the nearest future. ITER has a firm physics ground based on the results from the world tokamaks in terms of confinement, stability, heating, current drive, divertor, energetic particle confinement to an extend required in ITER. The flexibility of ITER will allow the exploration of broad operation space of fusion power, beta, pulse length and Q values in various operational scenarios. Success of the engineering R and D programs has demonstrated that all party has an enough capability to produce all the necessary equipment in agreement with the specifications of ITER. The acquired knowledge and technologies in ITER project allow us to demonstrate the scientific and technical feasibility of a fusion reactor. It can be concluded that ITER must be constructed in the nearest future. (author)

  13. Is Monte Carlo embarrassingly parallel?

    Energy Technology Data Exchange (ETDEWEB)

    Hoogenboom, J. E. [Delft Univ. of Technology, Mekelweg 15, 2629 JB Delft (Netherlands); Delft Nuclear Consultancy, IJsselzoom 2, 2902 LB Capelle aan den IJssel (Netherlands)

    2012-07-01

    Monte Carlo is often stated as being embarrassingly parallel. However, running a Monte Carlo calculation, especially a reactor criticality calculation, in parallel using tens of processors shows a serious limitation in speedup and the execution time may even increase beyond a certain number of processors. In this paper the main causes of the loss of efficiency when using many processors are analyzed using a simple Monte Carlo program for criticality. The basic mechanism for parallel execution is MPI. One of the bottlenecks turn out to be the rendez-vous points in the parallel calculation used for synchronization and exchange of data between processors. This happens at least at the end of each cycle for fission source generation in order to collect the full fission source distribution for the next cycle and to estimate the effective multiplication factor, which is not only part of the requested results, but also input to the next cycle for population control. Basic improvements to overcome this limitation are suggested and tested. Also other time losses in the parallel calculation are identified. Moreover, the threading mechanism, which allows the parallel execution of tasks based on shared memory using OpenMP, is analyzed in detail. Recommendations are given to get the maximum efficiency out of a parallel Monte Carlo calculation. (authors)

  14. Is Monte Carlo embarrassingly parallel?

    International Nuclear Information System (INIS)

    Hoogenboom, J. E.

    2012-01-01

    Monte Carlo is often stated as being embarrassingly parallel. However, running a Monte Carlo calculation, especially a reactor criticality calculation, in parallel using tens of processors shows a serious limitation in speedup and the execution time may even increase beyond a certain number of processors. In this paper the main causes of the loss of efficiency when using many processors are analyzed using a simple Monte Carlo program for criticality. The basic mechanism for parallel execution is MPI. One of the bottlenecks turn out to be the rendez-vous points in the parallel calculation used for synchronization and exchange of data between processors. This happens at least at the end of each cycle for fission source generation in order to collect the full fission source distribution for the next cycle and to estimate the effective multiplication factor, which is not only part of the requested results, but also input to the next cycle for population control. Basic improvements to overcome this limitation are suggested and tested. Also other time losses in the parallel calculation are identified. Moreover, the threading mechanism, which allows the parallel execution of tasks based on shared memory using OpenMP, is analyzed in detail. Recommendations are given to get the maximum efficiency out of a parallel Monte Carlo calculation. (authors)

  15. A fast and efficient adaptive parallel ray tracing based model for thermally coupled surface radiation in casting and heat treatment processes

    International Nuclear Information System (INIS)

    Fainberg, J; Schaefer, W

    2015-01-01

    A new algorithm for heat exchange between thermally coupled diffusely radiating interfaces is presented, which can be applied for closed and half open transparent radiating cavities. Interfaces between opaque and transparent materials are automatically detected and subdivided into elementary radiation surfaces named tiles. Contrary to the classical view factor method, the fixed unit sphere area subdivision oriented along the normal tile direction is projected onto the surrounding radiation mesh and not vice versa. Then, the total incident radiating flux of the receiver is approximated as a direct sum of radiation intensities of representative “senders” with the same weight factor. A hierarchical scheme for the space angle subdivision is selected in order to minimize the total memory and the computational demands during thermal calculations. Direct visibility is tested by means of a voxel-based ray tracing method accelerated by means of the anisotropic Chebyshev distance method, which reuses the computational grid as a Chebyshev one. The ray tracing algorithm is fully parallelized using MPI and takes advantage of the balanced distribution of all available tiles among all CPU's. This approach allows tracing of each particular ray without any communication. The algorithm has been implemented in a commercial casting process simulation software. The accuracy and computational performance of the new radiation model for heat treatment, investment and ingot casting applications is illustrated using industrial examples. (paper)

  16. ITER EDA newsletter. V. 10, special issue

    International Nuclear Information System (INIS)

    2001-07-01

    This ITER EDA Newsletter includes summaries of the reports of ITER EDA JCT Physics unit about ITER physics R and D during the Engineering Design Activities (EDA), ITER EDA JCT Naka JWC ITER technology R and D during the EDA, and Safety, Environment and Health group of ITER EDA JCT, Garching JWS on EDA activities related to safety

  17. System engineering and configuration management in ITER

    International Nuclear Information System (INIS)

    Chiocchio, S.; Martin, E.; Barabaschi, P.; Bartels, Hans Werner; How, J.; Spears, W.

    2007-01-01

    The construction of ITER will represent a major challenge for the fusion community at large, because of the intrinsic complexity of the tokamak design, the large number of different systems which are all essential for its operation, the worldwide distribution of the design activities and the unusual procurement scheme based on a combination of in-kind and directly funded deliverables. A key requirement for the success of such a large project is that a systematic approach to ensure the consistency of the design with the required performance is adopted. Also, effective project management methods, tools and working practices must be deployed to facilitate the communication and collaboration among the institutions and industries involved in the project. The authors have been involved in the definition and practical implementation of the design integration and configuration control structure inside ITER and in the system engineering process during the selection and optimization of the machine configuration. In parallel, they have assessed design, drawing and documentation management software to be used for the construction phase. Here, they describe the experience gained in recent years, explain the drivers behind the selection of the documents and drawings management systems, and illustrate the scope and issues of the configuration management activities to ensure the congruence of the design, to control and track the design changes and to manage the interfaces among the ITER systems

  18. Gauss-Seidel Iterative Method as a Real-Time Pile-Up Solver of Scintillation Pulses

    Science.gov (United States)

    Novak, Roman; Vencelj, Matja¿

    2009-12-01

    The pile-up rejection in nuclear spectroscopy has been confronted recently by several pile-up correction schemes that compensate for distortions of the signal and subsequent energy spectra artifacts as the counting rate increases. We study here a real-time capability of the event-by-event correction method, which at the core translates to solving many sets of linear equations. Tight time limits and constrained front-end electronics resources make well-known direct solvers inappropriate. We propose a novel approach based on the Gauss-Seidel iterative method, which turns out to be a stable and cost-efficient solution to improve spectroscopic resolution in the front-end electronics. We show the method convergence properties for a class of matrices that emerge in calorimetric processing of scintillation detector signals and demonstrate the ability of the method to support the relevant resolutions. The sole iteration-based error component can be brought below the sliding window induced errors in a reasonable number of iteration steps, thus allowing real-time operation. An area-efficient hardware implementation is proposed that fully utilizes the method's inherent parallelism.

  19. Iteration of adjoint equations

    International Nuclear Information System (INIS)

    Lewins, J.D.

    1994-01-01

    Adjoint functions are the basis of variational methods and now widely used for perturbation theory and its extension to higher order theory as used, for example, in modelling fuel burnup and optimization. In such models, the adjoint equation is to be solved in a critical system with an adjoint source distribution that is not zero but has special properties related to ratios of interest in critical systems. Consequently the methods of solving equations by iteration and accumulation are reviewed to show how conventional methods may be utilized in these circumstances with adequate accuracy. (author). 3 refs., 6 figs., 3 tabs

  20. iterClust: a statistical framework for iterative clustering analysis.

    Science.gov (United States)

    Ding, Hongxu; Wang, Wanxin; Califano, Andrea

    2018-03-22

    In a scenario where populations A, B1 and B2 (subpopulations of B) exist, pronounced differences between A and B may mask subtle differences between B1 and B2. Here we present iterClust, an iterative clustering framework, which can separate more pronounced differences (e.g. A and B) in starting iterations, followed by relatively subtle differences (e.g. B1 and B2), providing a comprehensive clustering trajectory. iterClust is implemented as a Bioconductor R package. andrea.califano@columbia.edu, hd2326@columbia.edu. Supplementary information is available at Bioinformatics online.

  1. A non overlapping parallel domain decomposition method applied to the simplified transport equations

    International Nuclear Information System (INIS)

    Lathuiliere, B.; Barrault, M.; Ramet, P.; Roman, J.

    2009-01-01

    A reactivity computation requires to compute the highest eigenvalue of a generalized eigenvalue problem. An inverse power algorithm is used commonly. Very fine modelizations are difficult to tackle for our sequential solver, based on the simplified transport equations, in terms of memory consumption and computational time. So, we propose a non-overlapping domain decomposition method for the approximate resolution of the linear system to solve at each inverse power iteration. Our method brings to a low development effort as the inner multigroup solver can be re-use without modification, and allows us to adapt locally the numerical resolution (mesh, finite element order). Numerical results are obtained by a parallel implementation of the method on two different cases with a pin by pin discretization. This results are analyzed in terms of memory consumption and parallel efficiency. (authors)

  2. Implementation of a parallel algorithm for spherical SN calculations on the IBM 3090

    International Nuclear Information System (INIS)

    Haghighat, A.; Lawrence, R.D.

    1989-01-01

    Parallel S N algorithms based on domain decomposition in angle are straightforward to develop in Cartesian geometry because the computation of the angular fluxes for a specific discrete ordinate can be performed independently of all other angles. This is not the case for curvilinear geometries, where the angular redistribution component of the discretized streaming operator results in coupling between angular fluxes along adjacent discrete ordinates. Previously, the authors developed a parallel algorithm for S N calculations in spherical geometry and examined its iterative convergence for criticality and detector problems with differing scattering/absorption ratios. In this paper, the authors describe the implementation of the algorithm on an IBM 3090 Model 400 (four processors) and present computational results illustrating the efficiency of the algorithm relative to serial execution

  3. Analysis of the ITER cryoplant operational modes

    International Nuclear Information System (INIS)

    Henry, D.; Journeaux, J.Y.; Roussel, P.; Michel, F.; Poncet, J.M.; Girard, A.; Kalinin, V.; Chesny, P.

    2007-01-01

    In the framework of an EFDA task, CEA is carrying out an analysis of the various ITER cryoplant operational modes. According to the project integration document, ITER is designed to be operated 365 days per year in order to optimize the available time of the Tokamak. It is anticipated that operation will be performed in long periods separated by maintenance periods (e.g. 10 days continuous operation and 1 week break) with annual or bi-annual major shutdown periods of a few months for maintenance, further installation and commissioning. For this operation schedule, auxiliary subsystems like the cryoplant and the cryodistribution have to cope with different heat loads which depend on the different ITER operating states. The cryoplant consists of four identical 4.5 K refrigerators and two 80 K helium loops coupled with two LN2 modules. All of these cryogenic subsystems have to operate in parallel to remove the heat loads from the magnet, 80 K shields, cryopumps and other small users. After a brief recall of the main particularities of a cryogenic system operating in a Tokamak environment, the first part of this study is dedicated to the assessment of the main ITER operation states. A new design of refrigeration loop for the HTS current leads, the updated layout of the cryodistribution system and revised strategy for operations of the cryopumps have been taken into consideration. The relevant normal operating scenarios of the cryoplant are checked for the typical ITER operating states like plasma operation state, short term stand by, short term maintenance, or test and conditioning state. The second part of the paper is dedicated to the abnormal operating modes coming from the magnets and from those generated by the cryoplant itself. The occurrence of a fast discharge or a quench of the magnets generates large heat loads disturbances and produces exceptional high mass flow rates which have to be managed by the cryoplant, while a failure of a cryogenic component induces

  4. EU Developments of the ITER ECRH System

    International Nuclear Information System (INIS)

    Henderson, M.

    2006-01-01

    The electron cyclotron (EC) heating and current drive (H (and) CD) system of ITER will deliver 20 MW/CW in the plasma at 170 GHz for H (and) CD in addition to 2.5 MW/3 s at 120 GHz for plasma start-up. The EC system is composed of power supplies (PS), up to 24 H (and) CD gyrotrons (1 to 2 MW tubes), 3 start-up gyrotrons (1 MW tubes), 24 transmission lines and two sets of launching antennas: equatorial (EL) and upper (UL) launchers. Under the present ITER procurement package the EU is responsible for one third of the H (and) CD 170 GHz gyrotrons, all PSs associated with the H (and) CD system, and the whole set (4) of upper launchers. In all areas of participation, the EU EC partnership (coordinated by the European Fusion Development Association - EFDA) aims toward advancing the technology of each of these subsystems. For example, procurement of Pulse Step Modulator (PSM) HVPS is under consideration, which might have equivalent costs to the present ITER design (thyristor HVPS and HV series switch), but with an increased flexibility in operation and variation in the EC power waveform. The EU is at the forefront in gyrotron research and is developing a 2 MW CW 170 GHz coaxial cavity gyrotron offering an increase in output power while maintaining moderate power densities in the gyrotron cavity and collector. THALES R in collaboration with its EFDA partners (FZK, CRPP, TEKES) is manufacturing a series of prototype tubes in three phases of typically 1 s, 100 s and then CW pulse capacity (∼ 20 10 ). A 2 MW, CW gyrotron test facility is being built at CRPP that will be used to develop the 2 MW coaxial tube, in addition to testing various components required by the EC system. EFDA has undertaken a parallel development of two launcher options: front (FS) and remote (RS) steering, with the aim of providing an optimum launcher for ITER weighing EC physics aspects and operation reliability. The FS launcher (ITER reference design) offers a significant enhancement in physics

  5. Modeling of ELM Dynamics in ITER

    International Nuclear Information System (INIS)

    Pankin, A.Y.; Bateman, G.; Kritz, A.H.; Brennan, D.P.; Snyder, P.B.; Kruger, S.

    2007-01-01

    Edge localized modes (ELMs) are large scale instabilities that alter the H-mode pedestal, reduce the total plasma stored energy, and can result in heat pulses to the divertor plates. These modes can be triggered by pressure driven ballooning modes or by current driven peeling instabilities. In this study, stability analyses are carried out for a series of ITER equilibria that are generated with the TEQ and TOQ equilibrium codes. The H-mode pedestal pressure and parallel component of plasma current density are varied in a systematic way in order to include the relevant parameter space for a specific ITER discharge. Ideal MHD stability codes, DCON, ELITE, and BALOO code, are employed to determine whether or not each ITER equilibrium profile is unstable to peeling or ballooning modes in the pedestal region. Several equilibria that are close to the marginal stability boundary for peeling and ballooning modes are tested with the NIMROD non-ideal MHD code. The effects of finite resistivity are studied in a series of linear NIMROD computations. It is found that the peeling-ballooning stability threshold is very sensitive to the resistivity and viscosity profiles, which vary dramatically over a wide range near the separatrix. Due to the effects of finite resistivity and viscosity, the peeling-ballooning stability threshold is shifted compared to the ideal threshold. A fundamental question in the integrated modeling of ELMy H-mode discharges concerning how much plasma and current density is removed during each ELM crash can be addressed with nonlinear non-ideal MHD simulations. In this study, the NIMROD computer simulations are continued into the nonlinear stage for several ITER equilibria that are marginally unstable to peeling or ballooning modes. The role of two-fluid and finite Larmor radius effects on the ELM dynamics in ITER geometry is examined. The formation of ELM filament structures, which are observed in many existing tokamak experiments, is demonstrated for ITER

  6. ITER assembly and maintenance

    International Nuclear Information System (INIS)

    Honda, T.; Davis, F.; Lousteau, D.

    1991-01-01

    This document is intended to describe the work conducted by the ITER Assembly and Maintenance (A and M) Design Unit and the supporting home teams during the ITER Conceptual Design Activities, carried out from 1988 through 1990. Its content consists of two main sections, i.e., Chapter III, which describes the identified tasks to be performed by the A and M system and a general description of the required equipment; and Chapter IV, which provides a more detailed description of the equipment proposed to perform the assigned tasks. A two-stage R and D program is now planned, i.e., (1) a prototype equipment functional tests using full scale mock-ups and (2) a full scale integration demonstration test facility with real components (vacuum vessel with ports, blanket modules, divertor modules, armor tiles, etc.). Crucial in-vessel and ex-vessel operations and the associated remote handling equipment, including handling of divertor plates and blanket modules will be demonstrated in the first phase, whereby the database needed to proceed with the engineering phase will be acquired. The second phase will demonstrate the ability of the overall system to execute the required maintenance procedures and evaluate the performance of the prototype equipment

  7. Template based parallel checkpointing in a massively parallel computer system

    Science.gov (United States)

    Archer, Charles Jens [Rochester, MN; Inglett, Todd Alan [Rochester, MN

    2009-01-13

    A method and apparatus for a template based parallel checkpoint save for a massively parallel super computer system using a parallel variation of the rsync protocol, and network broadcast. In preferred embodiments, the checkpoint data for each node is compared to a template checkpoint file that resides in the storage and that was previously produced. Embodiments herein greatly decrease the amount of data that must be transmitted and stored for faster checkpointing and increased efficiency of the computer system. Embodiments are directed to a parallel computer system with nodes arranged in a cluster with a high speed interconnect that can perform broadcast communication. The checkpoint contains a set of actual small data blocks with their corresponding checksums from all nodes in the system. The data blocks may be compressed using conventional non-lossy data compression algorithms to further reduce the overall checkpoint size.

  8. The simplified spherical harmonics (SPL) methodology with space and moment decomposition in parallel environments

    International Nuclear Information System (INIS)

    Gianluca, Longoni; Alireza, Haghighat

    2003-01-01

    In recent years, the SP L (simplified spherical harmonics) equations have received renewed interest for the simulation of nuclear systems. We have derived the SP L equations starting from the even-parity form of the S N equations. The SP L equations form a system of (L+1)/2 second order partial differential equations that can be solved with standard iterative techniques such as the Conjugate Gradient (CG). We discretized the SP L equations with the finite-volume approach in a 3-D Cartesian space. We developed a new 3-D general code, Pensp L (Parallel Environment Neutral-particle SP L ). Pensp L solves both fixed source and criticality eigenvalue problems. In order to optimize the memory management, we implemented a Compressed Diagonal Storage (CDS) to store the SP L matrices. Pensp L includes parallel algorithms for space and moment domain decomposition. The computational load is distributed on different processors, using a mapping function, which maps the 3-D Cartesian space and moments onto processors. The code is written in Fortran 90 using the Message Passing Interface (MPI) libraries for the parallel implementation of the algorithm. The code has been tested on the Pcpen cluster and the parallel performance has been assessed in terms of speed-up and parallel efficiency. (author)

  9. Reactor structure and superconducting magnet system of ITER

    International Nuclear Information System (INIS)

    Tada, Eisuke; Yoshida, Kiyoshi; Shibanuma, Kiyoshi; Okuno, Kiyoshi; Tsuji, Hiroshi; Shimamoto, Susumu

    1993-01-01

    Fusion Experimental Reactors are one of the major steps toward realization of the fusion energy and the key objective are to demonstrate the scientific and technological feasibility prior to the Demo Fusion Reactor. ITER (International Thermonuclear Experimental Reactor) is one of experimental reactors and the conceptual design has been completed by the united efforts of USA, USSR, EC and Japan. In parallel with the conceptual design, key technology development in various areas has being conducted. This paper describes the overall design concepts and the latest technological achievements of the ITER reactor structure and superconducting magnet system. (author)

  10. Copper Mountain conference on iterative methods: Proceedings: Volume 2

    Energy Technology Data Exchange (ETDEWEB)

    NONE

    1996-10-01

    This volume (the second of two) contains information presented during the last two days of the Copper Mountain Conference on Iterative Methods held April 9-13, 1996 at Copper Mountain, Colorado. Topics of the sessions held these two days include domain decomposition, Krylov methods, computational fluid dynamics, Markov chains, sparse and parallel basic linear algebra subprograms, multigrid methods, applications of iterative methods, equation systems with multiple right-hand sides, projection methods, and the Helmholtz equation. Selected papers indexed separately for the Energy Science and Technology Database.

  11. An Efficient Solid-phase Parallel Synthesis of 2-Amino and 2-Amidobenzo[d]oxazole Derivatives via Cyclization Reactions of 2-Hydroxyphenylthiourea Resin

    International Nuclear Information System (INIS)

    Jung, Selin; Kim, Seulgi; Lee, Geehyung; Gong, Youngdae

    2012-01-01

    An efficient solid-phase methodology has been developed for the synthesis of 2-amino and 2-amidobenzo[d]-oxazole derivatives. The key step in this procedure involves the preparation of polymer-bound 2-aminobenzo-[d]oxazole resins 4 by cyclization reaction of 2-hydroxyphenylthiourea resin 3. The resin-bound 2-hydroxy-phenylthiourea 3 is produced by the addition of 2-aminophenol to the isothiocyanate-terminated resin 2 and serve as a key intermediate for the linker resin. This core skeleton 2-aminobenzo[d]oxazole resin 4 undergoes functionalization reaction with various electrophiles, such as alkylhalides and acid chlorides to generate 2-amino and 2-amidobenzo[d]oxazole resins 5 and 6 respectively. Finally, 2-amino and 2-amidobenzo[d]oxazole derivatives 7 and 8 are then generated in good yields and purities by cleavage of the respective resins 5 and 6 under trifluoroacetic acid (TFA) in dichloromethane (CH 2 Cl 2 )

  12. Development Of A Parallel Performance Model For The THOR Neutral Particle Transport Code

    Energy Technology Data Exchange (ETDEWEB)

    Yessayan, Raffi; Azmy, Yousry; Schunert, Sebastian

    2017-02-01

    The THOR neutral particle transport code enables simulation of complex geometries for various problems from reactor simulations to nuclear non-proliferation. It is undergoing a thorough V&V requiring computational efficiency. This has motivated various improvements including angular parallelization, outer iteration acceleration, and development of peripheral tools. For guiding future improvements to the code’s efficiency, better characterization of its parallel performance is useful. A parallel performance model (PPM) can be used to evaluate the benefits of modifications and to identify performance bottlenecks. Using INL’s Falcon HPC, the PPM development incorporates an evaluation of network communication behavior over heterogeneous links and a functional characterization of the per-cell/angle/group runtime of each major code component. After evaluating several possible sources of variability, this resulted in a communication model and a parallel portion model. The former’s accuracy is bounded by the variability of communication on Falcon while the latter has an error on the order of 1%.

  13. Parallel algorithms for continuum dynamics

    International Nuclear Information System (INIS)

    Hicks, D.L.; Liebrock, L.M.

    1987-01-01

    Simply porting existing parallel programs to a new parallel processor may not achieve the full speedup possible; to achieve the maximum efficiency may require redesigning the parallel algorithms for the specific architecture. The authors discuss here parallel algorithms that were developed first for the HEP processor and then ported to the CRAY X-MP/4, the ELXSI/10, and the Intel iPSC/32. Focus is mainly on the most recent parallel processing results produced, i.e., those on the Intel Hypercube. The applications are simulations of continuum dynamics in which the momentum and stress gradients are important. Examples of these are inertial confinement fusion experiments, severe breaks in the coolant system of a reactor, weapons physics, shock-wave physics. Speedup efficiencies on the Intel iPSC Hypercube are very sensitive to the ratio of communication to computation. Great care must be taken in designing algorithms for this machine to avoid global communication. This is much more critical on the iPSC than it was on the three previous parallel processors

  14. Accelerated fast iterative shrinkage thresholding algorithms for sparsity-regularized cone-beam CT image reconstruction

    International Nuclear Information System (INIS)

    Xu, Qiaofeng; Sawatzky, Alex; Anastasio, Mark A.; Yang, Deshan; Tan, Jun

    2016-01-01

    Purpose: The development of iterative image reconstruction algorithms for cone-beam computed tomography (CBCT) remains an active and important research area. Even with hardware acceleration, the overwhelming majority of the available 3D iterative algorithms that implement nonsmooth regularizers remain computationally burdensome and have not been translated for routine use in time-sensitive applications such as image-guided radiation therapy (IGRT). In this work, two variants of the fast iterative shrinkage thresholding algorithm (FISTA) are proposed and investigated for accelerated iterative image reconstruction in CBCT. Methods: Algorithm acceleration was achieved by replacing the original gradient-descent step in the FISTAs by a subproblem that is solved by use of the ordered subset simultaneous algebraic reconstruction technique (OS-SART). Due to the preconditioning matrix adopted in the OS-SART method, two new weighted proximal problems were introduced and corresponding fast gradient projection-type algorithms were developed for solving them. We also provided efficient numerical implementations of the proposed algorithms that exploit the massive data parallelism of multiple graphics processing units. Results: The improved rates of convergence of the proposed algorithms were quantified in computer-simulation studies and by use of clinical projection data corresponding to an IGRT study. The accelerated FISTAs were shown to possess dramatically improved convergence properties as compared to the standard FISTAs. For example, the number of iterations to achieve a specified reconstruction error could be reduced by an order of magnitude. Volumetric images reconstructed from clinical data were produced in under 4 min. Conclusions: The FISTA achieves a quadratic convergence rate and can therefore potentially reduce the number of iterations required to produce an image of a specified image quality as compared to first-order methods. We have proposed and investigated

  15. Accelerated fast iterative shrinkage thresholding algorithms for sparsity-regularized cone-beam CT image reconstruction

    Science.gov (United States)

    Xu, Qiaofeng; Yang, Deshan; Tan, Jun; Sawatzky, Alex; Anastasio, Mark A.

    2016-01-01

    Purpose: The development of iterative image reconstruction algorithms for cone-beam computed tomography (CBCT) remains an active and important research area. Even with hardware acceleration, the overwhelming majority of the available 3D iterative algorithms that implement nonsmooth regularizers remain computationally burdensome and have not been translated for routine use in time-sensitive applications such as image-guided radiation therapy (IGRT). In this work, two variants of the fast iterative shrinkage thresholding algorithm (FISTA) are proposed and investigated for accelerated iterative image reconstruction in CBCT. Methods: Algorithm acceleration was achieved by replacing the original gradient-descent step in the FISTAs by a subproblem that is solved by use of the ordered subset simultaneous algebraic reconstruction technique (OS-SART). Due to the preconditioning matrix adopted in the OS-SART method, two new weighted proximal problems were introduced and corresponding fast gradient projection-type algorithms were developed for solving them. We also provided efficient numerical implementations of the proposed algorithms that exploit the massive data parallelism of multiple graphics processing units. Results: The improved rates of convergence of the proposed algorithms were quantified in computer-simulation studies and by use of clinical projection data corresponding to an IGRT study. The accelerated FISTAs were shown to possess dramatically improved convergence properties as compared to the standard FISTAs. For example, the number of iterations to achieve a specified reconstruction error could be reduced by an order of magnitude. Volumetric images reconstructed from clinical data were produced in under 4 min. Conclusions: The FISTA achieves a quadratic convergence rate and can therefore potentially reduce the number of iterations required to produce an image of a specified image quality as compared to first-order methods. We have proposed and investigated

  16. Rokkasho: Japanese site for ITER

    International Nuclear Information System (INIS)

    Ohtake, S.; Yamaguchi, V.; Matsuda, S.; Kishimoto, H.

    2003-01-01

    The Atomic Energy Commission of Japan authorized ITER as the core machine of the Third Phase Basic Program of Fusion Energy Development. After a series of discussions in the Atomic Energy Commission and the Council of Science and Technology Policy, Japanese Government concluded formally with the Cabinet Agreement on 31 May 2002 that Japan should participate in the ITER Project and offer the Rokkasho-Mura site for construction of ITER to the Negotiations among Canada (CA), the European Union (EU), Japan (JA), and the Russian Federation (RF). The JA site proposal is now under the international assessment in the framework of the ITER Negotiations. (author)

  17. IAEA activities related to ITER

    International Nuclear Information System (INIS)

    Dolan, T.J.; Schneider, U.

    2001-01-01

    As agreed between the IAEA and the ITER Parties, special sessions are dedicated to ITER at the IAEA Fusion Energy Conferences. At the 18th IAEA Fusion Energy Conference, held on 4-10 October 2000 in Sorrento, Italy, in the Artsimovich-Kadomtsev Memorial opening session there were special lectures by Carlo Rubbia (President, ENEA, Italy), A. Arima (Japan), and E.P. Velikhov (Russia); an overview talk on ITER by R. Aymar (ITER Director); and a talk on the FTU experiment by F. Romanelli. In total, 573 participants from 34 countries presented 389 papers (including 11 post-deadline papers and the 4 summaries)

  18. Efficient parallel implementations of QM/MM-REMD (quantum mechanical/molecular mechanics-replica-exchange MD) and umbrella sampling: isomerization of H2O2 in aqueous solution.

    Science.gov (United States)

    Fedorov, Dmitri G; Sugita, Yuji; Choi, Cheol Ho

    2013-07-03

    An efficient parallel implementation of QM/MM-based replica-exchange molecular dynamics (REMD) as well as umbrella samplings techniques was proposed by adopting the generalized distributed data interface (GDDI). Parallelization speed-up of 40.5 on 48 cores was achieved, making our QM/MM-MD engine a robust tool for studying complex chemical dynamics in solution. They were comparatively used to study the torsional isomerization of hydrogen peroxide in aqueous solution. All results by QM/MM-REMD and QM/MM umbrella sampling techniques yielded nearly identical potentials of mean force (PMFs) regardless of the particular QM theories for solute, showing that the overall dynamics are mainly determined by solvation. Although the entropic penalty of solvent rearrangements exists in cisoid conformers, it was found that both strong intermolecular hydrogen bonding and dipole-dipole interactions preferentially stabilize them in solution, reducing the torsional free-energy barrier at 0° by about 3 kcal/mol as compared to that in gas phase.

  19. Variational iteration method for one dimensional nonlinear thermoelasticity

    International Nuclear Information System (INIS)

    Sweilam, N.H.; Khader, M.M.

    2007-01-01

    This paper applies the variational iteration method to solve the Cauchy problem arising in one dimensional nonlinear thermoelasticity. The advantage of this method is to overcome the difficulty of calculation of Adomian's polynomials in the Adomian's decomposition method. The numerical results of this method are compared with the exact solution of an artificial model to show the efficiency of the method. The approximate solutions show that the variational iteration method is a powerful mathematical tool for solving nonlinear problems

  20. Massively Parallel Finite Element Programming

    KAUST Repository

    Heister, Timo

    2010-01-01

    Today\\'s large finite element simulations require parallel algorithms to scale on clusters with thousands or tens of thousands of processor cores. We present data structures and algorithms to take advantage of the power of high performance computers in generic finite element codes. Existing generic finite element libraries often restrict the parallelization to parallel linear algebra routines. This is a limiting factor when solving on more than a few hundreds of cores. We describe routines for distributed storage of all major components coupled with efficient, scalable algorithms. We give an overview of our effort to enable the modern and generic finite element library deal.II to take advantage of the power of large clusters. In particular, we describe the construction of a distributed mesh and develop algorithms to fully parallelize the finite element calculation. Numerical results demonstrate good scalability. © 2010 Springer-Verlag.

  1. Massively Parallel Finite Element Programming

    KAUST Repository

    Heister, Timo; Kronbichler, Martin; Bangerth, Wolfgang

    2010-01-01

    Today's large finite element simulations require parallel algorithms to scale on clusters with thousands or tens of thousands of processor cores. We present data structures and algorithms to take advantage of the power of high performance computers in generic finite element codes. Existing generic finite element libraries often restrict the parallelization to parallel linear algebra routines. This is a limiting factor when solving on more than a few hundreds of cores. We describe routines for distributed storage of all major components coupled with efficient, scalable algorithms. We give an overview of our effort to enable the modern and generic finite element library deal.II to take advantage of the power of large clusters. In particular, we describe the construction of a distributed mesh and develop algorithms to fully parallelize the finite element calculation. Numerical results demonstrate good scalability. © 2010 Springer-Verlag.

  2. ITER CTA newsletter. No. 13, October 2002

    International Nuclear Information System (INIS)

    2002-11-01

    This ITER CTA newsletter issue comprises concise information about an ITER related meeting concerning the joint implementation of ITER - the fifth ITER Negotiations Meeting - which was held in Toronto, Canada, 19-20 September, 2002, and information about assessment of the possible ITER site in Clarington, Ontario, Canada, which was the subject of the first official stage of the Joint Assessment of Specific Sites (JASS) for the ITER Project. This assessment was completed just before the Fifth ITER Negotiations Meeting

  3. Parallel plasma fluid turbulence calculations

    International Nuclear Information System (INIS)

    Leboeuf, J.N.; Carreras, B.A.; Charlton, L.A.; Drake, J.B.; Lynch, V.E.; Newman, D.E.; Sidikman, K.L.; Spong, D.A.

    1994-01-01

    The study of plasma turbulence and transport is a complex problem of critical importance for fusion-relevant plasmas. To this day, the fluid treatment of plasma dynamics is the best approach to realistic physics at the high resolution required for certain experimentally relevant calculations. Core and edge turbulence in a magnetic fusion device have been modeled using state-of-the-art, nonlinear, three-dimensional, initial-value fluid and gyrofluid codes. Parallel implementation of these models on diverse platforms--vector parallel (National Energy Research Supercomputer Center's CRAY Y-MP C90), massively parallel (Intel Paragon XP/S 35), and serial parallel (clusters of high-performance workstations using the Parallel Virtual Machine protocol)--offers a variety of paths to high resolution and significant improvements in real-time efficiency, each with its own advantages. The largest and most efficient calculations have been performed at the 200 Mword memory limit on the C90 in dedicated mode, where an overlap of 12 to 13 out of a maximum of 16 processors has been achieved with a gyrofluid model of core fluctuations. The richness of the physics captured by these calculations is commensurate with the increased resolution and efficiency and is limited only by the ingenuity brought to the analysis of the massive amounts of data generated

  4. ITER waste management

    International Nuclear Information System (INIS)

    Rosanvallon, S.; Na, B.C.; Benchikhoune, M.; Uzan, J. Elbez; Gastaldi, O.; Taylor, N.; Rodriguez, L.

    2010-01-01

    ITER will produce solid radioactive waste during its operation (arising from the replacement of components and from process and housekeeping waste) and during decommissioning (de-activation phase and dismantling). The waste will be activated by neutrons of energies up to 14 MeV and potentially contaminated by activated corrosion products, activated dust and tritium. This paper describes the waste origin, the waste classification as a function of the French national agency for radioactive waste management (ANDRA), the optimization process put in place to reduce the waste radiotoxicity and volumes, the estimated waste amount based on the current design and maintenance procedure, and the overall strategy from component removal to final disposal anticipated at this stage of the project.

  5. Iterated crowdsourcing dilemma game

    Science.gov (United States)

    Oishi, Koji; Cebrian, Manuel; Abeliuk, Andres; Masuda, Naoki

    2014-02-01

    The Internet has enabled the emergence of collective problem solving, also known as crowdsourcing, as a viable option for solving complex tasks. However, the openness of crowdsourcing presents a challenge because solutions obtained by it can be sabotaged, stolen, and manipulated at a low cost for the attacker. We extend a previously proposed crowdsourcing dilemma game to an iterated game to address this question. We enumerate pure evolutionarily stable strategies within the class of so-called reactive strategies, i.e., those depending on the last action of the opponent. Among the 4096 possible reactive strategies, we find 16 strategies each of which is stable in some parameter regions. Repeated encounters of the players can improve social welfare when the damage inflicted by an attack and the cost of attack are both small. Under the current framework, repeated interactions do not really ameliorate the crowdsourcing dilemma in a majority of the parameter space.

  6. ITER cooling systems

    International Nuclear Information System (INIS)

    Natalizio, A.; Hollies, R.E.; Sochaski, R.O.; Stubley, P.H.

    1992-06-01

    The ITER reference system uses low-temperature water for heat removal and high-temperature helium for bake-out. As these systems share common equipment, bake-out cannot be performed until the cooling system is drained and dried, and the reactor cannot be started until the helium has been purged from the cooling system. This study examines the feasibility of using a single high-temperature fluid to perform both heat removal and bake-out. The high temperature required for bake-out would also be in the range for power production. The study examines cost, operational benefits, and impact on reactor safety of two options: a high-pressure water system, and a low-pressure organic system. It was concluded that the cost savings and operational benefits are significant; there are no significant adverse safety impacts from operating either the water system or the organic system; and the capital costs of both systems are comparable

  7. Divertor development for ITER

    International Nuclear Information System (INIS)

    Janeschitz, G.; Ando, T.; Antipenkov, A.; Barabash, V.; Chiocchio, S.; Federici, G.; Ibbott, C.; Jakeman, R.; Matera, R.; Martin, E.; Parker, R.; Tivey, R.; Pacher, H.D.

    1998-01-01

    The requirements for the ITER divertor design, i.e. power and He ash exhaust, neutral leakage control, lifetime, disruption load resistance and exchange by remote handling, are described in this paper. These requirements and the physics requirements for detached and semi-attached operation result in the vertical target configuration. This is realised by a concept incorporating 60 cassettes carrying the high heat flux components. The armour choice for these components is CFC monoblock in the strike zone near at the lower part of the vertical target, and a W brush elsewhere. Cooling is by swirl tubes or hypervapotrons depending on the component. The status of the heat sink and joining technology R and D is given. Finally, the resulting design of the high heat flux components is presented. (orig.)

  8. Robust Cell Detection for Large-Scale 3D Microscopy Using GPU-Accelerated Iterative Voting

    Directory of Open Access Journals (Sweden)

    Leila Saadatifard

    2018-04-01

    Full Text Available High-throughput imaging techniques, such as Knife-Edge Scanning Microscopy (KESM,are capable of acquiring three-dimensional whole-organ images at sub-micrometer resolution. These images are challenging to segment since they can exceed several terabytes (TB in size, requiring extremely fast and fully automated algorithms. Staining techniques are limited to contrast agents that can be applied to large samples and imaged in a single pass. This requires maximizing the number of structures labeled in a single channel, resulting in images that are densely packed with spatial features. In this paper, we propose a three-dimensional approach for locating cells based on iterative voting. Due to the computational complexity of this algorithm, a highly efficient GPU implementation is required to make it practical on large data sets. The proposed algorithm has a limited number of input parameters and is highly parallel.

  9. Towards the procurement of the ITER divertor

    International Nuclear Information System (INIS)

    Merola, M.; Tivey, R.; Martin, A.; Pick, M.

    2006-01-01

    The procurement of the ITER divertor is planned to start in 2009. On the basis of the present common understanding of the sharing of the ITER components, the Japanese Participating Team (JAPT) will supply the outer vertical target, the Russian Federation (RF) PT the dome liner and will perform the high heat flux testing, the EU PT will supply the inner vertical targets and the cassette bodies, including final assembly of the divertor plasma-facing components (PFCs). The manufacturing of the PFCs of the ITER divertor represents a challenging endeavor due to the high technologies which are involved, and due to the unprecedented series production. To mitigate the associated risks, special arrangements need to be put in place prior to and during procurement to ensure quality and to keep to the time schedule. Before procurement can start, an ITER review of the qualification and production capability of each candidate PT is planned. Well in advance of the assumed start of the procurement, each PT which would like to contribute to the divertor PFC procurement, should first demonstrate its technical qualification to carry out the procurement with the required quality, and in an efficient and timely manner. Appropriate precautions, like subdivision of the procurement into stages, are also to be adopted during the procurement phase to mitigate the consequences of possible unexpected manufacturing problems. In preparation for writing the procurement specification for the vertical targets, the topic of setting acceptance criteria is also being addressed. This activity has the objective of defining workable acceptance criteria for the PFC armour joints. A complete set of analyses is also in progress to assess the latest design modifications against the design requirements. This task includes neutronic, shielding, thermo-mechanical and electromagnetic analyses. More than half of the ITER plasma parameters that must be measured and the related diagnostics are located in the

  10. Second meeting of the ITER Preparatory Committee

    International Nuclear Information System (INIS)

    Drew, M.

    2003-01-01

    The committee charged to oversee the ITER ITA (ITER transitional arrangements) the ITER preparatory committee, held its second meeting on 24 September at the JET facilities at Culham, UK. Dr. Umberto Finzi of the European Commission was chairman. This meeting was also the first since the succession by Dr. Yasuo Shimomura to Dr. Robert Aymar as Interim Project Leader (IPL). Welcoming Dr. Shimomura in his new capacity, the Committee paid tribute to the outstanding contributions of his predecessor to the definition, design and promotion of ITER, and expressed the gratitude of all Participants to Dr. Aymar and its best wishes for future success in his new appointment.The technical activities of the ITA were the main focus of the Committee's discussions. The Committee took note of the IPL's Status Report on ITA Technical Activities and endorsed the IPL's proposals for the top level structure of the International Team, including the designation of Dr. Pietro Barabaschi as Deputy to the IPL. The Committee took note of the IPL's proposals on Participants' contributions to the ITA and of the Participants' stated intentions and expectations in this regard. Several Delegations pointed out that access to necessary resources would depend strongly on progress made towards the Agreement. All Participants were invited, in the shared interests of the project, to respond constructively to the specific technical areas where the IPL reported a lack of resources. Following a presentation from the IT on Project Management Tools, the Committee expressed support, in general, for the proposed strategy designed to provide the current team with the CAD and Data Management elements necessary to prepare for an efficient start of ITER construction, and asked the IT Leader to report on an estimate and time profile of expenditure during the period to mid-2004. The Committee supported the proposals to re-establish the ITER Test Blanket Working. The Committee agreed that the phasing of planned

  11. ITER-FEAT - outline design report. Report by the ITER Director. ITER meeting, Tokyo, January 2000

    International Nuclear Information System (INIS)

    2001-01-01

    It is now possible to define the key elements of ITER-FEAT. This report provides the results, to date, of the joint work of the Special Working Group in the form of an Outline Design Report on the ITER-FEAT design which, subject to the views of ITER Council and of the Parties, will be the focus of further detailed design work and analysis in order to provide to the Parties a complete and fully integrated engineering design within the framework of the ITER EDA extension

  12. Automatic Control of ITER-like Structures

    International Nuclear Information System (INIS)

    Bosia, G.; Bremond, S.

    2005-01-01

    In ITER Ion Cyclotron System requires a power transfer efficiency in excess of 90% from power source to plasma in quasi continuous operation. This implies the availability of a control system capable of optimizing the array radiation spectrum, automatically acquiring impedance match between the power source and the plasma loaded array at the beginning of the power pulse and maintaining it against load variations due to plasma position and plasma edge parameters fluctuations, rapidly detecting voltage breakdowns in the array and/or in the transmission system and reliably discriminating them from fast load variations. In this paper a proposal for a practical ITER control system, including power, phase, frequency and impedance matching is described. (authors)

  13. A parallel algorithm for the non-symmetric eigenvalue problem

    International Nuclear Information System (INIS)

    Sidani, M.M.

    1991-01-01

    An algorithm is presented for the solution of the non-symmetric eigenvalue problem. The algorithm is based on a divide-and-conquer procedure that provides initial approximations to the eigenpairs, which are then refined using Newton iterations. Since the smaller subproblems can be solved independently, and since Newton iterations with different initial guesses can be started simultaneously, the algorithm - unlike the standard QR method - is ideal for parallel computers. The author also reports on his investigation of deflation methods designed to obtain further eigenpairs if needed. Numerical results from implementations on a host of parallel machines (distributed and shared-memory) are presented

  14. ITER CTA newsletter. No. 8

    International Nuclear Information System (INIS)

    2002-05-01

    This ITER CTA newsletter contains information about the Third Negotiations Meeting on the Joint Implementation of ITER held in Moscow on 23-24 April 2002 and about the visit of Canadian officials and members of the Canadian delegation to RF research center 'Kurchatov Institute'

  15. ITER physics design guidelines: 1989

    International Nuclear Information System (INIS)

    Uckan, N.A.

    1990-01-01

    The physics basis for ITER has been developed from an assessment of the results of the last twenty-five years of tokamak research and from detailed analysis of important physics issues specifically for the ITER design. This assessment has been carried out with direct participation of members of the experimental teams of each of the major tokamaks in the world fusion program through participation in ITER workshops, contributions to the ITER Physics R and D Program, and by direct contacts between the ITER team and the cognizant experimentalists. Extrapolations to the present data base, where needed, are made in the most cautious way consistent with engineering constraints and performance goals of the ITER. In cases where a working assumptions had to be introduced, which is insufficiently supported by the present data base, is explicitly stated. While a strong emphasis has been placed on the physics credibility of the design, the guidelines also take into account that ITER should be designed to be able to take advantage of potential improvements in tokamak physics that may occur before and during the operation of ITER. (author). 33 refs

  16. ITER management advisory committee meeting

    International Nuclear Information System (INIS)

    Yoshikawa, M.

    2001-01-01

    The ITER Management Advisory Committee (MAC) Meeting was held in Vienna on 16 July 2001. It was the last MAC Meeting and the main topics were consideration of the report by the Director on the ITER EDA status, review of the Work Programme, review of the Joint Fund and arrangements for termination and wind-up of the EDA

  17. ITER CTA newsletter. No. 7

    International Nuclear Information System (INIS)

    2002-04-01

    This issue of ITER CTA newsletter contains information about the meeting of the ITER CTA project board, which took place in Moscow, Russian Federation on 22 April 2002 on the occasion of the Third Negotiators Meeting (N3), and about the meeting 'EU divertor celebration day' organized on 16 January 2002 at Plansee AG, Reutte, Austria

  18. ITER EDA Newsletter. V. 3, no. 8

    International Nuclear Information System (INIS)

    1994-08-01

    This ITER EDA (Engineering Design Activities) Newsletter issue reports on the sixth ITER council meeting; introduces the newly appointed ITER director and reports on his address to the ITER council. The vacuum tank for the ITER model coil testing, installed at JAERI, Naka, Japan is also briefly described

  19. ITER ITA newsletter. No. 6, July 2003

    International Nuclear Information System (INIS)

    2003-09-01

    This issue of ITER ITA (ITER transitional Arrangements) newsletter contains concise information about ITER related activities. One of them was the farewell party for for Annick Lyraud and Robert Aymar, who will take up his position as Director-General of CERN in January 2004, another is information about Dr. Yasuo Shimomura, ITER interim project leader, and ITER technical work during the transitional arrangements

  20. ITER ITA newsletter. No. 8, September 2003

    International Nuclear Information System (INIS)

    2003-10-01

    This issue of ITER ITA (ITER transitional Arrangements) newsletter contains concise information about ITER related activities including Robert Aymar's leaving ITER for CERN, ITER related issues at the IAEA General Conference and status and prospects of thermonuclear power and activity during the ITA on materials foe vessel and in-vessel components

  1. ITER interim design report package documents

    International Nuclear Information System (INIS)

    1996-01-01

    This publication contains the Excerpt from the ITER Council (IC-8), the ITER Interim Design Report, Cost Review and Safety Analysis, ITER Site Requirements and ITER Site Design Assumptions and the Excerpt from the ITER Council (IC-9). 8 figs, 2 tabs

  2. Solution of the within-group multidimensional discrete ordinates transport equations on massively parallel architectures

    Science.gov (United States)

    Zerr, Robert Joseph

    2011-12-01

    thousands of processors. The PGS method does outperform SI DSA for the periodic heterogeneous layers (PHL) configuration problems. Although this demonstrates a relative strength/weakness between the two methods, the practicality of these problems is much less, further limiting instances where it would be beneficial to select ITMM over SI DSA. The results strongly indicate a need for a robust, stable, and efficient acceleration method (or preconditioner for PGMRES). The spatial multigrid (SMG) method is currently incomplete in that it does not work for all cases considered and does not effectively improve the convergence rate for all values of scattering ratio c or cell dimension h. Nevertheless, it does display the desired trend for highly scattering, optically thin problems. That is, it tends to lower the rate of growth of number of iterations with increasing number of processes, P, while not increasing the number of additional operations per iteration to the extent that the total execution time of the rapidly converging accelerated iterations exceeds that of the slower unaccelerated iterations. A predictive parallel performance model has been developed for the PBJ method. Timing tests were performed such that trend lines could be fitted to the data for the different components and used to estimate the execution times. Applied to the weak scaling results, the model notably underestimates construction time, but combined with a slight overestimation in iterative solution time, the model predicts total execution time very well for large P. It also does a decent job with the strong scaling results, closely predicting the construction time and time per iteration, especially as P increases. Although not shown to be competitive up to 1,024 processing elements with the current state of the art, the parallelized ITMM exhibits promising scaling trends. Ultimately, compared to the KBA method, the parallelized ITMM may be found to be a very attractive option for transport calculations

  3. Plasma control concepts for ITER

    International Nuclear Information System (INIS)

    Lister, J.B.; Nieswand, C.

    1997-01-01

    This overview paper skims over a wide range of issues related to the control of ITER plasmas. Although operation of the ITER project will require extensive developmental work to achieve the degree of control required, there is no indication that any of the identified problems will present overwhelming difficulties compared with the operation of present tokamaks. However, the precision of control required and the degree of automation of the final ITER plasma control system will present a challenge which is somewhat greater than for present tokamaks. In order to operate ITER optimally, integrated use of a large amount of diagnostic information will be necessary, evaluated and interpreted automatically. This will challenge both the diagnostics themselves and their supporting interpretation codes. The intervening years will provide us with the opportunity to implement and evaluate most of the new features required for ITER on existing tokamaks, with the exception of the control of an ignited plasma. (author) 7 figs., 7 refs

  4. ITER technical advisory committee meeting

    International Nuclear Information System (INIS)

    Fujiwara, M.

    2001-01-01

    The 17th Meeting of the ITER Technical Advisory Committee (TAC-17) was held on February 19-22, the ITER Garching Work Site in Germany. The objective of the meeting was to review the Draft Final Design Report of ITER-FEAT and assess the ability of the self-consistent overall design both to satisfy the technical objectives previously defined and to meet the cost limitations. TAC-17 was also organized to confirm that the design and critical elements, with emphasis on the key recommendations made at previous TAC meetings, are such as to extend the confidence in starting ITER construction. It was also intended to provide the ITER Council, scheduled to meet on 27 and 28 February in Toronto, with a technical assessment and key recommendations of the above mentioned report

  5. RF modeling of the ITER-relevant lower hybrid antenna

    International Nuclear Information System (INIS)

    Hillairet, J.; Ceccuzzi, S.; Belo, J.; Marfisi, L.; Artaud, J.F.; Bae, Y.S.; Berger-By, G.; Bernard, J.M.; Cara, Ph.; Cardinali, A.; Castaldo, C.; Cesario, R.; Decker, J.; Delpech, L.; Ekedahl, A.; Garcia, J.; Garibaldi, P.; Goniche, M.; Guilhem, D.; Hoang, G.T.

    2011-01-01

    In the frame of the EFDA task HCD-08-03-01, a 5 GHz Lower Hybrid system which should be able to deliver 20 MW CW on ITER and sustain the expected high heat fluxes has been reviewed. The design and overall dimensions of the key RF elements of the launcher and its subsystem has been updated from the 2001 design in collaboration with ITER organization. Modeling of the LH wave propagation and absorption into the plasma shows that the optimal parallel index must be chosen between 1.9 and 2.0 for the ITER steady-state scenario. The present study has been made with n || = 2.0 but can be adapted for n || = 1.9. Individual components have been studied separately giving confidence on the global RF design of the whole antenna.

  6. Adaptive control in multi-threaded iterated integration

    International Nuclear Information System (INIS)

    Doncker, Elise de; Yuasa, Fukuko

    2013-01-01

    In recent years we have developed a technique for the direct computation of Feynman loop-integrals, which are notorious for the occurrence of integrand singularities. Especially for handling singularities in the interior of the domain, we approximate the iterated integral using an adaptive algorithm in the coordinate directions. We present a novel multi-core parallelization scheme for adaptive multivariate integration, by assigning threads to the rule evaluations in the outer dimensions of the iterated integral. The method ensures a large parallel granularity as each function evaluation by itself comprises an integral over the lower dimensions, while the application of the threads is governed by the adaptive control in the outer level. We give computational results for a test set of 3- to 6-dimensional integrals, where several problems exhibit a loop integral behavior.

  7. Performance Analysis of Fission and Surface Source Iteration Method for Domain Decomposed Monte Carlo Whole-Core Calculation

    International Nuclear Information System (INIS)

    Jo, Yu Gwon; Oh, Yoo Min; Park, Hyang Kyu; Park, Kang Soon; Cho, Nam Zin

    2016-01-01

    In this paper, two issues in the FSS iteration method, i.e., the waiting time for surface source data and the variance biases in local tallies are investigated for the domain decomposed, 3-D continuous-energy whole-core calculation. The fission sources are provided as usual, while the surface sources are provided by banking MC particles crossing local domain boundaries. The surface sources serve as boundary conditions for nonoverlapping local problems, so that each local problem can be solved independently. In this paper, two issues in the FSS iteration are investigated. One is quantifying the waiting time of processors to receive surface source data. By using nonblocking communication, 'time penalty' to wait for the arrival of the surface source data is reduced. The other important issue is underestimation of the sample variance of the tally because of additional inter-iteration correlations in surface sources. From the numerical results on a 3-D whole-core test problem, it is observed that the time penalty is negligible in the FSS iteration method and that the real variances of both pin powers and assembly powers are estimated by the HB method. For those purposes, three cases; Case 1 (1 local domain), Case 2 (4 local domains), Case 3 (16 local domains) are tested. For both Cases 2 and 3, the time penalties for waiting are negligible compared to the source-tracking times. However, for finer divisions of local domains, the loss of parallel efficiency caused by the different number of sources for local domains in symmetric locations becomes larger due to the stochastic errors in source distributions. For all test cases, the HB method very well estimates the real variances of local tallies. However, it is also noted that the real variances of local tallies estimated by the HB method show slightly smaller than the real variances obtained from 30 independent batch runs and the deviations become larger for finer divisions of local domains. The batch size used for the HB

  8. Parallel Programming with Intel Parallel Studio XE

    CERN Document Server

    Blair-Chappell , Stephen

    2012-01-01

    Optimize code for multi-core processors with Intel's Parallel Studio Parallel programming is rapidly becoming a "must-know" skill for developers. Yet, where to start? This teach-yourself tutorial is an ideal starting point for developers who already know Windows C and C++ and are eager to add parallelism to their code. With a focus on applying tools, techniques, and language extensions to implement parallelism, this essential resource teaches you how to write programs for multicore and leverage the power of multicore in your programs. Sharing hands-on case studies and real-world examples, the

  9. Iterative methods for 3D implicit finite-difference migration using the complex Padé approximation

    International Nuclear Information System (INIS)

    Costa, Carlos A N; Campos, Itamara S; Costa, Jessé C; Neto, Francisco A; Schleicher, Jörg; Novais, Amélia

    2013-01-01

    Conventional implementations of 3D finite-difference (FD) migration use splitting techniques to accelerate performance and save computational cost. However, such techniques are plagued with numerical anisotropy that jeopardises the correct positioning of dipping reflectors in the directions not used for the operator splitting. We implement 3D downward continuation FD migration without splitting using a complex Padé approximation. In this way, the numerical anisotropy is eliminated at the expense of a computationally more intensive solution of a large-band linear system. We compare the performance of the iterative stabilized biconjugate gradient (BICGSTAB) and that of the multifrontal massively parallel direct solver (MUMPS). It turns out that the use of the complex Padé approximation not only stabilizes the solution, but also acts as an effective preconditioner for the BICGSTAB algorithm, reducing the number of iterations as compared to the implementation using the real Padé expansion. As a consequence, the iterative BICGSTAB method is more efficient than the direct MUMPS method when solving a single term in the Padé expansion. The results of both algorithms, here evaluated by computing the migration impulse response in the SEG/EAGE salt model, are of comparable quality. (paper)

  10. Experiences in Data-Parallel Programming

    Directory of Open Access Journals (Sweden)

    Terry W. Clark

    1997-01-01

    Full Text Available To efficiently parallelize a scientific application with a data-parallel compiler requires certain structural properties in the source program, and conversely, the absence of others. A recent parallelization effort of ours reinforced this observation and motivated this correspondence. Specifically, we have transformed a Fortran 77 version of GROMOS, a popular dusty-deck program for molecular dynamics, into Fortran D, a data-parallel dialect of Fortran. During this transformation we have encountered a number of difficulties that probably are neither limited to this particular application nor do they seem likely to be addressed by improved compiler technology in the near future. Our experience with GROMOS suggests a number of points to keep in mind when developing software that may at some time in its life cycle be parallelized with a data-parallel compiler. This note presents some guidelines for engineering data-parallel applications that are compatible with Fortran D or High Performance Fortran compilers.

  11. A survey of parallel multigrid algorithms

    Science.gov (United States)

    Chan, Tony F.; Tuminaro, Ray S.

    1987-01-01

    A typical multigrid algorithm applied to well-behaved linear-elliptic partial-differential equations (PDEs) is described. Criteria for designing and evaluating parallel algorithms are presented. Before evaluating the performance of some parallel multigrid algorithms, consideration is given to some theoretical complexity results for solving PDEs in parallel and for executing the multigrid algorithm. The effect of mapping and load imbalance on the partial efficiency of the algorithm is studied.

  12. ITER management advisory committee meeting in NAKA

    International Nuclear Information System (INIS)

    Yoshikawa, M.

    1999-01-01

    The ITER Management Advisory Committee (MAC) Meeting was held on 17 December 1999 in Naka, Japan. The main topics were the ITER EDA Status, Task Status Summary and Work Program and a schedule of ITER meetings

  13. ITER EDA newsletter. V. 7, no. 6

    International Nuclear Information System (INIS)

    1998-06-01

    This newsletter contains the articles: 'ITER representation at the 11th Pacific Basin Nuclear Conference', 'Summary of discussion points and further deliberations in the special committee on the ITER project in the Atomic Energy Commission', and 'ITER radio frequency systems'

  14. ITER EDA newsletter. V. 9, no. 2

    International Nuclear Information System (INIS)

    2000-02-01

    This ITER EDA Newsletter reports on the seventh ITER technical meeting on safety and environment and contains the executive summary of the eleventh ITER scrape-off layer and divertor physics expert group meeting. Individual abstracts have been prepared

  15. SPARSE ELECTROMAGNETIC IMAGING USING NONLINEAR LANDWEBER ITERATIONS

    KAUST Repository

    Desmal, Abdulla; Bagci, Hakan

    2015-01-01

    minimization problem is solved using nonlinear Landweber iterations, where at each iteration a thresholding function is applied to enforce the sparseness-promoting L0/L1-norm constraint. The thresholded nonlinear Landweber iterations are applied to several two

  16. A Parallel Butterfly Algorithm

    KAUST Repository

    Poulson, Jack; Demanet, Laurent; Maxwell, Nicholas; Ying, Lexing

    2014-01-01

    The butterfly algorithm is a fast algorithm which approximately evaluates a discrete analogue of the integral transform (Equation Presented.) at large numbers of target points when the kernel, K(x, y), is approximately low-rank when restricted to subdomains satisfying a certain simple geometric condition. In d dimensions with O(Nd) quasi-uniformly distributed source and target points, when each appropriate submatrix of K is approximately rank-r, the running time of the algorithm is at most O(r2Nd logN). A parallelization of the butterfly algorithm is introduced which, assuming a message latency of α and per-process inverse bandwidth of β, executes in at most (Equation Presented.) time using p processes. This parallel algorithm was then instantiated in the form of the open-source DistButterfly library for the special case where K(x, y) = exp(iΦ(x, y)), where Φ(x, y) is a black-box, sufficiently smooth, real-valued phase function. Experiments on Blue Gene/Q demonstrate impressive strong-scaling results for important classes of phase functions. Using quasi-uniform sources, hyperbolic Radon transforms, and an analogue of a three-dimensional generalized Radon transform were, respectively, observed to strong-scale from 1-node/16-cores up to 1024-nodes/16,384-cores with greater than 90% and 82% efficiency, respectively. © 2014 Society for Industrial and Applied Mathematics.

  17. A Parallel Butterfly Algorithm

    KAUST Repository

    Poulson, Jack

    2014-02-04

    The butterfly algorithm is a fast algorithm which approximately evaluates a discrete analogue of the integral transform (Equation Presented.) at large numbers of target points when the kernel, K(x, y), is approximately low-rank when restricted to subdomains satisfying a certain simple geometric condition. In d dimensions with O(Nd) quasi-uniformly distributed source and target points, when each appropriate submatrix of K is approximately rank-r, the running time of the algorithm is at most O(r2Nd logN). A parallelization of the butterfly algorithm is introduced which, assuming a message latency of α and per-process inverse bandwidth of β, executes in at most (Equation Presented.) time using p processes. This parallel algorithm was then instantiated in the form of the open-source DistButterfly library for the special case where K(x, y) = exp(iΦ(x, y)), where Φ(x, y) is a black-box, sufficiently smooth, real-valued phase function. Experiments on Blue Gene/Q demonstrate impressive strong-scaling results for important classes of phase functions. Using quasi-uniform sources, hyperbolic Radon transforms, and an analogue of a three-dimensional generalized Radon transform were, respectively, observed to strong-scale from 1-node/16-cores up to 1024-nodes/16,384-cores with greater than 90% and 82% efficiency, respectively. © 2014 Society for Industrial and Applied Mathematics.

  18. ITER cooling system

    International Nuclear Information System (INIS)

    Kveton, O.K.

    1990-11-01

    The present specification of the ITER cooling system does not permit its operation with water above 150 C. However, the first wall needs to be heated to higher temperatures during conditioning at 250 C and bake-out at 350 C. In order to use the cooling water for these operations the cooling system would have to operate during conditioning at 37 Bar and during bake-out at 164 Bar. This is undesirable from the safety analysis point of view, and alternative heating methods are to be found. This review suggests that superheated steam or gas heating can be used for both baking and conditioning. The blanket design must consider the use of dual heat transfer media, allowing for change from one to another in both directions. Transfer from water to gas or steam is the most intricate and risky part of the entire heating process. Superheated steam conditioning appears unfavorable. The use of inert gas is recommended, although alternative heating fluids such as organic coolant should be investigated

  19. ITER plasma facing components

    International Nuclear Information System (INIS)

    Kuroda, T.; Vieider, G.; Akiba, M.

    1991-01-01

    This document summarizes results of the Conceptual Design Activities (1988-1990) for the International Thermonuclear Experimental Reactor (ITER) project, namely those that pertain to the plasma facing components of the reactor vessel, of which the main components are the first wall and the divertor plates. After an introduction and an executive summary, the principal functions of the plasma-facing components are delineated, i.e., (i) define the low-impurity region within which the plasma is produced, (ii) absorb the electromagnetic radiation and charged-particle flux from the plasma, and (iii) protect the blanket/shield components from the plasma. A list of critical design issues for the divertor plates and the first wall is given, followed by discussions of the divertor plate design (including the issues of material selection, erosion lifetime, design concepts, thermal and mechanical analysis, operating limits and overall lifetime, tritium inventory, baking and conditioning, safety analysis, manufacture and testing, and advanced divertor concepts) and the first wall design (armor material and design, erosion lifetime, overall design concepts, thermal and mechanical analysis, lifetime and operating limits, tritium inventory, baking and conditioning, safety analysis, manufacture and testing, an alternative first wall design, and the limiters used instead of the divertor plates during start-up). Refs, figs and tabs

  20. Multi-Level iterative methods in computational plasma physics

    International Nuclear Information System (INIS)

    Knoll, D.A.; Barnes, D.C.; Brackbill, J.U.; Chacon, L.; Lapenta, G.

    1999-01-01

    Plasma physics phenomena occur on a wide range of spatial scales and on a wide range of time scales. When attempting to model plasma physics problems numerically the authors are inevitably faced with the need for both fine spatial resolution (fine grids) and implicit time integration methods. Fine grids can tax the efficiency of iterative methods and large time steps can challenge the robustness of iterative methods. To meet these challenges they are developing a hybrid approach where multigrid methods are used as preconditioners to Krylov subspace based iterative methods such as conjugate gradients or GMRES. For nonlinear problems they apply multigrid preconditioning to a matrix-few Newton-GMRES method. Results are presented for application of these multilevel iterative methods to the field solves in implicit moment method PIC, multidimensional nonlinear Fokker-Planck problems, and their initial efforts in particle MHD