WorldWideScience

Sample records for efficient parallel iterative

  1. Parallel S/sub n/ iteration schemes

    International Nuclear Information System (INIS)

    Wienke, B.R.; Hiromoto, R.E.

    1986-01-01

    The iterative, multigroup, discrete ordinates (S/sub n/) technique for solving the linear transport equation enjoys widespread usage and appeal. Serial iteration schemes and numerical algorithms developed over the years provide a timely framework for parallel extension. On the Denelcor HEP, the authors investigate three parallel iteration schemes for solving the one-dimensional S/sub n/ transport equation. The multigroup representation and serial iteration methods are also reviewed. This analysis represents a first attempt to extend serial S/sub n/ algorithms to parallel environments and provides good baseline estimates on ease of parallel implementation, relative algorithm efficiency, comparative speedup, and some future directions. The authors examine ordered and chaotic versions of these strategies, with and without concurrent rebalance and diffusion acceleration. Two strategies efficiently support high degrees of parallelization and appear to be robust parallel iteration techniques. The third strategy is a weaker parallel algorithm. Chaotic iteration, difficult to simulate on serial machines, holds promise and converges faster than ordered versions of the schemes. Actual parallel speedup and efficiency are high and payoff appears substantial

  2. An efficient parallel algorithm: Poststack and prestack Kirchhoff 3D depth migration using flexi-depth iterations

    Science.gov (United States)

    Rastogi, Richa; Srivastava, Abhishek; Khonde, Kiran; Sirasala, Kirannmayi M.; Londhe, Ashutosh; Chavhan, Hitesh

    2015-07-01

    This paper presents an efficient parallel 3D Kirchhoff depth migration algorithm suitable for current class of multicore architecture. The fundamental Kirchhoff depth migration algorithm exhibits inherent parallelism however, when it comes to 3D data migration, as the data size increases the resource requirement of the algorithm also increases. This challenges its practical implementation even on current generation high performance computing systems. Therefore a smart parallelization approach is essential to handle 3D data for migration. The most compute intensive part of Kirchhoff depth migration algorithm is the calculation of traveltime tables due to its resource requirements such as memory/storage and I/O. In the current research work, we target this area and develop a competent parallel algorithm for post and prestack 3D Kirchhoff depth migration, using hybrid MPI+OpenMP programming techniques. We introduce a concept of flexi-depth iterations while depth migrating data in parallel imaging space, using optimized traveltime table computations. This concept provides flexibility to the algorithm by migrating data in a number of depth iterations, which depends upon the available node memory and the size of data to be migrated during runtime. Furthermore, it minimizes the requirements of storage, I/O and inter-node communication, thus making it advantageous over the conventional parallelization approaches. The developed parallel algorithm is demonstrated and analysed on Yuva II, a PARAM series of supercomputers. Optimization, performance and scalability experiment results along with the migration outcome show the effectiveness of the parallel algorithm.

  3. Parallel iterative decoding of transform domain Wyner-Ziv video using cross bitplane correlation

    DEFF Research Database (Denmark)

    Luong, Huynh Van; Huang, Xin; Forchhammer, Søren

    2011-01-01

    decoding scheme is proposed to improve the coding efficiency of TDWZ video codecs. The proposed parallel iterative LDPC decoding scheme is able to utilize cross bitplane correlation during decoding, by iteratively refining the soft-input, updating a modeled noise distribution and thereafter enhancing......In recent years, Transform Domain Wyner-Ziv (TDWZ) video coding has been proposed as an efficient Distributed Video Coding (DVC) solution, which fully or partly exploits the source statistics at the decoder to reduce the computational burden at the encoder. In this paper, a parallel iterative LDPC...

  4. Parallel iterative solution of the Hermite Collocation equations on GPUs II

    International Nuclear Information System (INIS)

    Vilanakis, N; Mathioudakis, E

    2014-01-01

    Hermite Collocation is a high order finite element method for Boundary Value Problems modelling applications in several fields of science and engineering. Application of this integration free numerical solver for the solution of linear BVPs results in a large and sparse general system of algebraic equations, suggesting the usage of an efficient iterative solver especially for realistic simulations. In part I of this work an efficient parallel algorithm of the Schur complement method coupled with Bi-Conjugate Gradient Stabilized (BiCGSTAB) iterative solver has been designed for multicore computing architectures with a Graphics Processing Unit (GPU). In the present work the proposed algorithm has been extended for high performance computing environments consisting of multiprocessor machines with multiple GPUs. Since this is a distributed GPU and shared CPU memory parallel architecture, a hybrid memory treatment is needed for the development of the parallel algorithm. The realization of the algorithm took place on a multiprocessor machine HP SL390 with Tesla M2070 GPUs using the OpenMP and OpenACC standards. Execution time measurements reveal the efficiency of the parallel implementation

  5. Parallel computation of multigroup reactivity coefficient using iterative method

    Science.gov (United States)

    Susmikanti, Mike; Dewayatna, Winter

    2013-09-01

    One of the research activities to support the commercial radioisotope production program is a safety research target irradiation FPM (Fission Product Molybdenum). FPM targets form a tube made of stainless steel in which the nuclear degrees of superimposed high-enriched uranium. FPM irradiation tube is intended to obtain fission. The fission material widely used in the form of kits in the world of nuclear medicine. Irradiation FPM tube reactor core would interfere with performance. One of the disorders comes from changes in flux or reactivity. It is necessary to study a method for calculating safety terrace ongoing configuration changes during the life of the reactor, making the code faster became an absolute necessity. Neutron safety margin for the research reactor can be reused without modification to the calculation of the reactivity of the reactor, so that is an advantage of using perturbation method. The criticality and flux in multigroup diffusion model was calculate at various irradiation positions in some uranium content. This model has a complex computation. Several parallel algorithms with iterative method have been developed for the sparse and big matrix solution. The Black-Red Gauss Seidel Iteration and the power iteration parallel method can be used to solve multigroup diffusion equation system and calculated the criticality and reactivity coeficient. This research was developed code for reactivity calculation which used one of safety analysis with parallel processing. It can be done more quickly and efficiently by utilizing the parallel processing in the multicore computer. This code was applied for the safety limits calculation of irradiated targets FPM with increment Uranium.

  6. Sparse BLIP: BLind Iterative Parallel imaging reconstruction using compressed sensing.

    Science.gov (United States)

    She, Huajun; Chen, Rong-Rong; Liang, Dong; DiBella, Edward V R; Ying, Leslie

    2014-02-01

    To develop a sensitivity-based parallel imaging reconstruction method to reconstruct iteratively both the coil sensitivities and MR image simultaneously based on their prior information. Parallel magnetic resonance imaging reconstruction problem can be formulated as a multichannel sampling problem where solutions are sought analytically. However, the channel functions given by the coil sensitivities in parallel imaging are not known exactly and the estimation error usually leads to artifacts. In this study, we propose a new reconstruction algorithm, termed Sparse BLind Iterative Parallel, for blind iterative parallel imaging reconstruction using compressed sensing. The proposed algorithm reconstructs both the sensitivity functions and the image simultaneously from undersampled data. It enforces the sparseness constraint in the image as done in compressed sensing, but is different from compressed sensing in that the sensing matrix is unknown and additional constraint is enforced on the sensitivities as well. Both phantom and in vivo imaging experiments were carried out with retrospective undersampling to evaluate the performance of the proposed method. Experiments show improvement in Sparse BLind Iterative Parallel reconstruction when compared with Sparse SENSE, JSENSE, IRGN-TV, and L1-SPIRiT reconstructions with the same number of measurements. The proposed Sparse BLind Iterative Parallel algorithm reduces the reconstruction errors when compared to the state-of-the-art parallel imaging methods. Copyright © 2013 Wiley Periodicals, Inc.

  7. Iterative schemes for parallel Sn algorithms in a shared-memory computing environment

    International Nuclear Information System (INIS)

    Haghighat, A.; Hunter, M.A.; Mattis, R.E.

    1995-01-01

    Several two-dimensional spatial domain partitioning S n transport theory algorithms are developed on the basis of different iterative schemes. These algorithms are incorporated into TWOTRAN-II and tested on the shared-memory CRAY Y-MP C90 computer. For a series of fixed-source r-z geometry homogeneous problems, it is demonstrated that the concurrent red-black algorithms may result in large parallel efficiencies (>60%) on C90. It is also demonstrated that for a realistic shielding problem, the use of the negative flux fixup causes high load imbalance, which results in a significant loss of parallel efficiency

  8. PRIM: An Efficient Preconditioning Iterative Reweighted Least Squares Method for Parallel Brain MRI Reconstruction.

    Science.gov (United States)

    Xu, Zheng; Wang, Sheng; Li, Yeqing; Zhu, Feiyun; Huang, Junzhou

    2018-02-08

    The most recent history of parallel Magnetic Resonance Imaging (pMRI) has in large part been devoted to finding ways to reduce acquisition time. While joint total variation (JTV) regularized model has been demonstrated as a powerful tool in increasing sampling speed for pMRI, however, the major bottleneck is the inefficiency of the optimization method. While all present state-of-the-art optimizations for the JTV model could only reach a sublinear convergence rate, in this paper, we squeeze the performance by proposing a linear-convergent optimization method for the JTV model. The proposed method is based on the Iterative Reweighted Least Squares algorithm. Due to the complexity of the tangled JTV objective, we design a novel preconditioner to further accelerate the proposed method. Extensive experiments demonstrate the superior performance of the proposed algorithm for pMRI regarding both accuracy and efficiency compared with state-of-the-art methods.

  9. An iterative algorithm for solving the multidimensional neutron diffusion nodal method equations on parallel computers

    International Nuclear Information System (INIS)

    Kirk, B.L.; Azmy, Y.Y.

    1992-01-01

    In this paper the one-group, steady-state neutron diffusion equation in two-dimensional Cartesian geometry is solved using the nodal integral method. The discrete variable equations comprise loosely coupled sets of equations representing the nodal balance of neutrons, as well as neutron current continuity along rows or columns of computational cells. An iterative algorithm that is more suitable for solving large problems concurrently is derived based on the decomposition of the spatial domain and is accelerated using successive overrelaxation. This algorithm is very well suited for parallel computers, especially since the spatial domain decomposition occurs naturally, so that the number of iterations required for convergence does not depend on the number of processors participating in the calculation. Implementation of the authors' algorithm on the Intel iPSC/2 hypercube and Sequent Balance 8000 parallel computer is presented, and measured speedup and efficiency for test problems are reported. The results suggest that the efficiency of the hypercube quickly deteriorates when many processors are used, while the Sequent Balance retains very high efficiency for a comparable number of participating processors. This leads to the conjecture that message-passing parallel computers are not as well suited for this algorithm as shared-memory machines

  10. Efficiency Analysis of the Parallel Implementation of the SIMPLE Algorithm on Multiprocessor Computers

    Science.gov (United States)

    Lashkin, S. V.; Kozelkov, A. S.; Yalozo, A. V.; Gerasimov, V. Yu.; Zelensky, D. K.

    2017-12-01

    This paper describes the details of the parallel implementation of the SIMPLE algorithm for numerical solution of the Navier-Stokes system of equations on arbitrary unstructured grids. The iteration schemes for the serial and parallel versions of the SIMPLE algorithm are implemented. In the description of the parallel implementation, special attention is paid to computational data exchange among processors under the condition of the grid model decomposition using fictitious cells. We discuss the specific features for the storage of distributed matrices and implementation of vector-matrix operations in parallel mode. It is shown that the proposed way of matrix storage reduces the number of interprocessor exchanges. A series of numerical experiments illustrates the effect of the multigrid SLAE solver tuning on the general efficiency of the algorithm; the tuning involves the types of the cycles used (V, W, and F), the number of iterations of a smoothing operator, and the number of cells for coarsening. Two ways (direct and indirect) of efficiency evaluation for parallelization of the numerical algorithm are demonstrated. The paper presents the results of solving some internal and external flow problems with the evaluation of parallelization efficiency by two algorithms. It is shown that the proposed parallel implementation enables efficient computations for the problems on a thousand processors. Based on the results obtained, some general recommendations are made for the optimal tuning of the multigrid solver, as well as for selecting the optimal number of cells per processor.

  11. Efficient parallel implicit methods for rotary-wing aerodynamics calculations

    Science.gov (United States)

    Wissink, Andrew M.

    Euler/Navier-Stokes Computational Fluid Dynamics (CFD) methods are commonly used for prediction of the aerodynamics and aeroacoustics of modern rotary-wing aircraft. However, their widespread application to large complex problems is limited lack of adequate computing power. Parallel processing offers the potential for dramatic increases in computing power, but most conventional implicit solution methods are inefficient in parallel and new techniques must be adopted to realize its potential. This work proposes alternative implicit schemes for Euler/Navier-Stokes rotary-wing calculations which are robust and efficient in parallel. The first part of this work proposes an efficient parallelizable modification of the Lower Upper-Symmetric Gauss Seidel (LU-SGS) implicit operator used in the well-known Transonic Unsteady Rotor Navier Stokes (TURNS) code. The new hybrid LU-SGS scheme couples a point-relaxation approach of the Data Parallel-Lower Upper Relaxation (DP-LUR) algorithm for inter-processor communication with the Symmetric Gauss Seidel algorithm of LU-SGS for on-processor computations. With the modified operator, TURNS is implemented in parallel using Message Passing Interface (MPI) for communication. Numerical performance and parallel efficiency are evaluated on the IBM SP2 and Thinking Machines CM-5 multi-processors for a variety of steady-state and unsteady test cases. The hybrid LU-SGS scheme maintains the numerical performance of the original LU-SGS algorithm in all cases and shows a good degree of parallel efficiency. It experiences a higher degree of robustness than DP-LUR for third-order upwind solutions. The second part of this work examines use of Krylov subspace iterative solvers for the nonlinear CFD solutions. The hybrid LU-SGS scheme is used as a parallelizable preconditioner. Two iterative methods are tested, Generalized Minimum Residual (GMRES) and Orthogonal s-Step Generalized Conjugate Residual (OSGCR). The Newton method demonstrates good

  12. Time parallelization of advanced operation scenario simulations of ITER plasma

    International Nuclear Information System (INIS)

    Samaddar, D; Casper, T A; Kim, S H; Houlberg, W A; Berry, L A; Elwasif, W R; Batchelor, D

    2013-01-01

    This work demonstrates that simulations of advanced burning plasma operation scenarios can be successfully parallelized in time using the parareal algorithm. CORSICA -an advanced operation scenario code for tokamak plasmas is used as a test case. This is a unique application since the parareal algorithm has so far been applied to relatively much simpler systems except for the case of turbulence. In the present application, a computational gain of an order of magnitude has been achieved which is extremely promising. A successful implementation of the Parareal algorithm to codes like CORSICA ushers in the possibility of time efficient simulations of ITER plasmas.

  13. Primal Domain Decomposition Method with Direct and Iterative Solver for Circuit-Field-Torque Coupled Parallel Finite Element Method to Electric Machine Modelling

    Directory of Open Access Journals (Sweden)

    Daniel Marcsa

    2015-01-01

    Full Text Available The analysis and design of electromechanical devices involve the solution of large sparse linear systems, and require therefore high performance algorithms. In this paper, the primal Domain Decomposition Method (DDM with parallel forward-backward and with parallel Preconditioned Conjugate Gradient (PCG solvers are introduced in two-dimensional parallel time-stepping finite element formulation to analyze rotating machine considering the electromagnetic field, external circuit and rotor movement. The proposed parallel direct and the iterative solver with two preconditioners are analyzed concerning its computational efficiency and number of iterations of the solver with different preconditioners. Simulation results of a rotating machine is also presented.

  14. Accuracy analysis of hybrid parallel robot for the assembling of ITER

    Energy Technology Data Exchange (ETDEWEB)

    Wang Yongbo [Institute of Mechatronics and Virtual Engineering, Lappeenranta University of Technology, Skinnarilankatu 34, 53850 Lappeenranta (Finland); The State Key Laboratory of Mechanical Transmission, Chongqing University (China); Pessi, Pekka [Institute of Mechatronics and Virtual Engineering, Lappeenranta University of Technology, Skinnarilankatu 34, 53850 Lappeenranta (Finland); Wu Huapeng [Institute of Mechatronics and Virtual Engineering, Lappeenranta University of Technology, Skinnarilankatu 34, 53850 Lappeenranta (Finland)], E-mail: huapeng@lut.fi; Handroos, Heikki [Institute of Mechatronics and Virtual Engineering, Lappeenranta University of Technology, Skinnarilankatu 34, 53850 Lappeenranta (Finland)

    2009-06-15

    This paper presents a novel mobile parallel robot, which is able to carry welding and machining processes from inside the international thermonuclear experimental reactor (ITER) vacuum vessel (VV). The kinematics design of the robot has been optimized for ITER access. To improve the accuracy of the parallel robot, the errors caused by the stiffness and manufacture process have to be compensated or limited to a minimum value. In this paper kinematics errors and stiffness modeling are given. The simulation results are presented.

  15. Accuracy analysis of hybrid parallel robot for the assembling of ITER

    International Nuclear Information System (INIS)

    Wang Yongbo; Pessi, Pekka; Wu Huapeng; Handroos, Heikki

    2009-01-01

    This paper presents a novel mobile parallel robot, which is able to carry welding and machining processes from inside the international thermonuclear experimental reactor (ITER) vacuum vessel (VV). The kinematics design of the robot has been optimized for ITER access. To improve the accuracy of the parallel robot, the errors caused by the stiffness and manufacture process have to be compensated or limited to a minimum value. In this paper kinematics errors and stiffness modeling are given. The simulation results are presented.

  16. Iterative algorithms for large sparse linear systems on parallel computers

    Science.gov (United States)

    Adams, L. M.

    1982-01-01

    Algorithms for assembling in parallel the sparse system of linear equations that result from finite difference or finite element discretizations of elliptic partial differential equations, such as those that arise in structural engineering are developed. Parallel linear stationary iterative algorithms and parallel preconditioned conjugate gradient algorithms are developed for solving these systems. In addition, a model for comparing parallel algorithms on array architectures is developed and results of this model for the algorithms are given.

  17. Variation in efficiency of parallel algorithms. [for study of stiffness matrices in planar trusses

    Science.gov (United States)

    Hayashi, A.; Melosh, R. J.; Utku, S.; Salama, M.

    1985-01-01

    The present study has the objective to investigate some iterative parallel-processor linear equation solving algorithms with respect to efficiency for analyses of typical linear engineering systems. Attention is given to a set of n linear equations, Ku = p, where K = an n x n positive definite, sparsely populated, symmetric matrix, u = an n x 1 vector of unknown responses, and p = an n x 1 vector of prescribed constants. This study is concerned with a hybrid method in which iteration is used to solve the problem, while a direct method is used on the local processor level. Variations in the efficiency of parallel algorithms are explored. Measures of the efficiency are based on computer experiments regarding the algorithms. For all the algorithms, the wall clock time is found to decrease as the number of processors increases.

  18. Design of parallel intersector weld/cut robot for machining processes in ITER vacuum vessel

    International Nuclear Information System (INIS)

    Wu Huapeng; Handroos, Heikki; Kovanen, Janne; Rouvinen, Asko; Hannukainen, Petri; Saira, Tanja; Jones, Lawrence

    2003-01-01

    This paper presents a new parallel robot Penta-WH, which has five degrees of freedom driven by hydraulic cylinders. The manipulator has a large, singularity-free workspace and high stiffness and it acts as a transport device for welding, machining and inspection end-effectors inside the ITER vacuum vessel. The presented kinematic structure of a parallel robot is particularly suitable for the ITER environment. Analysis of the machining process for ITER, such as the machining methods and forces are given, and the kinematic analyses, such as workspace and force capacity are discussed

  19. Parallelization of the model-based iterative reconstruction algorithm DIRA

    International Nuclear Information System (INIS)

    Oertenberg, A.; Sandborg, M.; Alm Carlsson, G.; Malusek, A.; Magnusson, M.

    2016-01-01

    New paradigms for parallel programming have been devised to simplify software development on multi-core processors and many-core graphical processing units (GPU). Despite their obvious benefits, the parallelization of existing computer programs is not an easy task. In this work, the use of the Open Multiprocessing (OpenMP) and Open Computing Language (OpenCL) frameworks is considered for the parallelization of the model-based iterative reconstruction algorithm DIRA with the aim to significantly shorten the code's execution time. Selected routines were parallelized using OpenMP and OpenCL libraries; some routines were converted from MATLAB to C and optimised. Parallelization of the code with the OpenMP was easy and resulted in an overall speedup of 15 on a 16-core computer. Parallelization with OpenCL was more difficult owing to differences between the central processing unit and GPU architectures. The resulting speedup was substantially lower than the theoretical peak performance of the GPU; the cause was explained. (authors)

  20. AZTEC: A parallel iterative package for the solving linear systems

    Energy Technology Data Exchange (ETDEWEB)

    Hutchinson, S.A.; Shadid, J.N.; Tuminaro, R.S. [Sandia National Labs., Albuquerque, NM (United States)

    1996-12-31

    We describe a parallel linear system package, AZTEC. The package incorporates a number of parallel iterative methods (e.g. GMRES, biCGSTAB, CGS, TFQMR) and preconditioners (e.g. Jacobi, Gauss-Seidel, polynomial, domain decomposition with LU or ILU within subdomains). Additionally, AZTEC allows for the reuse of previous preconditioning factorizations within Newton schemes for nonlinear methods. Currently, a number of different users are using this package to solve a variety of PDE applications.

  1. Parallel iterative procedures for approximate solutions of wave propagation by finite element and finite difference methods

    Energy Technology Data Exchange (ETDEWEB)

    Kim, S. [Purdue Univ., West Lafayette, IN (United States)

    1994-12-31

    Parallel iterative procedures based on domain decomposition techniques are defined and analyzed for the numerical solution of wave propagation by finite element and finite difference methods. For finite element methods, in a Lagrangian framework, an efficient way for choosing the algorithm parameter as well as the algorithm convergence are indicated. Some heuristic arguments for finding the algorithm parameter for finite difference schemes are addressed. Numerical results are presented to indicate the effectiveness of the methods.

  2. P-SPARSLIB: A parallel sparse iterative solution package

    Energy Technology Data Exchange (ETDEWEB)

    Saad, Y. [Univ. of Minnesota, Minneapolis, MN (United States)

    1994-12-31

    Iterative methods are gaining popularity in engineering and sciences at a time where the computational environment is changing rapidly. P-SPARSLIB is a project to build a software library for sparse matrix computations on parallel computers. The emphasis is on iterative methods and the use of distributed sparse matrices, an extension of the domain decomposition approach to general sparse matrices. One of the goals of this project is to develop a software package geared towards specific applications. For example, the author will test the performance and usefulness of P-SPARSLIB modules on linear systems arising from CFD applications. Equally important is the goal of portability. In the long run, the author wishes to ensure that this package is portable on a variety of platforms, including SIMD environments and shared memory environments.

  3. Iterating skeletons

    DEFF Research Database (Denmark)

    Dieterle, Mischa; Horstmeyer, Thomas; Berthold, Jost

    2012-01-01

    a particular skeleton ad-hoc for repeated execution turns out to be considerably complicated, and raises general questions about introducing state into a stateless parallel computation. In addition, one would strongly prefer an approach which leaves the original skeleton intact, and only uses it as a building...... block inside a bigger structure. In this work, we present a general framework for skeleton iteration and discuss requirements and variations of iteration control and iteration body. Skeleton iteration is expressed by synchronising a parallel iteration body skeleton with a (likewise parallel) state......Skeleton-based programming is an area of increasing relevance with upcoming highly parallel hardware, since it substantially facilitates parallel programming and separates concerns. When parallel algorithms expressed by skeletons involve iterations – applying the same algorithm repeatedly...

  4. Efficient parallel iterative solvers for the solution of large dense linear systems arising from the boundary element method in electromagnetism

    International Nuclear Information System (INIS)

    Alleon, G.; Carpentieri, B.; Du, I.S.; Giraud, L.; Langou, J.; Martin, E.

    2003-01-01

    The boundary element method has become a popular tool for the solution of Maxwell's equations in electromagnetism. It discretizes only the surface of the radiating object and gives rise to linear systems that are smaller in size compared to those arising from finite element or finite difference discretizations. However, these systems are prohibitively demanding in terms of memory for direct methods and challenging to solve by iterative methods. In this paper we address the iterative solution via preconditioned Krylov methods of electromagnetic scattering problems expressed in an integral formulation, with main focus on the design of the pre-conditioner. We consider an approximate inverse method based on the Frobenius-norm minimization with a pattern prescribed in advance. The pre-conditioner is constructed from a sparse approximation of the dense coefficient matrix, and the patterns both for the pre-conditioner and for the coefficient matrix are computed a priori using geometric information from the mesh. We describe the implementation of the approximate inverse in an out-of-core parallel code that uses multipole techniques for the matrix-vector products, and show results on the numerical scalability of our method on systems of size up to one million unknowns. We propose an embedded iterative scheme based on the GMRES method and combined with multipole techniques, aimed at improving the robustness of the approximate inverse for large problems. We prove by numerical experiments that the proposed scheme enables the solution of very large and difficult problems efficiently at reduced computational and memory cost. Finally we perform a preliminary study on a spectral two-level pre-conditioner to enhance the robustness of our method. This numerical technique exploits spectral information of the preconditioned systems to build a low rank-update of the pre-conditioner. (authors)

  5. Efficient parallel iterative solvers for the solution of large dense linear systems arising from the boundary element method in electromagnetism

    Energy Technology Data Exchange (ETDEWEB)

    Alleon, G. [EADS-CCR, 31 - Blagnac (France); Carpentieri, B.; Du, I.S.; Giraud, L.; Langou, J.; Martin, E. [Cerfacs, 31 - Toulouse (France)

    2003-07-01

    The boundary element method has become a popular tool for the solution of Maxwell's equations in electromagnetism. It discretizes only the surface of the radiating object and gives rise to linear systems that are smaller in size compared to those arising from finite element or finite difference discretizations. However, these systems are prohibitively demanding in terms of memory for direct methods and challenging to solve by iterative methods. In this paper we address the iterative solution via preconditioned Krylov methods of electromagnetic scattering problems expressed in an integral formulation, with main focus on the design of the pre-conditioner. We consider an approximate inverse method based on the Frobenius-norm minimization with a pattern prescribed in advance. The pre-conditioner is constructed from a sparse approximation of the dense coefficient matrix, and the patterns both for the pre-conditioner and for the coefficient matrix are computed a priori using geometric information from the mesh. We describe the implementation of the approximate inverse in an out-of-core parallel code that uses multipole techniques for the matrix-vector products, and show results on the numerical scalability of our method on systems of size up to one million unknowns. We propose an embedded iterative scheme based on the GMRES method and combined with multipole techniques, aimed at improving the robustness of the approximate inverse for large problems. We prove by numerical experiments that the proposed scheme enables the solution of very large and difficult problems efficiently at reduced computational and memory cost. Finally we perform a preliminary study on a spectral two-level pre-conditioner to enhance the robustness of our method. This numerical technique exploits spectral information of the preconditioned systems to build a low rank-update of the pre-conditioner. (authors)

  6. Fast parallel algorithm for three-dimensional distance-driven model in iterative computed tomography reconstruction

    International Nuclear Information System (INIS)

    Chen Jian-Lin; Li Lei; Wang Lin-Yuan; Cai Ai-Long; Xi Xiao-Qi; Zhang Han-Ming; Li Jian-Xin; Yan Bin

    2015-01-01

    The projection matrix model is used to describe the physical relationship between reconstructed object and projection. Such a model has a strong influence on projection and backprojection, two vital operations in iterative computed tomographic reconstruction. The distance-driven model (DDM) is a state-of-the-art technology that simulates forward and back projections. This model has a low computational complexity and a relatively high spatial resolution; however, it includes only a few methods in a parallel operation with a matched model scheme. This study introduces a fast and parallelizable algorithm to improve the traditional DDM for computing the parallel projection and backprojection operations. Our proposed model has been implemented on a GPU (graphic processing unit) platform and has achieved satisfactory computational efficiency with no approximation. The runtime for the projection and backprojection operations with our model is approximately 4.5 s and 10.5 s per loop, respectively, with an image size of 256×256×256 and 360 projections with a size of 512×512. We compare several general algorithms that have been proposed for maximizing GPU efficiency by using the unmatched projection/backprojection models in a parallel computation. The imaging resolution is not sacrificed and remains accurate during computed tomographic reconstruction. (paper)

  7. The new Exponential Directional Iterative (EDI) 3-D Sn scheme for parallel adaptive differencing

    International Nuclear Information System (INIS)

    Sjoden, G.E.

    2005-01-01

    The new Exponential Directional Iterative (EDI) discrete ordinates (Sn) scheme for 3-D Cartesian Coordinates is presented. The EDI scheme is a logical extension of the positive, efficient Exponential Directional Weighted (EDW) Sn scheme currently used as the third level of the adaptive spatial differencing algorithm in the PENTRAN parallel discrete ordinates solver. Here, the derivation and advantages of the EDI scheme are presented; EDI uses EDW-rendered exponential coefficients as initial starting values to begin a fixed point iteration of the exponential coefficients. One issue that required evaluation was an iterative cutoff criterion to prevent the application of an unstable fixed point iteration; although this was needed in some cases, it was readily treated with a default to EDW. Iterative refinement of the exponential coefficients in EDI typically converged in fewer than four fixed point iterations. Moreover, EDI yielded more accurate angular fluxes compared to the other schemes tested, particularly in streaming conditions. Overall, it was found that the EDI scheme was up to an order of magnitude more accurate than the EDW scheme on a given mesh interval in streaming cases, and is potentially a good candidate as a fourth-level differencing scheme in the PENTRAN adaptive differencing sequence. The 3-D Cartesian computational cost of EDI was only about 20% more than the EDW scheme, and about 40% more than Diamond Zero (DZ). More evaluation and testing are required to determine suitable upgrade metrics for EDI to be fully integrated into the current adaptive spatial differencing sequence in PENTRAN. (author)

  8. PARALLEL ITERATIVE RECONSTRUCTION OF PHANTOM CATPHAN ON EXPERIMENTAL DATA

    Directory of Open Access Journals (Sweden)

    M. A. Mirzavand

    2016-01-01

    Full Text Available The principles of fast parallel iterative algorithms based on the use of graphics accelerators and OpenGL library are considered in the paper. The proposed approach provides simultaneous minimization of the residuals of the desired solution and total variation of the reconstructed three- dimensional image. The number of necessary input data, i. e. conical X-ray projections, can be reduced several times. It means in a corresponding number of times the possibility to reduce radiation exposure to the patient. At the same time maintain the necessary contrast and spatial resolution of threedimensional image of the patient. Heuristic iterative algorithm can be used as an alternative to the well-known three-dimensional Feldkamp algorithm.

  9. A mobile robot with parallel kinematics to meet the requirements for assembling and machining the ITER vacuum vessel

    Energy Technology Data Exchange (ETDEWEB)

    Pessi, Pekka [Lappeenranta University of Technology, Lappeenranta (Finland)], E-mail: pessi@lut.fi; Wu, Huapeng; Handroos, Heikki [Lappeenranta University of Technology, Lappeenranta (Finland); Jones, Lawrence [EFDA Close Support Unit, Boltzmannstrasse 2, Garching D-85748 (Germany)

    2007-10-15

    The present paper introduces a mobile parallel robot developed for International Thermonuclear Experimental Reactor (ITER). The task of the robot is to carry out welding and machining processes inside the ITER vacuum vessel. The kinematic design of the robot has been optimized for the ITER access. The kinematic analysis is given in the paper. A virtual prototype of the parallel robot is built. A dynamic behavior of the whole robot is studied by the multi-body system simulation (MBS)

  10. A mobile robot with parallel kinematics to meet the requirements for assembling and machining the ITER vacuum vessel

    International Nuclear Information System (INIS)

    Pessi, Pekka; Wu, Huapeng; Handroos, Heikki; Jones, Lawrence

    2007-01-01

    The present paper introduces a mobile parallel robot developed for International Thermonuclear Experimental Reactor (ITER). The task of the robot is to carry out welding and machining processes inside the ITER vacuum vessel. The kinematic design of the robot has been optimized for the ITER access. The kinematic analysis is given in the paper. A virtual prototype of the parallel robot is built. A dynamic behavior of the whole robot is studied by the multi-body system simulation (MBS)

  11. Efficient parallel simulation of CO2 geologic sequestration in saline aquifers

    International Nuclear Information System (INIS)

    Zhang, Keni; Doughty, Christine; Wu, Yu-Shu; Pruess, Karsten

    2007-01-01

    An efficient parallel simulator for large-scale, long-term CO2 geologic sequestration in saline aquifers has been developed. The parallel simulator is a three-dimensional, fully implicit model that solves large, sparse linear systems arising from discretization of the partial differential equations for mass and energy balance in porous and fractured media. The simulator is based on the ECO2N module of the TOUGH2code and inherits all the process capabilities of the single-CPU TOUGH2code, including a comprehensive description of the thermodynamics and thermophysical properties of H2O-NaCl- CO2 mixtures, modeling single and/or two-phase isothermal or non-isothermal flow processes, two-phase mixtures, fluid phases appearing or disappearing, as well as salt precipitation or dissolution. The new parallel simulator uses MPI for parallel implementation, the METIS software package for simulation domain partitioning, and the iterative parallel linear solver package Aztec for solving linear equations by multiple processors. In addition, the parallel simulator has been implemented with an efficient communication scheme. Test examples show that a linear or super-linear speedup can be obtained on Linux clusters as well as on supercomputers. Because of the significant improvement in both simulation time and memory requirement, the new simulator provides a powerful tool for tackling larger scale and more complex problems than can be solved by single-CPU codes. A high-resolution simulation example is presented that models buoyant convection, induced by a small increase in brine density caused by dissolution of CO2

  12. MICADO: Parallel implementation of a 2D-1D iterative algorithm for the 3D neutron transport problem in prismatic geometries

    International Nuclear Information System (INIS)

    Fevotte, F.; Lathuiliere, B.

    2013-01-01

    The large increase in computing power over the past few years now makes it possible to consider developing 3D full-core heterogeneous deterministic neutron transport solvers for reference calculations. Among all approaches presented in the literature, the method first introduced in [1] seems very promising. It consists in iterating over resolutions of 2D and ID MOC problems by taking advantage of prismatic geometries without introducing approximations of a low order operator such as diffusion. However, before developing a solver with all industrial options at EDF, several points needed to be clarified. In this work, we first prove the convergence of this iterative process, under some assumptions. We then present our high-performance, parallel implementation of this algorithm in the MICADO solver. Benchmarking the solver against the Takeda case shows that the 2D-1D coupling algorithm does not seem to affect the spatial convergence order of the MOC solver. As for performance issues, our study shows that even though the data distribution is suited to the 2D solver part, the efficiency of the ID part is sufficient to ensure a good parallel efficiency of the global algorithm. After this study, the main remaining difficulty implementation-wise is about the memory requirement of a vector used for initialization. An efficient acceleration operator will also need to be developed. (authors)

  13. Fast ℓ1-SPIRiT Compressed Sensing Parallel Imaging MRI: Scalable Parallel Implementation and Clinically Feasible Runtime

    Science.gov (United States)

    Murphy, Mark; Alley, Marcus; Demmel, James; Keutzer, Kurt; Vasanawala, Shreyas; Lustig, Michael

    2012-01-01

    We present ℓ1-SPIRiT, a simple algorithm for auto calibrating parallel imaging (acPI) and compressed sensing (CS) that permits an efficient implementation with clinically-feasible runtimes. We propose a CS objective function that minimizes cross-channel joint sparsity in the Wavelet domain. Our reconstruction minimizes this objective via iterative soft-thresholding, and integrates naturally with iterative Self-Consistent Parallel Imaging (SPIRiT). Like many iterative MRI reconstructions, ℓ1-SPIRiT’s image quality comes at a high computational cost. Excessively long runtimes are a barrier to the clinical use of any reconstruction approach, and thus we discuss our approach to efficiently parallelizing ℓ1-SPIRiT and to achieving clinically-feasible runtimes. We present parallelizations of ℓ1-SPIRiT for both multi-GPU systems and multi-core CPUs, and discuss the software optimization and parallelization decisions made in our implementation. The performance of these alternatives depends on the processor architecture, the size of the image matrix, and the number of parallel imaging channels. Fundamentally, achieving fast runtime requires the correct trade-off between cache usage and parallelization overheads. We demonstrate image quality via a case from our clinical experimentation, using a custom 3DFT Spoiled Gradient Echo (SPGR) sequence with up to 8× acceleration via poisson-disc undersampling in the two phase-encoded directions. PMID:22345529

  14. Issues in developing parallel iterative algorithms for solving partial differential equations on a (transputer-based) distributed parallel computing system

    International Nuclear Information System (INIS)

    Rajagopalan, S.; Jethra, A.; Khare, A.N.; Ghodgaonkar, M.D.; Srivenkateshan, R.; Menon, S.V.G.

    1990-01-01

    Issues relating to implementing iterative procedures, for numerical solution of elliptic partial differential equations, on a distributed parallel computing system are discussed. Preliminary investigations show that a speed-up of about 3.85 is achievable on a four transputer pipeline network. (author). 2 figs., 3 a ppendixes., 7 refs

  15. An Efficient Algorithm for Perturbed Orbit Integration Combining Analytical Continuation and Modified Chebyshev Picard Iteration

    Science.gov (United States)

    Elgohary, T.; Kim, D.; Turner, J.; Junkins, J.

    2014-09-01

    Several methods exist for integrating the motion in high order gravity fields. Some recent methods use an approximate starting orbit, and an efficient method is needed for generating warm starts that account for specific low order gravity approximations. By introducing two scalar Lagrange-like invariants and employing Leibniz product rule, the perturbed motion is integrated by a novel recursive formulation. The Lagrange-like invariants allow exact arbitrary order time derivatives. Restricting attention to the perturbations due to the zonal harmonics J2 through J6, we illustrate an idea. The recursively generated vector-valued time derivatives for the trajectory are used to develop a continuation series-based solution for propagating position and velocity. Numerical comparisons indicate performance improvements of ~ 70X over existing explicit Runge-Kutta methods while maintaining mm accuracy for the orbit predictions. The Modified Chebyshev Picard Iteration (MCPI) is an iterative path approximation method to solve nonlinear ordinary differential equations. The MCPI utilizes Picard iteration with orthogonal Chebyshev polynomial basis functions to recursively update the states. The key advantages of the MCPI are as follows: 1) Large segments of a trajectory can be approximated by evaluating the forcing function at multiple nodes along the current approximation during each iteration. 2) It can readily handle general gravity perturbations as well as non-conservative forces. 3) Parallel applications are possible. The Picard sequence converges to the solution over large time intervals when the forces are continuous and differentiable. According to the accuracy of the starting solutions, however, the MCPI may require significant number of iterations and function evaluations compared to other integrators. In this work, we provide an efficient methodology to establish good starting solutions from the continuation series method; this warm start improves the performance of the

  16. A mobile robot with parallel kinematics constructed under requirements for assembling and machining of the ITER vacuum vessel

    International Nuclear Information System (INIS)

    Pessi, P.; Huapeng Wu; Handroos, H.; Jones, L.

    2006-01-01

    ITER sectors require more stringent tolerances ± 5 mm than normally expected for the size of structure involved. The walls of ITER sectors are made of 60 mm thick stainless steel and are joined together by high efficiency structural and leak tight welds. In addition to the initial vacuum vessel assembly, sectors may have to be replaced for repair. Since commercially available machines are too heavy for the required machining operations and the lifting of a possible e-beam gun column system, and conventional robots lack the stiffness and accuracy in such machining condition, a new flexible, lightweight and mobile robotic machine is being considered. For the assembly of the ITER vacuum vessel sector, precise positioning of welding end-effectors, at some distance in a confined space from the available supports, will be required, which is not possible using conventional machines or robots. This paper presents a special robot, able to carry out welding and machining processes from inside the ITER vacuum vessel, consisting of a ten-degree-of-freedom parallel robot mounted on a carriage driven by electric motor/gearbox on a track. The robot consists of a Stewart platform based parallel mechanism. Water hydraulic cylinders are used as actuators to reach six degrees of freedom for parallel construction. Two linear and two rotational motions are used for enlargement the workspace of the manipulator. The robot carries both welding gun such as a TIG, hybrid laser or e-beam welding gun to weld the inner and outer walls of the ITER vacuum vessel sectors and machining tools to cut and milling the walls with necessary accuracy, it can also carry other tools and material to a required position inside the vacuum vessel . For assembling an on line six degrees of freedom seam finding algorithm has been developed, which enables the robot to find welding seam automatically in a very complex environment. In the machining multi flexible machining processes carried out automatically by

  17. Angular parallelization of a curvilinear Sn transport theory method

    International Nuclear Information System (INIS)

    Haghighat, A.

    1991-01-01

    In this paper a parallel algorithm for angular domain decomposition (or parallelization) of an r-dependent spherical S n transport theory method is derived. The parallel formulation is incorporated into TWOTRAN-II using the IBM Parallel Fortran compiler and implemented on an IBM 3090/400 (with four processors). The behavior of the parallel algorithm for different physical problems is studied, and it is concluded that the parallel algorithm behaves differently in the presence of a fission source as opposed to the absence of a fission source; this is attributed to the relative contributions of the source and the angular redistribution terms in the S s algorithm. Further, the parallel performance of the algorithm is measured for various problem sizes and different combinations of angular subdomains or processors. Poor parallel efficiencies between ∼35 and 50% are achieved in situations where the relative difference of parallel to serial iterations is ∼50%. High parallel efficiencies between ∼60% and 90% are obtained in situations where the relative difference of parallel to serial iterations is <35%

  18. Parallel conjugate gradient algorithms for manipulator dynamic simulation

    Science.gov (United States)

    Fijany, Amir; Scheld, Robert E.

    1989-01-01

    Parallel conjugate gradient algorithms for the computation of multibody dynamics are developed for the specialized case of a robot manipulator. For an n-dimensional positive-definite linear system, the Classical Conjugate Gradient (CCG) algorithms are guaranteed to converge in n iterations, each with a computation cost of O(n); this leads to a total computational cost of O(n sq) on a serial processor. A conjugate gradient algorithms is presented that provide greater efficiency using a preconditioner, which reduces the number of iterations required, and by exploiting parallelism, which reduces the cost of each iteration. Two Preconditioned Conjugate Gradient (PCG) algorithms are proposed which respectively use a diagonal and a tridiagonal matrix, composed of the diagonal and tridiagonal elements of the mass matrix, as preconditioners. Parallel algorithms are developed to compute the preconditioners and their inversions in O(log sub 2 n) steps using n processors. A parallel algorithm is also presented which, on the same architecture, achieves the computational time of O(log sub 2 n) for each iteration. Simulation results for a seven degree-of-freedom manipulator are presented. Variants of the proposed algorithms are also developed which can be efficiently implemented on the Robot Mathematics Processor (RMP).

  19. Implementation of a cell-wise block-Gauss-Seidel iterative method for SN transport on a hybrid parallel computer architecture

    International Nuclear Information System (INIS)

    Rosa, Massimiliano; Warsa, James S.; Perks, Michael

    2011-01-01

    We have implemented a cell-wise, block-Gauss-Seidel (bGS) iterative algorithm, for the solution of the S_n transport equations on the Roadrunner hybrid, parallel computer architecture. A compute node of this massively parallel machine comprises AMD Opteron cores that are linked to a Cell Broadband Engine™ (Cell/B.E.)"1. LAPACK routines have been ported to the Cell/B.E. in order to make use of its parallel Synergistic Processing Elements (SPEs). The bGS algorithm is based on the LU factorization and solution of a linear system that couples the fluxes for all S_n angles and energy groups on a mesh cell. For every cell of a mesh that has been parallel decomposed on the higher-level Opteron processors, a linear system is transferred to the Cell/B.E. and the parallel LAPACK routines are used to compute a solution, which is then transferred back to the Opteron, where the rest of the computations for the S_n transport problem take place. Compared to standard parallel machines, a hundred-fold speedup of the bGS was observed on the hybrid Roadrunner architecture. Numerical experiments with strong and weak parallel scaling demonstrate the bGS method is viable and compares favorably to full parallel sweeps (FPS) on two-dimensional, unstructured meshes when it is applied to optically thick, multi-material problems. As expected, however, it is not as efficient as FPS in optically thin problems. (author)

  20. Block iterative restoration of astronomical images with the massively parallel processor

    International Nuclear Information System (INIS)

    Heap, S.R.; Lindler, D.J.

    1987-01-01

    A method is described for algebraic image restoration capable of treating astronomical images. For a typical 500 x 500 image, direct algebraic restoration would require the solution of a 250,000 x 250,000 linear system. The block iterative approach is used to reduce the problem to solving 4900 121 x 121 linear systems. The algorithm was implemented on the Goddard Massively Parallel Processor, which can solve a 121 x 121 system in approximately 0.06 seconds. Examples are shown of the results for various astronomical images

  1. PCG: A software package for the iterative solution of linear systems on scalar, vector and parallel computers

    Energy Technology Data Exchange (ETDEWEB)

    Joubert, W. [Los Alamos National Lab., NM (United States); Carey, G.F. [Univ. of Texas, Austin, TX (United States)

    1994-12-31

    A great need exists for high performance numerical software libraries transportable across parallel machines. This talk concerns the PCG package, which solves systems of linear equations by iterative methods on parallel computers. The features of the package are discussed, as well as techniques used to obtain high performance as well as transportability across architectures. Representative numerical results are presented for several machines including the Connection Machine CM-5, Intel Paragon and Cray T3D parallel computers.

  2. Scalability of Parallel Scientific Applications on the Cloud

    Directory of Open Access Journals (Sweden)

    Satish Narayana Srirama

    2011-01-01

    Full Text Available Cloud computing, with its promise of virtually infinite resources, seems to suit well in solving resource greedy scientific computing problems. To study the effects of moving parallel scientific applications onto the cloud, we deployed several benchmark applications like matrix–vector operations and NAS parallel benchmarks, and DOUG (Domain decomposition On Unstructured Grids on the cloud. DOUG is an open source software package for parallel iterative solution of very large sparse systems of linear equations. The detailed analysis of DOUG on the cloud showed that parallel applications benefit a lot and scale reasonable on the cloud. We could also observe the limitations of the cloud and its comparison with cluster in terms of performance. However, for efficiently running the scientific applications on the cloud infrastructure, the applications must be reduced to frameworks that can successfully exploit the cloud resources, like the MapReduce framework. Several iterative and embarrassingly parallel algorithms are reduced to the MapReduce model and their performance is measured and analyzed. The analysis showed that Hadoop MapReduce has significant problems with iterative methods, while it suits well for embarrassingly parallel algorithms. Scientific computing often uses iterative methods to solve large problems. Thus, for scientific computing on the cloud, this paper raises the necessity for better frameworks or optimizations for MapReduce.

  3. Parallel Implicit Algorithms for CFD

    Science.gov (United States)

    Keyes, David E.

    1998-01-01

    The main goal of this project was efficient distributed parallel and workstation cluster implementations of Newton-Krylov-Schwarz (NKS) solvers for implicit Computational Fluid Dynamics (CFD.) "Newton" refers to a quadratically convergent nonlinear iteration using gradient information based on the true residual, "Krylov" to an inner linear iteration that accesses the Jacobian matrix only through highly parallelizable sparse matrix-vector products, and "Schwarz" to a domain decomposition form of preconditioning the inner Krylov iterations with primarily neighbor-only exchange of data between the processors. Prior experience has established that Newton-Krylov methods are competitive solvers in the CFD context and that Krylov-Schwarz methods port well to distributed memory computers. The combination of the techniques into Newton-Krylov-Schwarz was implemented on 2D and 3D unstructured Euler codes on the parallel testbeds that used to be at LaRC and on several other parallel computers operated by other agencies or made available by the vendors. Early implementations were made directly in Massively Parallel Integration (MPI) with parallel solvers we adapted from legacy NASA codes and enhanced for full NKS functionality. Later implementations were made in the framework of the PETSC library from Argonne National Laboratory, which now includes pseudo-transient continuation Newton-Krylov-Schwarz solver capability (as a result of demands we made upon PETSC during our early porting experiences). A secondary project pursued with funding from this contract was parallel implicit solvers in acoustics, specifically in the Helmholtz formulation. A 2D acoustic inverse problem has been solved in parallel within the PETSC framework.

  4. Improved Iterative Parallel Interference Cancellation Receiver for Future Wireless DS-CDMA Systems

    Directory of Open Access Journals (Sweden)

    Andrea Bernacchioni

    2005-04-01

    Full Text Available We present a new turbo multiuser detector for turbo-coded direct sequence code division multiple access (DS-CDMA systems. The proposed detector is based on the utilization of a parallel interference cancellation (PIC and a bank of turbo decoders. The PIC is broken up in order to perform interference cancellation after each constituent decoder of the turbo decoding scheme. Moreover, in the paper we propose a new enhanced algorithm that provides a more accurate estimation of the signal-to-noise-plus-interference-ratio used in the tentative decision device and in the MAP decoding algorithm. The performance of the proposed receiver is evaluated by means of computer simulations for medium to very high system loads, in AWGN and multipath fading channel, and compared to recently proposed interference cancellation-based iterative MUD, by taking into account the number of iterations and the complexity involved. We will see that the proposed receiver outperforms the others especially for highly loaded systems.

  5. Efficient approach to simulate EM loads on massive structures in ITER machine

    Energy Technology Data Exchange (ETDEWEB)

    Alekseev, A. [ITER Organization, Route de Vinon sur Verdon, 13115 St. Paul-Lez-Durance (France); Andreeva, Z.; Belov, A.; Belyakov, V.; Filatov, O. [D.V. Efremov Scientific Research Institute, 196641 St. Petersburg (Russian Federation); Gribov, Yu.; Ioki, K. [ITER Organization, Route de Vinon sur Verdon, 13115 St. Paul-Lez-Durance (France); Kukhtin, V.; Labusov, A.; Lamzin, E.; Lyublin, B.; Malkov, A.; Mazul, I. [D.V. Efremov Scientific Research Institute, 196641 St. Petersburg (Russian Federation); Rozov, V.; Sugihara, M. [ITER Organization, Route de Vinon sur Verdon, 13115 St. Paul-Lez-Durance (France); Sychevsky, S., E-mail: sytch@sintez.niiefa.spb.su [D.V. Efremov Scientific Research Institute, 196641 St. Petersburg (Russian Federation)

    2013-10-15

    Highlights: ► A modelling technique to predict EM loads in ITER conducting structures is presented. ► The technique provides low computational cost and parallel computations. ► Detailed models were built for the system “vacuum vessel, cryostat, thermal shields”. ► EM loads on massive in-vessel structures were simulated with the use of local models. ► A flexible combination of models enables desired accuracy of load distributions. -- Abstract: Operation of the ITER machine is associated with high electromagnetic (EM) loads. An essential contributor to EM loads is eddy currents induced in passive conductive structures. Reasoning from the ITER construction, a modelling technique has been developed and applied in computations to efficiently predict anticipated loads. The technique allows us to avoid building a global 3D finite-element (FE) model that requires meshing of the conducting structures and their vacuum environment into 3D solid elements that leads to high computational cost. The key features of the proposed technique are: (i) the use of an existing shell model for the system “vacuum vessel (VV), cryostat, and thermal shields (TS)” implementing the magnetic shell approach. A solution is obtained in terms of a single-component, in this case, vector electric potential taken within the conducting shells of the “VV + cryostat + TS” system. (ii) EM loads on in-vessel conducting structures are simulated with the use of local FE models. The local models use either the 3D solid body or shell approximations. Reasoning from the simulation efficiency, the local boundary conditions are put with respect to the total field or an external field. The use of an integral-differential formulation and special procedures ensures smooth and accurate simulated distributions of fields from current sources of any geometry. The local FE models have been developed and applied for EM analyses of a variety of the ITER components including the diagnostic systems

  6. Direct and iterative algorithms for the parallel solution of the one-dimensional macroscopic Navier-Stokes equations

    International Nuclear Information System (INIS)

    Doster, J.M.; Sills, E.D.

    1986-01-01

    Current efforts are under way to develop and evaluate numerical algorithms for the parallel solution of the large sparse matrix equations associated with the finite difference representation of the macroscopic Navier-Stokes equations. Previous work has shown that these equations can be cast into smaller coupled matrix equations suitable for solution utilizing multiple computer processors operating in parallel. The individual processors themselves may exhibit parallelism through the use of vector pipelines. This wor, has concentrated on the one-dimensional drift flux form of the Navier-Stokes equations. Direct and iterative algorithms that may be suitable for implementation on parallel computer architectures are evaluated in terms of accuracy and overall execution speed. This work has application to engineering and training simulations, on-line process control systems, and engineering workstations where increased computational speeds are required

  7. On the adequacy of message-passing parallel supercomputers for solving neutron transport problems

    International Nuclear Information System (INIS)

    Azmy, Y.Y.

    1990-01-01

    A coarse-grained, static-scheduling parallelization of the standard iterative scheme used for solving the discrete-ordinates approximation of the neutron transport equation is described. The parallel algorithm is based on a decomposition of the angular domain along the discrete ordinates, thus naturally producing a set of completely uncoupled systems of equations in each iteration. Implementation of the parallel code on Intcl's iPSC/2 hypercube, and solutions to test problems are presented as evidence of the high speedup and efficiency of the parallel code. The performance of the parallel code on the iPSC/2 is analyzed, and a model for the CPU time as a function of the problem size (order of angular quadrature) and the number of participating processors is developed and validated against measured CPU times. The performance model is used to speculate on the potential of massively parallel computers for significantly speeding up real-life transport calculations at acceptable efficiencies. We conclude that parallel computers with a few hundred processors are capable of producing large speedups at very high efficiencies in very large three-dimensional problems. 10 refs., 8 figs

  8. Parallel iterative solvers and preconditioners using approximate hierarchical methods

    Energy Technology Data Exchange (ETDEWEB)

    Grama, A.; Kumar, V.; Sameh, A. [Univ. of Minnesota, Minneapolis, MN (United States)

    1996-12-31

    In this paper, we report results of the performance, convergence, and accuracy of a parallel GMRES solver for Boundary Element Methods. The solver uses a hierarchical approximate matrix-vector product based on a hybrid Barnes-Hut / Fast Multipole Method. We study the impact of various accuracy parameters on the convergence and show that with minimal loss in accuracy, our solver yields significant speedups. We demonstrate the excellent parallel efficiency and scalability of our solver. The combined speedups from approximation and parallelism represent an improvement of several orders in solution time. We also develop fast and paralellizable preconditioners for this problem. We report on the performance of an inner-outer scheme and a preconditioner based on truncated Green`s function. Experimental results on a 256 processor Cray T3D are presented.

  9. Parallelizing flow-accumulation calculations on graphics processing units—From iterative DEM preprocessing algorithm to recursive multiple-flow-direction algorithm

    Science.gov (United States)

    Qin, Cheng-Zhi; Zhan, Lijun

    2012-06-01

    As one of the important tasks in digital terrain analysis, the calculation of flow accumulations from gridded digital elevation models (DEMs) usually involves two steps in a real application: (1) using an iterative DEM preprocessing algorithm to remove the depressions and flat areas commonly contained in real DEMs, and (2) using a recursive flow-direction algorithm to calculate the flow accumulation for every cell in the DEM. Because both algorithms are computationally intensive, quick calculation of the flow accumulations from a DEM (especially for a large area) presents a practical challenge to personal computer (PC) users. In recent years, rapid increases in hardware capacity of the graphics processing units (GPUs) provided in modern PCs have made it possible to meet this challenge in a PC environment. Parallel computing on GPUs using a compute-unified-device-architecture (CUDA) programming model has been explored to speed up the execution of the single-flow-direction algorithm (SFD). However, the parallel implementation on a GPU of the multiple-flow-direction (MFD) algorithm, which generally performs better than the SFD algorithm, has not been reported. Moreover, GPU-based parallelization of the DEM preprocessing step in the flow-accumulation calculations has not been addressed. This paper proposes a parallel approach to calculate flow accumulations (including both iterative DEM preprocessing and a recursive MFD algorithm) on a CUDA-compatible GPU. For the parallelization of an MFD algorithm (MFD-md), two different parallelization strategies using a GPU are explored. The first parallelization strategy, which has been used in the existing parallel SFD algorithm on GPU, has the problem of computing redundancy. Therefore, we designed a parallelization strategy based on graph theory. The application results show that the proposed parallel approach to calculate flow accumulations on a GPU performs much faster than either sequential algorithms or other parallel GPU

  10. COMPUTATIONAL EFFICIENCY OF A MODIFIED SCATTERING KERNEL FOR FULL-COUPLED PHOTON-ELECTRON TRANSPORT PARALLEL COMPUTING WITH UNSTRUCTURED TETRAHEDRAL MESHES

    Directory of Open Access Journals (Sweden)

    JONG WOON KIM

    2014-04-01

    In this paper, we introduce a modified scattering kernel approach to avoid the unnecessarily repeated calculations involved with the scattering source calculation, and used it with parallel computing to effectively reduce the computation time. Its computational efficiency was tested for three-dimensional full-coupled photon-electron transport problems using our computer program which solves the multi-group discrete ordinates transport equation by using the discontinuous finite element method with unstructured tetrahedral meshes for complicated geometrical problems. The numerical tests show that we can improve speed up to 17∼42 times for the elapsed time per iteration using the modified scattering kernel, not only in the single CPU calculation but also in the parallel computing with several CPUs.

  11. Structured Parallel Programming Patterns for Efficient Computation

    CERN Document Server

    McCool, Michael; Robison, Arch

    2012-01-01

    Programming is now parallel programming. Much as structured programming revolutionized traditional serial programming decades ago, a new kind of structured programming, based on patterns, is relevant to parallel programming today. Parallel computing experts and industry insiders Michael McCool, Arch Robison, and James Reinders describe how to design and implement maintainable and efficient parallel algorithms using a pattern-based approach. They present both theory and practice, and give detailed concrete examples using multiple programming models. Examples are primarily given using two of th

  12. High-speed parallel solution of the neutron diffusion equation with the hierarchical domain decomposition boundary element method incorporating parallel communications

    International Nuclear Information System (INIS)

    Tsuji, Masashi; Chiba, Gou

    2000-01-01

    A hierarchical domain decomposition boundary element method (HDD-BEM) for solving the multiregion neutron diffusion equation (NDE) has been fully parallelized, both for numerical computations and for data communications, to accomplish a high parallel efficiency on distributed memory message passing parallel computers. Data exchanges between node processors that are repeated during iteration processes of HDD-BEM are implemented, without any intervention of the host processor that was used to supervise parallel processing in the conventional parallelized HDD-BEM (P-HDD-BEM). Thus, the parallel processing can be executed with only cooperative operations of node processors. The communication overhead was even the dominant time consuming part in the conventional P-HDD-BEM, and the parallelization efficiency decreased steeply with the increase of the number of processors. With the parallel data communication, the efficiency is affected only by the number of boundary elements assigned to decomposed subregions, and the communication overhead can be drastically reduced. This feature can be particularly advantageous in the analysis of three-dimensional problems where a large number of processors are required. The proposed P-HDD-BEM offers a promising solution to the deterioration problem of parallel efficiency and opens a new path to parallel computations of NDEs on distributed memory message passing parallel computers. (author)

  13. III - Template Metaprogramming for massively parallel scientific computing - Templates for Iteration; Thread-level Parallelism

    CERN Multimedia

    CERN. Geneva

    2016-01-01

    Large scale scientific computing raises questions on different levels ranging from the fomulation of the problems to the choice of the best algorithms and their implementation for a specific platform. There are similarities in these different topics that can be exploited by modern-style C++ template metaprogramming techniques to produce readable, maintainable and generic code. Traditional low-level code tend to be fast but platform-dependent, and it obfuscates the meaning of the algorithm. On the other hand, object-oriented approach is nice to read, but may come with an inherent performance penalty. These lectures aim to present he basics of the Expression Template (ET) idiom which allows us to keep the object-oriented approach without sacrificing performance. We will in particular show to to enhance ET to include SIMD vectorization. We will then introduce techniques for abstracting iteration, and introduce thread-level parallelism for use in heavy data-centric loads. We will show to to apply these methods i...

  14. An efficient iteration strategy for the solution of the Euler equations

    Science.gov (United States)

    Walters, R. W.; Dwoyer, D. L.

    1985-01-01

    A line Gauss-Seidel (LGS) relaxation algorithm in conjunction with a one-parameter family of upwind discretizations of the Euler equations in two-dimensions is described. The basic algorithm has the property that convergence to the steady-state is quadratic for fully supersonic flows and linear otherwise. This is in contrast to the block ADI methods (either central or upwind differenced) and the upwind biased relaxation schemes, all of which converge linearly, independent of the flow regime. Moreover, the algorithm presented here is easily enhanced to detect regions of subsonic flow embedded in supersonic flow. This allows marching by lines in the supersonic regions, converging each line quadratically, and iterating in the subsonic regions, thus yielding a very efficient iteration strategy. Numerical results are presented for two-dimensional supersonic and transonic flows containing both oblique and normal shock waves which confirm the efficiency of the iteration strategy.

  15. Development and control towards a parallel water hydraulic weld/cut robot for machining processes in ITER vacuum vessel

    International Nuclear Information System (INIS)

    Wu Huapeng; Handroos, Heikki; Pessi, Pekka; Kilkki, Juha; Jones, Lawrence

    2005-01-01

    This paper presents a special robot, able to carry out welding and machining processes from inside the ITER vacuum vessel (VV), consisting of a five degree-of-freedom parallel mechanism, mounted on a carriage driven by two electric motors on a rack. The kinematic design of the robot has been optimised for ITER access and a hydraulically actuated pre-prototype built. A hybrid controller is designed for the robot, including position, speed and pressure feedback loops to achieve high accuracy and high dynamic performances. Finally, the experimental tests are given and discussed

  16. Parallel algorithms for nuclear reactor analysis via domain decomposition method

    International Nuclear Information System (INIS)

    Kim, Yong Hee

    1995-02-01

    the number of inner level iterations are limited. The analysis shows that mixed pseudo-boundary conditions have superior convergence properties if the pseudo-boundary parameters are optimally chosen. DN(or ND) conditions can be efficiently accelerated via under-relaxation concept, where DN(or ND) means that Dirichlet and Neumann conditions are independently imposed on neighbouring pseudo-boundaries. However, exact realization of such schemes is not practical since complete inner iteration is required. It is shown that limiting the number of inner iterations is equivalent to the under-relaxation concept, however, limiting the number of inner level iterations in MM scheme requires more outer iterations. Consequently, DN (or ND) algorithm with under-relaxation and MM algorithm may provide similar parallel performance in practical implementation, if the numerical solver used is not extraordinarily efficient. The parallel Schwarz algorithm is applied to two types of reactor benchmark problems: fixed source problems and eigenvalue problems. Several results of parallel computation for the problems are reported and compared with those of sequential computations. The results show that very high speedup can be achieved in fixed source problems in spite of the small problem size and that relatively high speedup, although lower than that of fixed source problems, can be obtained in eigenvalue problems

  17. Efficiency of thermal outgassing for tritium retention measurement and removal in ITER

    Directory of Open Access Journals (Sweden)

    G. De Temmerman

    2017-08-01

    Full Text Available As a licensed nuclear facility, ITER must limit the in-vessel tritium (T retention to reduce the risks of potential release during accidents, the inventory limit being set at 1kg. Simulations and extrapolations from existing experiments indicate that T-retention in ITER will mainly be driven by co-deposition with beryllium (Be eroded from the first wall, with co-deposits forming mainly in the divertor region but also possibly on the first wall itself. A pulsed Laser-Induced Desorption (LID system, called Tritium Monitor, is being designed to locally measure the T-retention in co-deposits forming on the inner divertor baffle of ITER. Regarding tritium removal, the baseline strategy is to perform baking of the plasma-facing components, at 513K for the FW and 623K for the divertor. Both baking and laser desorption rely on the thermal desorption of tritium from the surface, the efficiency of which remains unclear for thick (and possibly impure co-deposits. This contribution reports on the results of TMAP7 studies of this efficiency for ITER-relevant deposits.

  18. Domain decomposition methods and parallel computing

    International Nuclear Information System (INIS)

    Meurant, G.

    1991-01-01

    In this paper, we show how to efficiently solve large linear systems on parallel computers. These linear systems arise from discretization of scientific computing problems described by systems of partial differential equations. We show how to get a discrete finite dimensional system from the continuous problem and the chosen conjugate gradient iterative algorithm is briefly described. Then, the different kinds of parallel architectures are reviewed and their advantages and deficiencies are emphasized. We sketch the problems found in programming the conjugate gradient method on parallel computers. For this algorithm to be efficient on parallel machines, domain decomposition techniques are introduced. We give results of numerical experiments showing that these techniques allow a good rate of convergence for the conjugate gradient algorithm as well as computational speeds in excess of a billion of floating point operations per second. (author). 5 refs., 11 figs., 2 tabs., 1 inset

  19. A Novel Parallel Algorithm for Edit Distance Computation

    Directory of Open Access Journals (Sweden)

    Muhammad Murtaza Yousaf

    2018-01-01

    Full Text Available The edit distance between two sequences is the minimum number of weighted transformation-operations that are required to transform one string into the other. The weighted transformation-operations are insert, remove, and substitute. Dynamic programming solution to find edit distance exists but it becomes computationally intensive when the lengths of strings become very large. This work presents a novel parallel algorithm to solve edit distance problem of string matching. The algorithm is based on resolving dependencies in the dynamic programming solution of the problem and it is able to compute each row of edit distance table in parallel. In this way, it becomes possible to compute the complete table in min(m,n iterations for strings of size m and n whereas state-of-the-art parallel algorithm solves the problem in max(m,n iterations. The proposed algorithm also increases the amount of parallelism in each of its iteration. The algorithm is also capable of exploiting spatial locality while its implementation. Additionally, the algorithm works in a load balanced way that further improves its performance. The algorithm is implemented for multicore systems having shared memory. Implementation of the algorithm in OpenMP shows linear speedup and better execution time as compared to state-of-the-art parallel approach. Efficiency of the algorithm is also proven better in comparison to its competitor.

  20. Efficient relaxed-Jacobi smoothers for multigrid on parallel computers

    Science.gov (United States)

    Yang, Xiang; Mittal, Rajat

    2017-03-01

    In this Technical Note, we present a family of Jacobi-based multigrid smoothers suitable for the solution of discretized elliptic equations. These smoothers are based on the idea of scheduled-relaxation Jacobi proposed recently by Yang & Mittal (2014) [18] and employ two or three successive relaxed Jacobi iterations with relaxation factors derived so as to maximize the smoothing property of these iterations. The performance of these new smoothers measured in terms of convergence acceleration and computational workload, is assessed for multi-domain implementations typical of parallelized solvers, and compared to the lexicographic point Gauss-Seidel smoother. The tests include the geometric multigrid method on structured grids as well as the algebraic grid method on unstructured grids. The tests demonstrate that unlike Gauss-Seidel, the convergence of these Jacobi-based smoothers is unaffected by domain decomposition, and furthermore, they outperform the lexicographic Gauss-Seidel by factors that increase with domain partition count.

  1. Dynamical behaviour of neuronal networks iterated with memory

    International Nuclear Information System (INIS)

    Melatagia, P.M.; Ndoundam, R.; Tchuente, M.

    2005-11-01

    We study memory iteration where the updating consider a longer history of each site and the set of interaction matrices is palindromic. We analyze two different ways of updating the networks: parallel iteration with memory and sequential iteration with memory that we introduce in this paper. For parallel iteration, we define Lyapunov functional which permits us to characterize the periods behaviour and explicitly bounds the transient lengths of neural networks iterated with memory. For sequential iteration, we use an algebraic invariant to characterize the periods behaviour of the studied model of neural computation. (author)

  2. A homotopy method for solving Riccati equations on a shared memory parallel computer

    International Nuclear Information System (INIS)

    Zigic, D.; Watson, L.T.; Collins, E.G. Jr.; Davis, L.D.

    1993-01-01

    Although there are numerous algorithms for solving Riccati equations, there still remains a need for algorithms which can operate efficiently on large problems and on parallel machines. This paper gives a new homotopy-based algorithm for solving Riccati equations on a shared memory parallel computer. The central part of the algorithm is the computation of the kernel of the Jacobian matrix, which is essential for the corrector iterations along the homotopy zero curve. Using a Schur decomposition the tensor product structure of various matrices can be efficiently exploited. The algorithm allows for efficient parallelization on shared memory machines

  3. Efficient Parallel Kernel Solvers for Computational Fluid Dynamics Applications

    Science.gov (United States)

    Sun, Xian-He

    1997-01-01

    Distributed-memory parallel computers dominate today's parallel computing arena. These machines, such as Intel Paragon, IBM SP2, and Cray Origin2OO, have successfully delivered high performance computing power for solving some of the so-called "grand-challenge" problems. Despite initial success, parallel machines have not been widely accepted in production engineering environments due to the complexity of parallel programming. On a parallel computing system, a task has to be partitioned and distributed appropriately among processors to reduce communication cost and to attain load balance. More importantly, even with careful partitioning and mapping, the performance of an algorithm may still be unsatisfactory, since conventional sequential algorithms may be serial in nature and may not be implemented efficiently on parallel machines. In many cases, new algorithms have to be introduced to increase parallel performance. In order to achieve optimal performance, in addition to partitioning and mapping, a careful performance study should be conducted for a given application to find a good algorithm-machine combination. This process, however, is usually painful and elusive. The goal of this project is to design and develop efficient parallel algorithms for highly accurate Computational Fluid Dynamics (CFD) simulations and other engineering applications. The work plan is 1) developing highly accurate parallel numerical algorithms, 2) conduct preliminary testing to verify the effectiveness and potential of these algorithms, 3) incorporate newly developed algorithms into actual simulation packages. The work plan has well achieved. Two highly accurate, efficient Poisson solvers have been developed and tested based on two different approaches: (1) Adopting a mathematical geometry which has a better capacity to describe the fluid, (2) Using compact scheme to gain high order accuracy in numerical discretization. The previously developed Parallel Diagonal Dominant (PDD) algorithm

  4. Improved parallel solution techniques for the integral transport matrix method

    Energy Technology Data Exchange (ETDEWEB)

    Zerr, R. Joseph, E-mail: rjz116@psu.edu [Department of Mechanical and Nuclear Engineering, The Pennsylvania State University, University Park, PA (United States); Azmy, Yousry Y., E-mail: yyazmy@ncsu.edu [Department of Nuclear Engineering, North Carolina State University, Burlington Engineering Laboratories, Raleigh, NC (United States)

    2011-07-01

    Alternative solution strategies to the parallel block Jacobi (PBJ) method for the solution of the global problem with the integral transport matrix method operators have been designed and tested. The most straightforward improvement to the Jacobi iterative method is the Gauss-Seidel alternative. The parallel red-black Gauss-Seidel (PGS) algorithm can improve on the number of iterations and reduce work per iteration by applying an alternating red-black color-set to the subdomains and assigning multiple sub-domains per processor. A parallel GMRES(m) method was implemented as an alternative to stationary iterations. Computational results show that the PGS method can improve on the PBJ method execution time by up to 10´ when eight sub-domains per processor are used. However, compared to traditional source iterations with diffusion synthetic acceleration, it is still approximately an order of magnitude slower. The best-performing cases are optically thick because sub-domains decouple, yielding faster convergence. Further tests revealed that 64 sub-domains per processor was the best performing level of sub-domain division. An acceleration technique that improves the convergence rate would greatly improve the ITMM. The GMRES(m) method with a diagonal block pre conditioner consumes approximately the same time as the PBJ solver but could be improved by an as yet undeveloped, more efficient pre conditioner. (author)

  5. Improved parallel solution techniques for the integral transport matrix method

    International Nuclear Information System (INIS)

    Zerr, R. Joseph; Azmy, Yousry Y.

    2011-01-01

    Alternative solution strategies to the parallel block Jacobi (PBJ) method for the solution of the global problem with the integral transport matrix method operators have been designed and tested. The most straightforward improvement to the Jacobi iterative method is the Gauss-Seidel alternative. The parallel red-black Gauss-Seidel (PGS) algorithm can improve on the number of iterations and reduce work per iteration by applying an alternating red-black color-set to the subdomains and assigning multiple sub-domains per processor. A parallel GMRES(m) method was implemented as an alternative to stationary iterations. Computational results show that the PGS method can improve on the PBJ method execution time by up to 10´ when eight sub-domains per processor are used. However, compared to traditional source iterations with diffusion synthetic acceleration, it is still approximately an order of magnitude slower. The best-performing cases are optically thick because sub-domains decouple, yielding faster convergence. Further tests revealed that 64 sub-domains per processor was the best performing level of sub-domain division. An acceleration technique that improves the convergence rate would greatly improve the ITMM. The GMRES(m) method with a diagonal block pre conditioner consumes approximately the same time as the PBJ solver but could be improved by an as yet undeveloped, more efficient pre conditioner. (author)

  6. A cryogenic system design for the international thermonuclear experimental reactor (ITER)

    International Nuclear Information System (INIS)

    Slack, D.S.

    1991-01-01

    A conceptual design for ITER was completed last year. The author developed a suitable cryogenic system for ITER as part of this conceptual design effort. An overview of the design is reported. Emphasis is on the fact that cryogenics is a mature science, and a system supporting ITER needs can be made from time-proven components without loss of efficiency or reliability. Because of the large size of the ITER cryogenic system, large numbers of compressors and expanders must be used. Very high reliability is assured by arranging these components in parallel banks where servicing of individual components can be done without interruption of operations. This and other ideas based on the author's experience with Mirror Fusion Test Facility (MFTF) operations are described. 5 refs., 3 figs

  7. Development of a parallelization strategy for the VARIANT code

    International Nuclear Information System (INIS)

    Hanebutte, U.R.; Khalil, H.S.; Palmiotti, G.; Tatsumi, M.

    1996-01-01

    The VARIANT code solves the multigroup steady-state neutron diffusion and transport equation in three-dimensional Cartesian and hexagonal geometries using the variational nodal method. VARIANT consists of four major parts that must be executed sequentially: input handling, calculation of response matrices, solution algorithm (i.e. inner-outer iteration), and output of results. The objective of the parallelization effort was to reduce the overall computing time by distributing the work of the two computationally intensive (sequential) tasks, the coupling coefficient calculation and the iterative solver, equally among a group of processors. This report describes the code's calculations and gives performance results on one of the benchmark problems used to test the code. The performance analysis in the IBM SPx system shows good efficiency for well-load-balanced programs. Even for relatively small problem sizes, respectable efficiencies are seen for the SPx. An extension to achieve a higher degree of parallelism will be addressed in future work. 7 refs., 1 tab

  8. Parallel solutions of the two-group neutron diffusion equations

    International Nuclear Information System (INIS)

    Zee, K.S.; Turinsky, P.J.

    1987-01-01

    Recent efforts to adapt various numerical solution algorithms to parallel computer architectures have addressed the possibility of substantially reducing the running time of few-group neutron diffusion calculations. The authors have developed an efficient iterative parallel algorithm and an associated computer code for the rapid solution of the finite difference method representation of the two-group neutron diffusion equations on the CRAY X/MP-48 supercomputer having multi-CPUs and vector pipelines. For realistic simulation of light water reactor cores, the code employees a macroscopic depletion model with trace capability for selected fission product transients and critical boron. In addition to this, moderator and fuel temperature feedback models are also incorporated into the code. The validity of the physics models used in the code were benchmarked against qualified codes and proved accurate. This work is an extension of previous work in that various feedback effects are accounted for in the system; the entire code is structured to accommodate extensive vectorization; and an additional parallelism by multitasking is achieved not only for the solution of the matrix equations associated with the inner iterations but also for the other segments of the code, e.g., outer iterations

  9. A Note on Using Partitioning Techniques for Solving Unconstrained Optimization Problems on Parallel Systems

    Directory of Open Access Journals (Sweden)

    Mehiddin Al-Baali

    2015-12-01

    Full Text Available We deal with the design of parallel algorithms by using variable partitioning techniques to solve nonlinear optimization problems. We propose an iterative solution method that is very efficient for separable functions, our scope being to discuss its performance for general functions. Experimental results on an illustrative example have suggested some useful modifications that, even though they improve the efficiency of our parallel method, leave some questions open for further investigation.

  10. Conceptual design Fusion Experimental Reactor (FER/ITER)

    International Nuclear Information System (INIS)

    Uehara, Kazuya; Nagashima, Takashi; Ikeda, Yoshitaka

    1991-11-01

    This report describes a conceptual design of Lower Hybrid Wave (LH) system for FER and ITER. In JAERI, the conceptual design of LH system for FER has been performed in these 3 years in parallel to that of ITER. There must be a common design part with ITER and FER. The physical requirement of LH system is the saving of volt · sec in the current start-up phase, and the current drive at the boundary region. The frequency of 5GHz is mainly chosen for avoidance of the α particle absorption and for the availability of electron tube development. Seventy-two klystrons (FER) and one hundred klystrons (ITER) are necessary to inject the 30 MW (FER) and 45-50 MW (ITER) rf power into plasma using 0.7 - 0.8 MW klystron per one tube. The launching system is the multi-junction type and the rf spectrum must be as sharp as possible with high directivity to improve the current drive efficiency. One port (FER) and two ports (ITER) are used and the injection direction is in horizontal, in which the analysis of the ray-tracing code and the better coupling of LH wave is considered. The transmission line is over-sized waveguide with low rf loss. (author)

  11. Parallel preconditioned conjugate gradient algorithm applied to neutron diffusion problem

    International Nuclear Information System (INIS)

    Majumdar, A.; Martin, W.R.

    1992-01-01

    Numerical solution of the neutron diffusion problem requires solving a linear system of equations such as Ax = b, where A is an n x n symmetric positive definite (SPD) matrix; x and b are vectors with n components. The preconditioned conjugate gradient (PCG) algorithm is an efficient iterative method for solving such a linear system of equations. In this paper, the authors describe the implementation of a parallel PCG algorithm on a shared memory machine (BBN TC2000) and on a distributed workstation (IBM RS6000) environment created by the parallel virtual machine parallelization software

  12. Parallel paving: An algorithm for generating distributed, adaptive, all-quadrilateral meshes on parallel computers

    Energy Technology Data Exchange (ETDEWEB)

    Lober, R.R.; Tautges, T.J.; Vaughan, C.T.

    1997-03-01

    Paving is an automated mesh generation algorithm which produces all-quadrilateral elements. It can additionally generate these elements in varying sizes such that the resulting mesh adapts to a function distribution, such as an error function. While powerful, conventional paving is a very serial algorithm in its operation. Parallel paving is the extension of serial paving into parallel environments to perform the same meshing functions as conventional paving only on distributed, discretized models. This extension allows large, adaptive, parallel finite element simulations to take advantage of paving`s meshing capabilities for h-remap remeshing. A significantly modified version of the CUBIT mesh generation code has been developed to host the parallel paving algorithm and demonstrate its capabilities on both two dimensional and three dimensional surface geometries and compare the resulting parallel produced meshes to conventionally paved meshes for mesh quality and algorithm performance. Sandia`s {open_quotes}tiling{close_quotes} dynamic load balancing code has also been extended to work with the paving algorithm to retain parallel efficiency as subdomains undergo iterative mesh refinement.

  13. An efficient iterative grand canonical Monte Carlo algorithm to determine individual ionic chemical potentials in electrolytes.

    Science.gov (United States)

    Malasics, Attila; Boda, Dezso

    2010-06-28

    Two iterative procedures have been proposed recently to calculate the chemical potentials corresponding to prescribed concentrations from grand canonical Monte Carlo (GCMC) simulations. Both are based on repeated GCMC simulations with updated excess chemical potentials until the desired concentrations are established. In this paper, we propose combining our robust and fast converging iteration algorithm [Malasics, Gillespie, and Boda, J. Chem. Phys. 128, 124102 (2008)] with the suggestion of Lamperski [Mol. Simul. 33, 1193 (2007)] to average the chemical potentials in the iterations (instead of just using the chemical potentials obtained in the last iteration). We apply the unified method for various electrolyte solutions and show that our algorithm is more efficient if we use the averaging procedure. We discuss the convergence problems arising from violation of charge neutrality when inserting/deleting individual ions instead of neutral groups of ions (salts). We suggest a correction term to the iteration procedure that makes the algorithm efficient to determine the chemical potentials of individual ions too.

  14. Multi-petascale highly efficient parallel supercomputer

    Science.gov (United States)

    Asaad, Sameh; Bellofatto, Ralph E.; Blocksome, Michael A.; Blumrich, Matthias A.; Boyle, Peter; Brunheroto, Jose R.; Chen, Dong; Cher, Chen-Yong; Chiu, George L.; Christ, Norman; Coteus, Paul W.; Davis, Kristan D.; Dozsa, Gabor J.; Eichenberger, Alexandre E.; Eisley, Noel A.; Ellavsky, Matthew R.; Evans, Kahn C.; Fleischer, Bruce M.; Fox, Thomas W.; Gara, Alan; Giampapa, Mark E.; Gooding, Thomas M.; Gschwind, Michael K.; Gunnels, John A.; Hall, Shawn A.; Haring, Rudolf A.; Heidelberger, Philip; Inglett, Todd A.; Knudson, Brant L.; Kopcsay, Gerard V.; Kumar, Sameer; Mamidala, Amith R.; Marcella, James A.; Megerian, Mark G.; Miller, Douglas R.; Miller, Samuel J.; Muff, Adam J.; Mundy, Michael B.; O'Brien, John K.; O'Brien, Kathryn M.; Ohmacht, Martin; Parker, Jeffrey J.; Poole, Ruth J.; Ratterman, Joseph D.; Salapura, Valentina; Satterfield, David L.; Senger, Robert M.; Steinmacher-Burow, Burkhard; Stockdell, William M.; Stunkel, Craig B.; Sugavanam, Krishnan; Sugawara, Yutaka; Takken, Todd E.; Trager, Barry M.; Van Oosten, James L.; Wait, Charles D.; Walkup, Robert E.; Watson, Alfred T.; Wisniewski, Robert W.; Wu, Peng

    2018-05-15

    A Multi-Petascale Highly Efficient Parallel Supercomputer of 100 petaflop-scale includes node architectures based upon System-On-a-Chip technology, where each processing node comprises a single Application Specific Integrated Circuit (ASIC). The ASIC nodes are interconnected by a five dimensional torus network that optimally maximize the throughput of packet communications between nodes and minimize latency. The network implements collective network and a global asynchronous network that provides global barrier and notification functions. Integrated in the node design include a list-based prefetcher. The memory system implements transaction memory, thread level speculation, and multiversioning cache that improves soft error rate at the same time and supports DMA functionality allowing for parallel processing message-passing.

  15. Non-Cartesian parallel imaging reconstruction.

    Science.gov (United States)

    Wright, Katherine L; Hamilton, Jesse I; Griswold, Mark A; Gulani, Vikas; Seiberlich, Nicole

    2014-11-01

    Non-Cartesian parallel imaging has played an important role in reducing data acquisition time in MRI. The use of non-Cartesian trajectories can enable more efficient coverage of k-space, which can be leveraged to reduce scan times. These trajectories can be undersampled to achieve even faster scan times, but the resulting images may contain aliasing artifacts. Just as Cartesian parallel imaging can be used to reconstruct images from undersampled Cartesian data, non-Cartesian parallel imaging methods can mitigate aliasing artifacts by using additional spatial encoding information in the form of the nonhomogeneous sensitivities of multi-coil phased arrays. This review will begin with an overview of non-Cartesian k-space trajectories and their sampling properties, followed by an in-depth discussion of several selected non-Cartesian parallel imaging algorithms. Three representative non-Cartesian parallel imaging methods will be described, including Conjugate Gradient SENSE (CG SENSE), non-Cartesian generalized autocalibrating partially parallel acquisition (GRAPPA), and Iterative Self-Consistent Parallel Imaging Reconstruction (SPIRiT). After a discussion of these three techniques, several potential promising clinical applications of non-Cartesian parallel imaging will be covered. © 2014 Wiley Periodicals, Inc.

  16. Parallelized preconditioned BiCGStab solution of sparse linear system equations in F-COBRA-TF

    International Nuclear Information System (INIS)

    Geemert, Rene van; Glück, Markus; Riedmann, Michael; Gabriel, Harry

    2011-01-01

    Recently, the in-house development of a preconditioned and parallelized BiCGStab solver has been pursued successfully in AREVA’s advanced sub-channel code F-COBRA-TF. This solver can be run either in a sequential computation mode on a single CPU, or in a parallel computation mode on multiple parallel CPUs. The developed procedure enables the computation of several thousands of successive sparse linear system solutions in F-COBRA-TF with acceptable wall clock run times. The current paper provides general information about F-COBRA-TF in terms of modeling capabilities and application areas, and points out where the relevance arises for the efficient iterative solution of sparse linear systems. Furthermore, the preconditioning and parallelization strategies in the developed BiCGStab iterative solution approach are discussed. The paper is concluded with a number of verification examples. (author)

  17. Implementation of the multireference Brillouin-Wigner and Mukherjee’s coupled cluster methods with non-iterative triple excitations utilizing reference-level parallelism

    Energy Technology Data Exchange (ETDEWEB)

    Bhaskaran-Nair, Kiran; Brabec, Jiri; Apra, Edoardo; van Dam, Hubertus JJ; Pittner, Jiri; Kowalski, Karol

    2012-09-07

    In this paper we discuss the performance of the non-iterative State-Specific Mul- tireference Coupled Cluster (SS-MRCC) methods accounting for the effect of triply excited cluster amplitudes. The corrections to the Brillouin-Wigner and Mukherjee MRCC models based on the manifold of singly and doubly excited cluster amplitudes (BW-MRCCSD and Mk-MRCCSD, respectively) are tested and compared with the exact full configuration interaction results (FCI) for small systems (H2O, N2, and Be3). For larger systems (naphthyne isomers and -carotene), the non-iterative BW-MRCCSD(T) and Mk-MRCCSD(T) methods are compared against the results obtained with the single reference coupled cluster methods. We also report on the parallel performance of the non-iterative implementations based on the use of pro- cessor groups.

  18. Parallel multigrid smoothing: polynomial versus Gauss-Seidel

    International Nuclear Information System (INIS)

    Adams, Mark; Brezina, Marian; Hu, Jonathan; Tuminaro, Ray

    2003-01-01

    Gauss-Seidel is often the smoother of choice within multigrid applications. In the context of unstructured meshes, however, maintaining good parallel efficiency is difficult with multiplicative iterative methods such as Gauss-Seidel. This leads us to consider alternative smoothers. We discuss the computational advantages of polynomial smoothers within parallel multigrid algorithms for positive definite symmetric systems. Two particular polynomials are considered: Chebyshev and a multilevel specific polynomial. The advantages of polynomial smoothing over traditional smoothers such as Gauss-Seidel are illustrated on several applications: Poisson's equation, thin-body elasticity, and eddy current approximations to Maxwell's equations. While parallelizing the Gauss-Seidel method typically involves a compromise between a scalable convergence rate and maintaining high flop rates, polynomial smoothers achieve parallel scalable multigrid convergence rates without sacrificing flop rates. We show that, although parallel computers are the main motivation, polynomial smoothers are often surprisingly competitive with Gauss-Seidel smoothers on serial machines

  19. Parallel multigrid smoothing: polynomial versus Gauss-Seidel

    Science.gov (United States)

    Adams, Mark; Brezina, Marian; Hu, Jonathan; Tuminaro, Ray

    2003-07-01

    Gauss-Seidel is often the smoother of choice within multigrid applications. In the context of unstructured meshes, however, maintaining good parallel efficiency is difficult with multiplicative iterative methods such as Gauss-Seidel. This leads us to consider alternative smoothers. We discuss the computational advantages of polynomial smoothers within parallel multigrid algorithms for positive definite symmetric systems. Two particular polynomials are considered: Chebyshev and a multilevel specific polynomial. The advantages of polynomial smoothing over traditional smoothers such as Gauss-Seidel are illustrated on several applications: Poisson's equation, thin-body elasticity, and eddy current approximations to Maxwell's equations. While parallelizing the Gauss-Seidel method typically involves a compromise between a scalable convergence rate and maintaining high flop rates, polynomial smoothers achieve parallel scalable multigrid convergence rates without sacrificing flop rates. We show that, although parallel computers are the main motivation, polynomial smoothers are often surprisingly competitive with Gauss-Seidel smoothers on serial machines.

  20. An efficient iterative method for the generalized Stokes problem

    Energy Technology Data Exchange (ETDEWEB)

    Sameh, A. [Univ. of Minnesota, Twin Cities, MN (United States); Sarin, V. [Univ. of Illinois, Urbana, IL (United States)

    1996-12-31

    This paper presents an efficient iterative scheme for the generalized Stokes problem, which arises frequently in the simulation of time-dependent Navier-Stokes equations for incompressible fluid flow. The general form of the linear system is where A = {alpha}M + vT is an n x n symmetric positive definite matrix, in which M is the mass matrix, T is the discrete Laplace operator, {alpha} and {nu} are positive constants proportional to the inverses of the time-step {Delta}t and the Reynolds number Re respectively, and B is the discrete gradient operator of size n x k (k < n). Even though the matrix A is symmetric and positive definite, the system is indefinite due to the incompressibility constraint (B{sup T}u = 0). This causes difficulties both for iterative methods and commonly used preconditioners. Moreover, depending on the ratio {alpha}/{nu}, A behaves like the mass matrix M at one extreme and the Laplace operator T at the other, thus complicating the issue of preconditioning.

  1. Parallel keyed hash function construction based on chaotic maps

    International Nuclear Information System (INIS)

    Xiao Di; Liao Xiaofeng; Deng Shaojiang

    2008-01-01

    Recently, a variety of chaos-based hash functions have been proposed. Nevertheless, none of them works efficiently in parallel computing environment. In this Letter, an algorithm for parallel keyed hash function construction is proposed, whose structure can ensure the uniform sensitivity of hash value to the message. By means of the mechanism of both changeable-parameter and self-synchronization, the keystream establishes a close relation with the algorithm key, the content and the order of each message block. The entire message is modulated into the chaotic iteration orbit, and the coarse-graining trajectory is extracted as the hash value. Theoretical analysis and computer simulation indicate that the proposed algorithm can satisfy the performance requirements of hash function. It is simple, efficient, practicable, and reliable. These properties make it a good choice for hash on parallel computing platform

  2. Eigenvalues calculation algorithms for {lambda}-modes determination. Parallelization approach

    Energy Technology Data Exchange (ETDEWEB)

    Vidal, V. [Universidad Politecnica de Valencia (Spain). Departamento de Sistemas Informaticos y Computacion; Verdu, G.; Munoz-Cobo, J.L. [Universidad Politecnica de Valencia (Spain). Departamento de Ingenieria Quimica y Nuclear; Ginestart, D. [Universidad Politecnica de Valencia (Spain). Departamento de Matematica Aplicada

    1997-03-01

    In this paper, we review two methods to obtain the {lambda}-modes of a nuclear reactor, Subspace Iteration method and Arnoldi`s method, which are popular methods to solve the partial eigenvalue problem for a given matrix. In the developed application for the neutron diffusion equation we include improved acceleration techniques for both methods. Also, we propose two parallelization approaches for these methods, a coarse grain parallelization and a fine grain one. We have tested the developed algorithms with two realistic problems, focusing on the efficiency of the methods according to the CPU times. (author).

  3. Iteration schemes for parallelizing models of superconductivity

    Energy Technology Data Exchange (ETDEWEB)

    Gray, P.A. [Michigan State Univ., East Lansing, MI (United States)

    1996-12-31

    The time dependent Lawrence-Doniach model, valid for high fields and high values of the Ginzburg-Landau parameter, is often used for studying vortex dynamics in layered high-T{sub c} superconductors. When solving these equations numerically, the added degrees of complexity due to the coupling and nonlinearity of the model often warrant the use of high-performance computers for their solution. However, the interdependence between the layers can be manipulated so as to allow parallelization of the computations at an individual layer level. The reduced parallel tasks may then be solved independently using a heterogeneous cluster of networked workstations connected together with Parallel Virtual Machine (PVM) software. Here, this parallelization of the model is discussed and several computational implementations of varying degrees of parallelism are presented. Computational results are also given which contrast properties of convergence speed, stability, and consistency of these implementations. Included in these results are models involving the motion of vortices due to an applied current and pinning effects due to various material properties.

  4. Parallel preconditioning techniques for sparse CG solvers

    Energy Technology Data Exchange (ETDEWEB)

    Basermann, A.; Reichel, B.; Schelthoff, C. [Central Institute for Applied Mathematics, Juelich (Germany)

    1996-12-31

    Conjugate gradient (CG) methods to solve sparse systems of linear equations play an important role in numerical methods for solving discretized partial differential equations. The large size and the condition of many technical or physical applications in this area result in the need for efficient parallelization and preconditioning techniques of the CG method. In particular for very ill-conditioned matrices, sophisticated preconditioner are necessary to obtain both acceptable convergence and accuracy of CG. Here, we investigate variants of polynomial and incomplete Cholesky preconditioners that markedly reduce the iterations of the simply diagonally scaled CG and are shown to be well suited for massively parallel machines.

  5. High-performance blob-based iterative three-dimensional reconstruction in electron tomography using multi-GPUs

    Directory of Open Access Journals (Sweden)

    Wan Xiaohua

    2012-06-01

    Full Text Available Abstract Background Three-dimensional (3D reconstruction in electron tomography (ET has emerged as a leading technique to elucidate the molecular structures of complex biological specimens. Blob-based iterative methods are advantageous reconstruction methods for 3D reconstruction in ET, but demand huge computational costs. Multiple graphic processing units (multi-GPUs offer an affordable platform to meet these demands. However, a synchronous communication scheme between multi-GPUs leads to idle GPU time, and a weighted matrix involved in iterative methods cannot be loaded into GPUs especially for large images due to the limited available memory of GPUs. Results In this paper we propose a multilevel parallel strategy combined with an asynchronous communication scheme and a blob-ELLR data structure to efficiently perform blob-based iterative reconstructions on multi-GPUs. The asynchronous communication scheme is used to minimize the idle GPU time so as to asynchronously overlap communications with computations. The blob-ELLR data structure only needs nearly 1/16 of the storage space in comparison with ELLPACK-R (ELLR data structure and yields significant acceleration. Conclusions Experimental results indicate that the multilevel parallel scheme combined with the asynchronous communication scheme and the blob-ELLR data structure allows efficient implementations of 3D reconstruction in ET on multi-GPUs.

  6. Large-Scale Parallel Finite Element Analysis of the Stress Singular Problems

    International Nuclear Information System (INIS)

    Noriyuki Kushida; Hiroshi Okuda; Genki Yagawa

    2002-01-01

    In this paper, the convergence behavior of large-scale parallel finite element method for the stress singular problems was investigated. The convergence behavior of iterative solvers depends on the efficiency of the pre-conditioners. However, efficiency of pre-conditioners may be influenced by the domain decomposition that is necessary for parallel FEM. In this study the following results were obtained: Conjugate gradient method without preconditioning and the diagonal scaling preconditioned conjugate gradient method were not influenced by the domain decomposition as expected. symmetric successive over relaxation method preconditioned conjugate gradient method converged 6% faster as maximum if the stress singular area was contained in one sub-domain. (authors)

  7. Migration of vectorized iterative solvers to distributed memory architectures

    Energy Technology Data Exchange (ETDEWEB)

    Pommerell, C. [AT& T Bell Labs., Murray Hill, NJ (United States); Ruehl, R. [CSCS-ETH, Manno (Switzerland)

    1994-12-31

    Both necessity and opportunity motivate the use of high-performance computers for iterative linear solvers. Necessity results from the size of the problems being solved-smaller problems are often better handled by direct methods. Opportunity arises from the formulation of the iterative methods in terms of simple linear algebra operations, even if this {open_quote}natural{close_quotes} parallelism is not easy to exploit in irregularly structured sparse matrices and with good preconditioners. As a result, high-performance implementations of iterative solvers have attracted a lot of interest in recent years. Most efforts are geared to vectorize or parallelize the dominating operation-structured or unstructured sparse matrix-vector multiplication, or to increase locality and parallelism by reformulating the algorithm-reducing global synchronization in inner products or local data exchange in preconditioners. Target architectures for iterative solvers currently include mostly vector supercomputers and architectures with one or few optimized (e.g., super-scalar and/or super-pipelined RISC) processors and hierarchical memory systems. More recently, parallel computers with physically distributed memory and a better price/performance ratio have been offered by vendors as a very interesting alternative to vector supercomputers. However, programming comfort on such distributed memory parallel processors (DMPPs) still lags behind. Here the authors are concerned with iterative solvers and their changing computing environment. In particular, they are considering migration from traditional vector supercomputers to DMPPs. Application requirements force one to use flexible and portable libraries. They want to extend the portability of iterative solvers rather than reimplementing everything for each new machine, or even for each new architecture.

  8. Efficient Parallel Algorithms for Unsteady Incompressible Flows

    KAUST Repository

    Guermond, Jean-Luc; Minev, Peter D.

    2013-01-01

    The objective of this paper is to give an overview of recent developments on splitting schemes for solving the time-dependent incompressible Navier–Stokes equations and to discuss possible extensions to the variable density/viscosity case. A particular attention is given to algorithms that can be implemented efficiently on large parallel clusters.

  9. Iterative solution of general sparse linear systems on clusters of workstations

    Energy Technology Data Exchange (ETDEWEB)

    Lo, Gen-Ching; Saad, Y. [Univ. of Minnesota, Minneapolis, MN (United States)

    1996-12-31

    Solving sparse irregularly structured linear systems on parallel platforms poses several challenges. First, sparsity makes it difficult to exploit data locality, whether in a distributed or shared memory environment. A second, perhaps more serious challenge, is to find efficient ways to precondition the system. Preconditioning techniques which have a large degree of parallelism, such as multicolor SSOR, often have a slower rate of convergence than their sequential counterparts. Finally, a number of other computational kernels such as inner products could ruin any gains gained from parallel speed-ups, and this is especially true on workstation clusters where start-up times may be high. In this paper we discuss these issues and report on our experience with PSPARSLIB, an on-going project for building a library of parallel iterative sparse matrix solvers.

  10. A reduced complexity highly power/bandwidth efficient coded FQPSK system with iterative decoding

    Science.gov (United States)

    Simon, M. K.; Divsalar, D.

    2001-01-01

    Based on a representation of FQPSK as a trellis-coded modulation, this paper investigates the potential improvement in power efficiency obtained from the application of simple outer codes to form a concatenated coding arrangement with iterative decoding.

  11. Preliminary Study on the Enhancement of Reconstruction Speed for Emission Computed Tomography Using Parallel Processing

    International Nuclear Information System (INIS)

    Park, Min Jae; Lee, Jae Sung; Kim, Soo Mee; Kang, Ji Yeon; Lee, Dong Soo; Park, Kwang Suk

    2009-01-01

    Conventional image reconstruction uses simplified physical models of projection. However, real physics, for example 3D reconstruction, takes too long time to process all the data in clinic and is unable in a common reconstruction machine because of the large memory for complex physical models. We suggest the realistic distributed memory model of fast-reconstruction using parallel processing on personal computers to enable large-scale technologies. The preliminary tests for the possibility on virtual machines and various performance test on commercial super computer, Tachyon were performed. Expectation maximization algorithm with common 2D projection and realistic 3D line of response were tested. Since the process time was getting slower (max 6 times) after a certain iteration, optimization for compiler was performed to maximize the efficiency of parallelization. Parallel processing of a program on multiple computers was available on Linux with MPICH and NFS. We verified that differences between parallel processed image and single processed image at the same iterations were under the significant digits of floating point number, about 6 bit. Double processors showed good efficiency (1.96 times) of parallel computing. Delay phenomenon was solved by vectorization method using SSE. Through the study, realistic parallel computing system in clinic was established to be able to reconstruct by plenty of memory using the realistic physical models which was impossible to simplify

  12. Study on Parallel Processing for Efficient Flexible Multibody Analysis based on Subsystem Synthesis Method

    Energy Technology Data Exchange (ETDEWEB)

    Han, Jong-Boo; Song, Hajun; Kim, Sung-Soo [Chungnam Nat’l Univ., Daejeon (Korea, Republic of)

    2017-06-15

    Flexible multibody simulations are widely used in the industry to design mechanical systems. In flexible multibody dynamics, deformation coordinates are described either relatively in the body reference frame that is floating in the space or in the inertial reference frame. Moreover, these deformation coordinates are generated based on the discretization of the body according to the finite element approach. Therefore, the formulation of the flexible multibody system always deals with a huge number of degrees of freedom and the numerical solution methods require a substantial amount of computational time. Parallel computational methods are a solution for efficient computation. However, most of the parallel computational methods are focused on the efficient solution of large-sized linear equations. For multibody analysis, we need to develop an efficient formulation that could be suitable for parallel computation. In this paper, we developed a subsystem synthesis method for a flexible multibody system and proposed efficient parallel computational schemes based on the OpenMP API in order to achieve efficient computation. Simulations of a rotating blade system, which consists of three identical blades, were carried out with two different parallel computational schemes. Actual CPU times were measured to investigate the efficiency of the proposed parallel schemes.

  13. On the efficient parallel computation of Legendre transforms

    NARCIS (Netherlands)

    Inda, M.A.; Bisseling, R.H.; Maslen, D.K.

    2001-01-01

    In this article, we discuss a parallel implementation of efficient algorithms for computation of Legendre polynomial transforms and other orthogonal polynomial transforms. We develop an approach to the Driscoll-Healy algorithm using polynomial arithmetic and present experimental results on the

  14. On the efficient parallel computation of Legendre transforms

    NARCIS (Netherlands)

    Inda, M.A.; Bisseling, R.H.; Maslen, D.K.

    1999-01-01

    In this article we discuss a parallel implementation of efficient algorithms for computation of Legendre polynomial transforms and other orthogonal polynomial transforms. We develop an approach to the Driscoll-Healy algorithm using polynomial arithmetic and present experimental results on the

  15. Parallel 3-D method of characteristics in MPACT

    International Nuclear Information System (INIS)

    Kochunas, B.; Dovvnar, T. J.; Liu, Z.

    2013-01-01

    A new parallel 3-D MOC kernel has been developed and implemented in MPACT which makes use of the modular ray tracing technique to reduce computational requirements and to facilitate parallel decomposition. The parallel model makes use of both distributed and shared memory parallelism which are implemented with the MPI and OpenMP standards, respectively. The kernel is capable of parallel decomposition of problems in space, angle, and by characteristic rays up to 0(104) processors. Initial verification of the parallel 3-D MOC kernel was performed using the Takeda 3-D transport benchmark problems. The eigenvalues computed by MPACT are within the statistical uncertainty of the benchmark reference and agree well with the averages of other participants. The MPACT k eff differs from the benchmark results for rodded and un-rodded cases by 11 and -40 pcm, respectively. The calculations were performed for various numbers of processors and parallel decompositions up to 15625 processors; all producing the same result at convergence. The parallel efficiency of the worst case was 60%, while very good efficiency (>95%) was observed for cases using 500 processors. The overall run time for the 500 processor case was 231 seconds and 19 seconds for the case with 15625 processors. Ongoing work is focused on developing theoretical performance models and the implementation of acceleration techniques to minimize the number of iterations to converge. (authors)

  16. A parallel algorithm for solving the integral form of the discrete ordinates equations

    International Nuclear Information System (INIS)

    Zerr, R. J.; Azmy, Y. Y.

    2009-01-01

    The integral form of the discrete ordinates equations involves a system of equations that has a large, dense coefficient matrix. The serial construction methodology is presented and properties that affect the execution times to construct and solve the system are evaluated. Two approaches for massively parallel implementation of the solution algorithm are proposed and the current results of one of these are presented. The system of equations May be solved using two parallel solvers-block Jacobi and conjugate gradient. Results indicate that both methods can reduce overall wall-clock time for execution. The conjugate gradient solver exhibits better performance to compete with the traditional source iteration technique in terms of execution time and scalability. The parallel conjugate gradient method is synchronous, hence it does not increase the number of iterations for convergence compared to serial execution, and the efficiency of the algorithm demonstrates an apparent asymptotic decline. (authors)

  17. Colorado Conference on iterative methods. Volume 1

    Energy Technology Data Exchange (ETDEWEB)

    NONE

    1994-12-31

    The conference provided a forum on many aspects of iterative methods. Volume I topics were:Session: domain decomposition, nonlinear problems, integral equations and inverse problems, eigenvalue problems, iterative software kernels. Volume II presents nonsymmetric solvers, parallel computation, theory of iterative methods, software and programming environment, ODE solvers, multigrid and multilevel methods, applications, robust iterative methods, preconditioners, Toeplitz and circulation solvers, and saddle point problems. Individual papers are indexed separately on the EDB.

  18. Efficient Four-Parametric with-and-without-Memory Iterative Methods Possessing High Efficiency Indices

    Directory of Open Access Journals (Sweden)

    Alicia Cordero

    2018-01-01

    Full Text Available We construct a family of derivative-free optimal iterative methods without memory to approximate a simple zero of a nonlinear function. Error analysis demonstrates that the without-memory class has eighth-order convergence and is extendable to with-memory class. The extension of new family to the with-memory one is also presented which attains the convergence order 15.5156 and a very high efficiency index 15.51561/4≈1.9847. Some particular schemes of the with-memory family are also described. Numerical examples and some dynamical aspects of the new schemes are given to support theoretical results.

  19. An efficient implementation of parallel molecular dynamics method on SMP cluster architecture

    International Nuclear Information System (INIS)

    Suzuki, Masaaki; Okuda, Hiroshi; Yagawa, Genki

    2003-01-01

    The authors have applied MPI/OpenMP hybrid parallel programming model to parallelize a molecular dynamics (MD) method on a symmetric multiprocessor (SMP) cluster architecture. In that architecture, it can be expected that the hybrid parallel programming model, which uses the message passing library such as MPI for inter-SMP node communication and the loop directive such as OpenMP for intra-SNP node parallelization, is the most effective one. In this study, the parallel performance of the hybrid style has been compared with that of conventional flat parallel programming style, which uses only MPI, both in cases the fast multipole method (FMM) is employed for computing long-distance interactions and that is not employed. The computer environments used here are Hitachi SR8000/MPP placed at the University of Tokyo. The results of calculation are as follows. Without FMM, the parallel efficiency using 16 SMP nodes (128 PEs) is: 90% with the hybrid style, 75% with the flat-MPI style for MD simulation with 33,402 atoms. With FMM, the parallel efficiency using 16 SMP nodes (128 PEs) is: 60% with the hybrid style, 48% with the flat-MPI style for MD simulation with 117,649 atoms. (author)

  20. Efficient multitasking: parallel versus serial processing of multiple tasks.

    Science.gov (United States)

    Fischer, Rico; Plessow, Franziska

    2015-01-01

    In the context of performance optimizations in multitasking, a central debate has unfolded in multitasking research around whether cognitive processes related to different tasks proceed only sequentially (one at a time), or can operate in parallel (simultaneously). This review features a discussion of theoretical considerations and empirical evidence regarding parallel versus serial task processing in multitasking. In addition, we highlight how methodological differences and theoretical conceptions determine the extent to which parallel processing in multitasking can be detected, to guide their employment in future research. Parallel and serial processing of multiple tasks are not mutually exclusive. Therefore, questions focusing exclusively on either task-processing mode are too simplified. We review empirical evidence and demonstrate that shifting between more parallel and more serial task processing critically depends on the conditions under which multiple tasks are performed. We conclude that efficient multitasking is reflected by the ability of individuals to adjust multitasking performance to environmental demands by flexibly shifting between different processing strategies of multiple task-component scheduling.

  1. Parallel computing techniques for rotorcraft aerodynamics

    Science.gov (United States)

    Ekici, Kivanc

    The modification of unsteady three-dimensional Navier-Stokes codes for application on massively parallel and distributed computing environments is investigated. The Euler/Navier-Stokes code TURNS (Transonic Unsteady Rotor Navier-Stokes) was chosen as a test bed because of its wide use by universities and industry. For the efficient implementation of TURNS on parallel computing systems, two algorithmic changes are developed. First, main modifications to the implicit operator, Lower-Upper Symmetric Gauss Seidel (LU-SGS) originally used in TURNS, is performed. Second, application of an inexact Newton method, coupled with a Krylov subspace iterative method (Newton-Krylov method) is carried out. Both techniques have been tried previously for the Euler equations mode of the code. In this work, we have extended the methods to the Navier-Stokes mode. Several new implicit operators were tried because of convergence problems of traditional operators with the high cell aspect ratio (CAR) grids needed for viscous calculations on structured grids. Promising results for both Euler and Navier-Stokes cases are presented for these operators. For the efficient implementation of Newton-Krylov methods to the Navier-Stokes mode of TURNS, efficient preconditioners must be used. The parallel implicit operators used in the previous step are employed as preconditioners and the results are compared. The Message Passing Interface (MPI) protocol has been used because of its portability to various parallel architectures. It should be noted that the proposed methodology is general and can be applied to several other CFD codes (e.g. OVERFLOW).

  2. Explorations of the implementation of a parallel IDW interpolation algorithm in a Linux cluster-based parallel GIS

    Science.gov (United States)

    Huang, Fang; Liu, Dingsheng; Tan, Xicheng; Wang, Jian; Chen, Yunping; He, Binbin

    2011-04-01

    To design and implement an open-source parallel GIS (OP-GIS) based on a Linux cluster, the parallel inverse distance weighting (IDW) interpolation algorithm has been chosen as an example to explore the working model and the principle of algorithm parallel pattern (APP), one of the parallelization patterns for OP-GIS. Based on an analysis of the serial IDW interpolation algorithm of GRASS GIS, this paper has proposed and designed a specific parallel IDW interpolation algorithm, incorporating both single process, multiple data (SPMD) and master/slave (M/S) programming modes. The main steps of the parallel IDW interpolation algorithm are: (1) the master node packages the related information, and then broadcasts it to the slave nodes; (2) each node calculates its assigned data extent along one row using the serial algorithm; (3) the master node gathers the data from all nodes; and (4) iterations continue until all rows have been processed, after which the results are outputted. According to the experiments performed in the course of this work, the parallel IDW interpolation algorithm can attain an efficiency greater than 0.93 compared with similar algorithms, which indicates that the parallel algorithm can greatly reduce processing time and maximize speed and performance.

  3. Progress and Achievements on the R&D Activities for ITER Vacuum Vessel

    Energy Technology Data Exchange (ETDEWEB)

    Nakahira, M. [Japan Atomic Energy Research Institute (JAERI); Koizumi, K. [Japan Atomic Energy Research Institute (JAERI); Takahashi, H. [Japan Atomic Energy Research Institute (JAERI); Onozuka, M. [ITER Joint Central Team, Garching, Germany; Ioki, K. [ITER Joint Central Team, Garching, Germany; Kuzumin, E. [D.V. Efremov Scientific Research Institute, St. Petersburg, Russia; Krylov, V. [D.V. Efremov Scientific Research Institute, St. Petersburg, Russia; Maslakowski, J. [Oak Ridge National Laboratory (ORNL); Nelson, Brad E [ORNL; Jones, L. [Max-Planck Institute, Garching, Germany; Danner, W. [Max-Planck Institute, Garching, Germany; Maisonnier, D. [Max-Planck Institute, Garching, Germany

    2001-01-01

    The ITER vacuum vessel (VV) is designed to be large double-walled structure with a D-shaped crosssection. The achievable fabrication tolerance of this structure was unknown due to the size and complexity of shape. The Full-scale Sector Model of ITER Vacuum Vessel, which was 15m in height, was fabricated and tested to obtain the fabrication and assembly tolerances. The model was fabricated within the target tolerance of 5mm and welding deformation during assembly operation was obtained. The port structure was also connected using remotized welding tools to demonstrate the basic maintenance activity. In parallel, the tests of advanced welding, cutting and inspection system were performed to improve the efficiency of fabrication and maintenance of the Vacuum Vessel. These activities show the feasibility of ITER Vacuum Vessel as feasible in a realistic way. This paper describes the major progress, achievement and latest status of the R&D activities on the ITER vacuum vessel.

  4. 2D-RBUC for efficient parallel compression of residuals

    Science.gov (United States)

    Đurđević, Đorđe M.; Tartalja, Igor I.

    2018-02-01

    In this paper, we present a method for lossless compression of residuals with an efficient SIMD parallel decompression. The residuals originate from lossy or near lossless compression of height fields, which are commonly used to represent models of terrains. The algorithm is founded on the existing RBUC method for compression of non-uniform data sources. We have adapted the method to capture 2D spatial locality of height fields, and developed the data decompression algorithm for modern GPU architectures already present even in home computers. In combination with the point-level SIMD-parallel lossless/lossy high field compression method HFPaC, characterized by fast progressive decompression and seamlessly reconstructed surface, the newly proposed method trades off small efficiency degradation for a non negligible compression ratio (measured up to 91%) benefit.

  5. Discontinuous interleaving of parallel inverters for efficiency improvement

    DEFF Research Database (Denmark)

    Rannestad, Bjørn; Munk-Nielsen, Stig; Gadgaard, Kristian

    2017-01-01

    Interleaved switching of parallel inverters has previously been proposed for efficiency/size improvements of grid connected three-phase inverters. This paper proposes a novel interleaving method which practically eliminates insulated gate bipolar transistor (IGBT) turn-on losses and drastically...... overall power module losses are reduced. The modulation strategy is suited for converters with doubly fed induction generators (DFIG) for wind turbines, but are not limited hereto. Improvement of switching performance are measured and operational efficiency improvements are calculated and verified...

  6. High Efficiency EBCOT with Parallel Coding Architecture for JPEG2000

    Directory of Open Access Journals (Sweden)

    Chiang Jen-Shiun

    2006-01-01

    Full Text Available This work presents a parallel context-modeling coding architecture and a matching arithmetic coder (MQ-coder for the embedded block coding (EBCOT unit of the JPEG2000 encoder. Tier-1 of the EBCOT consumes most of the computation time in a JPEG2000 encoding system. The proposed parallel architecture can increase the throughput rate of the context modeling. To match the high throughput rate of the parallel context-modeling architecture, an efficient pipelined architecture for context-based adaptive arithmetic encoder is proposed. This encoder of JPEG2000 can work at 180 MHz to encode one symbol each cycle. Compared with the previous context-modeling architectures, our parallel architectures can improve the throughput rate up to 25%.

  7. A fast iterative scheme for the linearized Boltzmann equation

    Science.gov (United States)

    Wu, Lei; Zhang, Jun; Liu, Haihu; Zhang, Yonghao; Reese, Jason M.

    2017-06-01

    Iterative schemes to find steady-state solutions to the Boltzmann equation are efficient for highly rarefied gas flows, but can be very slow to converge in the near-continuum flow regime. In this paper, a synthetic iterative scheme is developed to speed up the solution of the linearized Boltzmann equation by penalizing the collision operator L into the form L = (L + Nδh) - Nδh, where δ is the gas rarefaction parameter, h is the velocity distribution function, and N is a tuning parameter controlling the convergence rate. The velocity distribution function is first solved by the conventional iterative scheme, then it is corrected such that the macroscopic flow velocity is governed by a diffusion-type equation that is asymptotic-preserving into the Navier-Stokes limit. The efficiency of this new scheme is assessed by calculating the eigenvalue of the iteration, as well as solving for Poiseuille and thermal transpiration flows. We find that the fastest convergence of our synthetic scheme for the linearized Boltzmann equation is achieved when Nδ is close to the average collision frequency. The synthetic iterative scheme is significantly faster than the conventional iterative scheme in both the transition and the near-continuum gas flow regimes. Moreover, due to its asymptotic-preserving properties, the synthetic iterative scheme does not need high spatial resolution in the near-continuum flow regime, which makes it even faster than the conventional iterative scheme. Using this synthetic scheme, with the fast spectral approximation of the linearized Boltzmann collision operator, Poiseuille and thermal transpiration flows between two parallel plates, through channels of circular/rectangular cross sections and various porous media are calculated over the whole range of gas rarefaction. Finally, the flow of a Ne-Ar gas mixture is solved based on the linearized Boltzmann equation with the Lennard-Jones intermolecular potential for the first time, and the difference

  8. Efficient numerical methods for the large-scale, parallel solution of elastoplastic contact problems

    KAUST Repository

    Frohne, Jö rg; Heister, Timo; Bangerth, Wolfgang

    2015-01-01

    © 2016 John Wiley & Sons, Ltd. Quasi-static elastoplastic contact problems are ubiquitous in many industrial processes and other contexts, and their numerical simulation is consequently of great interest in accurately describing and optimizing production processes. The key component in these simulations is the solution of a single load step of a time iteration. From a mathematical perspective, the problems to be solved in each time step are characterized by the difficulties of variational inequalities for both the plastic behavior and the contact problem. Computationally, they also often lead to very large problems. In this paper, we present and evaluate a complete set of methods that are (1) designed to work well together and (2) allow for the efficient solution of such problems. In particular, we use adaptive finite element meshes with linear and quadratic elements, a Newton linearization of the plasticity, active set methods for the contact problem, and multigrid-preconditioned linear solvers. Through a sequence of numerical experiments, we show the performance of these methods. This includes highly accurate solutions of a three-dimensional benchmark problem and scaling our methods in parallel to 1024 cores and more than a billion unknowns.

  9. Efficient numerical methods for the large-scale, parallel solution of elastoplastic contact problems

    KAUST Repository

    Frohne, Jörg

    2015-08-06

    © 2016 John Wiley & Sons, Ltd. Quasi-static elastoplastic contact problems are ubiquitous in many industrial processes and other contexts, and their numerical simulation is consequently of great interest in accurately describing and optimizing production processes. The key component in these simulations is the solution of a single load step of a time iteration. From a mathematical perspective, the problems to be solved in each time step are characterized by the difficulties of variational inequalities for both the plastic behavior and the contact problem. Computationally, they also often lead to very large problems. In this paper, we present and evaluate a complete set of methods that are (1) designed to work well together and (2) allow for the efficient solution of such problems. In particular, we use adaptive finite element meshes with linear and quadratic elements, a Newton linearization of the plasticity, active set methods for the contact problem, and multigrid-preconditioned linear solvers. Through a sequence of numerical experiments, we show the performance of these methods. This includes highly accurate solutions of a three-dimensional benchmark problem and scaling our methods in parallel to 1024 cores and more than a billion unknowns.

  10. GPU Parallel Bundle Block Adjustment

    Directory of Open Access Journals (Sweden)

    ZHENG Maoteng

    2017-09-01

    Full Text Available To deal with massive data in photogrammetry, we introduce the GPU parallel computing technology. The preconditioned conjugate gradient and inexact Newton method are also applied to decrease the iteration times while solving the normal equation. A brand new workflow of bundle adjustment is developed to utilize GPU parallel computing technology. Our method can avoid the storage and inversion of the big normal matrix, and compute the normal matrix in real time. The proposed method can not only largely decrease the memory requirement of normal matrix, but also largely improve the efficiency of bundle adjustment. It also achieves the same accuracy as the conventional method. Preliminary experiment results show that the bundle adjustment of a dataset with about 4500 images and 9 million image points can be done in only 1.5 minutes while achieving sub-pixel accuracy.

  11. Efficient Parallel Statistical Model Checking of Biochemical Networks

    Directory of Open Access Journals (Sweden)

    Paolo Ballarini

    2009-12-01

    Full Text Available We consider the problem of verifying stochastic models of biochemical networks against behavioral properties expressed in temporal logic terms. Exact probabilistic verification approaches such as, for example, CSL/PCTL model checking, are undermined by a huge computational demand which rule them out for most real case studies. Less demanding approaches, such as statistical model checking, estimate the likelihood that a property is satisfied by sampling executions out of the stochastic model. We propose a methodology for efficiently estimating the likelihood that a LTL property P holds of a stochastic model of a biochemical network. As with other statistical verification techniques, the methodology we propose uses a stochastic simulation algorithm for generating execution samples, however there are three key aspects that improve the efficiency: first, the sample generation is driven by on-the-fly verification of P which results in optimal overall simulation time. Second, the confidence interval estimation for the probability of P to hold is based on an efficient variant of the Wilson method which ensures a faster convergence. Third, the whole methodology is designed according to a parallel fashion and a prototype software tool has been implemented that performs the sampling/verification process in parallel over an HPC architecture.

  12. An Expert System for the Development of Efficient Parallel Code

    Science.gov (United States)

    Jost, Gabriele; Chun, Robert; Jin, Hao-Qiang; Labarta, Jesus; Gimenez, Judit

    2004-01-01

    We have built the prototype of an expert system to assist the user in the development of efficient parallel code. The system was integrated into the parallel programming environment that is currently being developed at NASA Ames. The expert system interfaces to tools for automatic parallelization and performance analysis. It uses static program structure information and performance data in order to automatically determine causes of poor performance and to make suggestions for improvements. In this paper we give an overview of our programming environment, describe the prototype implementation of our expert system, and demonstrate its usefulness with several case studies.

  13. Efficient high-precision matrix algebra on parallel architectures for nonlinear combinatorial optimization

    KAUST Repository

    Gunnels, John; Lee, Jon; Margulies, Susan

    2010-01-01

    We provide a first demonstration of the idea that matrix-based algorithms for nonlinear combinatorial optimization problems can be efficiently implemented. Such algorithms were mainly conceived by theoretical computer scientists for proving efficiency. We are able to demonstrate the practicality of our approach by developing an implementation on a massively parallel architecture, and exploiting scalable and efficient parallel implementations of algorithms for ultra high-precision linear algebra. Additionally, we have delineated and implemented the necessary algorithmic and coding changes required in order to address problems several orders of magnitude larger, dealing with the limits of scalability from memory footprint, computational efficiency, reliability, and interconnect perspectives. © Springer and Mathematical Programming Society 2010.

  14. Efficient high-precision matrix algebra on parallel architectures for nonlinear combinatorial optimization

    KAUST Repository

    Gunnels, John

    2010-06-01

    We provide a first demonstration of the idea that matrix-based algorithms for nonlinear combinatorial optimization problems can be efficiently implemented. Such algorithms were mainly conceived by theoretical computer scientists for proving efficiency. We are able to demonstrate the practicality of our approach by developing an implementation on a massively parallel architecture, and exploiting scalable and efficient parallel implementations of algorithms for ultra high-precision linear algebra. Additionally, we have delineated and implemented the necessary algorithmic and coding changes required in order to address problems several orders of magnitude larger, dealing with the limits of scalability from memory footprint, computational efficiency, reliability, and interconnect perspectives. © Springer and Mathematical Programming Society 2010.

  15. Leveraging Cloud Heterogeneity for Cost-Efficient Execution of Parallel Applications

    OpenAIRE

    Roloff, Eduardo; Diener, Matthias; Diaz Carreño, Emmanuell; Gaspary, Luciano Paschoal; Navaux, Philippe O.A.

    2017-01-01

    Public cloud providers offer a wide range of instance types, with different processing and interconnection speeds, as well as varying prices. Furthermore, the tasks of many parallel applications show different computational demands due to load imbalance. These differences can be exploited for improving the cost efficiency of parallel applications in many cloud environments by matching application requirements to instance types. In this paper, we introduce the concept of heterogeneous cloud sy...

  16. Fourier analysis of parallel block-Jacobi splitting with transport synthetic acceleration in two-dimensional geometry

    International Nuclear Information System (INIS)

    Rosa, M.; Warsa, J. S.; Chang, J. H.

    2007-01-01

    A Fourier analysis is conducted in two-dimensional (2D) Cartesian geometry for the discrete-ordinates (SN) approximation of the neutron transport problem solved with Richardson iteration (Source Iteration) and Richardson iteration preconditioned with Transport Synthetic Acceleration (TSA), using the Parallel Block-Jacobi (PBJ) algorithm. The results for the un-accelerated algorithm show that convergence of PBJ can degrade, leading in particular to stagnation of GMRES(m) in problems containing optically thin sub-domains. The results for the accelerated algorithm indicate that TSA can be used to efficiently precondition an iterative method in the optically thin case when implemented in the 'modified' version MTSA, in which only the scattering in the low order equations is reduced by some non-negative factor β<1. (authors)

  17. On efficiency of fire simulation realization: parallelization with greater number of computational meshes

    Science.gov (United States)

    Valasek, Lukas; Glasa, Jan

    2017-12-01

    Current fire simulation systems are capable to utilize advantages of high-performance computer (HPC) platforms available and to model fires efficiently in parallel. In this paper, efficiency of a corridor fire simulation on a HPC computer cluster is discussed. The parallel MPI version of Fire Dynamics Simulator is used for testing efficiency of selected strategies of allocation of computational resources of the cluster using a greater number of computational cores. Simulation results indicate that if the number of cores used is not equal to a multiple of the total number of cluster node cores there are allocation strategies which provide more efficient calculations.

  18. Computationally efficient implementation of combustion chemistry in parallel PDF calculations

    International Nuclear Information System (INIS)

    Lu Liuyan; Lantz, Steven R.; Ren Zhuyin; Pope, Stephen B.

    2009-01-01

    In parallel calculations of combustion processes with realistic chemistry, the serial in situ adaptive tabulation (ISAT) algorithm [S.B. Pope, Computationally efficient implementation of combustion chemistry using in situ adaptive tabulation, Combustion Theory and Modelling, 1 (1997) 41-63; L. Lu, S.B. Pope, An improved algorithm for in situ adaptive tabulation, Journal of Computational Physics 228 (2009) 361-386] substantially speeds up the chemistry calculations on each processor. To improve the parallel efficiency of large ensembles of such calculations in parallel computations, in this work, the ISAT algorithm is extended to the multi-processor environment, with the aim of minimizing the wall clock time required for the whole ensemble. Parallel ISAT strategies are developed by combining the existing serial ISAT algorithm with different distribution strategies, namely purely local processing (PLP), uniformly random distribution (URAN), and preferential distribution (PREF). The distribution strategies enable the queued load redistribution of chemistry calculations among processors using message passing. They are implemented in the software x2f m pi, which is a Fortran 95 library for facilitating many parallel evaluations of a general vector function. The relative performance of the parallel ISAT strategies is investigated in different computational regimes via the PDF calculations of multiple partially stirred reactors burning methane/air mixtures. The results show that the performance of ISAT with a fixed distribution strategy strongly depends on certain computational regimes, based on how much memory is available and how much overlap exists between tabulated information on different processors. No one fixed strategy consistently achieves good performance in all the regimes. Therefore, an adaptive distribution strategy, which blends PLP, URAN and PREF, is devised and implemented. It yields consistently good performance in all regimes. In the adaptive parallel

  19. Fast iterative censoring CFAR algorithm for ship detection from SAR images

    Science.gov (United States)

    Gu, Dandan; Yue, Hui; Zhang, Yuan; Gao, Pengcheng

    2017-11-01

    Ship detection is one of the essential techniques for ship recognition from synthetic aperture radar (SAR) images. This paper presents a fast iterative detection procedure to eliminate the influence of target returns on the estimation of local sea clutter distributions for constant false alarm rate (CFAR) detectors. A fast block detector is first employed to extract potential target sub-images; and then, an iterative censoring CFAR algorithm is used to detect ship candidates from each target blocks adaptively and efficiently, where parallel detection is available, and statistical parameters of G0 distribution fitting local sea clutter well can be quickly estimated based on an integral image operator. Experimental results of TerraSAR-X images demonstrate the effectiveness of the proposed technique.

  20. Parallelization of pressure equation solver for incompressible N-S equations

    International Nuclear Information System (INIS)

    Ichihara, Kiyoshi; Yokokawa, Mitsuo; Kaburaki, Hideo.

    1996-03-01

    A pressure equation solver in a code for 3-dimensional incompressible flow analysis has been parallelized by using red-black SOR method and PCG method on Fujitsu VPP500, a vector parallel computer with distributed memory. For the comparison of scalability, the solver using the red-black SOR method has been also parallelized on the Intel Paragon, a scalar parallel computer with a distributed memory. The scalability of the red-black SOR method on both VPP500 and Paragon was lost, when number of processor elements was increased. The reason of non-scalability on both systems is increasing communication time between processor elements. In addition, the parallelization by DO-loop division makes the vectorizing efficiency lower on VPP500. For an effective implementation on VPP500, a large scale problem which holds very long vectorized DO-loops in the parallel program should be solved. PCG method with red-black SOR method applied to incomplete LU factorization (red-black PCG) has more iteration steps than normal PCG method with forward and backward substitution, in spite of same number of the floating point operations in a DO-loop of incomplete LU factorization. The parallelized red-black PCG method has less merits than the parallelized red-black SOR method when the computational region has fewer grids, because the low vectorization efficiency is obtained in red-black PCG method. (author)

  1. High-Efficient Parallel CAVLC Encoders on Heterogeneous Multicore Architectures

    Directory of Open Access Journals (Sweden)

    H. Y. Su

    2012-04-01

    Full Text Available This article presents two high-efficient parallel realizations of the context-based adaptive variable length coding (CAVLC based on heterogeneous multicore processors. By optimizing the architecture of the CAVLC encoder, three kinds of dependences are eliminated or weaken, including the context-based data dependence, the memory accessing dependence and the control dependence. The CAVLC pipeline is divided into three stages: two scans, coding, and lag packing, and be implemented on two typical heterogeneous multicore architectures. One is a block-based SIMD parallel CAVLC encoder on multicore stream processor STORM. The other is a component-oriented SIMT parallel encoder on massively parallel architecture GPU. Both of them exploited rich data-level parallelism. Experiments results show that compared with the CPU version, more than 70 times of speedup can be obtained for STORM and over 50 times for GPU. The implementation of encoder on STORM can make a real-time processing for 1080p @30fps and GPU-based version can satisfy the requirements for 720p real-time encoding. The throughput of the presented CAVLC encoders is more than 10 times higher than that of published software encoders on DSP and multicore platforms.

  2. Development Of A Parallel Performance Model For The THOR Neutral Particle Transport Code

    Energy Technology Data Exchange (ETDEWEB)

    Yessayan, Raffi; Azmy, Yousry; Schunert, Sebastian

    2017-02-01

    The THOR neutral particle transport code enables simulation of complex geometries for various problems from reactor simulations to nuclear non-proliferation. It is undergoing a thorough V&V requiring computational efficiency. This has motivated various improvements including angular parallelization, outer iteration acceleration, and development of peripheral tools. For guiding future improvements to the code’s efficiency, better characterization of its parallel performance is useful. A parallel performance model (PPM) can be used to evaluate the benefits of modifications and to identify performance bottlenecks. Using INL’s Falcon HPC, the PPM development incorporates an evaluation of network communication behavior over heterogeneous links and a functional characterization of the per-cell/angle/group runtime of each major code component. After evaluating several possible sources of variability, this resulted in a communication model and a parallel portion model. The former’s accuracy is bounded by the variability of communication on Falcon while the latter has an error on the order of 1%.

  3. Improving computational efficiency of Monte Carlo simulations with variance reduction

    International Nuclear Information System (INIS)

    Turner, A.; Davis, A.

    2013-01-01

    CCFE perform Monte-Carlo transport simulations on large and complex tokamak models such as ITER. Such simulations are challenging since streaming and deep penetration effects are equally important. In order to make such simulations tractable, both variance reduction (VR) techniques and parallel computing are used. It has been found that the application of VR techniques in such models significantly reduces the efficiency of parallel computation due to 'long histories'. VR in MCNP can be accomplished using energy-dependent weight windows. The weight window represents an 'average behaviour' of particles, and large deviations in the arriving weight of a particle give rise to extreme amounts of splitting being performed and a long history. When running on parallel clusters, a long history can have a detrimental effect on the parallel efficiency - if one process is computing the long history, the other CPUs complete their batch of histories and wait idle. Furthermore some long histories have been found to be effectively intractable. To combat this effect, CCFE has developed an adaptation of MCNP which dynamically adjusts the WW where a large weight deviation is encountered. The method effectively 'de-optimises' the WW, reducing the VR performance but this is offset by a significant increase in parallel efficiency. Testing with a simple geometry has shown the method does not bias the result. This 'long history method' has enabled CCFE to significantly improve the performance of MCNP calculations for ITER on parallel clusters, and will be beneficial for any geometry combining streaming and deep penetration effects. (authors)

  4. Existence test for asynchronous interval iterations

    DEFF Research Database (Denmark)

    Madsen, Kaj; Caprani, O.; Stauning, Ole

    1997-01-01

    In the search for regions that contain fixed points ofa real function of several variables, tests based on interval calculationscan be used to establish existence ornon-existence of fixed points in regions that are examined in the course ofthe search. The search can e.g. be performed...... as a synchronous (sequential) interval iteration:In each iteration step all components of the iterate are calculatedbased on the previous iterate. In this case it is straight forward to base simple interval existence and non-existencetests on the calculations done in each step of the iteration. The search can also...... on thecomponentwise calculations done in the course of the iteration. These componentwisetests are useful for parallel implementation of the search, sincethe tests can then be performed local to each processor and only when a test issuccessful do a processor communicate this result to other processors....

  5. Gauss-Seidel Iterative Method as a Real-Time Pile-Up Solver of Scintillation Pulses

    Science.gov (United States)

    Novak, Roman; Vencelj, Matja¿

    2009-12-01

    The pile-up rejection in nuclear spectroscopy has been confronted recently by several pile-up correction schemes that compensate for distortions of the signal and subsequent energy spectra artifacts as the counting rate increases. We study here a real-time capability of the event-by-event correction method, which at the core translates to solving many sets of linear equations. Tight time limits and constrained front-end electronics resources make well-known direct solvers inappropriate. We propose a novel approach based on the Gauss-Seidel iterative method, which turns out to be a stable and cost-efficient solution to improve spectroscopic resolution in the front-end electronics. We show the method convergence properties for a class of matrices that emerge in calorimetric processing of scintillation detector signals and demonstrate the ability of the method to support the relevant resolutions. The sole iteration-based error component can be brought below the sliding window induced errors in a reasonable number of iteration steps, thus allowing real-time operation. An area-efficient hardware implementation is proposed that fully utilizes the method's inherent parallelism.

  6. Efficient Out of Core Sorting Algorithms for the Parallel Disks Model.

    Science.gov (United States)

    Kundeti, Vamsi; Rajasekaran, Sanguthevar

    2011-11-01

    In this paper we present efficient algorithms for sorting on the Parallel Disks Model (PDM). Numerous asymptotically optimal algorithms have been proposed in the literature. However many of these merge based algorithms have large underlying constants in the time bounds, because they suffer from the lack of read parallelism on PDM. The irregular consumption of the runs during the merge affects the read parallelism and contributes to the increased sorting time. In this paper we first introduce a novel idea called the dirty sequence accumulation that improves the read parallelism. Secondly, we show analytically that this idea can reduce the number of parallel I/O's required to sort the input close to the lower bound of [Formula: see text]. We experimentally verify our dirty sequence idea with the standard R-Way merge and show that our idea can reduce the number of parallel I/Os to sort on PDM significantly.

  7. Parallel state transfer and efficient quantum routing on quantum networks.

    Science.gov (United States)

    Chudzicki, Christopher; Strauch, Frederick W

    2010-12-31

    We study the routing of quantum information in parallel on multidimensional networks of tunable qubits and oscillators. These theoretical models are inspired by recent experiments in superconducting circuits. We show that perfect parallel state transfer is possible for certain networks of harmonic oscillator modes. We extend this to the distribution of entanglement between every pair of nodes in the network, finding that the routing efficiency of hypercube networks is optimal and robust in the presence of dissipation and finite bandwidth.

  8. An efficient parallel algorithm for the calculation of canonical MP2 energies.

    Science.gov (United States)

    Baker, Jon; Pulay, Peter

    2002-09-01

    We present the parallel version of a previous serial algorithm for the efficient calculation of canonical MP2 energies (Pulay, P.; Saebo, S.; Wolinski, K. Chem Phys Lett 2001, 344, 543). It is based on the Saebo-Almlöf direct-integral transformation, coupled with an efficient prescreening of the AO integrals. The parallel algorithm avoids synchronization delays by spawning a second set of slaves during the bin-sort prior to the second half-transformation. Results are presented for systems with up to 2000 basis functions. MP2 energies for molecules with 400-500 basis functions can be routinely calculated to microhartree accuracy on a small number of processors (6-8) in a matter of minutes with modern PC-based parallel computers. Copyright 2002 Wiley Periodicals, Inc. J Comput Chem 23: 1150-1156, 2002

  9. Recommendations for a cryogenic system for ITER [International Thermonuclear Experimental Reactor

    International Nuclear Information System (INIS)

    Slack, D.S.

    1989-01-01

    The International Thermonuclear Experimental Reactor (ITER) is a new tokamak design project with joint participation from Japan, the European Community, the Soviet Union, and the United States. ITER will be a large machine requiring up to 100 kW of refrigeration at 4.5 K to cool its superconducting magnets. Unlike earlier fusion experiments, the ITER cryogenic system must handle pulse loads constituting a large percentage of the total load. These come from neutron heating during a fusion burn and from ac losses during ramping of current in the PF (poloidal field) coils. This paper presents a conceptual design for a cryogenic system that meets ITER requirements. It describes a system with the following features: Only time-proven components are used. The system obtains a high efficiency without use of cold pumps or other developmental components. High reliability is achieved by paralleling compressors and expanders and by using adequate isolation valving. The problem of load fluctuations is solved by a simple load-leveling device. The cryogenic system can be housed in a separate building located at a considerable distance from the ITER core, if desired. The paper also summarizes physical plant size, cost estimates, and means of handling vented helium during magnet quench. 4 refs., 4 figs., 3 tabs

  10. Development and test of the ITER conductor joints

    Energy Technology Data Exchange (ETDEWEB)

    Martovetsky, N., LLNL

    1998-05-14

    Joints for the ITER superconducting Central Solenoid should perform in rapidly varying magnetic field with low losses and low DC resistance. This paper describes the design of the ITER joint and presents its assembly process. Two joints were built and tested at the PTF facility at MIT. Test results are presented, losses in transverse and parallel field and the DC performance are discussed. The developed joint demonstrates sufficient margin for baseline ITER operating scenarios.

  11. Decomposition based parallel processing technique for efficient collaborative optimization

    International Nuclear Information System (INIS)

    Park, Hyung Wook; Kim, Sung Chan; Kim, Min Soo; Choi, Dong Hoon

    2000-01-01

    In practical design studies, most of designers solve multidisciplinary problems with complex design structure. These multidisciplinary problems have hundreds of analysis and thousands of variables. The sequence of process to solve these problems affects the speed of total design cycle. Thus it is very important for designer to reorder original design processes to minimize total cost and time. This is accomplished by decomposing large multidisciplinary problem into several MultiDisciplinary Analysis SubSystem (MDASS) and processing it in parallel. This paper proposes new strategy for parallel decomposition of multidisciplinary problem to raise design efficiency by using genetic algorithm and shows the relationship between decomposition and Multidisciplinary Design Optimization(MDO) methodology

  12. LHCD and coupling experiments with an ITER-like PAM launcher on the FTU tokamak

    International Nuclear Information System (INIS)

    Pericoli Ridolfini, V.; Apicella, M.L.; Barbato, E.; Buratti, P.; Calabro, G.; Cardinali, A.; Mirizzi, F.; Panaccione, L.; Podda, S.; Tuccillo, A.A.; Bibet, Ph.; Granucci, G.; Sozzi, C.

    2005-01-01

    Successful experimental tests on a PAM (passive active multijunction) prototype antenna for the Lower Hybrid (LH) waves similar to that foreseen for ITER have been carried out on FTU. The power level routinely achieved without any fault in the transmission lines for the maximum time allowed by the LH power plant, i.e. 0.9 s, is 250 kW versus a design value of 270. It corresponds to 50 MW/m 2 through the ITER antenna active area if it is scaled for the different LH frequencies (5 GHz in ITER, 8 GHz in FTU) and it is more than 1.4 times the goal of the ITER design (33 MW/m 2 ). The test results validate the main features indicated by the simulation codes, concerning the power handling, the coupling and the launched N parallel spectrum. The power reflection coefficient R c is always ≤ 2.5%, once the PAM launcher has been properly conditioned, even with the grill mouth retracted 2 mm inside the port shadow, with density in front of the launcher very close or even lower than the cut-off value. The current drive efficiency is comparable to a conventional grill in similar conditions, once the lower directivity is taken into account. The flexibility in the N parallel spectrum is confirmed by the HXR and ECE spectra. Conditioning the PAM to operate at the ITER equivalent power level has required only one day of RF operation, without a previous baking of the waveguides. (author)

  13. NUFFT-Based Iterative Image Reconstruction via Alternating Direction Total Variation Minimization for Sparse-View CT

    Directory of Open Access Journals (Sweden)

    Bin Yan

    2015-01-01

    Full Text Available Sparse-view imaging is a promising scanning method which can reduce the radiation dose in X-ray computed tomography (CT. Reconstruction algorithm for sparse-view imaging system is of significant importance. The adoption of the spatial iterative algorithm for CT image reconstruction has a low operation efficiency and high computation requirement. A novel Fourier-based iterative reconstruction technique that utilizes nonuniform fast Fourier transform is presented in this study along with the advanced total variation (TV regularization for sparse-view CT. Combined with the alternating direction method, the proposed approach shows excellent efficiency and rapid convergence property. Numerical simulations and real data experiments are performed on a parallel beam CT. Experimental results validate that the proposed method has higher computational efficiency and better reconstruction quality than the conventional algorithms, such as simultaneous algebraic reconstruction technique using TV method and the alternating direction total variation minimization approach, with the same time duration. The proposed method appears to have extensive applications in X-ray CT imaging.

  14. Solution of the within-group multidimensional discrete ordinates transport equations on massively parallel architectures

    Science.gov (United States)

    Zerr, Robert Joseph

    2011-12-01

    thousands of processors. The PGS method does outperform SI DSA for the periodic heterogeneous layers (PHL) configuration problems. Although this demonstrates a relative strength/weakness between the two methods, the practicality of these problems is much less, further limiting instances where it would be beneficial to select ITMM over SI DSA. The results strongly indicate a need for a robust, stable, and efficient acceleration method (or preconditioner for PGMRES). The spatial multigrid (SMG) method is currently incomplete in that it does not work for all cases considered and does not effectively improve the convergence rate for all values of scattering ratio c or cell dimension h. Nevertheless, it does display the desired trend for highly scattering, optically thin problems. That is, it tends to lower the rate of growth of number of iterations with increasing number of processes, P, while not increasing the number of additional operations per iteration to the extent that the total execution time of the rapidly converging accelerated iterations exceeds that of the slower unaccelerated iterations. A predictive parallel performance model has been developed for the PBJ method. Timing tests were performed such that trend lines could be fitted to the data for the different components and used to estimate the execution times. Applied to the weak scaling results, the model notably underestimates construction time, but combined with a slight overestimation in iterative solution time, the model predicts total execution time very well for large P. It also does a decent job with the strong scaling results, closely predicting the construction time and time per iteration, especially as P increases. Although not shown to be competitive up to 1,024 processing elements with the current state of the art, the parallelized ITMM exhibits promising scaling trends. Ultimately, compared to the KBA method, the parallelized ITMM may be found to be a very attractive option for transport calculations

  15. Parallelized implicit propagators for the finite-difference Schrödinger equation

    Science.gov (United States)

    Parker, Jonathan; Taylor, K. T.

    1995-08-01

    We describe the application of block Gauss-Seidel and block Jacobi iterative methods to the design of implicit propagators for finite-difference models of the time-dependent Schrödinger equation. The block-wise iterative methods discussed here are mixed direct-iterative methods for solving simultaneous equations, in the sense that direct methods (e.g. LU decomposition) are used to invert certain block sub-matrices, and iterative methods are used to complete the solution. We describe parallel variants of the basic algorithm that are well suited to the medium- to coarse-grained parallelism of work-station clusters, and MIMD supercomputers, and we show that under a wide range of conditions, fine-grained parallelism of the computation can be achieved. Numerical tests are conducted on a typical one-electron atom Hamiltonian. The methods converge robustly to machine precision (15 significant figures), in some cases in as few as 6 or 7 iterations. The rate of convergence is nearly independent of the finite-difference grid-point separations.

  16. Parallel GPU implementation of iterative PCA algorithms.

    Science.gov (United States)

    Andrecut, M

    2009-11-01

    Principal component analysis (PCA) is a key statistical technique for multivariate data analysis. For large data sets, the common approach to PCA computation is based on the standard NIPALS-PCA algorithm, which unfortunately suffers from loss of orthogonality, and therefore its applicability is usually limited to the estimation of the first few components. Here we present an algorithm based on Gram-Schmidt orthogonalization (called GS-PCA), which eliminates this shortcoming of NIPALS-PCA. Also, we discuss the GPU (Graphics Processing Unit) parallel implementation of both NIPALS-PCA and GS-PCA algorithms. The numerical results show that the GPU parallel optimized versions, based on CUBLAS (NVIDIA), are substantially faster (up to 12 times) than the CPU optimized versions based on CBLAS (GNU Scientific Library).

  17. Parallel assembling and equation solving via graph algorithms with an application to the FE simulation of metal extrusion processes

    CERN Document Server

    Unterkircher, A

    2005-01-01

    We propose methods for parallel assembling and iterative equation solving based on graph algorithms. The assembling technique is independent of dimension, element type and model shape. As a parallel solving technique we construct a multiplicative symmetric Schwarz preconditioner for the conjugate gradient method. Both methods have been incorporated into a non-linear FE code to simulate 3D metal extrusion processes. We illustrate the efficiency of these methods on shared memory computers by realistic examples.

  18. ITER council proceedings: 1997

    International Nuclear Information System (INIS)

    1997-01-01

    This volume of the ITER EDA Documentation Series presents records of the 12th ITER Council Meeting, IC-12, which took place on 23-24 July, 1997 in Tampere, Finland. The Council received from the Parties (EU, Japan, Russia, US) positive responses on the Detailed Design Report. The Parties stated their willingness to contribute to fulfil their obligations in contributing to the ITER EDA. The summary discussions among the Parties led to the consensus that in July 1998 the ITER activities should proceed for additional three years with a general intent to enable an efficient start of possible, future ITER construction

  19. Parallel processing based decomposition technique for efficient collaborative optimization

    International Nuclear Information System (INIS)

    Park, Hyung Wook; Kim, Sung Chan; Kim, Min Soo; Choi, Dong Hoon

    2001-01-01

    In practical design studies, most of designers solve multidisciplinary problems with large sized and complex design system. These multidisciplinary problems have hundreds of analysis and thousands of variables. The sequence of process to solve these problems affects the speed of total design cycle. Thus it is very important for designer to reorder the original design processes to minimize total computational cost. This is accomplished by decomposing large multidisciplinary problem into several MultiDisciplinary Analysis SubSystem (MDASS) and processing it in parallel. This paper proposes new strategy for parallel decomposition of multidisciplinary problem to raise design efficiency by using genetic algorithm and shows the relationship between decomposition and Multidisciplinary Design Optimization(MDO) methodology

  20. Distributed Parallel Endmember Extraction of Hyperspectral Data Based on Spark

    Directory of Open Access Journals (Sweden)

    Zebin Wu

    2016-01-01

    Full Text Available Due to the increasing dimensionality and volume of remotely sensed hyperspectral data, the development of acceleration techniques for massive hyperspectral image analysis approaches is a very important challenge. Cloud computing offers many possibilities of distributed processing of hyperspectral datasets. This paper proposes a novel distributed parallel endmember extraction method based on iterative error analysis that utilizes cloud computing principles to efficiently process massive hyperspectral data. The proposed method takes advantage of technologies including MapReduce programming model, Hadoop Distributed File System (HDFS, and Apache Spark to realize distributed parallel implementation for hyperspectral endmember extraction, which significantly accelerates the computation of hyperspectral processing and provides high throughput access to large hyperspectral data. The experimental results, which are obtained by extracting endmembers of hyperspectral datasets on a cloud computing platform built on a cluster, demonstrate the effectiveness and computational efficiency of the proposed method.

  1. Iterative Decoding of Concatenated Codes: A Tutorial

    Directory of Open Access Journals (Sweden)

    Phillip A. Regalia

    2005-05-01

    Full Text Available The turbo decoding algorithm of a decade ago constituted a milestone in error-correction coding for digital communications, and has inspired extensions to generalized receiver topologies, including turbo equalization, turbo synchronization, and turbo CDMA, among others. Despite an accrued understanding of iterative decoding over the years, the “turbo principle” remains elusive to master analytically, thereby inciting interest from researchers outside the communications domain. In this spirit, we develop a tutorial presentation of iterative decoding for parallel and serial concatenated codes, in terms hopefully accessible to a broader audience. We motivate iterative decoding as a computationally tractable attempt to approach maximum-likelihood decoding, and characterize fixed points in terms of a “consensus” property between constituent decoders. We review how the decoding algorithm for both parallel and serial concatenated codes coincides with an alternating projection algorithm, which allows one to identify conditions under which the algorithm indeed converges to a maximum-likelihood solution, in terms of particular likelihood functions factoring into the product of their marginals. The presentation emphasizes a common framework applicable to both parallel and serial concatenated codes.

  2. Parallel Implementation of the Recursive Approximation of an Unsupervised Hierarchical Segmentation Algorithm. Chapter 5

    Science.gov (United States)

    Tilton, James C.; Plaza, Antonio J. (Editor); Chang, Chein-I. (Editor)

    2008-01-01

    The hierarchical image segmentation algorithm (referred to as HSEG) is a hybrid of hierarchical step-wise optimization (HSWO) and constrained spectral clustering that produces a hierarchical set of image segmentations. HSWO is an iterative approach to region grooving segmentation in which the optimal image segmentation is found at N(sub R) regions, given a segmentation at N(sub R+1) regions. HSEG's addition of constrained spectral clustering makes it a computationally intensive algorithm, for all but, the smallest of images. To counteract this, a computationally efficient recursive approximation of HSEG (called RHSEG) has been devised. Further improvements in processing speed are obtained through a parallel implementation of RHSEG. This chapter describes this parallel implementation and demonstrates its computational efficiency on a Landsat Thematic Mapper test scene.

  3. Locality-Driven Parallel Static Analysis for Power Delivery Networks

    KAUST Repository

    Zeng, Zhiyu

    2011-06-01

    Large VLSI on-chip Power Delivery Networks (PDNs) are challenging to analyze due to the sheer network complexity. In this article, a novel parallel partitioning-based PDN analysis approach is presented. We use the boundary circuit responses of each partition to divide the full grid simulation problem into a set of independent subgrid simulation problems. Instead of solving exact boundary circuit responses, a more efficient scheme is proposed to provide near-exact approximation to the boundary circuit responses by exploiting the spatial locality of the flip-chip-type power grids. This scheme is also used in a block-based iterative error reduction process to achieve fast convergence. Detailed computational cost analysis and performance modeling is carried out to determine the optimal (or near-optimal) number of partitions for parallel implementation. Through the analysis of several large power grids, the proposed approach is shown to have excellent parallel efficiency, fast convergence, and favorable scalability. Our approach can solve a 16-million-node power grid in 18 seconds on an IBM p5-575 processing node with 16 Power5+ processors, which is 18.8X faster than a state-of-the-art direct solver. © 2011 ACM.

  4. Efficient Parallel Algorithm For Direct Numerical Simulation of Turbulent Flows

    Science.gov (United States)

    Moitra, Stuti; Gatski, Thomas B.

    1997-01-01

    A distributed algorithm for a high-order-accurate finite-difference approach to the direct numerical simulation (DNS) of transition and turbulence in compressible flows is described. This work has two major objectives. The first objective is to demonstrate that parallel and distributed-memory machines can be successfully and efficiently used to solve computationally intensive and input/output intensive algorithms of the DNS class. The second objective is to show that the computational complexity involved in solving the tridiagonal systems inherent in the DNS algorithm can be reduced by algorithm innovations that obviate the need to use a parallelized tridiagonal solver.

  5. Power Efficient Design of Parallel/Serial FIR Filters in RNS

    DEFF Research Database (Denmark)

    Petricca, Massimo; Albicocco, Pietro; Cardarilli, Gian Carlo

    2012-01-01

    It is well known that the Residue Number System (RNS) provides an efficient implementation of parallel FIR filters especially when the filter order and the dynamic range are high. The two main drawbacks of RNS, need of converters and coding overhead, make a serialized implementation of the FIR...

  6. Energy-Efficient FPGA-Based Parallel Quasi-Stochastic Computing

    Directory of Open Access Journals (Sweden)

    Ramu Seva

    2017-11-01

    Full Text Available The high performance of FPGA (Field Programmable Gate Array in image processing applications is justified by its flexible reconfigurability, its inherent parallel nature and the availability of a large amount of internal memories. Lately, the Stochastic Computing (SC paradigm has been found to be significantly advantageous in certain application domains including image processing because of its lower hardware complexity and power consumption. However, its viability is deemed to be limited due to its serial bitstream processing and excessive run-time requirement for convergence. To address these issues, a novel approach is proposed in this work where an energy-efficient implementation of SC is accomplished by introducing fast-converging Quasi-Stochastic Number Generators (QSNGs and parallel stochastic bitstream processing, which are well suited to leverage FPGA’s reconfigurability and abundant internal memory resources. The proposed approach has been tested on the Virtex-4 FPGA, and results have been compared with the serial and parallel implementations of conventional stochastic computation using the well-known SC edge detection and multiplication circuits. Results prove that by using this approach, execution time, as well as the power consumption are decreased by a factor of 3.5 and 4.5 for the edge detection circuit and multiplication circuit, respectively.

  7. Parallel Newton-Krylov-Schwarz algorithms for the transonic full potential equation

    Science.gov (United States)

    Cai, Xiao-Chuan; Gropp, William D.; Keyes, David E.; Melvin, Robin G.; Young, David P.

    1996-01-01

    We study parallel two-level overlapping Schwarz algorithms for solving nonlinear finite element problems, in particular, for the full potential equation of aerodynamics discretized in two dimensions with bilinear elements. The overall algorithm, Newton-Krylov-Schwarz (NKS), employs an inexact finite-difference Newton method and a Krylov space iterative method, with a two-level overlapping Schwarz method as a preconditioner. We demonstrate that NKS, combined with a density upwinding continuation strategy for problems with weak shocks, is robust and, economical for this class of mixed elliptic-hyperbolic nonlinear partial differential equations, with proper specification of several parameters. We study upwinding parameters, inner convergence tolerance, coarse grid density, subdomain overlap, and the level of fill-in in the incomplete factorization, and report their effect on numerical convergence rate, overall execution time, and parallel efficiency on a distributed-memory parallel computer.

  8. Cell verification of parallel burnup calculation program MCBMPI based on MPI

    International Nuclear Information System (INIS)

    Yang Wankui; Liu Yaoguang; Ma Jimin; Wang Guanbo; Yang Xin; She Ding

    2014-01-01

    The parallel burnup calculation program MCBMPI was developed. The program was modularized. The parallel MCNP5 program MCNP5MPI was employed as neutron transport calculation module. And a composite of three solution methods was used to solve burnup equation, i.e. matrix exponential technique, TTA analytical solution, and Gauss Seidel iteration. MPI parallel zone decomposition strategy was concluded in the program. The program system only consists of MCNP5MPI and burnup subroutine. The latter achieves three main functions, i.e. zone decomposition, nuclide transferring and decaying, and data exchanging with MCNP5MPI. Also, the program was verified with the pressurized water reactor (PWR) cell burnup benchmark. The results show that it,s capable to apply the program to burnup calculation of multiple zones, and the computation efficiency could be significantly improved with the development of computer hardware. (authors)

  9. ITER Safety and Licensing

    International Nuclear Information System (INIS)

    Girard, J-.P; Taylor, N.; Garin, P.; Uzan-Elbez, J.; GULDEN, W.; Rodriguez-Rodrigo, L.

    2006-01-01

    The site for the construction of ITER has been chosen in June 2005. The facility will be implemented in Europe, south of France close to Marseille. The generic safety scheme is now under revision to adapt the design to the host country regulation. Even though ITER will be an international organization, it will have to comply with the French requirements in the fields of public and occupational health and safety, nuclear safety, radiation protection, licensing, nuclear substances and environmental protection. The organization of the central team together with its partners organized in domestic agencies for the in-kind procurement of components is a key issue for the success of the experimentation. ITER is the first facility that will achieve sustained nuclear fusion. It is both important for the experimental one-of-a-kind device, ITER itself, and for the future of fusion power plants to well understand the key safety issues of this potential new source of energy production. The main safety concern is confinement of the tritium, activated dust in the vacuum vessel and activated corrosion products in the coolant of the plasma-facing components. This is achieved in the design through multiple confinement barriers to implement the defence in depth approach. It will be demonstrated in documents submitted to the French regulator that these barriers maintain their function in all postulated incident and accident conditions. The licensing process started by examination of the safety options. This step has been performed by Europe during the candidature phase in 2002. In parallel to the final design, and taking into account the local regulations, the Preliminary Safety Report (RPrS) will be drafted with support of the European partner and others in the framework of ITER Task Agreements. Together with the license application, the RPrS will be forwarded to the regulatory bodies, which will launch public hearings and a safety review. Both processes must succeed in order to

  10. Parallel processing of two-dimensional Sn transport calculations

    International Nuclear Information System (INIS)

    Uematsu, M.

    1997-01-01

    A parallel processing method for the two-dimensional S n transport code DOT3.5 has been developed to achieve a drastic reduction in computation time. In the proposed method, parallelization is achieved with angular domain decomposition and/or space domain decomposition. The calculational speed of parallel processing by angular domain decomposition is largely influenced by frequent communications between processing elements. To assess parallelization efficiency, sample problems with up to 32 x 32 spatial meshes were solved with a Sun workstation using the PVM message-passing library. As a result, parallel calculation using 16 processing elements, for example, was found to be nine times as fast as that with one processing element. As for parallel processing by geometry segmentation, the influence of processing element communications on computation time is small; however, discontinuity at the segment boundary degrades convergence speed. To accelerate the convergence, an alternate sweep of angular flux in conjunction with space domain decomposition and a two-step rescaling method consisting of segmentwise rescaling and ordinary pointwise rescaling have been developed. By applying the developed method, the number of iterations needed to obtain a converged flux solution was reduced by a factor of 2. As a result, parallel calculation using 16 processing elements was found to be 5.98 times as fast as the original DOT3.5 calculation

  11. An Efficient Parallel Multi-Scale Segmentation Method for Remote Sensing Imagery

    Directory of Open Access Journals (Sweden)

    Haiyan Gu

    2018-04-01

    Full Text Available Remote sensing (RS image segmentation is an essential step in geographic object-based image analysis (GEOBIA to ultimately derive “meaningful objects”. While many segmentation methods exist, most of them are not efficient for large data sets. Thus, the goal of this research is to develop an efficient parallel multi-scale segmentation method for RS imagery by combining graph theory and the fractal net evolution approach (FNEA. Specifically, a minimum spanning tree (MST algorithm in graph theory is proposed to be combined with a minimum heterogeneity rule (MHR algorithm that is used in FNEA. The MST algorithm is used for the initial segmentation while the MHR algorithm is used for object merging. An efficient implementation of the segmentation strategy is presented using data partition and the “reverse searching-forward processing” chain based on message passing interface (MPI parallel technology. Segmentation results of the proposed method using images from multiple sensors (airborne, SPECIM AISA EAGLE II, WorldView-2, RADARSAT-2 and different selected landscapes (residential/industrial, residential/agriculture covering four test sites indicated its efficiency in accuracy and speed. We conclude that the proposed method is applicable and efficient for the segmentation of a variety of RS imagery (airborne optical, satellite optical, SAR, high-spectral, while the accuracy is comparable with that of the FNEA method.

  12. Parallel particle swarm optimization algorithm in nuclear problems

    International Nuclear Information System (INIS)

    Waintraub, Marcel; Pereira, Claudio M.N.A.; Schirru, Roberto

    2009-01-01

    Particle Swarm Optimization (PSO) is a population-based metaheuristic (PBM), in which solution candidates evolve through simulation of a simplified social adaptation model. Putting together robustness, efficiency and simplicity, PSO has gained great popularity. Many successful applications of PSO are reported, in which PSO demonstrated to have advantages over other well-established PBM. However, computational costs are still a great constraint for PSO, as well as for all other PBMs, especially in optimization problems with time consuming objective functions. To overcome such difficulty, parallel computation has been used. The default advantage of parallel PSO (PPSO) is the reduction of computational time. Master-slave approaches, exploring this characteristic are the most investigated. However, much more should be expected. It is known that PSO may be improved by more elaborated neighborhood topologies. Hence, in this work, we develop several different PPSO algorithms exploring the advantages of enhanced neighborhood topologies implemented by communication strategies in multiprocessor architectures. The proposed PPSOs have been applied to two complex and time consuming nuclear engineering problems: reactor core design and fuel reload optimization. After exhaustive experiments, it has been concluded that: PPSO still improves solutions after many thousands of iterations, making prohibitive the efficient use of serial (non-parallel) PSO in such kind of realworld problems; and PPSO with more elaborated communication strategies demonstrated to be more efficient and robust than the master-slave model. Advantages and peculiarities of each model are carefully discussed in this work. (author)

  13. Adaptive control in multi-threaded iterated integration

    International Nuclear Information System (INIS)

    Doncker, Elise de; Yuasa, Fukuko

    2013-01-01

    In recent years we have developed a technique for the direct computation of Feynman loop-integrals, which are notorious for the occurrence of integrand singularities. Especially for handling singularities in the interior of the domain, we approximate the iterated integral using an adaptive algorithm in the coordinate directions. We present a novel multi-core parallelization scheme for adaptive multivariate integration, by assigning threads to the rule evaluations in the outer dimensions of the iterated integral. The method ensures a large parallel granularity as each function evaluation by itself comprises an integral over the lower dimensions, while the application of the threads is governed by the adaptive control in the outer level. We give computational results for a test set of 3- to 6-dimensional integrals, where several problems exhibit a loop integral behavior.

  14. Efficient parallel implementation of active appearance model fitting algorithm on GPU.

    Science.gov (United States)

    Wang, Jinwei; Ma, Xirong; Zhu, Yuanping; Sun, Jizhou

    2014-01-01

    The active appearance model (AAM) is one of the most powerful model-based object detecting and tracking methods which has been widely used in various situations. However, the high-dimensional texture representation causes very time-consuming computations, which makes the AAM difficult to apply to real-time systems. The emergence of modern graphics processing units (GPUs) that feature a many-core, fine-grained parallel architecture provides new and promising solutions to overcome the computational challenge. In this paper, we propose an efficient parallel implementation of the AAM fitting algorithm on GPUs. Our design idea is fine grain parallelism in which we distribute the texture data of the AAM, in pixels, to thousands of parallel GPU threads for processing, which makes the algorithm fit better into the GPU architecture. We implement our algorithm using the compute unified device architecture (CUDA) on the Nvidia's GTX 650 GPU, which has the latest Kepler architecture. To compare the performance of our algorithm with different data sizes, we built sixteen face AAM models of different dimensional textures. The experiment results show that our parallel AAM fitting algorithm can achieve real-time performance for videos even on very high-dimensional textures.

  15. DGDFT: A massively parallel method for large scale density functional theory calculations.

    Science.gov (United States)

    Hu, Wei; Lin, Lin; Yang, Chao

    2015-09-28

    We describe a massively parallel implementation of the recently developed discontinuous Galerkin density functional theory (DGDFT) method, for efficient large-scale Kohn-Sham DFT based electronic structure calculations. The DGDFT method uses adaptive local basis (ALB) functions generated on-the-fly during the self-consistent field iteration to represent the solution to the Kohn-Sham equations. The use of the ALB set provides a systematic way to improve the accuracy of the approximation. By using the pole expansion and selected inversion technique to compute electron density, energy, and atomic forces, we can make the computational complexity of DGDFT scale at most quadratically with respect to the number of electrons for both insulating and metallic systems. We show that for the two-dimensional (2D) phosphorene systems studied here, using 37 basis functions per atom allows us to reach an accuracy level of 1.3 × 10(-4) Hartree/atom in terms of the error of energy and 6.2 × 10(-4) Hartree/bohr in terms of the error of atomic force, respectively. DGDFT can achieve 80% parallel efficiency on 128,000 high performance computing cores when it is used to study the electronic structure of 2D phosphorene systems with 3500-14 000 atoms. This high parallel efficiency results from a two-level parallelization scheme that we will describe in detail.

  16. DGDFT: A massively parallel method for large scale density functional theory calculations

    International Nuclear Information System (INIS)

    Hu, Wei; Yang, Chao; Lin, Lin

    2015-01-01

    We describe a massively parallel implementation of the recently developed discontinuous Galerkin density functional theory (DGDFT) method, for efficient large-scale Kohn-Sham DFT based electronic structure calculations. The DGDFT method uses adaptive local basis (ALB) functions generated on-the-fly during the self-consistent field iteration to represent the solution to the Kohn-Sham equations. The use of the ALB set provides a systematic way to improve the accuracy of the approximation. By using the pole expansion and selected inversion technique to compute electron density, energy, and atomic forces, we can make the computational complexity of DGDFT scale at most quadratically with respect to the number of electrons for both insulating and metallic systems. We show that for the two-dimensional (2D) phosphorene systems studied here, using 37 basis functions per atom allows us to reach an accuracy level of 1.3 × 10 −4 Hartree/atom in terms of the error of energy and 6.2 × 10 −4 Hartree/bohr in terms of the error of atomic force, respectively. DGDFT can achieve 80% parallel efficiency on 128,000 high performance computing cores when it is used to study the electronic structure of 2D phosphorene systems with 3500-14 000 atoms. This high parallel efficiency results from a two-level parallelization scheme that we will describe in detail

  17. DGDFT: A massively parallel method for large scale density functional theory calculations

    Energy Technology Data Exchange (ETDEWEB)

    Hu, Wei, E-mail: whu@lbl.gov; Yang, Chao, E-mail: cyang@lbl.gov [Computational Research Division, Lawrence Berkeley National Laboratory, Berkeley, California 94720 (United States); Lin, Lin, E-mail: linlin@math.berkeley.edu [Computational Research Division, Lawrence Berkeley National Laboratory, Berkeley, California 94720 (United States); Department of Mathematics, University of California, Berkeley, California 94720 (United States)

    2015-09-28

    We describe a massively parallel implementation of the recently developed discontinuous Galerkin density functional theory (DGDFT) method, for efficient large-scale Kohn-Sham DFT based electronic structure calculations. The DGDFT method uses adaptive local basis (ALB) functions generated on-the-fly during the self-consistent field iteration to represent the solution to the Kohn-Sham equations. The use of the ALB set provides a systematic way to improve the accuracy of the approximation. By using the pole expansion and selected inversion technique to compute electron density, energy, and atomic forces, we can make the computational complexity of DGDFT scale at most quadratically with respect to the number of electrons for both insulating and metallic systems. We show that for the two-dimensional (2D) phosphorene systems studied here, using 37 basis functions per atom allows us to reach an accuracy level of 1.3 × 10{sup −4} Hartree/atom in terms of the error of energy and 6.2 × 10{sup −4} Hartree/bohr in terms of the error of atomic force, respectively. DGDFT can achieve 80% parallel efficiency on 128,000 high performance computing cores when it is used to study the electronic structure of 2D phosphorene systems with 3500-14 000 atoms. This high parallel efficiency results from a two-level parallelization scheme that we will describe in detail.

  18. ITER shielding blanket

    Energy Technology Data Exchange (ETDEWEB)

    Strebkov, Yu [ENTEK, Moscow (Russian Federation); Avsjannikov, A [ENTEK, Moscow (Russian Federation); Baryshev, M [NIAT, Moscow (Russian Federation); Blinov, Yu [ENTEK, Moscow (Russian Federation); Shatalov, G [KIAE, Moscow (Russian Federation); Vasiliev, N [KIAE, Moscow (Russian Federation); Vinnikov, A [ENTEK, Moscow (Russian Federation); Chernjagin, A [DYNAMICA, Moscow (Russian Federation)

    1995-03-01

    A reference non-breeding blanket is under development now for the ITER Basic Performance Phase for the purpose of high reliability during the first stage of ITER operation. More severe operation modes are expected in this stage with first wall (FW) local heat loads up to 100-300Wcm{sup -2}. Integration of a blanket design with protective and start limiters requires new solutions to achieve high reliability, and possible use of beryllium as a protective material leads to technologies. The rigid shielding blanket concept was developed in Russia to satisfy the above-mentioned requirements. The concept is based on a copper alloy FW, austenitic stainless steel blanket structure, water cooling. Beryllium protection is integrated in the FW design. Fabrication technology and assembly procedure are described in parallel with the equipment used. (orig.).

  19. Efficient Parallel Strategy Improvement for Parity Games

    OpenAIRE

    Fearnley, John

    2017-01-01

    We study strategy improvement algorithms for solving parity games. While these algorithms are known to solve parity games using a very small number of iterations, experimental studies have found that a high step complexity causes them to perform poorly in practice. In this paper we seek to address this situation. Every iteration of the algorithm must compute a best response, and while the standard way of doing this uses the Bellman-Ford algorithm, we give experimental results that show that o...

  20. Unified Lambert Tool for Massively Parallel Applications in Space Situational Awareness

    Science.gov (United States)

    Woollands, Robyn M.; Read, Julie; Hernandez, Kevin; Probe, Austin; Junkins, John L.

    2018-03-01

    This paper introduces a parallel-compiled tool that combines several of our recently developed methods for solving the perturbed Lambert problem using modified Chebyshev-Picard iteration. This tool (unified Lambert tool) consists of four individual algorithms, each of which is unique and better suited for solving a particular type of orbit transfer. The first is a Keplerian Lambert solver, which is used to provide a good initial guess (warm start) for solving the perturbed problem. It is also used to determine the appropriate algorithm to call for solving the perturbed problem. The arc length or true anomaly angle spanned by the transfer trajectory is the parameter that governs the automated selection of the appropriate perturbed algorithm, and is based on the respective algorithm convergence characteristics. The second algorithm solves the perturbed Lambert problem using the modified Chebyshev-Picard iteration two-point boundary value solver. This algorithm does not require a Newton-like shooting method and is the most efficient of the perturbed solvers presented herein, however the domain of convergence is limited to about a third of an orbit and is dependent on eccentricity. The third algorithm extends the domain of convergence of the modified Chebyshev-Picard iteration two-point boundary value solver to about 90% of an orbit, through regularization with the Kustaanheimo-Stiefel transformation. This is the second most efficient of the perturbed set of algorithms. The fourth algorithm uses the method of particular solutions and the modified Chebyshev-Picard iteration initial value solver for solving multiple revolution perturbed transfers. This method does require "shooting" but differs from Newton-like shooting methods in that it does not require propagation of a state transition matrix. The unified Lambert tool makes use of the General Mission Analysis Tool and we use it to compute thousands of perturbed Lambert trajectories in parallel on the Space Situational

  1. Improving the iterative Linear Interaction Energy approach using automated recognition of configurational transitions.

    Science.gov (United States)

    Vosmeer, C Ruben; Kooi, Derk P; Capoferri, Luigi; Terpstra, Margreet M; Vermeulen, Nico P E; Geerke, Daan P

    2016-01-01

    Recently an iterative method was proposed to enhance the accuracy and efficiency of ligand-protein binding affinity prediction through linear interaction energy (LIE) theory. For ligand binding to flexible Cytochrome P450s (CYPs), this method was shown to decrease the root-mean-square error and standard deviation of error prediction by combining interaction energies of simulations starting from different conformations. Thereby, different parts of protein-ligand conformational space are sampled in parallel simulations. The iterative LIE framework relies on the assumption that separate simulations explore different local parts of phase space, and do not show transitions to other parts of configurational space that are already covered in parallel simulations. In this work, a method is proposed to (automatically) detect such transitions during the simulations that are performed to construct LIE models and to predict binding affinities. Using noise-canceling techniques and splines to fit time series of the raw data for the interaction energies, transitions during simulation between different parts of phase space are identified. Boolean selection criteria are then applied to determine which parts of the interaction energy trajectories are to be used as input for the LIE calculations. Here we show that this filtering approach benefits the predictive quality of our previous CYP 2D6-aryloxypropanolamine LIE model. In addition, an analysis is performed of the gain in computational efficiency that can be obtained from monitoring simulations using the proposed filtering method and by prematurely terminating simulations accordingly.

  2. Totally parallel multilevel algorithms

    Science.gov (United States)

    Frederickson, Paul O.

    1988-01-01

    Four totally parallel algorithms for the solution of a sparse linear system have common characteristics which become quite apparent when they are implemented on a highly parallel hypercube such as the CM2. These four algorithms are Parallel Superconvergent Multigrid (PSMG) of Frederickson and McBryan, Robust Multigrid (RMG) of Hackbusch, the FFT based Spectral Algorithm, and Parallel Cyclic Reduction. In fact, all four can be formulated as particular cases of the same totally parallel multilevel algorithm, which are referred to as TPMA. In certain cases the spectral radius of TPMA is zero, and it is recognized to be a direct algorithm. In many other cases the spectral radius, although not zero, is small enough that a single iteration per timestep keeps the local error within the required tolerance.

  3. Adaptive Iterative Soft-Input Soft-Output Parallel Decision-Feedback Detectors for Asynchronous Coded DS-CDMA Systems

    Directory of Open Access Journals (Sweden)

    Zhang Wei

    2005-01-01

    Full Text Available The optimum and many suboptimum iterative soft-input soft-output (SISO multiuser detectors require a priori information about the multiuser system, such as the users' transmitted signature waveforms, relative delays, as well as the channel impulse response. In this paper, we employ adaptive algorithms in the SISO multiuser detector in order to avoid the need for this a priori information. First, we derive the optimum SISO parallel decision-feedback detector for asynchronous coded DS-CDMA systems. Then, we propose two adaptive versions of this SISO detector, which are based on the normalized least mean square (NLMS and recursive least squares (RLS algorithms. Our SISO adaptive detectors effectively exploit the a priori information of coded symbols, whose soft inputs are obtained from a bank of single-user decoders. Furthermore, we consider how to select practical finite feedforward and feedback filter lengths to obtain a good tradeoff between the performance and computational complexity of the receiver.

  4. A parallel algorithm for the two-dimensional time fractional diffusion equation with implicit difference method.

    Science.gov (United States)

    Gong, Chunye; Bao, Weimin; Tang, Guojian; Jiang, Yuewen; Liu, Jie

    2014-01-01

    It is very time consuming to solve fractional differential equations. The computational complexity of two-dimensional fractional differential equation (2D-TFDE) with iterative implicit finite difference method is O(M(x)M(y)N(2)). In this paper, we present a parallel algorithm for 2D-TFDE and give an in-depth discussion about this algorithm. A task distribution model and data layout with virtual boundary are designed for this parallel algorithm. The experimental results show that the parallel algorithm compares well with the exact solution. The parallel algorithm on single Intel Xeon X5540 CPU runs 3.16-4.17 times faster than the serial algorithm on single CPU core. The parallel efficiency of 81 processes is up to 88.24% compared with 9 processes on a distributed memory cluster system. We do think that the parallel computing technology will become a very basic method for the computational intensive fractional applications in the near future.

  5. Parallelization of the preconditioned IDR solver for modern multicore computer systems

    Science.gov (United States)

    Bessonov, O. A.; Fedoseyev, A. I.

    2012-10-01

    This paper present the analysis, parallelization and optimization approach for the large sparse matrix solver CNSPACK for modern multicore microprocessors. CNSPACK is an advanced solver successfully used for coupled solution of stiff problems arising in multiphysics applications such as CFD, semiconductor transport, kinetic and quantum problems. It employs iterative IDR algorithm with ILU preconditioning (user chosen ILU preconditioning order). CNSPACK has been successfully used during last decade for solving problems in several application areas, including fluid dynamics and semiconductor device simulation. However, there was a dramatic change in processor architectures and computer system organization in recent years. Due to this, performance criteria and methods have been revisited, together with involving the parallelization of the solver and preconditioner using Open MP environment. Results of the successful implementation for efficient parallelization are presented for the most advances computer system (Intel Core i7-9xx or two-processor Xeon 55xx/56xx).

  6. Accelerating Lattice QCD Multigrid on GPUs Using Fine-Grained Parallelization

    Energy Technology Data Exchange (ETDEWEB)

    Clark, M. A. [NVIDIA Corp., Santa Clara; Joó, Bálint [Jefferson Lab; Strelchenko, Alexei [Fermilab; Cheng, Michael [Boston U., Ctr. Comp. Sci.; Gambhir, Arjun [William-Mary Coll.; Brower, Richard [Boston U.

    2016-12-22

    The past decade has witnessed a dramatic acceleration of lattice quantum chromodynamics calculations in nuclear and particle physics. This has been due to both significant progress in accelerating the iterative linear solvers using multi-grid algorithms, and due to the throughput improvements brought by GPUs. Deploying hierarchical algorithms optimally on GPUs is non-trivial owing to the lack of parallelism on the coarse grids, and as such, these advances have not proved multiplicative. Using the QUDA library, we demonstrate that by exposing all sources of parallelism that the underlying stencil problem possesses, and through appropriate mapping of this parallelism to the GPU architecture, we can achieve high efficiency even for the coarsest of grids. Results are presented for the Wilson-Clover discretization, where we demonstrate up to 10x speedup over present state-of-the-art GPU-accelerated methods on Titan. Finally, we look to the future, and consider the software implications of our findings.

  7. An efficient parallel stochastic simulation method for analysis of nonviral gene delivery systems

    KAUST Repository

    Kuwahara, Hiroyuki

    2011-01-01

    Gene therapy has a great potential to become an effective treatment for a wide variety of diseases. One of the main challenges to make gene therapy practical in clinical settings is the development of efficient and safe mechanisms to deliver foreign DNA molecules into the nucleus of target cells. Several computational and experimental studies have shown that the design process of synthetic gene transfer vectors can be greatly enhanced by computational modeling and simulation. This paper proposes a novel, effective parallelization of the stochastic simulation algorithm (SSA) for pharmacokinetic models that characterize the rate-limiting, multi-step processes of intracellular gene delivery. While efficient parallelizations of the SSA are still an open problem in a general setting, the proposed parallel simulation method is able to substantially accelerate the next reaction selection scheme and the reaction update scheme in the SSA by exploiting and decomposing the structures of stochastic gene delivery models. This, thus, makes computationally intensive analysis such as parameter optimizations and gene dosage control for specific cell types, gene vectors, and transgene expression stability substantially more practical than that could otherwise be with the standard SSA. Here, we translated the nonviral gene delivery model based on mass-action kinetics by Varga et al. [Molecular Therapy, 4(5), 2001] into a more realistic model that captures intracellular fluctuations based on stochastic chemical kinetics, and as a case study we applied our parallel simulation to this stochastic model. Our results show that our simulation method is able to increase the efficiency of statistical analysis by at least 50% in various settings. © 2011 ACM.

  8. Parallel efficient rate control methods for JPEG 2000

    Science.gov (United States)

    Martínez-del-Amor, Miguel Á.; Bruns, Volker; Sparenberg, Heiko

    2017-09-01

    Since the introduction of JPEG 2000, several rate control methods have been proposed. Among them, post-compression rate-distortion optimization (PCRD-Opt) is the most widely used, and the one recommended by the standard. The approach followed by this method is to first compress the entire image split in code blocks, and subsequently, optimally truncate the set of generated bit streams according to the maximum target bit rate constraint. The literature proposes various strategies on how to estimate ahead of time where a block will get truncated in order to stop the execution prematurely and save time. However, none of them have been defined bearing in mind a parallel implementation. Today, multi-core and many-core architectures are becoming popular for JPEG 2000 codecs implementations. Therefore, in this paper, we analyze how some techniques for efficient rate control can be deployed in GPUs. In order to do that, the design of our GPU-based codec is extended, allowing stopping the process at a given point. This extension also harnesses a higher level of parallelism on the GPU, leading to up to 40% of speedup with 4K test material on a Titan X. In a second step, three selected rate control methods are adapted and implemented in our parallel encoder. A comparison is then carried out, and used to select the best candidate to be deployed in a GPU encoder, which gave an extra 40% of speedup in those situations where it was really employed.

  9. Efficient Parallel Implementation of Active Appearance Model Fitting Algorithm on GPU

    Directory of Open Access Journals (Sweden)

    Jinwei Wang

    2014-01-01

    Full Text Available The active appearance model (AAM is one of the most powerful model-based object detecting and tracking methods which has been widely used in various situations. However, the high-dimensional texture representation causes very time-consuming computations, which makes the AAM difficult to apply to real-time systems. The emergence of modern graphics processing units (GPUs that feature a many-core, fine-grained parallel architecture provides new and promising solutions to overcome the computational challenge. In this paper, we propose an efficient parallel implementation of the AAM fitting algorithm on GPUs. Our design idea is fine grain parallelism in which we distribute the texture data of the AAM, in pixels, to thousands of parallel GPU threads for processing, which makes the algorithm fit better into the GPU architecture. We implement our algorithm using the compute unified device architecture (CUDA on the Nvidia’s GTX 650 GPU, which has the latest Kepler architecture. To compare the performance of our algorithm with different data sizes, we built sixteen face AAM models of different dimensional textures. The experiment results show that our parallel AAM fitting algorithm can achieve real-time performance for videos even on very high-dimensional textures.

  10. Parallelization in Modern C++

    CERN Multimedia

    CERN. Geneva

    2016-01-01

    The traditionally used and well established parallel programming models OpenMP and MPI are both targeting lower level parallelism and are meant to be as language agnostic as possible. For a long time, those models were the only widely available portable options for developing parallel C++ applications beyond using plain threads. This has strongly limited the optimization capabilities of compilers, has inhibited extensibility and genericity, and has restricted the use of those models together with other, modern higher level abstractions introduced by the C++11 and C++14 standards. The recent revival of interest in the industry and wider community for the C++ language has also spurred a remarkable amount of standardization proposals and technical specifications being developed. Those efforts however have so far failed to build a vision on how to seamlessly integrate various types of parallelism, such as iterative parallel execution, task-based parallelism, asynchronous many-task execution flows, continuation s...

  11. Efficient sequential and parallel algorithms for record linkage.

    Science.gov (United States)

    Mamun, Abdullah-Al; Mi, Tian; Aseltine, Robert; Rajasekaran, Sanguthevar

    2014-01-01

    Integrating data from multiple sources is a crucial and challenging problem. Even though there exist numerous algorithms for record linkage or deduplication, they suffer from either large time needs or restrictions on the number of datasets that they can integrate. In this paper we report efficient sequential and parallel algorithms for record linkage which handle any number of datasets and outperform previous algorithms. Our algorithms employ hierarchical clustering algorithms as the basis. A key idea that we use is radix sorting on certain attributes to eliminate identical records before any further processing. Another novel idea is to form a graph that links similar records and find the connected components. Our sequential and parallel algorithms have been tested on a real dataset of 1,083,878 records and synthetic datasets ranging in size from 50,000 to 9,000,000 records. Our sequential algorithm runs at least two times faster, for any dataset, than the previous best-known algorithm, the two-phase algorithm using faster computation of the edit distance (TPA (FCED)). The speedups obtained by our parallel algorithm are almost linear. For example, we get a speedup of 7.5 with 8 cores (residing in a single node), 14.1 with 16 cores (residing in two nodes), and 26.4 with 32 cores (residing in four nodes). We have compared the performance of our sequential algorithm with TPA (FCED) and found that our algorithm outperforms the previous one. The accuracy is the same as that of this previous best-known algorithm.

  12. A parallel algorithm for the non-symmetric eigenvalue problem

    International Nuclear Information System (INIS)

    Sidani, M.M.

    1991-01-01

    An algorithm is presented for the solution of the non-symmetric eigenvalue problem. The algorithm is based on a divide-and-conquer procedure that provides initial approximations to the eigenpairs, which are then refined using Newton iterations. Since the smaller subproblems can be solved independently, and since Newton iterations with different initial guesses can be started simultaneously, the algorithm - unlike the standard QR method - is ideal for parallel computers. The author also reports on his investigation of deflation methods designed to obtain further eigenpairs if needed. Numerical results from implementations on a host of parallel machines (distributed and shared-memory) are presented

  13. Comparison of Non-overlapping and Overlapping Local/Global Iteration Schemes for Whole-Core Deterministic Transport Calculation

    International Nuclear Information System (INIS)

    Yuk, Seung Su; Cho, Bumhee; Cho, Nam Zin

    2013-01-01

    In the case of deterministic transport model, fixed-k problem formulation is necessary and the overlapping local domain is chosen. However, as mentioned in, the partial current-based Coarse Mesh Finite Difference (p-CMFD) procedure enables also non-overlapping local/global (NLG) iteration. In this paper, NLG iteration is combined with p-CMFD and with CMFD (augmented with a concept of p-CMFD), respectively, and compared to OLG iteration on a 2-D test problem. Non-overlapping local/global iteration with p-CMFD and CMFD global calculation is introduced and tested on a 2-D deterministic transport problem. The modified C5G7 problem is analyzed with both NLG and OLG methods and the solutions converge to the reference solution except for some cases of NLG with CMFD. NLG with CMFD gives the best performance if the solution converges. But if fission-source iteration in local calculation is not enough, it is prone to diverge. The p-CMFD global solver gives unconditional convergence (for both OLG and NLG). A study of switching scheme is in progress, where NLG/p-CMFD is used as 'starter' and then switched to NLG/CMFD to render the whole-core transport calculation more efficient and robust. Parallel computation is another obvious future work

  14. A highly efficient parallel algorithm for solving the neutron diffusion nodal equations on shared-memory computers

    International Nuclear Information System (INIS)

    Azmy, Y.Y.; Kirk, B.L.

    1990-01-01

    Modern parallel computer architectures offer an enormous potential for reducing CPU and wall-clock execution times of large-scale computations commonly performed in various applications in science and engineering. Recently, several authors have reported their efforts in developing and implementing parallel algorithms for solving the neutron diffusion equation on a variety of shared- and distributed-memory parallel computers. Testing of these algorithms for a variety of two- and three-dimensional meshes showed significant speedup of the computation. Even for very large problems (i.e., three-dimensional fine meshes) executed concurrently on a few nodes in serial (nonvector) mode, however, the measured computational efficiency is very low (40 to 86%). In this paper, the authors present a highly efficient (∼85 to 99.9%) algorithm for solving the two-dimensional nodal diffusion equations on the Sequent Balance 8000 parallel computer. Also presented is a model for the performance, represented by the efficiency, as a function of problem size and the number of participating processors. The model is validated through several tests and then extrapolated to larger problems and more processors to predict the performance of the algorithm in more computationally demanding situations

  15. Coarse-grain parallel solution of few-group neutron diffusion equations

    International Nuclear Information System (INIS)

    Sarsour, H.N.; Turinsky, P.J.

    1991-01-01

    The authors present a parallel numerical algorithm for the solution of the finite difference representation of the few-group neutron diffusion equations. The targeted architectures are multiprocessor computers with shared memory like the Cray Y-MP and the IBM 3090/VF, where coarse granularity is important for minimizing overhead. Most of the work done in the past, which attempts to exploit concurrence, has concentrated on the inner iterations of the standard outer-inner iterative strategy. This produces very fine granularity. To coarsen granularity, the authors introduce parallelism at the nested outer-inner level. The problem's spatial domain was partitioned into contiguous subregions and assigned a processor to solve for each subregion independent of all other subregions, hence, processors; i.e., each subregion is treated as a reactor core with imposed boundary conditions. Since those boundary conditions on interior surfaces, referred to as internal boundary conditions (IBCs), are not known, a third iterative level, the recomposition iterations, is introduced to communicate results between subregions

  16. Parallel Conjugate Gradient: Effects of Ordering Strategies, Programming Paradigms, and Architectural Platforms

    Science.gov (United States)

    Oliker, Leonid; Heber, Gerd; Biswas, Rupak

    2000-01-01

    The Conjugate Gradient (CG) algorithm is perhaps the best-known iterative technique to solve sparse linear systems that are symmetric and positive definite. A sparse matrix-vector multiply (SPMV) usually accounts for most of the floating-point operations within a CG iteration. In this paper, we investigate the effects of various ordering and partitioning strategies on the performance of parallel CG and SPMV using different programming paradigms and architectures. Results show that for this class of applications, ordering significantly improves overall performance, that cache reuse may be more important than reducing communication, and that it is possible to achieve message passing performance using shared memory constructs through careful data ordering and distribution. However, a multi-threaded implementation of CG on the Tera MTA does not require special ordering or partitioning to obtain high efficiency and scalability.

  17. Implementation of a parallel algorithm for spherical SN calculations on the IBM 3090

    International Nuclear Information System (INIS)

    Haghighat, A.; Lawrence, R.D.

    1989-01-01

    Parallel S N algorithms based on domain decomposition in angle are straightforward to develop in Cartesian geometry because the computation of the angular fluxes for a specific discrete ordinate can be performed independently of all other angles. This is not the case for curvilinear geometries, where the angular redistribution component of the discretized streaming operator results in coupling between angular fluxes along adjacent discrete ordinates. Previously, the authors developed a parallel algorithm for S N calculations in spherical geometry and examined its iterative convergence for criticality and detector problems with differing scattering/absorption ratios. In this paper, the authors describe the implementation of the algorithm on an IBM 3090 Model 400 (four processors) and present computational results illustrating the efficiency of the algorithm relative to serial execution

  18. Software abstractions and computational issues in parallel structure adaptive mesh methods for electronic structure calculations

    Energy Technology Data Exchange (ETDEWEB)

    Kohn, S.; Weare, J.; Ong, E.; Baden, S.

    1997-05-01

    We have applied structured adaptive mesh refinement techniques to the solution of the LDA equations for electronic structure calculations. Local spatial refinement concentrates memory resources and numerical effort where it is most needed, near the atomic centers and in regions of rapidly varying charge density. The structured grid representation enables us to employ efficient iterative solver techniques such as conjugate gradient with FAC multigrid preconditioning. We have parallelized our solver using an object- oriented adaptive mesh refinement framework.

  19. A discrete ordinate response matrix method for massively parallel computers

    International Nuclear Information System (INIS)

    Hanebutte, U.R.; Lewis, E.E.

    1991-01-01

    A discrete ordinate response matrix method is formulated for the solution of neutron transport problems on massively parallel computers. The response matrix formulation eliminates iteration on the scattering source. The nodal matrices which result from the diamond-differenced equations are utilized in a factored form which minimizes memory requirements and significantly reduces the required number of algorithm utilizes massive parallelism by assigning each spatial node to a processor. The algorithm is accelerated effectively by a synthetic method in which the low-order diffusion equations are also solved by massively parallel red/black iterations. The method has been implemented on a 16k Connection Machine-2, and S 8 and S 16 solutions have been obtained for fixed-source benchmark problems in X--Y geometry

  20. BitPAl: a bit-parallel, general integer-scoring sequence alignment algorithm.

    Science.gov (United States)

    Loving, Joshua; Hernandez, Yozen; Benson, Gary

    2014-11-15

    Mapping of high-throughput sequencing data and other bulk sequence comparison applications have motivated a search for high-efficiency sequence alignment algorithms. The bit-parallel approach represents individual cells in an alignment scoring matrix as bits in computer words and emulates the calculation of scores by a series of logic operations composed of AND, OR, XOR, complement, shift and addition. Bit-parallelism has been successfully applied to the longest common subsequence (LCS) and edit-distance problems, producing fast algorithms in practice. We have developed BitPAl, a bit-parallel algorithm for general, integer-scoring global alignment. Integer-scoring schemes assign integer weights for match, mismatch and insertion/deletion. The BitPAl method uses structural properties in the relationship between adjacent scores in the scoring matrix to construct classes of efficient algorithms, each designed for a particular set of weights. In timed tests, we show that BitPAl runs 7-25 times faster than a standard iterative algorithm. Source code is freely available for download at http://lobstah.bu.edu/BitPAl/BitPAl.html. BitPAl is implemented in C and runs on all major operating systems. jloving@bu.edu or yhernand@bu.edu or gbenson@bu.edu Supplementary data are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press.

  1. A Parallel Particle Swarm Optimization Algorithm Accelerated by Asynchronous Evaluations

    Science.gov (United States)

    Venter, Gerhard; Sobieszczanski-Sobieski, Jaroslaw

    2005-01-01

    A parallel Particle Swarm Optimization (PSO) algorithm is presented. Particle swarm optimization is a fairly recent addition to the family of non-gradient based, probabilistic search algorithms that is based on a simplified social model and is closely tied to swarming theory. Although PSO algorithms present several attractive properties to the designer, they are plagued by high computational cost as measured by elapsed time. One approach to reduce the elapsed time is to make use of coarse-grained parallelization to evaluate the design points. Previous parallel PSO algorithms were mostly implemented in a synchronous manner, where all design points within a design iteration are evaluated before the next iteration is started. This approach leads to poor parallel speedup in cases where a heterogeneous parallel environment is used and/or where the analysis time depends on the design point being analyzed. This paper introduces an asynchronous parallel PSO algorithm that greatly improves the parallel e ciency. The asynchronous algorithm is benchmarked on a cluster assembled of Apple Macintosh G5 desktop computers, using the multi-disciplinary optimization of a typical transport aircraft wing as an example.

  2. Automatic Loop Parallelization via Compiler Guided Refactoring

    DEFF Research Database (Denmark)

    Larsen, Per; Ladelsky, Razya; Lidman, Jacob

    For many parallel applications, performance relies not on instruction-level parallelism, but on loop-level parallelism. Unfortunately, many modern applications are written in ways that obstruct automatic loop parallelization. Since we cannot identify sufficient parallelization opportunities...... for these codes in a static, off-line compiler, we developed an interactive compilation feedback system that guides the programmer in iteratively modifying application source, thereby improving the compiler’s ability to generate loop-parallel code. We use this compilation system to modify two sequential...... benchmarks, finding that the code parallelized in this way runs up to 8.3 times faster on an octo-core Intel Xeon 5570 system and up to 12.5 times faster on a quad-core IBM POWER6 system. Benchmark performance varies significantly between the systems. This suggests that semi-automatic parallelization should...

  3. Efficient sequential and parallel algorithms for finding edit distance based motifs.

    Science.gov (United States)

    Pal, Soumitra; Xiao, Peng; Rajasekaran, Sanguthevar

    2016-08-18

    Motif search is an important step in extracting meaningful patterns from biological data. The general problem of motif search is intractable and there is a pressing need to develop efficient, exact and approximation algorithms to solve this problem. In this paper, we present several novel, exact, sequential and parallel algorithms for solving the (l,d) Edit-distance-based Motif Search (EMS) problem: given two integers l,d and n biological strings, find all strings of length l that appear in each input string with atmost d errors of types substitution, insertion and deletion. One popular technique to solve the problem is to explore for each input string the set of all possible l-mers that belong to the d-neighborhood of any substring of the input string and output those which are common for all input strings. We introduce a novel and provably efficient neighborhood exploration technique. We show that it is enough to consider the candidates in neighborhood which are at a distance exactly d. We compactly represent these candidate motifs using wildcard characters and efficiently explore them with very few repetitions. Our sequential algorithm uses a trie based data structure to efficiently store and sort the candidate motifs. Our parallel algorithm in a multi-core shared memory setting uses arrays for storing and a novel modification of radix-sort for sorting the candidate motifs. The algorithms for EMS are customarily evaluated on several challenging instances such as (8,1), (12,2), (16,3), (20,4), and so on. The best previously known algorithm, EMS1, is sequential and in estimated 3 days solves up to instance (16,3). Our sequential algorithms are more than 20 times faster on (16,3). On other hard instances such as (9,2), (11,3), (13,4), our algorithms are much faster. Our parallel algorithm has more than 600 % scaling performance while using 16 threads. Our algorithms have pushed up the state-of-the-art of EMS solvers and we believe that the techniques introduced in

  4. Eliminating graphs by means of parallel knock-out schemes

    NARCIS (Netherlands)

    Broersma, H.J.; Fomin, F.V.; Královic, R.; Woeginger, G.J.

    2007-01-01

    In 1997 Lampert and Slater introduced parallel knock-out schemes, an iterative process on graphs that goes through several rounds. In each round of this process, every vertex eliminates exactly one of its neighbors. The parallel knock-out number of a graph is the minimum number of rounds after which

  5. Eliminating graphs by means of parallel knock-out schemes

    NARCIS (Netherlands)

    Broersma, Haitze J.; Fomin, F.V.; Královič, R.; Woeginger, Gerhard

    In 1997 Lampert and Slater introduced parallel knock-out schemes, an iterative process on graphs that goes through several rounds. In each round of this process, every vertex eliminates exactly one of its neighbors. The parallel knock-out number of a graph is the minimum number of rounds after which

  6. Efficient sequential and parallel algorithms for planted motif search.

    Science.gov (United States)

    Nicolae, Marius; Rajasekaran, Sanguthevar

    2014-01-31

    Motif searching is an important step in the detection of rare events occurring in a set of DNA or protein sequences. One formulation of the problem is known as (l,d)-motif search or Planted Motif Search (PMS). In PMS we are given two integers l and d and n biological sequences. We want to find all sequences of length l that appear in each of the input sequences with at most d mismatches. The PMS problem is NP-complete. PMS algorithms are typically evaluated on certain instances considered challenging. Despite ample research in the area, a considerable performance gap exists because many state of the art algorithms have large runtimes even for moderately challenging instances. This paper presents a fast exact parallel PMS algorithm called PMS8. PMS8 is the first algorithm to solve the challenging (l,d) instances (25,10) and (26,11). PMS8 is also efficient on instances with larger l and d such as (50,21). We include a comparison of PMS8 with several state of the art algorithms on multiple problem instances. This paper also presents necessary and sufficient conditions for 3 l-mers to have a common d-neighbor. The program is freely available at http://engr.uconn.edu/~man09004/PMS8/. We present PMS8, an efficient exact algorithm for Planted Motif Search. PMS8 introduces novel ideas for generating common neighborhoods. We have also implemented a parallel version for this algorithm. PMS8 can solve instances not solved by any previous algorithms.

  7. Multi-petascale highly efficient parallel supercomputer

    Science.gov (United States)

    Asaad, Sameh; Bellofatto, Ralph E.; Blocksome, Michael A.; Blumrich, Matthias A.; Boyle, Peter; Brunheroto, Jose R.; Chen, Dong; Cher, Chen -Yong; Chiu, George L.; Christ, Norman; Coteus, Paul W.; Davis, Kristan D.; Dozsa, Gabor J.; Eichenberger, Alexandre E.; Eisley, Noel A.; Ellavsky, Matthew R.; Evans, Kahn C.; Fleischer, Bruce M.; Fox, Thomas W.; Gara, Alan; Giampapa, Mark E.; Gooding, Thomas M.; Gschwind, Michael K.; Gunnels, John A.; Hall, Shawn A.; Haring, Rudolf A.; Heidelberger, Philip; Inglett, Todd A.; Knudson, Brant L.; Kopcsay, Gerard V.; Kumar, Sameer; Mamidala, Amith R.; Marcella, James A.; Megerian, Mark G.; Miller, Douglas R.; Miller, Samuel J.; Muff, Adam J.; Mundy, Michael B.; O'Brien, John K.; O'Brien, Kathryn M.; Ohmacht, Martin; Parker, Jeffrey J.; Poole, Ruth J.; Ratterman, Joseph D.; Salapura, Valentina; Satterfield, David L.; Senger, Robert M.; Smith, Brian; Steinmacher-Burow, Burkhard; Stockdell, William M.; Stunkel, Craig B.; Sugavanam, Krishnan; Sugawara, Yutaka; Takken, Todd E.; Trager, Barry M.; Van Oosten, James L.; Wait, Charles D.; Walkup, Robert E.; Watson, Alfred T.; Wisniewski, Robert W.; Wu, Peng

    2015-07-14

    A Multi-Petascale Highly Efficient Parallel Supercomputer of 100 petaOPS-scale computing, at decreased cost, power and footprint, and that allows for a maximum packaging density of processing nodes from an interconnect point of view. The Supercomputer exploits technological advances in VLSI that enables a computing model where many processors can be integrated into a single Application Specific Integrated Circuit (ASIC). Each ASIC computing node comprises a system-on-chip ASIC utilizing four or more processors integrated into one die, with each having full access to all system resources and enabling adaptive partitioning of the processors to functions such as compute or messaging I/O on an application by application basis, and preferably, enable adaptive partitioning of functions in accordance with various algorithmic phases within an application, or if I/O or other processors are underutilized, then can participate in computation or communication nodes are interconnected by a five dimensional torus network with DMA that optimally maximize the throughput of packet communications between nodes and minimize latency.

  8. Iterative Splitting Methods for Differential Equations

    CERN Document Server

    Geiser, Juergen

    2011-01-01

    Iterative Splitting Methods for Differential Equations explains how to solve evolution equations via novel iterative-based splitting methods that efficiently use computational and memory resources. It focuses on systems of parabolic and hyperbolic equations, including convection-diffusion-reaction equations, heat equations, and wave equations. In the theoretical part of the book, the author discusses the main theorems and results of the stability and consistency analysis for ordinary differential equations. He then presents extensions of the iterative splitting methods to partial differential

  9. HPC-NMF: A High-Performance Parallel Algorithm for Nonnegative Matrix Factorization

    Energy Technology Data Exchange (ETDEWEB)

    2016-08-22

    NMF is a useful tool for many applications in different domains such as topic modeling in text mining, background separation in video analysis, and community detection in social networks. Despite its popularity in the data mining community, there is a lack of efficient distributed algorithms to solve the problem for big data sets. We propose a high-performance distributed-memory parallel algorithm that computes the factorization by iteratively solving alternating non-negative least squares (NLS) subproblems for $\\WW$ and $\\HH$. It maintains the data and factor matrices in memory (distributed across processors), uses MPI for interprocessor communication, and, in the dense case, provably minimizes communication costs (under mild assumptions). As opposed to previous implementation, our algorithm is also flexible: It performs well for both dense and sparse matrices, and allows the user to choose any one of the multiple algorithms for solving the updates to low rank factors $\\WW$ and $\\HH$ within the alternating iterations.

  10. Fuel cycle design for ITER and its extrapolation to DEMO

    International Nuclear Information System (INIS)

    Konishi, Satoshi; Glugla, Manfred; Hayashi, Takumi

    2008-01-01

    ITER is the first fusion device that continuously processes DT plasma exhaust and supplies recycled fuel in a closed loop. All the tritium and deuterium in the exhaust are recovered, purified and returned to the tokamak with minimal delay, so that extended burn can be sustained with limited inventory. To maintain the safety of the entire facility, plant scale detritiation systems will also continuously run to remove tritium from the effluents at the maximum efficiency. In this entire tritium plant system, extremely high decontamination factor, that is the ratio of the tritium loss to the processing flow rate, is required for fuel economy and minimized tritium emissions, and the system design based on the state-of-the-art technology is expected to satisfy all the requirements without significant technical challenges. Considerable part of the fusion tritium system will be verified with ITER and its decades of operation experiences. Toward the DEMO plant that will actually generate energy and operate its closed fuel cycle, breeding blanket and power train that caries high temperature and pressure media from the fusion device to the generation system will be the major addition. For the tritium confinement, safety and environmental emission, particularly blanket, its coolant, and generation systems such as heat exchanger, steam generator and turbine will be the critical systems, because the tritium permeation from the breeder and handling large amount of high temperature, high pressure coolant will be further more difficult than that required for ITER. Detritiation of solid waste such as used blanket and divertor will be another issue for both tritium economy and safety. Unlike in the case of ITER that is regarded as experimental facility, DEMO will be expected to demonstrate the safety, reliability and social acceptance issue, even if economical feature is excluded. Fuel and environmental issue to be tested in the DEMO will determine the viability of the fusion as a

  11. Fuel cycle design for ITER and its extrapolation to DEMO

    Energy Technology Data Exchange (ETDEWEB)

    Konishi, Satoshi [Institute of Advanced Energy, Kyoto University, Kyoto 611-0011 (Japan)], E-mail: s-konishi@iae.kyoto-u.ac.jp; Glugla, Manfred [Forschungszentrum Karlsruhe, P.O. Box 3640, D 76021 Karlsruhe (Germany); Hayashi, Takumi [Apan Atomic Energy AgencyTokai, Ibaraki 319-0015 Japan (Japan)

    2008-12-15

    ITER is the first fusion device that continuously processes DT plasma exhaust and supplies recycled fuel in a closed loop. All the tritium and deuterium in the exhaust are recovered, purified and returned to the tokamak with minimal delay, so that extended burn can be sustained with limited inventory. To maintain the safety of the entire facility, plant scale detritiation systems will also continuously run to remove tritium from the effluents at the maximum efficiency. In this entire tritium plant system, extremely high decontamination factor, that is the ratio of the tritium loss to the processing flow rate, is required for fuel economy and minimized tritium emissions, and the system design based on the state-of-the-art technology is expected to satisfy all the requirements without significant technical challenges. Considerable part of the fusion tritium system will be verified with ITER and its decades of operation experiences. Toward the DEMO plant that will actually generate energy and operate its closed fuel cycle, breeding blanket and power train that caries high temperature and pressure media from the fusion device to the generation system will be the major addition. For the tritium confinement, safety and environmental emission, particularly blanket, its coolant, and generation systems such as heat exchanger, steam generator and turbine will be the critical systems, because the tritium permeation from the breeder and handling large amount of high temperature, high pressure coolant will be further more difficult than that required for ITER. Detritiation of solid waste such as used blanket and divertor will be another issue for both tritium economy and safety. Unlike in the case of ITER that is regarded as experimental facility, DEMO will be expected to demonstrate the safety, reliability and social acceptance issue, even if economical feature is excluded. Fuel and environmental issue to be tested in the DEMO will determine the viability of the fusion as a

  12. Communications oriented programming of parallel iterative solutions of sparse linear systems

    Science.gov (United States)

    Patrick, M. L.; Pratt, T. W.

    1986-01-01

    Parallel algorithms are developed for a class of scientific computational problems by partitioning the problems into smaller problems which may be solved concurrently. The effectiveness of the resulting parallel solutions is determined by the amount and frequency of communication and synchronization and the extent to which communication can be overlapped with computation. Three different parallel algorithms for solving the same class of problems are presented, and their effectiveness is analyzed from this point of view. The algorithms are programmed using a new programming environment. Run-time statistics and experience obtained from the execution of these programs assist in measuring the effectiveness of these algorithms.

  13. ITER Central Solenoid Module Fabrication

    Energy Technology Data Exchange (ETDEWEB)

    Smith, John [General Atomics, San Diego, CA (United States)

    2016-09-23

    The fabrication of the modules for the ITER Central Solenoid (CS) has started in a dedicated production facility located in Poway, California, USA. The necessary tools have been designed, built, installed, and tested in the facility to enable the start of production. The current schedule has first module fabrication completed in 2017, followed by testing and subsequent shipment to ITER. The Central Solenoid is a key component of the ITER tokamak providing the inductive voltage to initiate and sustain the plasma current and to position and shape the plasma. The design of the CS has been a collaborative effort between the US ITER Project Office (US ITER), the international ITER Organization (IO) and General Atomics (GA). GA’s responsibility includes: completing the fabrication design, developing and qualifying the fabrication processes and tools, and then completing the fabrication of the seven 110 tonne CS modules. The modules will be shipped separately to the ITER site, and then stacked and aligned in the Assembly Hall prior to insertion in the core of the ITER tokamak. A dedicated facility in Poway, California, USA has been established by GA to complete the fabrication of the seven modules. Infrastructure improvements included thick reinforced concrete floors, a diesel generator for backup power, along with, cranes for moving the tooling within the facility. The fabrication process for a single module requires approximately 22 months followed by five months of testing, which includes preliminary electrical testing followed by high current (48.5 kA) tests at 4.7K. The production of the seven modules is completed in a parallel fashion through ten process stations. The process stations have been designed and built with most stations having completed testing and qualification for carrying out the required fabrication processes. The final qualification step for each process station is achieved by the successful production of a prototype coil. Fabrication of the first

  14. Parallel and Efficient Sensitivity Analysis of Microscopy Image Segmentation Workflows in Hybrid Systems.

    Science.gov (United States)

    Barreiros, Willian; Teodoro, George; Kurc, Tahsin; Kong, Jun; Melo, Alba C M A; Saltz, Joel

    2017-09-01

    We investigate efficient sensitivity analysis (SA) of algorithms that segment and classify image features in a large dataset of high-resolution images. Algorithm SA is the process of evaluating variations of methods and parameter values to quantify differences in the output. A SA can be very compute demanding because it requires re-processing the input dataset several times with different parameters to assess variations in output. In this work, we introduce strategies to efficiently speed up SA via runtime optimizations targeting distributed hybrid systems and reuse of computations from runs with different parameters. We evaluate our approach using a cancer image analysis workflow on a hybrid cluster with 256 nodes, each with an Intel Phi and a dual socket CPU. The SA attained a parallel efficiency of over 90% on 256 nodes. The cooperative execution using the CPUs and the Phi available in each node with smart task assignment strategies resulted in an additional speedup of about 2×. Finally, multi-level computation reuse lead to an additional speedup of up to 2.46× on the parallel version. The level of performance attained with the proposed optimizations will allow the use of SA in large-scale studies.

  15. Parallel computation of fluid-structural interactions using high resolution upwind schemes

    Science.gov (United States)

    Hu, Zongjun

    An efficient and accurate solver is developed to simulate the non-linear fluid-structural interactions in turbomachinery flutter flows. A new low diffusion E-CUSP scheme, Zha CUSP scheme, is developed to improve the efficiency and accuracy of the inviscid flux computation. The 3D unsteady Navier-Stokes equations with the Baldwin-Lomax turbulence model are solved using the finite volume method with the dual-time stepping scheme. The linearized equations are solved with Gauss-Seidel line iterations. The parallel computation is implemented using MPI protocol. The solver is validated with 2D cases for its turbulence modeling, parallel computation and unsteady calculation. The Zha CUSP scheme is validated with 2D cases, including a supersonic flat plate boundary layer, a transonic converging-diverging nozzle and a transonic inlet diffuser. The Zha CUSP2 scheme is tested with 3D cases, including a circular-to-rectangular nozzle, a subsonic compressor cascade and a transonic channel. The Zha CUSP schemes are proved to be accurate, robust and efficient in these tests. The steady and unsteady separation flows in a 3D stationary cascade under high incidence and three inlet Mach numbers are calculated to study the steady state separation flow patterns and their unsteady oscillation characteristics. The leading edge vortex shedding is the mechanism behind the unsteady characteristics of the high incidence separated flows. The separation flow characteristics is affected by the inlet Mach number. The blade aeroelasticity of a linear cascade with forced oscillating blades is studied using parallel computation. A simplified two-passage cascade with periodic boundary condition is first calculated under a medium frequency and a low incidence. The full scale cascade with 9 blades and two end walls is then studied more extensively under three oscillation frequencies and two incidence angles. The end wall influence and the blade stability are studied and compared under different

  16. Design and development of the ITER vacuum vessel

    Energy Technology Data Exchange (ETDEWEB)

    Koizumi, K.; Nakahira, M.; Itou, Y.; Tada, E. [Japan Atomic Energy Research Inst., Naka, Ibaraki (Japan); Johnson, G.; Ioki, K.; Elio, F.; Iizuka, T.; Sannazzaro, G.; Takahashi, K.; Utin, Y.; Onozuka, M. [ITER Joint Central Team (JCT), Garching (Germany); Nelson, B. [US Home Team, Oak Ridge National Laboratory (United States); Vallone, C. [EU Home Team, NET Team, Garching (Germany); Kuzmin, E. [RF Home Team, Efremov Institute, City (Russian Federation)

    1998-09-01

    In ITER, the vacuum vessel (VV) is designed to be a water cooled, double-walled toroidal structure made of 316LN stainless steel with a D-shaped cross section approximately 9 m wide and 15 m high. The design work which began at the beginning of the ITER-EDA is nearing completion by resolving the technical issues. In parallel with the design activities, the R and D program, full-scale VV sector model project, was initiated in 1995 to resolve the design and fabrication issues. The full-scale sector model corresponds to an 18 sector (9 sub-sector x 2) and is being fabricated on schedule. To date, 60% of the fabrication had been completed. The fabrication of full-scale model including sector-to-sector connection will be completed by the end of 1997 and performance tests are scheduled until the end of ITER-EDA. This paper describes the latest status of the ITER VV design and the full-scale sector model project. (orig.) 3 refs.

  17. Study of wall conditioning in tokamaks with application to ITER

    International Nuclear Information System (INIS)

    Kogut, Dmitri

    2014-01-01

    Thesis is devoted to studies of performance and efficiency of wall conditioning techniques in fusion reactors, such as ITER. Conditioning is necessary to control the state of the surface of plasma facing components to ensure plasma initiation and performance. Conditioning and operation of the JET tokamak with ITER-relevant material mix is extensively studied. A 2D model of glow conditioning discharges is developed and validated; it predicts reasonably uniform discharges in ITER. In the nuclear phase of ITER operation conditioning will be needed to control tritium inventory. It is shown here that isotopic exchange is an efficient mean to eliminate tritium from the walls by replacing it with deuterium. Extrapolations for tritium removal are comparable with expected retention per a nominal plasma pulse in ITER. A 1D model of hydrogen isotopic exchange in beryllium is developed and validated. It shows that fluence and temperature of the surface influence efficiency of the isotopic exchange. (author) [fr

  18. Copper Mountain conference on iterative methods: Proceedings: Volume 2

    Energy Technology Data Exchange (ETDEWEB)

    NONE

    1996-10-01

    This volume (the second of two) contains information presented during the last two days of the Copper Mountain Conference on Iterative Methods held April 9-13, 1996 at Copper Mountain, Colorado. Topics of the sessions held these two days include domain decomposition, Krylov methods, computational fluid dynamics, Markov chains, sparse and parallel basic linear algebra subprograms, multigrid methods, applications of iterative methods, equation systems with multiple right-hand sides, projection methods, and the Helmholtz equation. Selected papers indexed separately for the Energy Science and Technology Database.

  19. Automatic Parallelization An Overview of Fundamental Compiler Techniques

    CERN Document Server

    Midkiff, Samuel P

    2012-01-01

    Compiling for parallelism is a longstanding topic of compiler research. This book describes the fundamental principles of compiling "regular" numerical programs for parallelism. We begin with an explanation of analyses that allow a compiler to understand the interaction of data reads and writes in different statements and loop iterations during program execution. These analyses include dependence analysis, use-def analysis and pointer analysis. Next, we describe how the results of these analyses are used to enable transformations that make loops more amenable to parallelization, and

  20. Iterative methods for 3D implicit finite-difference migration using the complex Padé approximation

    International Nuclear Information System (INIS)

    Costa, Carlos A N; Campos, Itamara S; Costa, Jessé C; Neto, Francisco A; Schleicher, Jörg; Novais, Amélia

    2013-01-01

    Conventional implementations of 3D finite-difference (FD) migration use splitting techniques to accelerate performance and save computational cost. However, such techniques are plagued with numerical anisotropy that jeopardises the correct positioning of dipping reflectors in the directions not used for the operator splitting. We implement 3D downward continuation FD migration without splitting using a complex Padé approximation. In this way, the numerical anisotropy is eliminated at the expense of a computationally more intensive solution of a large-band linear system. We compare the performance of the iterative stabilized biconjugate gradient (BICGSTAB) and that of the multifrontal massively parallel direct solver (MUMPS). It turns out that the use of the complex Padé approximation not only stabilizes the solution, but also acts as an effective preconditioner for the BICGSTAB algorithm, reducing the number of iterations as compared to the implementation using the real Padé expansion. As a consequence, the iterative BICGSTAB method is more efficient than the direct MUMPS method when solving a single term in the Padé expansion. The results of both algorithms, here evaluated by computing the migration impulse response in the SEG/EAGE salt model, are of comparable quality. (paper)

  1. Reactor structure and superconducting magnet system of ITER

    International Nuclear Information System (INIS)

    Tada, Eisuke; Yoshida, Kiyoshi; Shibanuma, Kiyoshi; Okuno, Kiyoshi; Tsuji, Hiroshi; Shimamoto, Susumu

    1993-01-01

    Fusion Experimental Reactors are one of the major steps toward realization of the fusion energy and the key objective are to demonstrate the scientific and technological feasibility prior to the Demo Fusion Reactor. ITER (International Thermonuclear Experimental Reactor) is one of experimental reactors and the conceptual design has been completed by the united efforts of USA, USSR, EC and Japan. In parallel with the conceptual design, key technology development in various areas has being conducted. This paper describes the overall design concepts and the latest technological achievements of the ITER reactor structure and superconducting magnet system. (author)

  2. Recent Progress on ECH Technology for ITER

    Science.gov (United States)

    Sirigiri, Jagadishwar

    2005-10-01

    The Electron Cyclotron Heating and Current Drive (ECH&CD) system for ITER is a critical ITER system that must be available for use on Day 1 of the ITER experimental program. The applications of the system include plasma start-up, plasma heating and suppression of Neoclassical Tearing Modes (NTMs). These applications are accomplished using 27 one megawatt continuous wave gyrotrons: 24 at a frequency of 170 GHz and 3 at a frequency of 120 GHz. There are DC power supplies for the gyrotrons, a transmission line system, one launcher at the equatorial plane and three upper port launchers. The US will play a major role in delivering parts of the ECH&CD system to ITER. The present state-of-the-art includes major advances in all areas of ECH technology. In the US, a major effort is underway to supply gyrotrons of up to 1.5 MW power level at 110 GHz to General Atomics for use in heating the DIII-D tokamak. This presentation will include a brief review of the state-of-the-art, worldwide, in ECH technology. The requirements for the ITER ECH&CD system will then be reviewed. ITER calls for gyrotrons capable of operating from a 50 kV power supply, after potential depression, with a minimum of 50% overall efficiency. This is a very significant challenge and some approaches to meeting this goal will be presented. Recent experimental results at MIT showing improved efficiency of high frequency, 1.5 MW gyrotrons will be described. These results will be incorporated into the planned development of gyrotrons for ITER. The ITER ECH&CD system will also be a challenge to the transmission lines, which must operate at high average power at up to 1000 seconds and with high efficiency. The technology challenges and efforts in the US and other ITER parties to solve these problems will be reviewed. *In collaboration with E. Choi, C. Marchewka, I. Mastovosky, M. A. Shapiro and R. J. Temkin. This work is supported by the Office of Fusion Energy Sciences of the U. S. Department of Energy.

  3. An efficient parallel algorithm for the solution of a tridiagonal linear system of equations

    Science.gov (United States)

    Stone, H. S.

    1971-01-01

    Tridiagonal linear systems of equations are solved on conventional serial machines in a time proportional to N, where N is the number of equations. The conventional algorithms do not lend themselves directly to parallel computations on computers of the ILLIAC IV class, in the sense that they appear to be inherently serial. An efficient parallel algorithm is presented in which computation time grows as log sub 2 N. The algorithm is based on recursive doubling solutions of linear recurrence relations, and can be used to solve recurrence relations of all orders.

  4. A Study on GPU-based Iterative ML-EM Reconstruction Algorithm for Emission Computed Tomographic Imaging Systems

    Energy Technology Data Exchange (ETDEWEB)

    Ha, Woo Seok; Kim, Soo Mee; Park, Min Jae; Lee, Dong Soo; Lee, Jae Sung [Seoul National University, Seoul (Korea, Republic of)

    2009-10-15

    The maximum likelihood-expectation maximization (ML-EM) is the statistical reconstruction algorithm derived from probabilistic model of the emission and detection processes. Although the ML-EM has many advantages in accuracy and utility, the use of the ML-EM is limited due to the computational burden of iterating processing on a CPU (central processing unit). In this study, we developed a parallel computing technique on GPU (graphic processing unit) for ML-EM algorithm. Using Geforce 9800 GTX+ graphic card and CUDA (compute unified device architecture) the projection and backprojection in ML-EM algorithm were parallelized by NVIDIA's technology. The time delay on computations for projection, errors between measured and estimated data and backprojection in an iteration were measured. Total time included the latency in data transmission between RAM and GPU memory. The total computation time of the CPU- and GPU-based ML-EM with 32 iterations were 3.83 and 0.26 sec, respectively. In this case, the computing speed was improved about 15 times on GPU. When the number of iterations increased into 1024, the CPU- and GPU-based computing took totally 18 min and 8 sec, respectively. The improvement was about 135 times and was caused by delay on CPU-based computing after certain iterations. On the other hand, the GPU-based computation provided very small variation on time delay per iteration due to use of shared memory. The GPU-based parallel computation for ML-EM improved significantly the computing speed and stability. The developed GPU-based ML-EM algorithm could be easily modified for some other imaging geometries

  5. A Study on GPU-based Iterative ML-EM Reconstruction Algorithm for Emission Computed Tomographic Imaging Systems

    International Nuclear Information System (INIS)

    Ha, Woo Seok; Kim, Soo Mee; Park, Min Jae; Lee, Dong Soo; Lee, Jae Sung

    2009-01-01

    The maximum likelihood-expectation maximization (ML-EM) is the statistical reconstruction algorithm derived from probabilistic model of the emission and detection processes. Although the ML-EM has many advantages in accuracy and utility, the use of the ML-EM is limited due to the computational burden of iterating processing on a CPU (central processing unit). In this study, we developed a parallel computing technique on GPU (graphic processing unit) for ML-EM algorithm. Using Geforce 9800 GTX+ graphic card and CUDA (compute unified device architecture) the projection and backprojection in ML-EM algorithm were parallelized by NVIDIA's technology. The time delay on computations for projection, errors between measured and estimated data and backprojection in an iteration were measured. Total time included the latency in data transmission between RAM and GPU memory. The total computation time of the CPU- and GPU-based ML-EM with 32 iterations were 3.83 and 0.26 sec, respectively. In this case, the computing speed was improved about 15 times on GPU. When the number of iterations increased into 1024, the CPU- and GPU-based computing took totally 18 min and 8 sec, respectively. The improvement was about 135 times and was caused by delay on CPU-based computing after certain iterations. On the other hand, the GPU-based computation provided very small variation on time delay per iteration due to use of shared memory. The GPU-based parallel computation for ML-EM improved significantly the computing speed and stability. The developed GPU-based ML-EM algorithm could be easily modified for some other imaging geometries

  6. OS and Runtime Support for Efficiently Managing Cores in Parallel Applications

    OpenAIRE

    Klues, Kevin Alan

    2015-01-01

    Parallel applications can benefit from the ability to explicitly control their thread scheduling policies in user-space. However, modern operating systems lack the interfaces necessary to make this type of “user-level” scheduling efficient. The key component missing is the ability for applications to gain direct access to cores and keep control of those cores even when making I/O operations that traditionally block in the kernel. A number of former systems provided limited support for these c...

  7. Performance Analysis of Fission and Surface Source Iteration Method for Domain Decomposed Monte Carlo Whole-Core Calculation

    International Nuclear Information System (INIS)

    Jo, Yu Gwon; Oh, Yoo Min; Park, Hyang Kyu; Park, Kang Soon; Cho, Nam Zin

    2016-01-01

    In this paper, two issues in the FSS iteration method, i.e., the waiting time for surface source data and the variance biases in local tallies are investigated for the domain decomposed, 3-D continuous-energy whole-core calculation. The fission sources are provided as usual, while the surface sources are provided by banking MC particles crossing local domain boundaries. The surface sources serve as boundary conditions for nonoverlapping local problems, so that each local problem can be solved independently. In this paper, two issues in the FSS iteration are investigated. One is quantifying the waiting time of processors to receive surface source data. By using nonblocking communication, 'time penalty' to wait for the arrival of the surface source data is reduced. The other important issue is underestimation of the sample variance of the tally because of additional inter-iteration correlations in surface sources. From the numerical results on a 3-D whole-core test problem, it is observed that the time penalty is negligible in the FSS iteration method and that the real variances of both pin powers and assembly powers are estimated by the HB method. For those purposes, three cases; Case 1 (1 local domain), Case 2 (4 local domains), Case 3 (16 local domains) are tested. For both Cases 2 and 3, the time penalties for waiting are negligible compared to the source-tracking times. However, for finer divisions of local domains, the loss of parallel efficiency caused by the different number of sources for local domains in symmetric locations becomes larger due to the stochastic errors in source distributions. For all test cases, the HB method very well estimates the real variances of local tallies. However, it is also noted that the real variances of local tallies estimated by the HB method show slightly smaller than the real variances obtained from 30 independent batch runs and the deviations become larger for finer divisions of local domains. The batch size used for the HB

  8. Parallelizing More Loops with Compiler Guided Refactoring

    DEFF Research Database (Denmark)

    Larsen, Per; Ladelsky, Razya; Lidman, Jacob

    2012-01-01

    an interactive compilation feedback system that guides programmers in iteratively modifying their application source code. This helps leverage the compiler’s ability to generate loop-parallel code. We employ our system to modify two sequential benchmarks dealing with image processing and edge detection...

  9. A non overlapping parallel domain decomposition method applied to the simplified transport equations

    International Nuclear Information System (INIS)

    Lathuiliere, B.; Barrault, M.; Ramet, P.; Roman, J.

    2009-01-01

    A reactivity computation requires to compute the highest eigenvalue of a generalized eigenvalue problem. An inverse power algorithm is used commonly. Very fine modelizations are difficult to tackle for our sequential solver, based on the simplified transport equations, in terms of memory consumption and computational time. So, we propose a non-overlapping domain decomposition method for the approximate resolution of the linear system to solve at each inverse power iteration. Our method brings to a low development effort as the inner multigroup solver can be re-use without modification, and allows us to adapt locally the numerical resolution (mesh, finite element order). Numerical results are obtained by a parallel implementation of the method on two different cases with a pin by pin discretization. This results are analyzed in terms of memory consumption and parallel efficiency. (authors)

  10. Efficient graph-based dynamic load-balancing for parallel large-scale agent-based traffic simulation

    NARCIS (Netherlands)

    Xu, Y.; Cai, W.; Aydt, H.; Lees, M.; Tolk, A.; Diallo, S.Y.; Ryzhov, I.O.; Yilmaz, L.; Buckley, S.; Miller, J.A.

    2014-01-01

    One of the issues of parallelizing large-scale agent-based traffic simulations is partitioning and load-balancing. Traffic simulations are dynamic applications where the distribution of workload in the spatial domain constantly changes. Dynamic load-balancing at run-time has shown better efficiency

  11. Efficient parallel and out of core algorithms for constructing large bi-directed de Bruijn graphs

    Directory of Open Access Journals (Sweden)

    Vaughn Matthew

    2010-11-01

    Full Text Available Abstract Background Assembling genomic sequences from a set of overlapping reads is one of the most fundamental problems in computational biology. Algorithms addressing the assembly problem fall into two broad categories - based on the data structures which they employ. The first class uses an overlap/string graph and the second type uses a de Bruijn graph. However with the recent advances in short read sequencing technology, de Bruijn graph based algorithms seem to play a vital role in practice. Efficient algorithms for building these massive de Bruijn graphs are very essential in large sequencing projects based on short reads. In an earlier work, an O(n/p time parallel algorithm has been given for this problem. Here n is the size of the input and p is the number of processors. This algorithm enumerates all possible bi-directed edges which can overlap with a node and ends up generating Θ(nΣ messages (Σ being the size of the alphabet. Results In this paper we present a Θ(n/p time parallel algorithm with a communication complexity that is equal to that of parallel sorting and is not sensitive to Σ. The generality of our algorithm makes it very easy to extend it even to the out-of-core model and in this case it has an optimal I/O complexity of Θ(nlog(n/BBlog(M/B (M being the main memory size and B being the size of the disk block. We demonstrate the scalability of our parallel algorithm on a SGI/Altix computer. A comparison of our algorithm with the previous approaches reveals that our algorithm is faster - both asymptotically and practically. We demonstrate the scalability of our sequential out-of-core algorithm by comparing it with the algorithm used by VELVET to build the bi-directed de Bruijn graph. Our experiments reveal that our algorithm can build the graph with a constant amount of memory, which clearly outperforms VELVET. We also provide efficient algorithms for the bi-directed chain compaction problem. Conclusions The bi

  12. Efficient parallel and out of core algorithms for constructing large bi-directed de Bruijn graphs.

    Science.gov (United States)

    Kundeti, Vamsi K; Rajasekaran, Sanguthevar; Dinh, Hieu; Vaughn, Matthew; Thapar, Vishal

    2010-11-15

    Assembling genomic sequences from a set of overlapping reads is one of the most fundamental problems in computational biology. Algorithms addressing the assembly problem fall into two broad categories - based on the data structures which they employ. The first class uses an overlap/string graph and the second type uses a de Bruijn graph. However with the recent advances in short read sequencing technology, de Bruijn graph based algorithms seem to play a vital role in practice. Efficient algorithms for building these massive de Bruijn graphs are very essential in large sequencing projects based on short reads. In an earlier work, an O(n/p) time parallel algorithm has been given for this problem. Here n is the size of the input and p is the number of processors. This algorithm enumerates all possible bi-directed edges which can overlap with a node and ends up generating Θ(nΣ) messages (Σ being the size of the alphabet). In this paper we present a Θ(n/p) time parallel algorithm with a communication complexity that is equal to that of parallel sorting and is not sensitive to Σ. The generality of our algorithm makes it very easy to extend it even to the out-of-core model and in this case it has an optimal I/O complexity of Θ(nlog(n/B)Blog(M/B)) (M being the main memory size and B being the size of the disk block). We demonstrate the scalability of our parallel algorithm on a SGI/Altix computer. A comparison of our algorithm with the previous approaches reveals that our algorithm is faster--both asymptotically and practically. We demonstrate the scalability of our sequential out-of-core algorithm by comparing it with the algorithm used by VELVET to build the bi-directed de Bruijn graph. Our experiments reveal that our algorithm can build the graph with a constant amount of memory, which clearly outperforms VELVET. We also provide efficient algorithms for the bi-directed chain compaction problem. The bi-directed de Bruijn graph is a fundamental data structure for

  13. Iterative solution of the Helmholtz equation

    Energy Technology Data Exchange (ETDEWEB)

    Larsson, E.; Otto, K. [Uppsala Univ. (Sweden)

    1996-12-31

    We have shown that the numerical solution of the two-dimensional Helmholtz equation can be obtained in a very efficient way by using a preconditioned iterative method. We discretize the equation with second-order accurate finite difference operators and take special care to obtain non-reflecting boundary conditions. We solve the large, sparse system of equations that arises with the preconditioned restarted GMRES iteration. The preconditioner is of {open_quotes}fast Poisson type{close_quotes}, and is derived as a direct solver for a modified PDE problem.The arithmetic complexity for the preconditioner is O(n log{sub 2} n), where n is the number of grid points. As a test problem we use the propagation of sound waves in water in a duct with curved bottom. Numerical experiments show that the preconditioned iterative method is very efficient for this type of problem. The convergence rate does not decrease dramatically when the frequency increases. Compared to banded Gaussian elimination, which is a standard solution method for this type of problems, the iterative method shows significant gain in both storage requirement and arithmetic complexity. Furthermore, the relative gain increases when the frequency increases.

  14. Comparison of multihardware parallel implementations for a phase unwrapping algorithm

    Science.gov (United States)

    Hernandez-Lopez, Francisco Javier; Rivera, Mariano; Salazar-Garibay, Adan; Legarda-Sáenz, Ricardo

    2018-04-01

    Phase unwrapping is an important problem in the areas of optical metrology, synthetic aperture radar (SAR) image analysis, and magnetic resonance imaging (MRI) analysis. These images are becoming larger in size and, particularly, the availability and need for processing of SAR and MRI data have increased significantly with the acquisition of remote sensing data and the popularization of magnetic resonators in clinical diagnosis. Therefore, it is important to develop faster and accurate phase unwrapping algorithms. We propose a parallel multigrid algorithm of a phase unwrapping method named accumulation of residual maps, which builds on a serial algorithm that consists of the minimization of a cost function; minimization achieved by means of a serial Gauss-Seidel kind algorithm. Our algorithm also optimizes the original cost function, but unlike the original work, our algorithm is a parallel Jacobi class with alternated minimizations. This strategy is known as the chessboard type, where red pixels can be updated in parallel at same iteration since they are independent. Similarly, black pixels can be updated in parallel in an alternating iteration. We present parallel implementations of our algorithm for different parallel multicore architecture such as CPU-multicore, Xeon Phi coprocessor, and Nvidia graphics processing unit. In all the cases, we obtain a superior performance of our parallel algorithm when compared with the original serial version. In addition, we present a detailed comparative performance of the developed parallel versions.

  15. Re-starting an Arnoldi iteration

    Energy Technology Data Exchange (ETDEWEB)

    Lehoucq, R.B. [Argonne National Lab., IL (United States)

    1996-12-31

    The Arnoldi iteration is an efficient procedure for approximating a subset of the eigensystem of a large sparse n x n matrix A. The iteration produces a partial orthogonal reduction of A into an upper Hessenberg matrix H{sub m} of order m. The eigenvalues of this small matrix H{sub m} are used to approximate a subset of the eigenvalues of the large matrix A. The eigenvalues of H{sub m} improve as estimates to those of A as m increases. Unfortunately, so does the cost and storage of the reduction. The idea of re-starting the Arnoldi iteration is motivated by the prohibitive cost associated with building a large factorization.

  16. Parallel SN algorithms in shared- and distributed-memory environments

    International Nuclear Information System (INIS)

    Haghighat, Alireza; Hunter, Melissa A.; Mattis, Ronald E.

    1995-01-01

    Different 2-D spatial domain partitioning Sn transport theory algorithms have been developed on the basis of the Block-Jacobi iterative scheme. These algorithms have been incorporated into TWOTRAN-II, and tested on a shared-memory CRAY Y-MP C90 and a distributed-memory IBM SP1. For a series of fixed source r-z geometry homogeneous problems, parallel efficiencies in a range of 50-90% are achieved on the C90 with 6 processors, and lower values (20-60%) are obtained on the SP1. It is demonstrated that better performance is attainable if one addresses issues such as convergence rate, load-balancing, and granularity for both architectures, as well as message passing (network bandwidth and latency) for SP1. (author). 17 refs, 4 figs

  17. A massively parallel discrete ordinates response matrix method for neutron transport

    International Nuclear Information System (INIS)

    Hanebutte, U.R.; Lewis, E.E.

    1992-01-01

    In this paper a discrete ordinates response matrix method is formulated with anisotropic scattering for the solution of neutron transport problems on massively parallel computers. The response matrix formulation eliminates iteration on the scattering source. The nodal matrices that result from the diamond-differenced equations are utilized in a factored form that minimizes memory requirements and significantly reduces the number of arithmetic operations required per node. The red-black solution algorithm utilizes massive parallelism by assigning each spatial node to one or more processors. The algorithm is accelerated by a synthetic method in which the low-order diffusion equations are also solved by massively parallel red-black iterations. The method is implemented on a 16K Connection Machine-2, and S 8 and S 16 solutions are obtained for fixed-source benchmark problems in x-y geometry

  18. CONTRIBUTION OF QUADRATIC RESIDUE DIFFUSERS TO EFFICIENCY OF TILTED PROFILE PARALLEL HIGHWAY NOISE BARRIERS

    Directory of Open Access Journals (Sweden)

    M. R. Monazzam ، P. Nassiri

    2009-10-01

    Full Text Available This paper presents the results of an investigation on the acoustic performance of tilted profile parallel barriers with quadratic residue diffuser (QRD tops and faces. A 2D boundary element method (BEM is used to predict the barrier insertion loss. The results of rigid and with absorptive coverage are also calculated for comparisons. Using QRD on the top surface and faces of all tilted profile parallel barrier models introduced here is found to improve the efficiency of barriers compared with rigid equivalent parallel barrier at the examined receiver positions. Applying a QRD with frequency design of 400 Hz on 5 degrees tilted parallel barrier improves the overall performance of its equivalent rigid barrier by 1.8 dB(A. Increase in the treated surfaces with reactive elements shifts the effective performance toward lower frequencies. It is found that by tilting the barriers from 0 to 10 degrees in parallel set up, the degradation effects in parallel barriers is reduced but the absorption effect of fibrous materials and also diffusivity of the quadratic residue diffuser is reduced significantly. In this case all the designed barriers have better performance with 10 degrees tilting in parallel set up. The most economic traffic noise parallel barrier which produces significantly high performance, is achieved by covering the top surface of the barrier closed to the receiver by just a QRD with frequency design of 400 Hz and tilting angle of 10 degrees. The average A-weighted insertion loss in this barrier is predicted to be 16.3 dB (A.

  19. Experimental test campaign on an ITER divertor mock-up

    Energy Technology Data Exchange (ETDEWEB)

    Dell' Orco, G. E-mail: giovanni.dellorco@brasimone.enea.it; Malavasi, A.; Merola, M.; Polazzi, G.; Simoncini, M.; Zito, D

    2002-11-01

    In 1998, in the frame of the European R and D on ITER high heat flux components, the fabrication of a full scale ITER Divertor Outboard mock-up was launched. It comprised a Cassette Body (CB), designed with some mechanical and hydraulic simplifications with respect to the reference body and its actively cooled Dummy Armour Prototype (DAP). This DAP consists of a Vertical Target (VT), a Wing (WI) and a Dump Target (DT), manufactured by European industries, which are integrated to the Gas Box Liner (GBL) supplied by the Russian Federation ITER Home Team. In 1999, in parallel with the manufacturing activity, the ITER European Home Team decided to assign to ENEA a Task for checking the component integration and performing the thermal-hydraulic and thermal mechanical testing of the DAP and CB. In 1999-2000, ENEA performed the experimental campaign at Brasimone Labs. The present work presents the experimental results of the component integration and the thermal-hydraulic and thermo-mechanical fatigue tests.

  20. Experimental test campaign on an ITER divertor mock-up

    International Nuclear Information System (INIS)

    Dell'Orco, G.; Malavasi, A.; Merola, M.; Polazzi, G.; Simoncini, M.; Zito, D.

    2002-01-01

    In 1998, in the frame of the European R and D on ITER high heat flux components, the fabrication of a full scale ITER Divertor Outboard mock-up was launched. It comprised a Cassette Body (CB), designed with some mechanical and hydraulic simplifications with respect to the reference body and its actively cooled Dummy Armour Prototype (DAP). This DAP consists of a Vertical Target (VT), a Wing (WI) and a Dump Target (DT), manufactured by European industries, which are integrated to the Gas Box Liner (GBL) supplied by the Russian Federation ITER Home Team. In 1999, in parallel with the manufacturing activity, the ITER European Home Team decided to assign to ENEA a Task for checking the component integration and performing the thermal-hydraulic and thermal mechanical testing of the DAP and CB. In 1999-2000, ENEA performed the experimental campaign at Brasimone Labs. The present work presents the experimental results of the component integration and the thermal-hydraulic and thermo-mechanical fatigue tests

  1. Further comments on the geometrical efficiency of a parallel-disk source and detector system

    International Nuclear Information System (INIS)

    Ruby, L.

    1994-01-01

    A derivation is presented for a previously published formula, which determines the geometrical efficiency of a parallel-disk source and detector system. The formula involves an integral over a product of two Bessel functions. An algebraic approximation to the integral is also discussed. (orig.)

  2. The role of crowding in parallel search: Peripheral pooling is not responsible for logarithmic efficiency in parallel search.

    Science.gov (United States)

    Madison, Anna; Lleras, Alejandro; Buetti, Simona

    2018-02-01

    Recent results from our laboratory showed that, in fixed-target parallel search tasks, reaction times increase in a logarithmic fashion with set size, and the slope of this logarithmic function is modulated by lure-target similarity. These results were interpreted as being consistent with a processing architecture where early vision (stage one) processes elements in the display in exhaustive fashion with unlimited capacity and with a limitation in resolution. Here, we evaluate the contribution of crowding to our recent logarithmic search slope findings, considering the possibility that peripheral pooling of features (as observed in crowding) may be responsible for logarithmic efficiency. Factors known to affect the strength of crowding were varied, specifically: item spacing and similarity. The results from three experiments converge on the same pattern of results: reaction times increased logarithmically with set size and were modulated by lure-target similarity even when crowding was minimized within displays through an inter-item spacing manipulation. Furthermore, we found logarithmic search efficiencies were overall improved in displays where crowding was minimized compared to displays where crowding was possible. The findings from these three experiments suggest logarithmic efficiency in efficient search is not the result peripheral pooling of features. That said, the presence of crowding does tend to reduce search efficiency, even in "pop-out" search situations.

  3. Numeric algorithms for parallel processors computer architectures with applications to the few-groups neutron diffusion equations

    International Nuclear Information System (INIS)

    Zee, S.K.

    1987-01-01

    A numeric algorithm and an associated computer code were developed for the rapid solution of the finite-difference method representation of the few-group neutron-diffusion equations on parallel computers. Applications of the numeric algorithm on both SIMD (vector pipeline) and MIMD/SIMD (multi-CUP/vector pipeline) architectures were explored. The algorithm was successfully implemented in the two-group, 3-D neutron diffusion computer code named DIFPAR3D (DIFfusion PARallel 3-Dimension). Numerical-solution techniques used in the code include the Chebyshev polynomial acceleration technique in conjunction with the power method of outer iteration. For inner iterations, a parallel form of red-black (cyclic) line SOR with automated determination of group dependent relaxation factors and iteration numbers required to achieve specified inner iteration error tolerance is incorporated. The code employs a macroscopic depletion model with trace capability for selected fission products' transients and critical boron. In addition to this, moderator and fuel temperature feedback models are also incorporated into the DIFPAR3D code, for realistic simulation of power reactor cores. The physics models used were proven acceptable in separate benchmarking studies

  4. ITER...ation

    International Nuclear Information System (INIS)

    Troyon, F.

    1997-01-01

    Recurrent attacks against ITER, the new generation of tokamak are a mix of political and scientific arguments. This short article draws a historical review of the European fusion program. This program has allowed to build and manage several installations in the aim of getting experimental results necessary to lead the program forwards. ITER will bring together a fusion reactor core with technologies such as materials, superconductive coils, heating devices and instrumentation in order to validate and delimit the operating range. ITER will be a logical and decisive step towards the use of controlled fusion. (A.C.)

  5. RF modeling of the ITER-relevant lower hybrid antenna

    International Nuclear Information System (INIS)

    Hillairet, J.; Ceccuzzi, S.; Belo, J.; Marfisi, L.; Artaud, J.F.; Bae, Y.S.; Berger-By, G.; Bernard, J.M.; Cara, Ph.; Cardinali, A.; Castaldo, C.; Cesario, R.; Decker, J.; Delpech, L.; Ekedahl, A.; Garcia, J.; Garibaldi, P.; Goniche, M.; Guilhem, D.; Hoang, G.T.

    2011-01-01

    In the frame of the EFDA task HCD-08-03-01, a 5 GHz Lower Hybrid system which should be able to deliver 20 MW CW on ITER and sustain the expected high heat fluxes has been reviewed. The design and overall dimensions of the key RF elements of the launcher and its subsystem has been updated from the 2001 design in collaboration with ITER organization. Modeling of the LH wave propagation and absorption into the plasma shows that the optimal parallel index must be chosen between 1.9 and 2.0 for the ITER steady-state scenario. The present study has been made with n || = 2.0 but can be adapted for n || = 1.9. Individual components have been studied separately giving confidence on the global RF design of the whole antenna.

  6. Efficient numerical methods for fluid- and electrodynamics on massively parallel systems

    Energy Technology Data Exchange (ETDEWEB)

    Zudrop, Jens

    2016-07-01

    In the last decade, computer technology has evolved rapidly. Modern high performance computing systems offer a tremendous amount of computing power in the range of a few peta floating point operations per second. In contrast, numerical software development is much slower and most existing simulation codes cannot exploit the full computing power of these systems. Partially, this is due to the numerical methods themselves and partially it is related to bottlenecks within the parallelization concept and its data structures. The goal of the thesis is the development of numerical algorithms and corresponding data structures to remedy both kinds of parallelization bottlenecks. The approach is based on a co-design of the numerical schemes (including numerical analysis) and their realizations in algorithms and software. Various kinds of applications, from multicomponent flows (Lattice Boltzmann Method) to electrodynamics (Discontinuous Galerkin Method) to embedded geometries (Octree), are considered and efficiency of the developed approaches is demonstrated for large scale simulations.

  7. Linear multifrequency-grey acceleration recast for preconditioned Krylov iterations

    International Nuclear Information System (INIS)

    Morel, Jim E.; Brian Yang, T.-Y.; Warsa, James S.

    2007-01-01

    The linear multifrequency-grey acceleration (LMFGA) technique is used to accelerate the iterative convergence of multigroup thermal radiation diffusion calculations in high energy density simulations. Although it is effective and efficient in one-dimensional calculations, the LMFGA method has recently been observed to significantly degrade under certain conditions in multidimensional calculations with large discontinuities in material properties. To address this deficiency, we recast the LMFGA method in terms of a preconditioned system that is solved with a Krylov method (LMFGK). Results are presented demonstrating that the new LMFGK method always requires fewer iterations than the original LMFGA method. The reduction in iteration count increases with both the size of the time step and the inhomogeneity of the problem. However, for reasons later explained, the LMFGK method can cost more per iteration than the LMFGA method, resulting in lower but comparable efficiency in problems with small time steps and weak inhomogeneities. In problems with large time steps and strong inhomogeneities, the LMFGK method is significantly more efficient than the LMFGA method

  8. News from ITER controls - a status report

    International Nuclear Information System (INIS)

    Wallander, A.; Abadie, L.; Di Maio, F.; Evrard, B.; Fourneron, J.M.; Gulati, H.; Hansalia, C.; Journeaux, J.Y.; Kim, C.; Klotz, W.D.; Mahajan, K.; Makijarvi, P; Matsumoto, Y.; Pande, S.; Simrock, S.; Stepanov, D.; Utzel, N.; Vergara, A.; Winter, A.; Yonekawa, I.

    2012-01-01

    Construction of ITER has started at the Cadarache site in southern France. The first buildings are taking shape and more than 60 % of the in-kind procurement has been committed by the seven ITER member states (China, Europe, India, Japan, Korea, Russia and United States). The design and manufacturing of the main components of the machine is now underway all over the world. Each of these components comes with a local control system, which must be integrated in the central control system. The control group at ITER has developed two products to facilitate it; the plant control design handbook (PCDH) and the control, data access and communication (CODAC) core system. PCDH is a document which prescribes the technologies and methods to be used in developing local control systems and sets the rules applicable to the in-kind procurements. CODAC core system is a software package, distributed to all in-kind procurement developers, which implements the PCDH and facilitates the compliance of the local control system. In parallel, the ITER control group is proceeding with the design of the central control system to allow fully integrated and automated operation of ITER. In this paper we report on the progress of the design and technology choices and we discuss justifications of those choices. We also report on the results of some pilot projects aimed at validating the design and technologies. (authors)

  9. Contribution of diffuser surfaces to efficiency of tilted T shape parallel highway noise barriers

    Directory of Open Access Journals (Sweden)

    N. Javid Rouzi

    2009-04-01

    Full Text Available Background and aimsThe paper presents the results of an investigation on the acoustic  performance of tilted profile parallel barriers with quadratic residue diffuser tops and faces.MethodsA2D boundary element method (BEM is used to predict the barrier insertion loss. The results of rigid and with absorptive coverage are also calculated for comparisons. Using QRD on the top surface and faces of all tilted profile parallel barrier models introduced here is found to  improve the efficiency of barriers compared with rigid equivalent parallel barrier at the examined  receiver positions.Results Applying a QRD with frequency design of 400 Hz on 5 degrees tilted parallel barrier  improves the overall performance of its equivalent rigid barrier by 1.8 dB(A. Increase the treated surfaces with reactive elements shifts the effective performance toward lower frequencies. It is  found that by tilting the barriers from 0 to 10 degrees in parallel set up, the degradation effects in  parallel barriers is reduced but the absorption effect of fibrous materials and also diffusivity of thequadratic residue diffuser is reduced significantly. In this case all the designed barriers have better  performance with 10 degrees tilting in parallel set up.ConclusionThe most economic traffic noise parallel barrier, which produces significantly  high performance, is achieved by covering the top surface of the barrier closed to the receiver by  just a QRD with frequency design of 400 Hz and tilting angle of 10 degrees. The average Aweighted  insertion loss in this barrier is predicted to be 16.3 dB (A.

  10. ITER EDA newsletter. V. 10, no. 1

    International Nuclear Information System (INIS)

    2001-01-01

    This article provides a summary of results of the ITER Physics Committee Meeting, which was held on 14 October 2000 at the ITER Garching Joint Work Site, Germany. The ITER Physics Committee is the body responsible for overseeing, through the seven specialized Expert Groups, the R and D activities contributed voluntarily by the ITER Parties. The Parties' Physics Designated Persons, the Chairs and Co-Chairs of ITER Physics Expert Groups and the JCT members involved attended the Meeting. As usual, the meeting was chaired by the ITER Director, Dr. R. Aymar, who reported on the status of the ITER EDA. Dr. Aymar described the steps being taken in preparing the ITER-FEAT Final Design Report (FDR), and further stated that the Report would be available in time to be of benefit to the Negotiations on the ITER Joint Implementation, expected to start around May 2001. All Parties recognize that the ITER Physics Expert Group structure has been useful in focusing the tokamak physics activity on the ITER-relevant issues and provides an efficient worldwide collaboration on confirming innovative solutions. The concept of an international workshop to be organized as a pre-meeting of each Expert Group meeting, in order to involve U.S. scientists in the discussion of generic tokamak physics issues, was introduced in 2000, with some success, and its goal should be pursued

  11. Design iteration in construction projects – Review and directions

    Directory of Open Access Journals (Sweden)

    Purva Mujumdar

    2018-03-01

    Full Text Available Design phase of any construction project involves several designers who exchange information with each other most often in an unstructured manner throughout the design phase. When these information exchanges happen to occur in cycles/loops, it is termed as design iteration. Iteration is an inherent and unavoidable aspect of any design phase which requires proper planning. Till date, very few researchers have explored the design iteration (“complexity” in construction sector. Hence, the objective of this paper was to document and review the complexities of iteration during design phase of construction projects for efficient design planning. To achieve this objective, exhaustive literature review on design iteration was done for four sectors – construction, manufacturing, aerospace, and software development. In addition, semi-structured interviews and discussions were done with a few design experts to verify the different dimensions of iteration. Finally, a design iteration framework was presented in this study that facilitates successful planning. Keywords: Design iteration, Types of iteration, Causes and impact of iteration, Models of iteration, Execution strategies of iteration

  12. Estimation of POL-iteration methods in fast running DNBR code

    Energy Technology Data Exchange (ETDEWEB)

    Kwon, Hyuk; Kim, S. J.; Seo, K. W.; Hwang, D. H. [KAERI, Daejeon (Korea, Republic of)

    2016-05-15

    In this study, various root finding methods are applied to the POL-iteration module in SCOMS and POLiteration efficiency is compared with reference method. On the base of these results, optimum algorithm of POL iteration is selected. The POL requires the iteration until present local power reach limit power. The process to search the limiting power is equivalent with a root finding of nonlinear equation. POL iteration process involved in online monitoring system used a variant bisection method that is the most robust algorithm to find the root of nonlinear equation. The method including the interval accelerating factor and escaping routine out of ill-posed condition assured the robustness of SCOMS system. POL iteration module in SCOMS shall satisfy the requirement which is a minimum calculation time. For this requirement of calculation time, non-iterative algorithm, few channel model, simple steam table are implemented into SCOMS to improve the calculation time. MDNBR evaluation at a given operating condition requires the DNBR calculation at all axial locations. An increasing of POL-iteration number increased a calculation load of SCOMS significantly. Therefore, calculation efficiency of SCOMS is strongly dependent on the POL iteration number. In case study, the iterations of the methods have a superlinear convergence for finding limiting power but Brent method shows a quardratic convergence speed. These methods are effective and better than the reference bisection algorithm.

  13. IHadoop: Asynchronous iterations for MapReduce

    KAUST Repository

    Elnikety, Eslam Mohamed Ibrahim

    2011-11-01

    MapReduce is a distributed programming frame-work designed to ease the development of scalable data-intensive applications for large clusters of commodity machines. Most machine learning and data mining applications involve iterative computations over large datasets, such as the Web hyperlink structures and social network graphs. Yet, the MapReduce model does not efficiently support this important class of applications. The architecture of MapReduce, most critically its dataflow techniques and task scheduling, is completely unaware of the nature of iterative applications; tasks are scheduled according to a policy that optimizes the execution for a single iteration which wastes bandwidth, I/O, and CPU cycles when compared with an optimal execution for a consecutive set of iterations. This work presents iHadoop, a modified MapReduce model, and an associated implementation, optimized for iterative computations. The iHadoop model schedules iterations asynchronously. It connects the output of one iteration to the next, allowing both to process their data concurrently. iHadoop\\'s task scheduler exploits inter-iteration data locality by scheduling tasks that exhibit a producer/consumer relation on the same physical machine allowing a fast local data transfer. For those iterative applications that require satisfying certain criteria before termination, iHadoop runs the check concurrently during the execution of the subsequent iteration to further reduce the application\\'s latency. This paper also describes our implementation of the iHadoop model, and evaluates its performance against Hadoop, the widely used open source implementation of MapReduce. Experiments using different data analysis applications over real-world and synthetic datasets show that iHadoop performs better than Hadoop for iterative algorithms, reducing execution time of iterative applications by 25% on average. Furthermore, integrating iHadoop with HaLoop, a variant Hadoop implementation that caches

  14. IHadoop: Asynchronous iterations for MapReduce

    KAUST Repository

    Elnikety, Eslam Mohamed Ibrahim; El Sayed, Tamer S.; Ramadan, Hany E.

    2011-01-01

    MapReduce is a distributed programming frame-work designed to ease the development of scalable data-intensive applications for large clusters of commodity machines. Most machine learning and data mining applications involve iterative computations over large datasets, such as the Web hyperlink structures and social network graphs. Yet, the MapReduce model does not efficiently support this important class of applications. The architecture of MapReduce, most critically its dataflow techniques and task scheduling, is completely unaware of the nature of iterative applications; tasks are scheduled according to a policy that optimizes the execution for a single iteration which wastes bandwidth, I/O, and CPU cycles when compared with an optimal execution for a consecutive set of iterations. This work presents iHadoop, a modified MapReduce model, and an associated implementation, optimized for iterative computations. The iHadoop model schedules iterations asynchronously. It connects the output of one iteration to the next, allowing both to process their data concurrently. iHadoop's task scheduler exploits inter-iteration data locality by scheduling tasks that exhibit a producer/consumer relation on the same physical machine allowing a fast local data transfer. For those iterative applications that require satisfying certain criteria before termination, iHadoop runs the check concurrently during the execution of the subsequent iteration to further reduce the application's latency. This paper also describes our implementation of the iHadoop model, and evaluates its performance against Hadoop, the widely used open source implementation of MapReduce. Experiments using different data analysis applications over real-world and synthetic datasets show that iHadoop performs better than Hadoop for iterative algorithms, reducing execution time of iterative applications by 25% on average. Furthermore, integrating iHadoop with HaLoop, a variant Hadoop implementation that caches

  15. Accelerating nuclear configuration interaction calculations through a preconditioned block iterative eigensolver

    Science.gov (United States)

    Shao, Meiyue; Aktulga, H. Metin; Yang, Chao; Ng, Esmond G.; Maris, Pieter; Vary, James P.

    2018-01-01

    We describe a number of recently developed techniques for improving the performance of large-scale nuclear configuration interaction calculations on high performance parallel computers. We show the benefit of using a preconditioned block iterative method to replace the Lanczos algorithm that has traditionally been used to perform this type of computation. The rapid convergence of the block iterative method is achieved by a proper choice of starting guesses of the eigenvectors and the construction of an effective preconditioner. These acceleration techniques take advantage of special structure of the nuclear configuration interaction problem which we discuss in detail. The use of a block method also allows us to improve the concurrency of the computation, and take advantage of the memory hierarchy of modern microprocessors to increase the arithmetic intensity of the computation relative to data movement. We also discuss the implementation details that are critical to achieving high performance on massively parallel multi-core supercomputers, and demonstrate that the new block iterative solver is two to three times faster than the Lanczos based algorithm for problems of moderate sizes on a Cray XC30 system.

  16. Plane-wave electronic structure calculations on a parallel supercomputer

    International Nuclear Information System (INIS)

    Nelson, J.S.; Plimpton, S.J.; Sears, M.P.

    1993-01-01

    The development of iterative solutions of Schrodinger's equation in a plane-wave (pw) basis over the last several years has coincided with great advances in the computational power available for performing the calculations. These dual developments have enabled many new and interesting condensed matter phenomena to be studied from a first-principles approach. The authors present a detailed description of the implementation on a parallel supercomputer (hypercube) of the first-order equation-of-motion solution to Schrodinger's equation, using plane-wave basis functions and ab initio separable pseudopotentials. By distributing the plane-waves across the processors of the hypercube many of the computations can be performed in parallel, resulting in decreases in the overall computation time relative to conventional vector supercomputers. This partitioning also provides ample memory for large Fast Fourier Transform (FFT) meshes and the storage of plane-wave coefficients for many hundreds of energy bands. The usefulness of the parallel techniques is demonstrated by benchmark timings for both the FFT's and iterations of the self-consistent solution of Schrodinger's equation for different sized Si unit cells of up to 512 atoms

  17. Biomedical applications on the GRID efficient management of parallel jobs

    CERN Document Server

    Moscicki, Jakub T; Lee Hurng Chun; Lin, S C; Pia, Maria Grazia

    2004-01-01

    Distributed computing based on the Master-Worker and PULL interaction model is applicable to a number of applications in high energy physics, medical physics and bio-informatics. We demonstrate a realistic medical physics use-case of a dosimetric system for brachytherapy using distributed Grid resources. We present the efficient techniques for running parallel jobs in a case of the BLAST, a gene sequencing application, as well as for the Monte Carlo simulation based on Geant4. We present a strategy for improving the runtime performance and robustness of the jobs as well as for the minimization of the development time needed to migrate the applications to a distributed environment.

  18. Parallel numerical modeling of hybrid-dimensional compositional non-isothermal Darcy flows in fractured porous media

    Science.gov (United States)

    Xing, F.; Masson, R.; Lopez, S.

    2017-09-01

    This paper introduces a new discrete fracture model accounting for non-isothermal compositional multiphase Darcy flows and complex networks of fractures with intersecting, immersed and non-immersed fractures. The so called hybrid-dimensional model using a 2D model in the fractures coupled with a 3D model in the matrix is first derived rigorously starting from the equi-dimensional matrix fracture model. Then, it is discretized using a fully implicit time integration combined with the Vertex Approximate Gradient (VAG) finite volume scheme which is adapted to polyhedral meshes and anisotropic heterogeneous media. The fully coupled systems are assembled and solved in parallel using the Single Program Multiple Data (SPMD) paradigm with one layer of ghost cells. This strategy allows for a local assembly of the discrete systems. An efficient preconditioner is implemented to solve the linear systems at each time step and each Newton type iteration of the simulation. The numerical efficiency of our approach is assessed on different meshes, fracture networks, and physical settings in terms of parallel scalability, nonlinear convergence and linear convergence.

  19. The simplified spherical harmonics (SPL) methodology with space and moment decomposition in parallel environments

    International Nuclear Information System (INIS)

    Gianluca, Longoni; Alireza, Haghighat

    2003-01-01

    In recent years, the SP L (simplified spherical harmonics) equations have received renewed interest for the simulation of nuclear systems. We have derived the SP L equations starting from the even-parity form of the S N equations. The SP L equations form a system of (L+1)/2 second order partial differential equations that can be solved with standard iterative techniques such as the Conjugate Gradient (CG). We discretized the SP L equations with the finite-volume approach in a 3-D Cartesian space. We developed a new 3-D general code, Pensp L (Parallel Environment Neutral-particle SP L ). Pensp L solves both fixed source and criticality eigenvalue problems. In order to optimize the memory management, we implemented a Compressed Diagonal Storage (CDS) to store the SP L matrices. Pensp L includes parallel algorithms for space and moment domain decomposition. The computational load is distributed on different processors, using a mapping function, which maps the 3-D Cartesian space and moments onto processors. The code is written in Fortran 90 using the Message Passing Interface (MPI) libraries for the parallel implementation of the algorithm. The code has been tested on the Pcpen cluster and the parallel performance has been assessed in terms of speed-up and parallel efficiency. (author)

  20. SPARSE ELECTROMAGNETIC IMAGING USING NONLINEAR LANDWEBER ITERATIONS

    KAUST Repository

    Desmal, Abdulla

    2015-07-29

    A scheme for efficiently solving the nonlinear electromagnetic inverse scattering problem on sparse investigation domains is described. The proposed scheme reconstructs the (complex) dielectric permittivity of an investigation domain from fields measured away from the domain itself. Least-squares data misfit between the computed scattered fields, which are expressed as a nonlinear function of the permittivity, and the measured fields is constrained by the L0/L1-norm of the solution. The resulting minimization problem is solved using nonlinear Landweber iterations, where at each iteration a thresholding function is applied to enforce the sparseness-promoting L0/L1-norm constraint. The thresholded nonlinear Landweber iterations are applied to several two-dimensional problems, where the ``measured\\'\\' fields are synthetically generated or obtained from actual experiments. These numerical experiments demonstrate the accuracy, efficiency, and applicability of the proposed scheme in reconstructing sparse profiles with high permittivity values.

  1. Advances in iterative methods for nonlinear equations

    CERN Document Server

    Busquier, Sonia

    2016-01-01

    This book focuses on the approximation of nonlinear equations using iterative methods. Nine contributions are presented on the construction and analysis of these methods, the coverage encompassing convergence, efficiency, robustness, dynamics, and applications. Many problems are stated in the form of nonlinear equations, using mathematical modeling. In particular, a wide range of problems in Applied Mathematics and in Engineering can be solved by finding the solutions to these equations. The book reveals the importance of studying convergence aspects in iterative methods and shows that selection of the most efficient and robust iterative method for a given problem is crucial to guaranteeing a good approximation. A number of sample criteria for selecting the optimal method are presented, including those regarding the order of convergence, the computational cost, and the stability, including the dynamics. This book will appeal to researchers whose field of interest is related to nonlinear problems and equations...

  2. Parallel External Memory Graph Algorithms

    DEFF Research Database (Denmark)

    Arge, Lars Allan; Goodrich, Michael T.; Sitchinava, Nodari

    2010-01-01

    In this paper, we study parallel I/O efficient graph algorithms in the Parallel External Memory (PEM) model, one o f the private-cache chip multiprocessor (CMP) models. We study the fundamental problem of list ranking which leads to efficient solutions to problems on trees, such as computing lowest...... an optimal speedup of ¿(P) in parallel I/O complexity and parallel computation time, compared to the single-processor external memory counterparts....

  3. On the Convergence of Iterative Receiver Algorithms Utilizing Hard Decisions

    Directory of Open Access Journals (Sweden)

    Jürgen F. Rößler

    2009-01-01

    Full Text Available The convergence of receivers performing iterative hard decision interference cancellation (IHDIC is analyzed in a general framework for ASK, PSK, and QAM constellations. We first give an overview of IHDIC algorithms known from the literature applied to linear modulation and DS-CDMA-based transmission systems and show the relation to Hopfield neural network theory. It is proven analytically that IHDIC with serial update scheme always converges to a stable state in the estimated values in course of iterations and that IHDIC with parallel update scheme converges to cycles of length 2. Additionally, we visualize the convergence behavior with the aid of convergence charts. Doing so, we give insight into possible errors occurring in IHDIC which turn out to be caused by locked error situations. The derived results can directly be applied to those iterative soft decision interference cancellation (ISDIC receivers whose soft decision functions approach hard decision functions in course of the iterations.

  4. Chatter suppression methods of a robot machine for ITER vacuum vessel assembly and maintenance

    International Nuclear Information System (INIS)

    Wu, Huapeng; Wang, Yongbo; Li, Ming; Al-Saedi, Mazin; Handroos, Heikki

    2014-01-01

    Highlights: •A redundant 10-DOF serial-parallel hybrid robot for ITER assembly and maintains is presented. •A dynamic model of the robot is developed. •A feedback and feedforward controller is presented to suppress machining vibration of the robot. -- Abstract: In the process of assembly and maintenance of ITER vacuum vessel (ITER VV), various machining tasks including threading, milling, welding-defects cutting and flexible hose boring are required to be performed from inside of ITER VV by on-site machining tools. Robot machine is a promising option for these tasks, but great chatter (machine vibration) would happen in the machining process. The chatter vibration will deteriorate the robot accuracy and surface quality, and even cause some damages on the end-effector tools and the robot structure itself. This paper introduces two vibration control methods, one is passive and another is active vibration control. For the passive vibration control, a parallel mechanism is presented to increase the stiffness of robot machine; for the active vibration control, a hybrid control method combining feedforward controller and nonlinear feedback controller is introduced for chatter suppression. A dynamic model and its chatter vibration phenomena of a hybrid robot is demonstrated. Simulation results are given based on the proposed hybrid robot machine which is developed for the ITER VV assembly and maintenance

  5. Chatter suppression methods of a robot machine for ITER vacuum vessel assembly and maintenance

    Energy Technology Data Exchange (ETDEWEB)

    Wu, Huapeng; Wang, Yongbo, E-mail: yongbo.wang@lut.fi; Li, Ming; Al-Saedi, Mazin; Handroos, Heikki

    2014-10-15

    Highlights: •A redundant 10-DOF serial-parallel hybrid robot for ITER assembly and maintains is presented. •A dynamic model of the robot is developed. •A feedback and feedforward controller is presented to suppress machining vibration of the robot. -- Abstract: In the process of assembly and maintenance of ITER vacuum vessel (ITER VV), various machining tasks including threading, milling, welding-defects cutting and flexible hose boring are required to be performed from inside of ITER VV by on-site machining tools. Robot machine is a promising option for these tasks, but great chatter (machine vibration) would happen in the machining process. The chatter vibration will deteriorate the robot accuracy and surface quality, and even cause some damages on the end-effector tools and the robot structure itself. This paper introduces two vibration control methods, one is passive and another is active vibration control. For the passive vibration control, a parallel mechanism is presented to increase the stiffness of robot machine; for the active vibration control, a hybrid control method combining feedforward controller and nonlinear feedback controller is introduced for chatter suppression. A dynamic model and its chatter vibration phenomena of a hybrid robot is demonstrated. Simulation results are given based on the proposed hybrid robot machine which is developed for the ITER VV assembly and maintenance.

  6. Cluster Optimization and Parallelization of Simulations with Dynamically Adaptive Grids

    KAUST Repository

    Schreiber, Martin; Weinzierl, Tobias; Bungartz, Hans-Joachim

    2013-01-01

    The present paper studies solvers for partial differential equations that work on dynamically adaptive grids stemming from spacetrees. Due to the underlying tree formalism, such grids efficiently can be decomposed into connected grid regions (clusters) on-the-fly. A graph on those clusters classified according to their grid invariancy, workload, multi-core affinity, and further meta data represents the inter-cluster communication. While stationary clusters already can be handled more efficiently than their dynamic counterparts, we propose to treat them as atomic grid entities and introduce a skip mechanism that allows the grid traversal to omit those regions completely. The communication graph ensures that the cluster data nevertheless are kept consistent, and several shared memory parallelization strategies are feasible. A hyperbolic benchmark that has to remesh selected mesh regions iteratively to preserve conforming tessellations acts as benchmark for the present work. We discuss runtime improvements resulting from the skip mechanism and the implications on shared memory performance and load balancing. © 2013 Springer-Verlag.

  7. Design and fabrication of the 'ITER-like' SINGAP D- acceleration system

    International Nuclear Information System (INIS)

    Massmann, P.; Esch, H.P.L. de; Hemsworth, R.S.; Svensson, L.

    2005-01-01

    To demonstrate ITER NBI (1 MV, 40 A) relevant beam optics in the Cadarache 1 MV, 100 mA test bed, a new D - beam source system has been put into operation. The system retains a maximum of the ITER SINGAP key parameters, e.g. the perveance matched D - current density at 1 MeV is 20 mA/cm 2 . The accelerator parameters are identical to the ITER SINGAP design, aiming at a near parallel 1 MeV beam of 5 mrad divergence. The design is aimed at also demonstrating SINGAP 'on to off-axis' beam steering by a simple transverse displacement of the post-acceleration electrode. First beams up to 850 keV have been obtained after only 4 weeks of commissioning

  8. An efficient parallel algorithm for matrix-vector multiplication

    Energy Technology Data Exchange (ETDEWEB)

    Hendrickson, B.; Leland, R.; Plimpton, S.

    1993-03-01

    The multiplication of a vector by a matrix is the kernel computation of many algorithms in scientific computation. A fast parallel algorithm for this calculation is therefore necessary if one is to make full use of the new generation of parallel supercomputers. This paper presents a high performance, parallel matrix-vector multiplication algorithm that is particularly well suited to hypercube multiprocessors. For an n x n matrix on p processors, the communication cost of this algorithm is O(n/[radical]p + log(p)), independent of the matrix sparsity pattern. The performance of the algorithm is demonstrated by employing it as the kernel in the well-known NAS conjugate gradient benchmark, where a run time of 6.09 seconds was observed. This is the best published performance on this benchmark achieved to date using a massively parallel supercomputer.

  9. Development and application of efficient strategies for parallel magnetic resonance imaging

    Energy Technology Data Exchange (ETDEWEB)

    Breuer, F.

    2006-07-01

    artifacts. Unfortunately, parallel imaging is associated with a loss in signal-to-noise ratio (SNR) and therefore is limited to applications which do not already operate at the SNR limit. An additional limitation is the fact that the coil array must provide sufficient sensitivity variations throughout the object under investigation in order to offer enough spatial encoding capacity. This doctoral thesis exhibits an overview of my research on the topic of efficient parallel imaging strategies. Based on existing parallel acquisition and reconstruction strategies, such as SENSE and GRAPPA, new concepts have been developed and transferred to potential clinical applications. (orig.)

  10. Development and application of efficient strategies for parallel magnetic resonance imaging

    International Nuclear Information System (INIS)

    Breuer, F.

    2006-01-01

    . Unfortunately, parallel imaging is associated with a loss in signal-to-noise ratio (SNR) and therefore is limited to applications which do not already operate at the SNR limit. An additional limitation is the fact that the coil array must provide sufficient sensitivity variations throughout the object under investigation in order to offer enough spatial encoding capacity. This doctoral thesis exhibits an overview of my research on the topic of efficient parallel imaging strategies. Based on existing parallel acquisition and reconstruction strategies, such as SENSE and GRAPPA, new concepts have been developed and transferred to potential clinical applications. (orig.)

  11. Functional efficiency comparison between split- and parallel-hybrid using advanced energy flow analysis methods

    Energy Technology Data Exchange (ETDEWEB)

    Guttenberg, Philipp; Lin, Mengyan [Romax Technology, Nottingham (United Kingdom)

    2009-07-01

    The following paper presents a comparative efficiency analysis of the Toyota Prius versus the Honda Insight using advanced Energy Flow Analysis methods. The sample study shows that even very different hybrid concepts like a split- and a parallel-hybrid can be compared in a high level of detail and demonstrates the benefit showing exemplary results. (orig.)

  12. Adapting high-level language programs for parallel processing using data flow

    Science.gov (United States)

    Standley, Hilda M.

    1988-01-01

    EASY-FLOW, a very high-level data flow language, is introduced for the purpose of adapting programs written in a conventional high-level language to a parallel environment. The level of parallelism provided is of the large-grained variety in which parallel activities take place between subprograms or processes. A program written in EASY-FLOW is a set of subprogram calls as units, structured by iteration, branching, and distribution constructs. A data flow graph may be deduced from an EASY-FLOW program.

  13. Implementing O(N N-Body Algorithms Efficiently in Data-Parallel Languages

    Directory of Open Access Journals (Sweden)

    Yu Hu

    1996-01-01

    Full Text Available The optimization techniques for hierarchical O(N N-body algorithms described here focus on managing the data distribution and the data references, both between the memories of different nodes and within the memory hierarchy of each node. We show how the techniques can be expressed in data-parallel languages, such as High Performance Fortran (HPF and Connection Machine Fortran (CMF. The effectiveness of our techniques is demonstrated on an implementation of Anderson's hierarchical O(N N-body method for the Connection Machine system CM-5/5E. Of the total execution time, communication accounts for about 10–20% of the total time, with the average efficiency for arithmetic operations being about 40% and the total efficiency (including communication being about 35%. For the CM-5E, a performance in excess of 60 Mflop/s per node (peak 160 Mflop/s per node has been measured.

  14. Measurement and control system for the ITER remote handling mock-up test

    International Nuclear Information System (INIS)

    Oka, K.; Kakudate, S.; Takiguchi, Y.; Ako, K.; Taguchi, K.; Tada, E.; Ozaki, F.; Shibanuma, K.

    1998-01-01

    The mock-up test platforms composed of full-scale remote handling (RH) equipment were developed for demonstrating remote replacement of the ITER blanket and divertor. In parallel, the measurement and control system for operating these RH equipment were constructed on the basis of open architecture with object oriented feature, aiming at realization of fully-remoted automatic operation required for ITER. This paper describes the design concept of the measurement and control system for the remote handling equipment of ITER, and outlines the measured performances of the fabricated measurement system for the remote handling mock-up tests, which includes Data Acquisition System (DAS), Visual Monitoring System (VMS) and Virtual Reality System (VRS). (authors)

  15. Various Newton-type iterative methods for solving nonlinear equations

    Directory of Open Access Journals (Sweden)

    Manoj Kumar

    2013-10-01

    Full Text Available The aim of the present paper is to introduce and investigate new ninth and seventh order convergent Newton-type iterative methods for solving nonlinear equations. The ninth order convergent Newton-type iterative method is made derivative free to obtain seventh-order convergent Newton-type iterative method. These new with and without derivative methods have efficiency indices 1.5518 and 1.6266, respectively. The error equations are used to establish the order of convergence of these proposed iterative methods. Finally, various numerical comparisons are implemented by MATLAB to demonstrate the performance of the developed methods.

  16. Fabrication progress of the ITER vacuum vessel sector in Korea

    Energy Technology Data Exchange (ETDEWEB)

    Kim, B.C., E-mail: bckim@nfri.re.kr [National Fusion Research Institute, Gwahangno 113, Yuseong-gu, Daejeon (Korea, Republic of); Lee, Y.J.; Hong, K.H.; Sa, J.W.; Kim, H.S.; Park, C.K.; Ahn, H.J.; Bak, J.S.; Jung, K.J. [National Fusion Research Institute, Gwahangno 113, Yuseong-gu, Daejeon (Korea, Republic of); Park, K.H.; Roh, B.R.; Kim, T.S.; Lee, J.S.; Jung, Y.H.; Sung, H.J.; Choi, S.Y.; Kim, H.G.; Kwon, I.K.; Kwon, T.H. [Hyundai Heavy Industries Co. Ltd., Dong-gu, Ulsan (Korea, Republic of)

    2013-10-15

    Highlights: ► Fabrication of ITER vacuum vessel sector full scale mock-up to develop fabrication procedures. ► The welding and nondestructive examination techniques conform to RCC-MR. ► The preparation of real manufacturing of ITER vacuum vessel sector. -- Abstract: As a participant of ITER project, ITER Korea has to supply two ITER vacuum vessel sectors (Sector no. 6, no. 1) of total nine ITER VV sectors. After the procurement arrangement with ITER Organization, ITER Korea made the contract with Hyundai Heavy Industries (HHI) for fabrication of two sectors. Then the start of the manufacturing design was initiated from January 2010. HHI made three real scale R and D mock-ups to verify the critical fabrication feasibility issues on electron beam welding, 3D forming, welding distortion and achievable tolerances. The documentation according to IO and the French nuclear safety regulation requirement, the qualification of welding and nondestructive examination procedures conform to RCC-MR 2007 were proceed in parallel. The mass production of raw material was done after receiving ANB (agreed notified body) verification of product/parts and shop qualification. The manufacturing drawing, manufacturing and inspection plan of VV sector with supporting fabrication procedures are also verified by ANB, accordingly the first cutting and forming of plates for VV sector fabrication started from February 2012. This paper reports the latest fabrication progress of ITER vacuum vessel Sector no. 6 that will be assembled as the first sector in the ITER pit. The overall fabrication route, R and D mock-up fabrication results with forming and welding distortion analysis, qualification status of welding and nondestructive examination (NDE) are also presented.

  17. Fast parallel algorithm for CT image reconstruction.

    Science.gov (United States)

    Flores, Liubov A; Vidal, Vicent; Mayo, Patricia; Rodenas, Francisco; Verdú, Gumersindo

    2012-01-01

    In X-ray computed tomography (CT) the X rays are used to obtain the projection data needed to generate an image of the inside of an object. The image can be generated with different techniques. Iterative methods are more suitable for the reconstruction of images with high contrast and precision in noisy conditions and from a small number of projections. Their use may be important in portable scanners for their functionality in emergency situations. However, in practice, these methods are not widely used due to the high computational cost of their implementation. In this work we analyze iterative parallel image reconstruction with the Portable Extensive Toolkit for Scientific computation (PETSc).

  18. An efficient spectral crystal plasticity solver for GPU architectures

    Science.gov (United States)

    Malahe, Michael

    2018-03-01

    We present a spectral crystal plasticity (CP) solver for graphics processing unit (GPU) architectures that achieves a tenfold increase in efficiency over prior GPU solvers. The approach makes use of a database containing a spectral decomposition of CP simulations performed using a conventional iterative solver over a parameter space of crystal orientations and applied velocity gradients. The key improvements in efficiency come from reducing global memory transactions, exposing more instruction-level parallelism, reducing integer instructions and performing fast range reductions on trigonometric arguments. The scheme also makes more efficient use of memory than prior work, allowing for larger problems to be solved on a single GPU. We illustrate these improvements with a simulation of 390 million crystal grains on a consumer-grade GPU, which executes at a rate of 2.72 s per strain step.

  19. Performance and capacity analysis of Poisson photon-counting based Iter-PIC OCDMA systems.

    Science.gov (United States)

    Li, Lingbin; Zhou, Xiaolin; Zhang, Rong; Zhang, Dingchen; Hanzo, Lajos

    2013-11-04

    In this paper, an iterative parallel interference cancellation (Iter-PIC) technique is developed for optical code-division multiple-access (OCDMA) systems relying on shot-noise limited Poisson photon-counting reception. The novel semi-analytical tool of extrinsic information transfer (EXIT) charts is used for analysing both the bit error rate (BER) performance as well as the channel capacity of these systems and the results are verified by Monte Carlo simulations. The proposed Iter-PIC OCDMA system is capable of achieving two orders of magnitude BER improvements and a 0.1 nats of capacity improvement over the conventional chip-level OCDMA systems at a coding rate of 1/10.

  20. Vector and parallel processors in computational science

    International Nuclear Information System (INIS)

    Duff, I.S.; Reid, J.K.

    1985-01-01

    This book presents the papers given at a conference which reviewed the new developments in parallel and vector processing. Topics considered at the conference included hardware (array processors, supercomputers), programming languages, software aids, numerical methods (e.g., Monte Carlo algorithms, iterative methods, finite elements, optimization), and applications (e.g., neutron transport theory, meteorology, image processing)

  1. Efficient parallel algorithms for string editing and related problems

    Science.gov (United States)

    Apostolico, Alberto; Atallah, Mikhail J.; Larmore, Lawrence; Mcfaddin, H. S.

    1988-01-01

    The string editing problem for input strings x and y consists of transforming x into y by performing a series of weighted edit operations on x of overall minimum cost. An edit operation on x can be the deletion of a symbol from x, the insertion of a symbol in x or the substitution of a symbol x with another symbol. This problem has a well known O((absolute value of x)(absolute value of y)) time sequential solution (25). The efficient Program Requirements Analysis Methods (PRAM) parallel algorithms for the string editing problem are given. If m = ((absolute value of x),(absolute value of y)) and n = max((absolute value of x),(absolute value of y)), then the CREW bound is O (log m log n) time with O (mn/log m) processors. In all algorithms, space is O (mn).

  2. Parallel ray tracing for one-dimensional discrete ordinate computations

    International Nuclear Information System (INIS)

    Jarvis, R.D.; Nelson, P.

    1996-01-01

    The ray-tracing sweep in discrete-ordinates, spatially discrete numerical approximation methods applied to the linear, steady-state, plane-parallel, mono-energetic, azimuthally symmetric, neutral-particle transport equation can be reduced to a parallel prefix computation. In so doing, the often severe penalty in convergence rate of the source iteration, suffered by most current parallel algorithms using spatial domain decomposition, can be avoided while attaining parallelism in the spatial domain to whatever extent desired. In addition, the reduction implies parallel algorithm complexity limits for the ray-tracing sweep. The reduction applies to all closed, linear, one-cell functional (CLOF) spatial approximation methods, which encompasses most in current popular use. Scalability test results of an implementation of the algorithm on a 64-node nCube-2S hypercube-connected, message-passing, multi-computer are described. (author)

  3. A novel iterative scheme and its application to differential equations.

    Science.gov (United States)

    Khan, Yasir; Naeem, F; Šmarda, Zdeněk

    2014-01-01

    The purpose of this paper is to employ an alternative approach to reconstruct the standard variational iteration algorithm II proposed by He, including Lagrange multiplier, and to give a simpler formulation of Adomian decomposition and modified Adomian decomposition method in terms of newly proposed variational iteration method-II (VIM). Through careful investigation of the earlier variational iteration algorithm and Adomian decomposition method, we find unnecessary calculations for Lagrange multiplier and also repeated calculations involved in each iteration, respectively. Several examples are given to verify the reliability and efficiency of the method.

  4. FAST ITERATIVE KILOVOLTAGE CONE BEAM TOMOGRAPHY

    Directory of Open Access Journals (Sweden)

    S. A. Zolotarev

    2015-01-01

    Full Text Available Creating a fast parallel iterative tomographic algorithms based on the use of graphics accelerators, which simultaneously provide the minimization of residual and total variation of the reconstructed image is an important and urgent task, which is of great scientific and practical importance. Such algorithms can be used, for example, in the implementation of radiation therapy patients, because it is always done pre-computed tomography of patients in order to better identify areas which can then be subjected to radiation exposure. 

  5. Accelerated fast iterative shrinkage thresholding algorithms for sparsity-regularized cone-beam CT image reconstruction

    International Nuclear Information System (INIS)

    Xu, Qiaofeng; Sawatzky, Alex; Anastasio, Mark A.; Yang, Deshan; Tan, Jun

    2016-01-01

    Purpose: The development of iterative image reconstruction algorithms for cone-beam computed tomography (CBCT) remains an active and important research area. Even with hardware acceleration, the overwhelming majority of the available 3D iterative algorithms that implement nonsmooth regularizers remain computationally burdensome and have not been translated for routine use in time-sensitive applications such as image-guided radiation therapy (IGRT). In this work, two variants of the fast iterative shrinkage thresholding algorithm (FISTA) are proposed and investigated for accelerated iterative image reconstruction in CBCT. Methods: Algorithm acceleration was achieved by replacing the original gradient-descent step in the FISTAs by a subproblem that is solved by use of the ordered subset simultaneous algebraic reconstruction technique (OS-SART). Due to the preconditioning matrix adopted in the OS-SART method, two new weighted proximal problems were introduced and corresponding fast gradient projection-type algorithms were developed for solving them. We also provided efficient numerical implementations of the proposed algorithms that exploit the massive data parallelism of multiple graphics processing units. Results: The improved rates of convergence of the proposed algorithms were quantified in computer-simulation studies and by use of clinical projection data corresponding to an IGRT study. The accelerated FISTAs were shown to possess dramatically improved convergence properties as compared to the standard FISTAs. For example, the number of iterations to achieve a specified reconstruction error could be reduced by an order of magnitude. Volumetric images reconstructed from clinical data were produced in under 4 min. Conclusions: The FISTA achieves a quadratic convergence rate and can therefore potentially reduce the number of iterations required to produce an image of a specified image quality as compared to first-order methods. We have proposed and investigated

  6. Accelerated fast iterative shrinkage thresholding algorithms for sparsity-regularized cone-beam CT image reconstruction

    Science.gov (United States)

    Xu, Qiaofeng; Yang, Deshan; Tan, Jun; Sawatzky, Alex; Anastasio, Mark A.

    2016-01-01

    Purpose: The development of iterative image reconstruction algorithms for cone-beam computed tomography (CBCT) remains an active and important research area. Even with hardware acceleration, the overwhelming majority of the available 3D iterative algorithms that implement nonsmooth regularizers remain computationally burdensome and have not been translated for routine use in time-sensitive applications such as image-guided radiation therapy (IGRT). In this work, two variants of the fast iterative shrinkage thresholding algorithm (FISTA) are proposed and investigated for accelerated iterative image reconstruction in CBCT. Methods: Algorithm acceleration was achieved by replacing the original gradient-descent step in the FISTAs by a subproblem that is solved by use of the ordered subset simultaneous algebraic reconstruction technique (OS-SART). Due to the preconditioning matrix adopted in the OS-SART method, two new weighted proximal problems were introduced and corresponding fast gradient projection-type algorithms were developed for solving them. We also provided efficient numerical implementations of the proposed algorithms that exploit the massive data parallelism of multiple graphics processing units. Results: The improved rates of convergence of the proposed algorithms were quantified in computer-simulation studies and by use of clinical projection data corresponding to an IGRT study. The accelerated FISTAs were shown to possess dramatically improved convergence properties as compared to the standard FISTAs. For example, the number of iterations to achieve a specified reconstruction error could be reduced by an order of magnitude. Volumetric images reconstructed from clinical data were produced in under 4 min. Conclusions: The FISTA achieves a quadratic convergence rate and can therefore potentially reduce the number of iterations required to produce an image of a specified image quality as compared to first-order methods. We have proposed and investigated

  7. Solving the Stokes problem on a massively parallel computer

    DEFF Research Database (Denmark)

    Axelsson, Owe; Barker, Vincent A.; Neytcheva, Maya

    2001-01-01

    boundary value problem for each velocity component, are solved by the conjugate gradient method with a preconditioning based on the algebraic multi‐level iteration (AMLI) technique. The velocity is found from the computed pressure. The method is optimal in the sense that the computational work...... is proportional to the number of unknowns. Further, it is designed to exploit a massively parallel computer with distributed memory architecture. Numerical experiments on a Cray T3E computer illustrate the parallel performance of the method....

  8. A kind of iteration algorithm for fast wave heating

    International Nuclear Information System (INIS)

    Zhu Xueguang; Kuang Guangli; Zhao Yanping; Li Youyi; Xie Jikang

    1998-03-01

    The standard normal distribution for particles in Tokamak geometry is usually assumed in fast wave heating. In fact, due to the quasi-linear diffusion effect, the parallel and vertical temperature of resonant particles is not equal, so, this will bring some error. For this case, the Fokker-Planck equation is introduced, and iteration algorithm is adopted to solve the problem well

  9. Variable aperture-based ptychographical iterative engine method

    Science.gov (United States)

    Sun, Aihui; Kong, Yan; Meng, Xin; He, Xiaoliang; Du, Ruijun; Jiang, Zhilong; Liu, Fei; Xue, Liang; Wang, Shouyu; Liu, Cheng

    2018-02-01

    A variable aperture-based ptychographical iterative engine (vaPIE) is demonstrated both numerically and experimentally to reconstruct the sample phase and amplitude rapidly. By adjusting the size of a tiny aperture under the illumination of a parallel light beam to change the illumination on the sample step by step and recording the corresponding diffraction patterns sequentially, both the sample phase and amplitude can be faithfully reconstructed with a modified ptychographical iterative engine (PIE) algorithm. Since many fewer diffraction patterns are required than in common PIE and the shape, the size, and the position of the aperture need not to be known exactly, this proposed vaPIE method remarkably reduces the data acquisition time and makes PIE less dependent on the mechanical accuracy of the translation stage; therefore, the proposed technique can be potentially applied for various scientific researches.

  10. Tungsten recrystallization and cracking under ITER-relevant heat loads

    Energy Technology Data Exchange (ETDEWEB)

    Budaev, V.P., E-mail: Budaev@mail.ru [NRC «Kurchatov Institute», Akademika Kurchatova pl., Moscow (Russian Federation); Martynenko, Yu.V. [NRC «Kurchatov Institute», Akademika Kurchatova pl., Moscow (Russian Federation); National Research Nuclear University MEPhI, Kashirskoe sh. 31, Moscow (Russian Federation); Karpov, A.V.; Belova, N.E. [NRC «Kurchatov Institute», Akademika Kurchatova pl., Moscow (Russian Federation); Zhitlukhin, A.M. [SRC RF TRINITI, Moscow Region (Russian Federation); Klimov, N.S., E-mail: klimov@triniti.ru [SRC RF TRINITI, Moscow Region (Russian Federation); National Research Nuclear University MEPhI, Kashirskoe sh. 31, Moscow (Russian Federation); Podkovyrov, V.L.; Barsuk, V.A.; Putrik, A.B.; Yaroshevskaya, A.D. [SRC RF TRINITI, Moscow Region (Russian Federation); Giniyatulin, R.N. [Efremov Institute, St. Petersburg (Russian Federation); Safronov, V.M. [Institution «Project Center ITER», Moscow (Russian Federation); SRC RF TRINITI, Moscow Region (Russian Federation); Khimchenko, L.N. [Institution «Project Center ITER», Moscow (Russian Federation)

    2015-08-15

    The tungsten surface structure was analyzed after the test in the QSPA-T under heat loads relevant to those expected in the ITER during disruptions. Repeated pulses lead to the melting and the resolidification of the tungsten surface layer of ∼50 μm thickness. There is ∼50 μm thickness intermediate layer between the original structure and the resolidified layer. The intermediate layer is recrystallized and has a random grains’ orientation whereas the resolidified layer and basic structure have texture with preferable orientation 〈1 0 0〉 normal to the surface. The cracks which were normal to the surface were observed in the resolidified layer as well as the cracks which were parallel to the surface at the depth up to 300 μm. Such cracks can result in the brittle destruction which is a hazard for the full tungsten divertor of the ITER. The theoretical analysis of the crack formation reasons and a possible consequence for the ITER are given.

  11. Iterative solution of high order compact systems

    Energy Technology Data Exchange (ETDEWEB)

    Spotz, W.F.; Carey, G.F. [Univ. of Texas, Austin, TX (United States)

    1996-12-31

    We have recently developed a class of finite difference methods which provide higher accuracy and greater stability than standard central or upwind difference methods, but still reside on a compact patch of grid cells. In the present study we investigate the performance of several gradient-type iterative methods for solving the associated sparse systems. Both serial and parallel performance studies have been made. Representative examples are taken from elliptic PDE`s for diffusion, convection-diffusion, and viscous flow applications.

  12. Acceleration of iterative tomographic reconstruction using graphics processors

    International Nuclear Information System (INIS)

    Belzunce, M.A.; Osorio, A.; Verrastro, C.A.

    2009-01-01

    Using iterative algorithms for image reconstruction in 3 D Positron Emission Tomography has shown to produce images with better quality than analytical methods. How ever, these algorithms are computationally expensive. New Graphic Processor Units (GPU) provides high performance at low cost and also programming tools that make possible to execute parallel algorithms easily in scientific applications. In this work, we try to achieve an acceleration of image reconstruction algorithms in 3 D PET by using a GPU. A parallel implementation of the algorithm ML-EM 3 D was developed using Siddon algorithm as Projector and Back-projector. Results show that accelerations of more than one order of magnitude can be achieved, keeping similar image quality. (author)

  13. Iterated unscented Kalman filter for phase unwrapping of interferometric fringes.

    Science.gov (United States)

    Xie, Xianming

    2016-08-22

    A fresh phase unwrapping algorithm based on iterated unscented Kalman filter is proposed to estimate unambiguous unwrapped phase of interferometric fringes. This method is the result of combining an iterated unscented Kalman filter with a robust phase gradient estimator based on amended matrix pencil model, and an efficient quality-guided strategy based on heap sort. The iterated unscented Kalman filter that is one of the most robust methods under the Bayesian theorem frame in non-linear signal processing so far, is applied to perform simultaneously noise suppression and phase unwrapping of interferometric fringes for the first time, which can simplify the complexity and the difficulty of pre-filtering procedure followed by phase unwrapping procedure, and even can remove the pre-filtering procedure. The robust phase gradient estimator is used to efficiently and accurately obtain phase gradient information from interferometric fringes, which is needed for the iterated unscented Kalman filtering phase unwrapping model. The efficient quality-guided strategy is able to ensure that the proposed method fast unwraps wrapped pixels along the path from the high-quality area to the low-quality area of wrapped phase images, which can greatly improve the efficiency of phase unwrapping. Results obtained from synthetic data and real data show that the proposed method can obtain better solutions with an acceptable time consumption, with respect to some of the most used algorithms.

  14. Determination of quantitative tissue composition by iterative reconstruction on 3D DECT volumes

    Energy Technology Data Exchange (ETDEWEB)

    Magnusson, Maria [Linkoeping Univ. (Sweden). Dept. of Electrical Engineering; Linkoeping Univ. (Sweden). Dept. of Medical and Health Sciences, Radiation Physics; Linkoeping Univ. (Sweden). Center for Medical Image Science and Visualization (CMIV); Malusek, Alexandr [Linkoeping Univ. (Sweden). Dept. of Medical and Health Sciences, Radiation Physics; Linkoeping Univ. (Sweden). Center for Medical Image Science and Visualization (CMIV); Nuclear Physics Institute AS CR, Prague (Czech Republic). Dept. of Radiation Dosimetry; Muhammad, Arif [Linkoeping Univ. (Sweden). Dept. of Medical and Health Sciences, Radiation Physics; Carlsson, Gudrun Alm [Linkoeping Univ. (Sweden). Dept. of Medical and Health Sciences, Radiation Physics; Linkoeping Univ. (Sweden). Center for Medical Image Science and Visualization (CMIV)

    2011-07-01

    Quantitative tissue classification using dual-energy CT has the potential to improve accuracy in radiation therapy dose planning as it provides more information about material composition of scanned objects than the currently used methods based on single-energy CT. One problem that hinders successful application of both single- and dual-energy CT is the presence of beam hardening and scatter artifacts in reconstructed data. Current pre- and post-correction methods used for image reconstruction often bias CT attenuation values and thus limit their applicability for quantitative tissue classification. Here we demonstrate simulation studies with a novel iterative algorithm that decomposes every soft tissue voxel into three base materials: water, protein, and adipose. The results demonstrate that beam hardening artifacts can effectively be removed and accurate estimation of mass fractions of each base material can be achieved. Our iterative algorithm starts with calculating parallel projections on two previously reconstructed DECT volumes reconstructed from fan-beam or helical projections with small conebeam angle. The parallel projections are then used in an iterative loop. Future developments include segmentation of soft and bone tissue and subsequent determination of bone composition. (orig.)

  15. A block-iterative nodal integral method for forced convection problems

    International Nuclear Information System (INIS)

    Decker, W.J.; Dorning, J.J.

    1992-01-01

    A new efficient iterative nodal integral method for the time-dependent two- and three-dimensional incompressible Navier-Stokes equations has been developed. Using the approach introduced by Azmy and Droning to develop nodal mehtods with high accuracy on coarse spatial grids for two-dimensional steady-state problems and extended to coarse two-dimensional space-time grids by Wilson et al. for thermal convection problems, we have developed a new iterative nodal integral method for the time-dependent Navier-Stokes equations for mechanically forced convection. A new, extremely efficient block iterative scheme is employed to invert the Jacobian within each of the Newton-Raphson iterations used to solve the final nonlinear discrete-variable equations. By taking advantage of the special structure of the Jacobian, this scheme greatly reduces memory requirements. The accuracy of the overall method is illustrated by appliying it to the time-dependent version of the classic two-dimensional driven cavity problem of computational fluid dynamics

  16. Iterative Adaptive Sampling For Accurate Direct Illumination

    National Research Council Canada - National Science Library

    Donikian, Michael

    2004-01-01

    This thesis introduces a new multipass algorithm, Iterative Adaptive Sampling, for efficiently computing the direct illumination in scenes with many lights, including area lights that cause realistic soft shadows...

  17. A dimension decomposition approach based on iterative observer design for an elliptic Cauchy problem

    KAUST Repository

    Majeed, Muhammad Usman; Laleg-Kirati, Taous-Meriem

    2015-01-01

    A state observer inspired iterative algorithm is presented to solve boundary estimation problem for Laplace equation using one of the space variables as a time-like variable. Three dimensional domain with two congruent parallel surfaces

  18. New algorithms for parallel MRI

    International Nuclear Information System (INIS)

    Anzengruber, S; Ramlau, R; Bauer, F; Leitao, A

    2008-01-01

    Magnetic Resonance Imaging with parallel data acquisition requires algorithms for reconstructing the patient's image from a small number of measured lines of the Fourier domain (k-space). In contrast to well-known algorithms like SENSE and GRAPPA and its flavors we consider the problem as a non-linear inverse problem. However, in order to avoid cost intensive derivatives we will use Landweber-Kaczmarz iteration and in order to improve the overall results some additional sparsity constraints.

  19. The ITER divertor concept

    International Nuclear Information System (INIS)

    Janeschitz, G.; Borrass, K.; Federici, G.; Igitkhanov, Y.; Kukushkin, A.; Pacher, H.D.; Pacher, G.W.; Sugihara, M.

    1995-01-01

    The ITER divertor must exhaust most of the alpha particle power and the He ash at acceptable erosion rates. The high recycling regime of the ITER-CDA for present parameters would yield high power loads and erosion rates on conventional targets. Improvement by radiation in the SOL at constant pressure is limited in principle. To permit a higher radiation fraction, the plasma pressure along the field must be reduced by more than a factor 10, reducing also the target ion flux. This pressure reduction can be obtained by strong plasma-neutral interaction below the X-point. Under these conditions T e in the divertor can be reduced to <5 eV along a flame like ionisation front by impurity radiation and CX losses. Downstream of the front, neutrals undergo more CX or i-n collisions than ionisation events, resulting in significant momentum loss via neutrals to the divertor chamber wall. The pressure reduction by this mechanism depends on the along-field length for neutral-plasma interaction, the parallel power flux, the neutral density, the ratio of neutral-neutral collision length to the plasma-wall distance and on the Mach number of ions and neutrals. A supersonic transition in the main plasma-neutral interaction region, expected to occur near the ionisation front, would be beneficial for momentum removal. The momentum transfer fraction to the side walls is calculated: low Knudsen number is beneficial. The impact of the different physics effects on the chosen geometry and on the ITER divertor design and the lifetime of the various divertor components are discussed. ((orig.))

  20. Parallel scalability and efficiency of vortex particle method for aeroelasticity analysis of bluff bodies

    Science.gov (United States)

    Tolba, Khaled Ibrahim; Morgenthal, Guido

    2018-01-01

    This paper presents an analysis of the scalability and efficiency of a simulation framework based on the vortex particle method. The code is applied for the numerical aerodynamic analysis of line-like structures. The numerical code runs on multicore CPU and GPU architectures using OpenCL framework. The focus of this paper is the analysis of the parallel efficiency and scalability of the method being applied to an engineering test case, specifically the aeroelastic response of a long-span bridge girder at the construction stage. The target is to assess the optimal configuration and the required computer architecture, such that it becomes feasible to efficiently utilise the method within the computational resources available for a regular engineering office. The simulations and the scalability analysis are performed on a regular gaming type computer.

  1. Parallelization of a spherical Sn transport theory algorithm

    International Nuclear Information System (INIS)

    Haghighat, A.

    1989-01-01

    The work described in this paper derives a parallel algorithm for an R-dependent spherical S N transport theory algorithm and studies its performance by testing different sample problems. The S N transport method is one of the most accurate techniques used to solve the linear Boltzmann equation. Several studies have been done on the vectorization of the S N algorithms; however, very few studies have been performed on the parallelization of this algorithm. Weinke and Hommoto have looked at the parallel processing of the different energy groups, and Azmy recently studied the parallel processing of the inner iterations of an X-Y S N nodal transport theory method. Both studies have reported very encouraging results, which have prompted us to look at the parallel processing of an R-dependent S N spherical geometry algorithm. This geometry was chosen because, in spite of its simplicity, it contains the complications of the curvilinear geometries (i.e., redistribution of neutrons over the discretized angular bins)

  2. A Parallel Solver for Large-Scale Markov Chains

    Czech Academy of Sciences Publication Activity Database

    Benzi, M.; Tůma, Miroslav

    2002-01-01

    Roč. 41, - (2002), s. 135-153 ISSN 0168-9274 R&D Projects: GA AV ČR IAA2030801; GA ČR GA101/00/1035 Keywords : parallel preconditioning * iterative methods * discrete Markov chains * generalized inverses * singular matrices * graph partitioning * AINV * Bi-CGSTAB Subject RIV: BA - General Mathematics Impact factor: 0.504, year: 2002

  3. Low-memory iterative density fitting.

    Science.gov (United States)

    Grajciar, Lukáš

    2015-07-30

    A new low-memory modification of the density fitting approximation based on a combination of a continuous fast multipole method (CFMM) and a preconditioned conjugate gradient solver is presented. Iterative conjugate gradient solver uses preconditioners formed from blocks of the Coulomb metric matrix that decrease the number of iterations needed for convergence by up to one order of magnitude. The matrix-vector products needed within the iterative algorithm are calculated using CFMM, which evaluates them with the linear scaling memory requirements only. Compared with the standard density fitting implementation, up to 15-fold reduction of the memory requirements is achieved for the most efficient preconditioner at a cost of only 25% increase in computational time. The potential of the method is demonstrated by performing density functional theory calculations for zeolite fragment with 2592 atoms and 121,248 auxiliary basis functions on a single 12-core CPU workstation. © 2015 Wiley Periodicals, Inc.

  4. An efficient numerical scheme for the simulation of parallel-plate active magnetic regenerators

    DEFF Research Database (Denmark)

    Torregrosa-Jaime, Bárbara; Corberán, José M.; Payá, Jorge

    2015-01-01

    A one-dimensional model of a parallel-plate active magnetic regenerator (AMR) is presented in this work. The model is based on an efficient numerical scheme which has been developed after analysing the heat transfer mechanisms in the regenerator bed. The new finite difference scheme optimally com...... to the fully implicit scheme, the proposed scheme achieves more accurate results, prevents numerical errors and requires less computational effort. In AMR simulations the new scheme can reduce the computational time by 88%....

  5. Parallel Jacobi EVD Methods on Integrated Circuits

    Directory of Open Access Journals (Sweden)

    Chi-Chia Sun

    2014-01-01

    Full Text Available Design strategies for parallel iterative algorithms are presented. In order to further study different tradeoff strategies in design criteria for integrated circuits, A 10 × 10 Jacobi Brent-Luk-EVD array with the simplified μ-CORDIC processor is used as an example. The experimental results show that using the μ-CORDIC processor is beneficial for the design criteria as it yields a smaller area, faster overall computation time, and less energy consumption than the regular CORDIC processor. It is worth to notice that the proposed parallel EVD method can be applied to real-time and low-power array signal processing algorithms performing beamforming or DOA estimation.

  6. Variable aperture-based ptychographical iterative engine method.

    Science.gov (United States)

    Sun, Aihui; Kong, Yan; Meng, Xin; He, Xiaoliang; Du, Ruijun; Jiang, Zhilong; Liu, Fei; Xue, Liang; Wang, Shouyu; Liu, Cheng

    2018-02-01

    A variable aperture-based ptychographical iterative engine (vaPIE) is demonstrated both numerically and experimentally to reconstruct the sample phase and amplitude rapidly. By adjusting the size of a tiny aperture under the illumination of a parallel light beam to change the illumination on the sample step by step and recording the corresponding diffraction patterns sequentially, both the sample phase and amplitude can be faithfully reconstructed with a modified ptychographical iterative engine (PIE) algorithm. Since many fewer diffraction patterns are required than in common PIE and the shape, the size, and the position of the aperture need not to be known exactly, this proposed vaPIE method remarkably reduces the data acquisition time and makes PIE less dependent on the mechanical accuracy of the translation stage; therefore, the proposed technique can be potentially applied for various scientific researches. (2018) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE).

  7. ITER and the fusion reactor: status and challenge to technology

    International Nuclear Information System (INIS)

    Lackner, K.

    2001-01-01

    Fusion has a high potential, but requires an integrated physics and technology effort without precedence in non-military R and D, the basic physics feasibility demonstration will be concluded with ITER, although R and D for efficiency improvement will continue. The essential technological issues remaining at the start of ITER operation concern materials questions: first wall components and radiation tolerant (low activation materials). This paper comprised just the copy of the slides presentation with the following subjects: magnetic confinement fusion, the Tokamak, progress in Tokamak performance, ITER: its geneology, physics basis-critical issues, cutaway of ITER-FEAT, R and D - divertor cassette (L-5), differences power plant-ITER, challenges for ITER and fusion plants, main technological problems (plasma facing materials), structural and functional materials for fusion power plants, ferritic steels, EUROFER development, improvements beyond ferritic steels, costing among others. (nevyjel)

  8. iHadoop: Asynchronous Iterations Support for MapReduce

    KAUST Repository

    Elnikety, Eslam

    2011-08-01

    MapReduce is a distributed programming framework designed to ease the development of scalable data-intensive applications for large clusters of commodity machines. Most machine learning and data mining applications involve iterative computations over large datasets, such as the Web hyperlink structures and social network graphs. Yet, the MapReduce model does not efficiently support this important class of applications. The architecture of MapReduce, most critically its dataflow techniques and task scheduling, is completely unaware of the nature of iterative applications; tasks are scheduled according to a policy that optimizes the execution for a single iteration which wastes bandwidth, I/O, and CPU cycles when compared with an optimal execution for a consecutive set of iterations. This work presents iHadoop, a modified MapReduce model, and an associated implementation, optimized for iterative computations. The iHadoop model schedules iterations asynchronously. It connects the output of one iteration to the next, allowing both to process their data concurrently. iHadoop\\'s task scheduler exploits inter- iteration data locality by scheduling tasks that exhibit a producer/consumer relation on the same physical machine allowing a fast local data transfer. For those iterative applications that require satisfying certain criteria before termination, iHadoop runs the check concurrently during the execution of the subsequent iteration to further reduce the application\\'s latency. This thesis also describes our implementation of the iHadoop model, and evaluates its performance against Hadoop, the widely used open source implementation of MapReduce. Experiments using different data analysis applications over real-world and synthetic datasets show that iHadoop performs better than Hadoop for iterative algorithms, reducing execution time of iterative applications by 25% on average. Furthermore, integrating iHadoop with HaLoop, a variant Hadoop implementation that caches

  9. Multi-Level iterative methods in computational plasma physics

    International Nuclear Information System (INIS)

    Knoll, D.A.; Barnes, D.C.; Brackbill, J.U.; Chacon, L.; Lapenta, G.

    1999-01-01

    Plasma physics phenomena occur on a wide range of spatial scales and on a wide range of time scales. When attempting to model plasma physics problems numerically the authors are inevitably faced with the need for both fine spatial resolution (fine grids) and implicit time integration methods. Fine grids can tax the efficiency of iterative methods and large time steps can challenge the robustness of iterative methods. To meet these challenges they are developing a hybrid approach where multigrid methods are used as preconditioners to Krylov subspace based iterative methods such as conjugate gradients or GMRES. For nonlinear problems they apply multigrid preconditioning to a matrix-few Newton-GMRES method. Results are presented for application of these multilevel iterative methods to the field solves in implicit moment method PIC, multidimensional nonlinear Fokker-Planck problems, and their initial efforts in particle MHD

  10. ITER Neutral Beam Injection System

    International Nuclear Information System (INIS)

    Ohara, Yoshihiro; Tanaka, Shigeru; Akiba, Masato

    1991-03-01

    A Japanese design proposal of the ITER Neutral Beam Injection System (NBS) which is consistent with the ITER common design requirements is described. The injection system is required to deliver a neutral deuterium beam of 75MW at 1.3MeV to the reactor plasma and utilized not only for plasma heating but also for current drive and current profile control. The injection system is composed of 9 modules, each of which is designed so as to inject a 1.3MeV, 10MW neutral beam. The most important point in the design is that the injection system is based on the utilization of a cesium-seeded volume negative ion source which can produce an intense negative ion beam with high current density at a low source operating pressure. The design value of the source is based on the experimental values achieved at JAERI. The utilization of the cesium-seeded volume source is essential to the design of an efficient and compact neutral beam injection system which satisfies the ITER common design requirements. The critical components to realize this design are the 1.3MeV, 17A electrostatic accelerator and the high voltage DC acceleration power supply, whose performances must be demonstrated prior to the construction of ITER NBI system. (author)

  11. Toolkit for high performance Monte Carlo radiation transport and activation calculations for shielding applications in ITER

    International Nuclear Information System (INIS)

    Serikov, A.; Fischer, U.; Grosse, D.; Leichtle, D.; Majerle, M.

    2011-01-01

    The Monte Carlo (MC) method is the most suitable computational technique of radiation transport for shielding applications in fusion neutronics. This paper is intended for sharing the results of long term experience of the fusion neutronics group at Karlsruhe Institute of Technology (KIT) in radiation shielding calculations with the MCNP5 code for the ITER fusion reactor with emphasizing on the use of several ITER project-driven computer programs developed at KIT. Two of them, McCad and R2S, seem to be the most useful in radiation shielding analyses. The McCad computer graphical tool allows to perform automatic conversion of the MCNP models from the underlying CAD (CATIA) data files, while the R2S activation interface couples the MCNP radiation transport with the FISPACT activation allowing to estimate nuclear responses such as dose rate and nuclear heating after the ITER reactor shutdown. The cell-based R2S scheme was applied in shutdown photon dose analysis for the designing of the In-Vessel Viewing System (IVVS) and the Glow Discharge Cleaning (GDC) unit in ITER. Newly developed at KIT mesh-based R2S feature was successfully tested on the shutdown dose rate calculations for the upper port in the Neutral Beam (NB) cell of ITER. The merits of McCad graphical program were broadly acknowledged by the neutronic analysts and its continuous improvement at KIT has introduced its stable and more convenient run with its Graphical User Interface. Detailed 3D ITER neutronic modeling with the MCNP Monte Carlo method requires a lot of computation resources, inevitably leading to parallel calculations on clusters. Performance assessments of the MCNP5 parallel runs on the JUROPA/HPC-FF supercomputer cluster permitted to find the optimal number of processors for ITER-type runs. (author)

  12. ITER EDA status

    International Nuclear Information System (INIS)

    Aymar, R.

    2001-01-01

    The Project has focused on drafting the Plant Description Document (PDD), which will be published as the Technical Basis for the ITER Final Design Report (FDR), and its related documentation in time for the ITER review process. The preparations have involved continued intensive detailed design work, analyses and assessments by the Home Teams and the Joint Central Team, who have co-operated closely and efficiently. The main technical document has been completed in time for circulation, as planned, to TAC members for their review at TAC-17 (19-22 February 2001). Some of the supporting documents, such as the Plant Design Specification (PDS), Design Requirements and Guidelines (DRG1 and DRG2), and the Plant Safety Requirement (PSR) are also available for reference in draft form. A summary paper of the PDD for the Council's information is available as a separate document. A new documentation structure for the Project has been established. This hierarchical structure for documentation facilitates the entire organization in a way that allows better change control and avoids duplications. The initiative was intended to make this documentation system valid for the construction and operation phases of ITER. As requested, the Director and the JCT have been assisting the Explorations to plan for future joint technical activities during the Negotiations, and to consider technical issues important for ITER construction and operation for their introduction in the draft of a future joint implementation agreement. As charged by the Explorers, the Director has held discussions with the Home Team Leaders in order to prepare for the staffing of the International Team and Participants Teams during the Negotiations (Co-ordinated Technical Activities, CTA) and also in view of informing all ITER staff about their future directions in a timely fashion. One important element of the work was the completion by the Parties' industries of costing studies of about 83 ''procurement packages

  13. Performance Analysis of Iterative Channel Estimation and Multiuser Detection in Multipath DS-CDMA Channels

    Science.gov (United States)

    Li, Husheng; Betz, Sharon M.; Poor, H. Vincent

    2007-05-01

    This paper examines the performance of decision feedback based iterative channel estimation and multiuser detection in channel coded aperiodic DS-CDMA systems operating over multipath fading channels. First, explicit expressions describing the performance of channel estimation and parallel interference cancellation based multiuser detection are developed. These results are then combined to characterize the evolution of the performance of a system that iterates among channel estimation, multiuser detection and channel decoding. Sufficient conditions for convergence of this system to a unique fixed point are developed.

  14. The simplified spherical harmonics (SP{sub L}) methodology with space and moment decomposition in parallel environments

    Energy Technology Data Exchange (ETDEWEB)

    Gianluca, Longoni; Alireza, Haghighat [Florida University, Nuclear and Radiological Engineering Department, Gainesville, FL (United States)

    2003-07-01

    In recent years, the SP{sub L} (simplified spherical harmonics) equations have received renewed interest for the simulation of nuclear systems. We have derived the SP{sub L} equations starting from the even-parity form of the S{sub N} equations. The SP{sub L} equations form a system of (L+1)/2 second order partial differential equations that can be solved with standard iterative techniques such as the Conjugate Gradient (CG). We discretized the SP{sub L} equations with the finite-volume approach in a 3-D Cartesian space. We developed a new 3-D general code, Pensp{sub L} (Parallel Environment Neutral-particle SP{sub L}). Pensp{sub L} solves both fixed source and criticality eigenvalue problems. In order to optimize the memory management, we implemented a Compressed Diagonal Storage (CDS) to store the SP{sub L} matrices. Pensp{sub L} includes parallel algorithms for space and moment domain decomposition. The computational load is distributed on different processors, using a mapping function, which maps the 3-D Cartesian space and moments onto processors. The code is written in Fortran 90 using the Message Passing Interface (MPI) libraries for the parallel implementation of the algorithm. The code has been tested on the Pcpen cluster and the parallel performance has been assessed in terms of speed-up and parallel efficiency. (author)

  15. Japanese perspective of fusion nuclear technology from ITER to DEMO

    International Nuclear Information System (INIS)

    Tanaka, Satoru; Takatsu, Hideyuki

    2007-01-01

    The world fusion community is now launching construction of ITER, the first nuclear-grade fusion machine in the world. In parallel to the ITER program, Broader Approach (BA) activities are to be initiated in this year by EU and Japan, mainly at Rokkasho BA site in Japan, as complementary activities to ITER toward DEMO. The BA activities include IFMIFEVEDA (International Fusion Materials Irradiation Facility-Engineering Validation and Engineering Design Activities) and DEMO design activities with generic technology R and Ds, both of which are critical to the rapid development of DEMO and commercial fusion power plants. The Atomic Energy Commission of Japan reviewed on-going third phase fusion program and issued the results of the review, 'On the policy of Nuclear Fusion Research and Development' in November 2005. In this report, it is anticipated that the ITER will be made operational in a decade and the programmatic objective can be met in the succeeding seven or eight years. Under this condition, the report presents a roadmap toward the DEMO and beyond and R and D items on fusion nuclear technology, indispensable for fusion energy utilization, are re-aligned. In the present paper, Japanese view and policy on ITER and beyond is summarized mainly from the viewpoints of nuclear fusion technology, and a minimum set of R and D elements on fusion nuclear technology, essential for fusion energy utilization, is presented. (orig.)

  16. Variational iteration method for one dimensional nonlinear thermoelasticity

    International Nuclear Information System (INIS)

    Sweilam, N.H.; Khader, M.M.

    2007-01-01

    This paper applies the variational iteration method to solve the Cauchy problem arising in one dimensional nonlinear thermoelasticity. The advantage of this method is to overcome the difficulty of calculation of Adomian's polynomials in the Adomian's decomposition method. The numerical results of this method are compared with the exact solution of an artificial model to show the efficiency of the method. The approximate solutions show that the variational iteration method is a powerful mathematical tool for solving nonlinear problems

  17. ERA: Efficient serial and parallel suffix tree construction for very long strings

    KAUST Repository

    Mansour, Essam

    2011-09-01

    The suffix tree is a data structure for indexing strings. It is used in a variety of applications such as bioinformatics, time series analysis, clustering, text editing and data compression. However, when the string and the resulting suffix tree are too large to fit into the main memory, most existing construction algorithms become very inefficient. This paper presents a disk-based suffix tree construction method, called Elastic Range (ERa), which works efficiently with very long strings that are much larger than the available memory. ERa partitions the tree construction process horizontally and vertically and minimizes I/Os by dynamically adjusting the horizontal partitions independently for each vertical partition, based on the evolving shape of the tree and the available memory. Where appropriate, ERa also groups vertical partitions together to amortize the I/O cost. We developed a serial version; a parallel version for shared-memory and shared-disk multi-core systems; and a parallel version for shared-nothing architectures. ERa indexes the entire human genome in 19 minutes on an ordinary desktop computer. For comparison, the fastest existing method needs 15 minutes using 1024 CPUs on an IBM BlueGene supercomputer.

  18. High performance shallow water kernels for parallel overland flow simulations based on FullSWOF2D

    KAUST Repository

    Wittmann, Roland

    2017-01-25

    We describe code optimization and parallelization procedures applied to the sequential overland flow solver FullSWOF2D. Major difficulties when simulating overland flows comprise dealing with high resolution datasets of large scale areas which either cannot be computed on a single node either due to limited amount of memory or due to too many (time step) iterations resulting from the CFL condition. We address these issues in terms of two major contributions. First, we demonstrate a generic step-by-step transformation of the second order finite volume scheme in FullSWOF2D towards MPI parallelization. Second, the computational kernels are optimized by the use of templates and a portable vectorization approach. We discuss the load imbalance of the flux computation due to dry and wet cells and propose a solution using an efficient cell counting approach. Finally, scalability results are shown for different test scenarios along with a flood simulation benchmark using the Shaheen II supercomputer.

  19. Parallel Computing Strategies for Irregular Algorithms

    Science.gov (United States)

    Biswas, Rupak; Oliker, Leonid; Shan, Hongzhang; Biegel, Bryan (Technical Monitor)

    2002-01-01

    Parallel computing promises several orders of magnitude increase in our ability to solve realistic computationally-intensive problems, but relies on their efficient mapping and execution on large-scale multiprocessor architectures. Unfortunately, many important applications are irregular and dynamic in nature, making their effective parallel implementation a daunting task. Moreover, with the proliferation of parallel architectures and programming paradigms, the typical scientist is faced with a plethora of questions that must be answered in order to obtain an acceptable parallel implementation of the solution algorithm. In this paper, we consider three representative irregular applications: unstructured remeshing, sparse matrix computations, and N-body problems, and parallelize them using various popular programming paradigms on a wide spectrum of computer platforms ranging from state-of-the-art supercomputers to PC clusters. We present the underlying problems, the solution algorithms, and the parallel implementation strategies. Smart load-balancing, partitioning, and ordering techniques are used to enhance parallel performance. Overall results demonstrate the complexity of efficiently parallelizing irregular algorithms.

  20. Application of parallel computing techniques to a large-scale reservoir simulation

    International Nuclear Information System (INIS)

    Zhang, Keni; Wu, Yu-Shu; Ding, Chris; Pruess, Karsten

    2001-01-01

    Even with the continual advances made in both computational algorithms and computer hardware used in reservoir modeling studies, large-scale simulation of fluid and heat flow in heterogeneous reservoirs remains a challenge. The problem commonly arises from intensive computational requirement for detailed modeling investigations of real-world reservoirs. This paper presents the application of a massive parallel-computing version of the TOUGH2 code developed for performing large-scale field simulations. As an application example, the parallelized TOUGH2 code is applied to develop a three-dimensional unsaturated-zone numerical model simulating flow of moisture, gas, and heat in the unsaturated zone of Yucca Mountain, Nevada, a potential repository for high-level radioactive waste. The modeling approach employs refined spatial discretization to represent the heterogeneous fractured tuffs of the system, using more than a million 3-D gridblocks. The problem of two-phase flow and heat transfer within the model domain leads to a total of 3,226,566 linear equations to be solved per Newton iteration. The simulation is conducted on a Cray T3E-900, a distributed-memory massively parallel computer. Simulation results indicate that the parallel computing technique, as implemented in the TOUGH2 code, is very efficient. The reliability and accuracy of the model results have been demonstrated by comparing them to those of small-scale (coarse-grid) models. These comparisons show that simulation results obtained with the refined grid provide more detailed predictions of the future flow conditions at the site, aiding in the assessment of proposed repository performance

  1. Variational Iteration Method for Fifth-Order Boundary Value Problems Using He's Polynomials

    Directory of Open Access Journals (Sweden)

    Muhammad Aslam Noor

    2008-01-01

    Full Text Available We apply the variational iteration method using He's polynomials (VIMHP for solving the fifth-order boundary value problems. The proposed method is an elegant combination of variational iteration and the homotopy perturbation methods and is mainly due to Ghorbani (2007. The suggested algorithm is quite efficient and is practically well suited for use in these problems. The proposed iterative scheme finds the solution without any discritization, linearization, or restrictive assumptions. Several examples are given to verify the reliability and efficiency of the method. The fact that the proposed technique solves nonlinear problems without using Adomian's polynomials can be considered as a clear advantage of this algorithm over the decomposition method.

  2. FPGA implementation of low complexity LDPC iterative decoder

    Science.gov (United States)

    Verma, Shivani; Sharma, Sanjay

    2016-07-01

    Low-density parity-check (LDPC) codes, proposed by Gallager, emerged as a class of codes which can yield very good performance on the additive white Gaussian noise channel as well as on the binary symmetric channel. LDPC codes have gained lots of importance due to their capacity achieving property and excellent performance in the noisy channel. Belief propagation (BP) algorithm and its approximations, most notably min-sum, are popular iterative decoding algorithms used for LDPC and turbo codes. The trade-off between the hardware complexity and the decoding throughput is a critical factor in the implementation of the practical decoder. This article presents introduction to LDPC codes and its various decoding algorithms followed by realisation of LDPC decoder by using simplified message passing algorithm and partially parallel decoder architecture. Simplified message passing algorithm has been proposed for trade-off between low decoding complexity and decoder performance. It greatly reduces the routing and check node complexity of the decoder. Partially parallel decoder architecture possesses high speed and reduced complexity. The improved design of the decoder possesses a maximum symbol throughput of 92.95 Mbps and a maximum of 18 decoding iterations. The article presents implementation of 9216 bits, rate-1/2, (3, 6) LDPC decoder on Xilinx XC3D3400A device from Spartan-3A DSP family.

  3. ITER Remote Maintenance System (IRMS) lifecycle management

    Energy Technology Data Exchange (ETDEWEB)

    Tesini, Alessandro, E-mail: alessandro.tesini@iter.org [ITER Organization, CS 90 046, 13067 St. Paul Lez Durance Cedex (France); Otto' , Bede [Oxford Technologies Ltd, 7, Nuffield Way, Abingdon, Oxon OX14 1RJ (United Kingdom); Blight, John [FAAST 31c Allee de la Granette, 13600 Ceyreste (France); Choi, Chang-Hwan; Friconneau, Jean-Pierre; Gotewal, Krishan Kumar; Hamilton, David [ITER Organization, CS 90 046, 13067 St. Paul Lez Durance Cedex (France); Heckendorn, Frank [FD Technologies, PO Box 6686, Aiken, SC (United States); Martins, Jean-Pierre [ITER Organization, CS 90 046, 13067 St. Paul Lez Durance Cedex (France); Marty, Thomas [Westinghouse, 122, avenue de Hambourg, 13008 Marseille (France); Nakahira, Masataka; Palmer, Jim; Subramanian, Rajendran [ITER Organization, CS 90 046, 13067 St. Paul Lez Durance Cedex (France)

    2011-10-15

    The availability of the ITER machine to perform its scientific program is strongly dependent on the performance of the different Remote Handling (RH) systems constituting the ITER Remote Maintenance System (IRMS). The lifecycle of the IRMS will largely exceed 40 years from initial concept design and proof testing through to machine decommissioning. Such a long lifecycle requires that a rigorous approach is put in place to guarantee the technical capabilities of the highly innovative IRMS, its efficiency and its availability. For this purpose, an IRMS System Engineering and IRMS lifecycle management approach has been adopted by ITER. The approach aims at ensuring the IRMS full operability and availability at an acceptable cost of ownership over the full ITER machine assembly and operations period. The IRMS lifecycle management method described in this paper covers such subjects as specific requirements for IRMS design reviews, monitoring during manufacture, factory and site acceptance testing, integrated commissioning, decontamination, maintenance and re-qualification strategies, requirements for Integrated Logistical Support during operations. The updating and implementation of the IRMS lifecycle strategy and this procedure will be managed and monitored by the Remote Handling Integrated Product Team (RH-IPT). Although developed for the IRMS, the basic principles and procedures of lifecycle management could be applied to other ITER plant systems whose reliability and availability will be essential for the continued operation of the ITER machine.

  4. ITER Remote Maintenance System (IRMS) lifecycle management

    International Nuclear Information System (INIS)

    Tesini, Alessandro; Otto', Bede; Blight, John; Choi, Chang-Hwan; Friconneau, Jean-Pierre; Gotewal, Krishan Kumar; Hamilton, David; Heckendorn, Frank; Martins, Jean-Pierre; Marty, Thomas; Nakahira, Masataka; Palmer, Jim; Subramanian, Rajendran

    2011-01-01

    The availability of the ITER machine to perform its scientific program is strongly dependent on the performance of the different Remote Handling (RH) systems constituting the ITER Remote Maintenance System (IRMS). The lifecycle of the IRMS will largely exceed 40 years from initial concept design and proof testing through to machine decommissioning. Such a long lifecycle requires that a rigorous approach is put in place to guarantee the technical capabilities of the highly innovative IRMS, its efficiency and its availability. For this purpose, an IRMS System Engineering and IRMS lifecycle management approach has been adopted by ITER. The approach aims at ensuring the IRMS full operability and availability at an acceptable cost of ownership over the full ITER machine assembly and operations period. The IRMS lifecycle management method described in this paper covers such subjects as specific requirements for IRMS design reviews, monitoring during manufacture, factory and site acceptance testing, integrated commissioning, decontamination, maintenance and re-qualification strategies, requirements for Integrated Logistical Support during operations. The updating and implementation of the IRMS lifecycle strategy and this procedure will be managed and monitored by the Remote Handling Integrated Product Team (RH-IPT). Although developed for the IRMS, the basic principles and procedures of lifecycle management could be applied to other ITER plant systems whose reliability and availability will be essential for the continued operation of the ITER machine.

  5. Iterative Runge–Kutta-type methods for nonlinear ill-posed problems

    International Nuclear Information System (INIS)

    Böckmann, C; Pornsawad, P

    2008-01-01

    We present a regularization method for solving nonlinear ill-posed problems by applying the family of Runge–Kutta methods to an initial value problem, in particular, to the asymptotical regularization method. We prove that the developed iterative regularization method converges to a solution under certain conditions and with a general stopping rule. Some particular iterative regularization methods are numerically implemented. Numerical results of the examples show that the developed Runge–Kutta-type regularization methods yield stable solutions and that particular implicit methods are very efficient in saving iteration steps

  6. Efficient method to design RF pulses for parallel excitation MRI using gridding and conjugate gradient.

    Science.gov (United States)

    Feng, Shuo; Ji, Jim

    2014-04-01

    Parallel excitation (pTx) techniques with multiple transmit channels have been widely used in high field MRI imaging to shorten the RF pulse duration and/or reduce the specific absorption rate (SAR). However, the efficiency of pulse design still needs substantial improvement for practical real-time applications. In this paper, we present a detailed description of a fast pulse design method with Fourier domain gridding and a conjugate gradient method. Simulation results of the proposed method show that the proposed method can design pTx pulses at an efficiency 10 times higher than that of the conventional conjugate-gradient based method, without reducing the accuracy of the desirable excitation patterns.

  7. Qualification test for ITER HCCR-TBS mockups with high heat flux test facility

    Energy Technology Data Exchange (ETDEWEB)

    Kim, Suk-Kwon, E-mail: skkim93@kaeri.re.kr [Korea Atomic Energy Research Institute, Daejeon (Korea, Republic of); Park, Seong Dae; Jin, Hyung Gon; Lee, Eo Hwak; Yoon, Jae-Sung; Lee, Dong Won [Korea Atomic Energy Research Institute, Daejeon (Korea, Republic of); Cho, Seungyon [National Fusion Research Institute, Daejeon (Korea, Republic of)

    2016-11-01

    Highlights: • The test mockups for ITER HCCR (Helium Cooled Ceramic Reflector) TBS (Test Blanket System) in Korea were designed and fabricated. • A thermo-hydraulic analysis was performed using a high heat flux test facility by using electron beam. • The plan for qualification tests was developed to evaluate the thermo-hydraulic efficiency in accordance with the requirements of the ITER Organization. - Abstract: The test mockups for ITER HCCR (Helium Cooled Ceramic Reflector) TBS (Test Blanket System) in Korea were designed and fabricated, and an integrity and thermo-hydraulic performance test should be completed under the same or similar operation conditions of ITER. The test plan for a thermo-hydraulic analysis was developed by using a high heat flux test facility, called the Korean heat load test facility by using electron beam (KoHLT-EB). This facility is utilized for a qualification test of the plasma facing component (PFC) for the ITER first wall and DEMO divertor, and for the thermo-hydraulic experiments. In this work, KoHLT-EB will be used for the plan of the performance qualification test of the ITER HCCR-TBS mockups. This qualification tests should be performed to evaluate the thermo-hydraulic efficiency in accordance with the requirements of the ITER Organization (IO), which describe the specifications and qualifications of the heat flux test facility and test procedure for ITER PFC.

  8. On the Convergence of Asynchronous Parallel Pattern Search

    International Nuclear Information System (INIS)

    Tamara Gilbson Kolda

    2002-01-01

    In this paper the authors prove global convergence for asynchronous parallel pattern search. In standard pattern search, decisions regarding the update of the iterate and the step-length control parameter are synchronized implicitly across all search directions. They lose this feature in asynchronous parallel pattern search since the search along each direction proceeds semi-autonomously. By bounding the value of the step-length control parameter after any step that produces decrease along a single search direction, they can prove that all the processes share a common accumulation point and that such a point is a stationary point of the standard nonlinear unconstrained optimization problem

  9. A parallel algorithm for solving the multidimensional within-group discrete ordinates equations with spatial domain decomposition - 104

    International Nuclear Information System (INIS)

    Zerr, R.J.; Azmy, Y.Y.

    2010-01-01

    A spatial domain decomposition with a parallel block Jacobi solution algorithm has been developed based on the integral transport matrix formulation of the discrete ordinates approximation for solving the within-group transport equation. The new methodology abandons the typical source iteration scheme and solves directly for the fully converged scalar flux. Four matrix operators are constructed based upon the integral form of the discrete ordinates equations. A single differential mesh sweep is performed to construct these operators. The method is parallelized by decomposing the problem domain into several smaller sub-domains, each treated as an independent problem. The scalar flux of each sub-domain is solved exactly given incoming angular flux boundary conditions. Sub-domain boundary conditions are updated iteratively, and convergence is achieved when the scalar flux error in all cells meets a pre-specified convergence criterion. The method has been implemented in a computer code that was then employed for strong scaling studies of the algorithm's parallel performance via a fixed-size problem in tests ranging from one domain up to one cell per sub-domain. Results indicate that the best parallel performance compared to source iterations occurs for optically thick, highly scattering problems, the variety that is most difficult for the traditional SI scheme to solve. Moreover, the minimum execution time occurs when each sub-domain contains a total of four cells. (authors)

  10. Thermo-mechanical analysis of ITER first mirrors and its use for the ITER equatorial visible/infrared wide angle viewing system optical design

    International Nuclear Information System (INIS)

    Joanny, M.; Salasca, S.; Dapena, M.; Cantone, B.; Travère, J. M.; Thellier, C.; Fermé, J. J.; Marot, L.; Buravand, O.; Perrollaz, G.; Zeile, C.

    2012-01-01

    ITER first mirrors (FMs), as the first components of most ITER optical diagnostics, will be exposed to high plasma radiation flux and neutron load. To reduce the FMs heating and optical surface deformation induced during ITER operation, the use of relevant materials and cooling system are foreseen. The calculations led on different materials and FMs designs and geometries (100 mm and 200 mm) show that the use of CuCrZr and TZM, and a complex integrated cooling system can limit efficiently the FMs heating and reduce their optical surface deformation under plasma radiation flux and neutron load. These investigations were used to evaluate, for the ITER equatorial port visible/infrared wide angle viewing system, the impact of the FMs properties change during operation on the instrument main optical performances. The results obtained are presented and discussed.

  11. Development of parallel algorithms for electrical power management in space applications

    Science.gov (United States)

    Berry, Frederick C.

    1989-01-01

    The application of parallel techniques for electrical power system analysis is discussed. The Newton-Raphson method of load flow analysis was used along with the decomposition-coordination technique to perform load flow analysis. The decomposition-coordination technique enables tasks to be performed in parallel by partitioning the electrical power system into independent local problems. Each independent local problem represents a portion of the total electrical power system on which a loan flow analysis can be performed. The load flow analysis is performed on these partitioned elements by using the Newton-Raphson load flow method. These independent local problems will produce results for voltage and power which can then be passed to the coordinator portion of the solution procedure. The coordinator problem uses the results of the local problems to determine if any correction is needed on the local problems. The coordinator problem is also solved by an iterative method much like the local problem. The iterative method for the coordination problem will also be the Newton-Raphson method. Therefore, each iteration at the coordination level will result in new values for the local problems. The local problems will have to be solved again along with the coordinator problem until some convergence conditions are met.

  12. Implementation of GPU parallel equilibrium reconstruction for plasma control in EAST

    Energy Technology Data Exchange (ETDEWEB)

    Huang, Yao, E-mail: yaohuang@ipp.ac.cn [Institute of Plasma Physics, Chinese Academy of Sciences, Hefei (China); Xiao, B.J. [Institute of Plasma Physics, Chinese Academy of Sciences, Hefei (China); School of Nuclear Science & Technology, University of Science & Technology of China (China); Luo, Z.P.; Yuan, Q.P.; Pei, X.F. [Institute of Plasma Physics, Chinese Academy of Sciences, Hefei (China); Yue, X.N. [School of Nuclear Science & Technology, University of Science & Technology of China (China)

    2016-11-15

    Highlights: • We described parallel equilibrium reconstruction code P-EFIT running on GPU was integrated with EAST plasma control system. • Compared with RT-EFIT used in EAST, P-EFIT has better spatial resolution and full algorithm of EFIT per iteration. • With the data interface through RFM, 65 × 65 spatial grids P-EFIT can satisfy the accuracy and time feasibility requirements for plasma control. • Successful control using ISOFLUX/P-EFIT was established in the dedicated experiment during the EAST 2014 campaign. • This work is a stepping-stone towards versatile ISOFLUX/P-EFIT control, such as real-time equilibrium reconstruction with more diagnostics. - Abstract: Implementation of P-EFIT code for plasma control in EAST is described. P-EFIT is based on the EFIT framework, but built with the CUDA™ architecture to take advantage of massively parallel Graphical Processing Unit (GPU) cores to significantly accelerate the computation. 65 × 65 grid size P-EFIT can complete one reconstruction iteration in 300 μs, with one iteration strategy, it can satisfy the needs of real-time plasma shape control. Data interface between P-EFIT and PCS is realized and developed by transferring data through RFM. First application of P-EFIT to discharge control in EAST is described.

  13. Novel Robot Solutions for Carrying out Field Joint Welding and Machining in the Assembly of the Vacuum Vessel of ITER

    International Nuclear Information System (INIS)

    Pessi, P.

    2009-01-01

    It is necessary to use highly specialized robots in ITER (International Thermonuclear Experimental Reactor) both in the manufacturing and maintenance of the reactor due to a demanding environment. The sectors of the ITER vacuum vessel (VV) require more stringent tolerances than normally expected for the size of the structure involved. VV consists of nine sectors that are to be welded together. The vacuum vessel has a toroidal chamber structure. The task of the designed robot is to carry the welding apparatus along a path with a stringent tolerance during the assembly operation. In addition to the initial vacuum vessel assembly, after a limited running period, sectors need to be replaced for repair. Mechanisms with closed-loop kinematic chains are used in the design of robots in this work. One version is a purely parallel manipulator and another is a hybrid manipulator where the parallel and serial structures are combined. Traditional industrial robots that generally have the links actuated in series are inherently not very rigid and have poor dynamic performance in high speed and high dynamic loading conditions. Compared with open chain manipulators, parallel manipulators have high stiffness, high accuracy and a high force/torque capacity in a reduced workspace. Parallel manipulators have a mechanical architecture where all of the links are connected to the base and to the end-effector of the robot. The purpose of this thesis is to develop special parallel robots for the assembly, machining and repairing of the VV of the ITER. The process of the assembly and machining of the vacuum vessel needs a special robot. By studying the structure of the vacuum vessel, two novel parallel robots were designed and built; they have six and ten degrees of freedom driven by hydraulic cylinders and electrical servo motors. Kinematic models for the proposed robots were defined and two prototypes built. Experiments for machine cutting and laser welding with the 6-DOF robot were

  14. Novel Robot Solutions for Carrying out Field Joint Welding and Machining in the Assembly of the Vacuum Vessel of ITER

    Energy Technology Data Exchange (ETDEWEB)

    Pessi, P.

    2009-07-01

    It is necessary to use highly specialized robots in ITER (International Thermonuclear Experimental Reactor) both in the manufacturing and maintenance of the reactor due to a demanding environment. The sectors of the ITER vacuum vessel (VV) require more stringent tolerances than normally expected for the size of the structure involved. VV consists of nine sectors that are to be welded together. The vacuum vessel has a toroidal chamber structure. The task of the designed robot is to carry the welding apparatus along a path with a stringent tolerance during the assembly operation. In addition to the initial vacuum vessel assembly, after a limited running period, sectors need to be replaced for repair. Mechanisms with closed-loop kinematic chains are used in the design of robots in this work. One version is a purely parallel manipulator and another is a hybrid manipulator where the parallel and serial structures are combined. Traditional industrial robots that generally have the links actuated in series are inherently not very rigid and have poor dynamic performance in high speed and high dynamic loading conditions. Compared with open chain manipulators, parallel manipulators have high stiffness, high accuracy and a high force/torque capacity in a reduced workspace. Parallel manipulators have a mechanical architecture where all of the links are connected to the base and to the end-effector of the robot. The purpose of this thesis is to develop special parallel robots for the assembly, machining and repairing of the VV of the ITER. The process of the assembly and machining of the vacuum vessel needs a special robot. By studying the structure of the vacuum vessel, two novel parallel robots were designed and built; they have six and ten degrees of freedom driven by hydraulic cylinders and electrical servo motors. Kinematic models for the proposed robots were defined and two prototypes built. Experiments for machine cutting and laser welding with the 6-DOF robot were

  15. An Efficient MapReduce-Based Parallel Clustering Algorithm for Distributed Traffic Subarea Division

    Directory of Open Access Journals (Sweden)

    Dawen Xia

    2015-01-01

    Full Text Available Traffic subarea division is vital for traffic system management and traffic network analysis in intelligent transportation systems (ITSs. Since existing methods may not be suitable for big traffic data processing, this paper presents a MapReduce-based Parallel Three-Phase K-Means (Par3PKM algorithm for solving traffic subarea division problem on a widely adopted Hadoop distributed computing platform. Specifically, we first modify the distance metric and initialization strategy of K-Means and then employ a MapReduce paradigm to redesign the optimized K-Means algorithm for parallel clustering of large-scale taxi trajectories. Moreover, we propose a boundary identifying method to connect the borders of clustering results for each cluster. Finally, we divide traffic subarea of Beijing based on real-world trajectory data sets generated by 12,000 taxis in a period of one month using the proposed approach. Experimental evaluation results indicate that when compared with K-Means, Par2PK-Means, and ParCLARA, Par3PKM achieves higher efficiency, more accuracy, and better scalability and can effectively divide traffic subarea with big taxi trajectory data.

  16. SPECT reconstruction of combined cone beam and parallel hole collimation with experimental data

    International Nuclear Information System (INIS)

    Li, Jianying; Jaszczak, R.J.; Turkington, T.G.; Greer, K.L.; Coleman, R.E.

    1993-01-01

    The authors have developed three methods to combine parallel and cone bean (P and CB) SPECT data using modified Maximum Likelihood-Expectation Maximization (ML-EM) algorithms. The first combination method applies both parallel and cone beam data sets to reconstruct a single intermediate image after each iteration using the ML-EM algorithm. The other two iterative methods combine the intermediate parallel beam (PB) and cone beam (CB) source estimates to enhance the uniformity of images. These two methods are ad hoc methods. In earlier studies using computer Monte Carlo simulation, they suggested that improved images might be obtained by reconstructing combined P and CB SPECT data. These combined collimation methods are qualitatively evaluated using experimental data. An attenuation compensation is performed by including the effects of attenuation in the transition matrix as a multiplicative factor. The combined P and CB images are compared with CB-only images and the result indicate that the combined P and CB approaches suppress artifacts caused by truncated projections and correct for the distortions of the CB-only images

  17. Massively parallel multicanonical simulations

    Science.gov (United States)

    Gross, Jonathan; Zierenberg, Johannes; Weigel, Martin; Janke, Wolfhard

    2018-03-01

    Generalized-ensemble Monte Carlo simulations such as the multicanonical method and similar techniques are among the most efficient approaches for simulations of systems undergoing discontinuous phase transitions or with rugged free-energy landscapes. As Markov chain methods, they are inherently serial computationally. It was demonstrated recently, however, that a combination of independent simulations that communicate weight updates at variable intervals allows for the efficient utilization of parallel computational resources for multicanonical simulations. Implementing this approach for the many-thread architecture provided by current generations of graphics processing units (GPUs), we show how it can be efficiently employed with of the order of 104 parallel walkers and beyond, thus constituting a versatile tool for Monte Carlo simulations in the era of massively parallel computing. We provide the fully documented source code for the approach applied to the paradigmatic example of the two-dimensional Ising model as starting point and reference for practitioners in the field.

  18. Parallel Atomistic Simulations

    Energy Technology Data Exchange (ETDEWEB)

    HEFFELFINGER,GRANT S.

    2000-01-18

    Algorithms developed to enable the use of atomistic molecular simulation methods with parallel computers are reviewed. Methods appropriate for bonded as well as non-bonded (and charged) interactions are included. While strategies for obtaining parallel molecular simulations have been developed for the full variety of atomistic simulation methods, molecular dynamics and Monte Carlo have received the most attention. Three main types of parallel molecular dynamics simulations have been developed, the replicated data decomposition, the spatial decomposition, and the force decomposition. For Monte Carlo simulations, parallel algorithms have been developed which can be divided into two categories, those which require a modified Markov chain and those which do not. Parallel algorithms developed for other simulation methods such as Gibbs ensemble Monte Carlo, grand canonical molecular dynamics, and Monte Carlo methods for protein structure determination are also reviewed and issues such as how to measure parallel efficiency, especially in the case of parallel Monte Carlo algorithms with modified Markov chains are discussed.

  19. Parallelization of a blind deconvolution algorithm

    Science.gov (United States)

    Matson, Charles L.; Borelli, Kathy J.

    2006-09-01

    Often it is of interest to deblur imagery in order to obtain higher-resolution images. Deblurring requires knowledge of the blurring function - information that is often not available separately from the blurred imagery. Blind deconvolution algorithms overcome this problem by jointly estimating both the high-resolution image and the blurring function from the blurred imagery. Because blind deconvolution algorithms are iterative in nature, they can take minutes to days to deblur an image depending how many frames of data are used for the deblurring and the platforms on which the algorithms are executed. Here we present our progress in parallelizing a blind deconvolution algorithm to increase its execution speed. This progress includes sub-frame parallelization and a code structure that is not specialized to a specific computer hardware architecture.

  20. Iterative group splitting algorithm for opportunistic scheduling systems

    KAUST Repository

    Nam, Haewoon; Alouini, Mohamed-Slim

    2014-01-01

    An efficient feedback algorithm for opportunistic scheduling systems based on iterative group splitting is proposed in this paper. Similar to the opportunistic splitting algorithm, the proposed algorithm adjusts (or lowers) the feedback threshold

  1. Performance Analysis of Parallel Mathematical Subroutine library PARCEL

    International Nuclear Information System (INIS)

    Yamada, Susumu; Shimizu, Futoshi; Kobayashi, Kenichi; Kaburaki, Hideo; Kishida, Norio

    2000-01-01

    The parallel mathematical subroutine library PARCEL (Parallel Computing Elements) has been developed by Japan Atomic Energy Research Institute for easy use of typical parallelized mathematical codes in any application problems on distributed parallel computers. The PARCEL includes routines for linear equations, eigenvalue problems, pseudo-random number generation, and fast Fourier transforms. It is shown that the results of performance for linear equations routines exhibit good parallelization efficiency on vector, as well as scalar, parallel computers. A comparison of the efficiency results with the PETSc (Portable Extensible Tool kit for Scientific Computations) library has been reported. (author)

  2. A sparsity-regularized Born iterative method for reconstruction of two-dimensional piecewise continuous inhomogeneous domains

    KAUST Repository

    Sandhu, Ali Imran; Desmal, Abdulla; Bagci, Hakan

    2016-01-01

    A sparsity-regularized Born iterative method (BIM) is proposed for efficiently reconstructing two-dimensional piecewise-continuous inhomogeneous dielectric profiles. Such profiles are typically not spatially sparse, which reduces the efficiency of the sparsity-promoting regularization. To overcome this problem, scattered fields are represented in terms of the spatial derivative of the dielectric profile and reconstruction is carried out over samples of the dielectric profile's derivative. Then, like the conventional BIM, the nonlinear problem is iteratively converted into a sequence of linear problems (in derivative samples) and sparsity constraint is enforced on each linear problem using the thresholded Landweber iterations. Numerical results, which demonstrate the efficiency and accuracy of the proposed method in reconstructing piecewise-continuous dielectric profiles, are presented.

  3. Parallel algorithms for continuum dynamics

    International Nuclear Information System (INIS)

    Hicks, D.L.; Liebrock, L.M.

    1987-01-01

    Simply porting existing parallel programs to a new parallel processor may not achieve the full speedup possible; to achieve the maximum efficiency may require redesigning the parallel algorithms for the specific architecture. The authors discuss here parallel algorithms that were developed first for the HEP processor and then ported to the CRAY X-MP/4, the ELXSI/10, and the Intel iPSC/32. Focus is mainly on the most recent parallel processing results produced, i.e., those on the Intel Hypercube. The applications are simulations of continuum dynamics in which the momentum and stress gradients are important. Examples of these are inertial confinement fusion experiments, severe breaks in the coolant system of a reactor, weapons physics, shock-wave physics. Speedup efficiencies on the Intel iPSC Hypercube are very sensitive to the ratio of communication to computation. Great care must be taken in designing algorithms for this machine to avoid global communication. This is much more critical on the iPSC than it was on the three previous parallel processors

  4. Conformable variational iteration method

    Directory of Open Access Journals (Sweden)

    Omer Acan

    2017-02-01

    Full Text Available In this study, we introduce the conformable variational iteration method based on new defined fractional derivative called conformable fractional derivative. This new method is applied two fractional order ordinary differential equations. To see how the solutions of this method, linear homogeneous and non-linear non-homogeneous fractional ordinary differential equations are selected. Obtained results are compared the exact solutions and their graphics are plotted to demonstrate efficiency and accuracy of the method.

  5. Jointly-check iterative decoding algorithm for quantum sparse graph codes

    International Nuclear Information System (INIS)

    Jun-Hu, Shao; Bao-Ming, Bai; Wei, Lin; Lin, Zhou

    2010-01-01

    For quantum sparse graph codes with stabilizer formalism, the unavoidable girth-four cycles in their Tanner graphs greatly degrade the iterative decoding performance with a standard belief-propagation (BP) algorithm. In this paper, we present a jointly-check iterative algorithm suitable for decoding quantum sparse graph codes efficiently. Numerical simulations show that this modified method outperforms the standard BP algorithm with an obvious performance improvement. (general)

  6. LightForce Photon-Pressure Collision Avoidance: Updated Efficiency Analysis Utilizing a Highly Parallel Simulation Approach

    Science.gov (United States)

    Stupl, Jan; Faber, Nicolas; Foster, Cyrus; Yang, Fan Yang; Nelson, Bron; Aziz, Jonathan; Nuttall, Andrew; Henze, Chris; Levit, Creon

    2014-01-01

    This paper provides an updated efficiency analysis of the LightForce space debris collision avoidance scheme. LightForce aims to prevent collisions on warning by utilizing photon pressure from ground based, commercial off the shelf lasers. Past research has shown that a few ground-based systems consisting of 10 kilowatt class lasers directed by 1.5 meter telescopes with adaptive optics could lower the expected number of collisions in Low Earth Orbit (LEO) by an order of magnitude. Our simulation approach utilizes the entire Two Line Element (TLE) catalogue in LEO for a given day as initial input. Least-squares fitting of a TLE time series is used for an improved orbit estimate. We then calculate the probability of collision for all LEO objects in the catalogue for a time step of the simulation. The conjunctions that exceed a threshold probability of collision are then engaged by a simulated network of laser ground stations. After those engagements, the perturbed orbits are used to re-assess the probability of collision and evaluate the efficiency of the system. This paper describes new simulations with three updated aspects: 1) By utilizing a highly parallel simulation approach employing hundreds of processors, we have extended our analysis to a much broader dataset. The simulation time is extended to one year. 2) We analyze not only the efficiency of LightForce on conjunctions that naturally occur, but also take into account conjunctions caused by orbit perturbations due to LightForce engagements. 3) We use a new simulation approach that is regularly updating the LightForce engagement strategy, as it would be during actual operations. In this paper we present our simulation approach to parallelize the efficiency analysis, its computational performance and the resulting expected efficiency of the LightForce collision avoidance system. Results indicate that utilizing a network of four LightForce stations with 20 kilowatt lasers, 85% of all conjunctions with a

  7. Parallelization of the FLAPW method

    International Nuclear Information System (INIS)

    Canning, A.; Mannstadt, W.; Freeman, A.J.

    1999-01-01

    The FLAPW (full-potential linearized-augmented plane-wave) method is one of the most accurate first-principles methods for determining electronic and magnetic properties of crystals and surfaces. Until the present work, the FLAPW method has been limited to systems of less than about one hundred atoms due to a lack of an efficient parallel implementation to exploit the power and memory of parallel computers. In this work we present an efficient parallelization of the method by division among the processors of the plane-wave components for each state. The code is also optimized for RISC (reduced instruction set computer) architectures, such as those found on most parallel computers, making full use of BLAS (basic linear algebra subprograms) wherever possible. Scaling results are presented for systems of up to 686 silicon atoms and 343 palladium atoms per unit cell, running on up to 512 processors on a CRAY T3E parallel computer

  8. Parallelization of the FLAPW method

    Science.gov (United States)

    Canning, A.; Mannstadt, W.; Freeman, A. J.

    2000-08-01

    The FLAPW (full-potential linearized-augmented plane-wave) method is one of the most accurate first-principles methods for determining structural, electronic and magnetic properties of crystals and surfaces. Until the present work, the FLAPW method has been limited to systems of less than about a hundred atoms due to the lack of an efficient parallel implementation to exploit the power and memory of parallel computers. In this work, we present an efficient parallelization of the method by division among the processors of the plane-wave components for each state. The code is also optimized for RISC (reduced instruction set computer) architectures, such as those found on most parallel computers, making full use of BLAS (basic linear algebra subprograms) wherever possible. Scaling results are presented for systems of up to 686 silicon atoms and 343 palladium atoms per unit cell, running on up to 512 processors on a CRAY T3E parallel supercomputer.

  9. ITER blanket module shield block design and analysis

    International Nuclear Information System (INIS)

    Mitin, D.; Khomyakov, S.; Razmerov, A.; Strebkov, Yu.

    2008-01-01

    This paper presents the alternative design of the shield block cooling path for a typical ITER blanket module with a predominantly sequential flow circuit. A number of serious disadvantages have been observed for the reference design, where the parallel flow circuit is used, which is inherent in the majority of blanket modules. The paper discusses these disadvantages and demonstrates the benefit of the alternative design based on the detailed design and the technological, hydraulic, thermal, structural and strength analyses, conducted for module no. 17

  10. ITER-FEAT - outline design report. Report by the ITER Director. ITER meeting, Tokyo, January 2000

    International Nuclear Information System (INIS)

    2001-01-01

    It is now possible to define the key elements of ITER-FEAT. This report provides the results, to date, of the joint work of the Special Working Group in the form of an Outline Design Report on the ITER-FEAT design which, subject to the views of ITER Council and of the Parties, will be the focus of further detailed design work and analysis in order to provide to the Parties a complete and fully integrated engineering design within the framework of the ITER EDA extension

  11. Parallel plasma fluid turbulence calculations

    International Nuclear Information System (INIS)

    Leboeuf, J.N.; Carreras, B.A.; Charlton, L.A.; Drake, J.B.; Lynch, V.E.; Newman, D.E.; Sidikman, K.L.; Spong, D.A.

    1994-01-01

    The study of plasma turbulence and transport is a complex problem of critical importance for fusion-relevant plasmas. To this day, the fluid treatment of plasma dynamics is the best approach to realistic physics at the high resolution required for certain experimentally relevant calculations. Core and edge turbulence in a magnetic fusion device have been modeled using state-of-the-art, nonlinear, three-dimensional, initial-value fluid and gyrofluid codes. Parallel implementation of these models on diverse platforms--vector parallel (National Energy Research Supercomputer Center's CRAY Y-MP C90), massively parallel (Intel Paragon XP/S 35), and serial parallel (clusters of high-performance workstations using the Parallel Virtual Machine protocol)--offers a variety of paths to high resolution and significant improvements in real-time efficiency, each with its own advantages. The largest and most efficient calculations have been performed at the 200 Mword memory limit on the C90 in dedicated mode, where an overlap of 12 to 13 out of a maximum of 16 processors has been achieved with a gyrofluid model of core fluctuations. The richness of the physics captured by these calculations is commensurate with the increased resolution and efficiency and is limited only by the ingenuity brought to the analysis of the massive amounts of data generated

  12. Parallel computation for solving the tridiagonal linear system of equations

    International Nuclear Information System (INIS)

    Ishiguro, Misako; Harada, Hiroo; Fujii, Minoru; Fujimura, Toichiro; Nakamura, Yasuhiro; Nanba, Katsumi.

    1981-09-01

    Recently, applications of parallel computation for scientific calculations have increased from the need of the high speed calculation of large scale programs. At the JAERI computing center, an array processor FACOM 230-75 APU has installed to study the applicability of parallel computation for nuclear codes. We made some numerical experiments by using the APU on the methods of solution of tridiagonal linear equation which is an important problem in scientific calculations. Referring to the recent papers with parallel methods, we investigate eight ones. These are Gauss elimination method, Parallel Gauss method, Accelerated parallel Gauss method, Jacobi method, Recursive doubling method, Cyclic reduction method, Chebyshev iteration method, and Conjugate gradient method. The computing time and accuracy were compared among the methods on the basis of the numerical experiments. As the result, it is found that the Cyclic reduction method is best both in computing time and accuracy and the Gauss elimination method is the second one. (author)

  13. Algebraic multigrid preconditioning within parallel finite-element solvers for 3-D electromagnetic modelling problems in geophysics

    Science.gov (United States)

    Koldan, Jelena; Puzyrev, Vladimir; de la Puente, Josep; Houzeaux, Guillaume; Cela, José María

    2014-06-01

    We present an elaborate preconditioning scheme for Krylov subspace methods which has been developed to improve the performance and reduce the execution time of parallel node-based finite-element (FE) solvers for 3-D electromagnetic (EM) numerical modelling in exploration geophysics. This new preconditioner is based on algebraic multigrid (AMG) that uses different basic relaxation methods, such as Jacobi, symmetric successive over-relaxation (SSOR) and Gauss-Seidel, as smoothers and the wave front algorithm to create groups, which are used for a coarse-level generation. We have implemented and tested this new preconditioner within our parallel nodal FE solver for 3-D forward problems in EM induction geophysics. We have performed series of experiments for several models with different conductivity structures and characteristics to test the performance of our AMG preconditioning technique when combined with biconjugate gradient stabilized method. The results have shown that, the more challenging the problem is in terms of conductivity contrasts, ratio between the sizes of grid elements and/or frequency, the more benefit is obtained by using this preconditioner. Compared to other preconditioning schemes, such as diagonal, SSOR and truncated approximate inverse, the AMG preconditioner greatly improves the convergence of the iterative solver for all tested models. Also, when it comes to cases in which other preconditioners succeed to converge to a desired precision, AMG is able to considerably reduce the total execution time of the forward-problem code-up to an order of magnitude. Furthermore, the tests have confirmed that our AMG scheme ensures grid-independent rate of convergence, as well as improvement in convergence regardless of how big local mesh refinements are. In addition, AMG is designed to be a black-box preconditioner, which makes it easy to use and combine with different iterative methods. Finally, it has proved to be very practical and efficient in the

  14. System engineering and configuration management in ITER

    International Nuclear Information System (INIS)

    Chiocchio, S.; Martin, E.; Barabaschi, P.; Bartels, Hans Werner; How, J.; Spears, W.

    2007-01-01

    The construction of ITER will represent a major challenge for the fusion community at large, because of the intrinsic complexity of the tokamak design, the large number of different systems which are all essential for its operation, the worldwide distribution of the design activities and the unusual procurement scheme based on a combination of in-kind and directly funded deliverables. A key requirement for the success of such a large project is that a systematic approach to ensure the consistency of the design with the required performance is adopted. Also, effective project management methods, tools and working practices must be deployed to facilitate the communication and collaboration among the institutions and industries involved in the project. The authors have been involved in the definition and practical implementation of the design integration and configuration control structure inside ITER and in the system engineering process during the selection and optimization of the machine configuration. In parallel, they have assessed design, drawing and documentation management software to be used for the construction phase. Here, they describe the experience gained in recent years, explain the drivers behind the selection of the documents and drawings management systems, and illustrate the scope and issues of the configuration management activities to ensure the congruence of the design, to control and track the design changes and to manage the interfaces among the ITER systems

  15. Block-Parallel Data Analysis with DIY2

    Energy Technology Data Exchange (ETDEWEB)

    Morozov, Dmitriy [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Peterka, Tom [Argonne National Lab. (ANL), Argonne, IL (United States)

    2017-08-30

    DIY2 is a programming model and runtime for block-parallel analytics on distributed-memory machines. Its main abstraction is block-structured data parallelism: data are decomposed into blocks; blocks are assigned to processing elements (processes or threads); computation is described as iterations over these blocks, and communication between blocks is defined by reusable patterns. By expressing computation in this general form, the DIY2 runtime is free to optimize the movement of blocks between slow and fast memories (disk and flash vs. DRAM) and to concurrently execute blocks residing in memory with multiple threads. This enables the same program to execute in-core, out-of-core, serial, parallel, single-threaded, multithreaded, or combinations thereof. This paper describes the implementation of the main features of the DIY2 programming model and optimizations to improve performance. DIY2 is evaluated on benchmark test cases to establish baseline performance for several common patterns and on larger complete analysis codes running on large-scale HPC machines.

  16. Automatic Control of ITER-like Structures

    International Nuclear Information System (INIS)

    Bosia, G.; Bremond, S.

    2005-01-01

    In ITER Ion Cyclotron System requires a power transfer efficiency in excess of 90% from power source to plasma in quasi continuous operation. This implies the availability of a control system capable of optimizing the array radiation spectrum, automatically acquiring impedance match between the power source and the plasma loaded array at the beginning of the power pulse and maintaining it against load variations due to plasma position and plasma edge parameters fluctuations, rapidly detecting voltage breakdowns in the array and/or in the transmission system and reliably discriminating them from fast load variations. In this paper a proposal for a practical ITER control system, including power, phase, frequency and impedance matching is described. (authors)

  17. Design of ITER neutron monitor using micro fission chambers

    International Nuclear Information System (INIS)

    Nishitani, Takeo; Ebisawa, Katsuyuki; Ando, Toshiro; Kasai, Satoshi; Johnson, L.C.; Walker, C.

    1998-08-01

    We are designing micro fission chambers, which are pencil size gas counters with fissile material inside, to be installed in the vacuum vessel as neutron flux monitors for ITER. We found that the 238 U micro fission chambers are not suitable because the detection efficiency will increase up to 50% in the ITER life time by breading 239 Pu. We propose to install 235 U micro fission chambers on the front side of the back plate in the gap between adjacent blanket modules and behind the blankets at 10 poloidal locations. One chamber will be installed in the divertor cassette just under the dome. Employing both pulse counting mode and Campbelling mode in the electronics, we can accomplish the ITER requirement of 10 7 dynamic range with 1 ms temporal resolution, and eliminate the effect of gamma-rays. We demonstrate by neutron Monte Carlo calculation with three-dimensional modeling that we avoid those detection efficiency changes by installing micro fission chambers at several poloidal locations inside the vacuum vessel. (author)

  18. Tokamak equilibria with non-parallel flow in a triangularity-deformed axisymmetric toroidal coordinate system

    Directory of Open Access Journals (Sweden)

    Ap Kuiroukidis

    2018-01-01

    Full Text Available We consider a generalized Grad–Shafranov equation (GGSE in a triangularity-deformed axisymmetric toroidal coordinate system and solve it numerically for the generic case of ITER-like and JET-like equilibria with non-parallel flow. It turns out that increase of the triangularity improves confinement by leading to larger values of the toroidal beta and the safety factor. This result is supported by the application of a criterion for linear stability valid for equilibria with flow parallel to the magnetic field. Also, the parallel flow has a weaker stabilizing effect.

  19. On the mathematic simulation of the energy efficiency for heat exchangers with the systems of impingement plane-parallel jets

    Directory of Open Access Journals (Sweden)

    Haritonova Larisa

    2017-01-01

    Full Text Available The article gives the analytical generalization of the data on the energy efficiency for heat exchangers with the flat heat exchange surface to which systems of impact plane parallel jets are sent. Functional relations of specific power consumption (per unit of area, which were obtained for the first time using the techniques of the similarity law, for moving a heat carrier are shown with regard to design and operation factors. The regression equations representing a mathematical model of the process enable to carry out an analysis of various factors impact on the parameter to be determined. The obtained results can be used to optimize or to create the calculation techniques for new highly-efficient heat exchange devices with jet plane -parallel impingement systems and also to reduce power consumption for moving a heat carrier.

  20. Tuning iteration space slicing based tiled multi-core code implementing Nussinov's RNA folding.

    Science.gov (United States)

    Palkowski, Marek; Bielecki, Wlodzimierz

    2018-01-15

    RNA folding is an ongoing compute-intensive task of bioinformatics. Parallelization and improving code locality for this kind of algorithms is one of the most relevant areas in computational biology. Fortunately, RNA secondary structure approaches, such as Nussinov's recurrence, involve mathematical operations over affine control loops whose iteration space can be represented by the polyhedral model. This allows us to apply powerful polyhedral compilation techniques based on the transitive closure of dependence graphs to generate parallel tiled code implementing Nussinov's RNA folding. Such techniques are within the iteration space slicing framework - the transitive dependences are applied to the statement instances of interest to produce valid tiles. The main problem at generating parallel tiled code is defining a proper tile size and tile dimension which impact parallelism degree and code locality. To choose the best tile size and tile dimension, we first construct parallel parametric tiled code (parameters are variables defining tile size). With this purpose, we first generate two nonparametric tiled codes with different fixed tile sizes but with the same code structure and then derive a general affine model, which describes all integer factors available in expressions of those codes. Using this model and known integer factors present in the mentioned expressions (they define the left-hand side of the model), we find unknown integers in this model for each integer factor available in the same fixed tiled code position and replace in this code expressions, including integer factors, with those including parameters. Then we use this parallel parametric tiled code to implement the well-known tile size selection (TSS) technique, which allows us to discover in a given search space the best tile size and tile dimension maximizing target code performance. For a given search space, the presented approach allows us to choose the best tile size and tile dimension in

  1. Fast Time and Space Parallel Algorithms for Solution of Parabolic Partial Differential Equations

    Science.gov (United States)

    Fijany, Amir

    1993-01-01

    In this paper, fast time- and Space -Parallel agorithms for solution of linear parabolic PDEs are developed. It is shown that the seemingly strictly serial iterations of the time-stepping procedure for solution of the problem can be completed decoupled.

  2. Efficient exact optimization of multi-objective redundancy allocation problems in series-parallel systems

    International Nuclear Information System (INIS)

    Cao, Dingzhou; Murat, Alper; Chinnam, Ratna Babu

    2013-01-01

    This paper proposes a decomposition-based approach to exactly solve the multi-objective Redundancy Allocation Problem for series-parallel systems. Redundancy allocation problem is a form of reliability optimization and has been the subject of many prior studies. The majority of these earlier studies treat redundancy allocation problem as a single objective problem maximizing the system reliability or minimizing the cost given certain constraints. The few studies that treated redundancy allocation problem as a multi-objective optimization problem relied on meta-heuristic solution approaches. However, meta-heuristic approaches have significant limitations: they do not guarantee that Pareto points are optimal and, more importantly, they may not identify all the Pareto-optimal points. In this paper, we treat redundancy allocation problem as a multi-objective problem, as is typical in practice. We decompose the original problem into several multi-objective sub-problems, efficiently and exactly solve sub-problems, and then systematically combine the solutions. The decomposition-based approach can efficiently generate all the Pareto-optimal solutions for redundancy allocation problems. Experimental results demonstrate the effectiveness and efficiency of the proposed method over meta-heuristic methods on a numerical example taken from the literature.

  3. A sparsity-regularized Born iterative method for reconstruction of two-dimensional piecewise continuous inhomogeneous domains

    KAUST Repository

    Sandhu, Ali Imran

    2016-04-10

    A sparsity-regularized Born iterative method (BIM) is proposed for efficiently reconstructing two-dimensional piecewise-continuous inhomogeneous dielectric profiles. Such profiles are typically not spatially sparse, which reduces the efficiency of the sparsity-promoting regularization. To overcome this problem, scattered fields are represented in terms of the spatial derivative of the dielectric profile and reconstruction is carried out over samples of the dielectric profile\\'s derivative. Then, like the conventional BIM, the nonlinear problem is iteratively converted into a sequence of linear problems (in derivative samples) and sparsity constraint is enforced on each linear problem using the thresholded Landweber iterations. Numerical results, which demonstrate the efficiency and accuracy of the proposed method in reconstructing piecewise-continuous dielectric profiles, are presented.

  4. A hybrid, massively parallel implementation of a genetic algorithm for optimization of the impact performance of a metal/polymer composite plate

    KAUST Repository

    Narayanan, Kiran

    2012-07-17

    A hybrid parallelization method composed of a coarse-grained genetic algorithm (GA) and fine-grained objective function evaluations is implemented on a heterogeneous computational resource consisting of 16 IBM Blue Gene/P racks, a single x86 cluster node and a high-performance file system. The GA iterator is coupled with a finite-element (FE) analysis code developed in house to facilitate computational steering in order to calculate the optimal impact velocities of a projectile colliding with a polyurea/structural steel composite plate. The FE code is capable of capturing adiabatic shear bands and strain localization, which are typically observed in high-velocity impact applications, and it includes several constitutive models of plasticity, viscoelasticity and viscoplasticity for metals and soft materials, which allow simulation of ductile fracture by void growth. A strong scaling study of the FE code was conducted to determine the optimum number of processes run in parallel. The relative efficiency of the hybrid, multi-level parallelization method is studied in order to determine the parameters for the parallelization. Optimal impact velocities of the projectile calculated using the proposed approach, are reported. © The Author(s) 2012.

  5. IWR-solution for the ITER vacuum vessel assembly

    Energy Technology Data Exchange (ETDEWEB)

    Wu, H., E-mail: huapeng@lut.fi [Laboratory of Intelligent Machines, Lappeenranta University of Technology (Finland); Handroos, H. [Laboratory of Intelligent Machines, Lappeenranta University of Technology (Finland); Pela, P. [Tekes (Finland); Wang, Y. [Laboratory of Intelligent Machines, Lappeenranta University of Technology (Finland)

    2011-10-15

    The assembly of ITER vacuum vessel (VV) is still a very big challenge as the process can only be done from inside the VV. The welding of the VV assembly is carried out using the dedicated robotic systems. The main functions of the robots are: (i) measuring the actual space between every two sectors, (ii) positioning of the 150 kg splice plates between the sector shells, (iii) welding the splice plates to the sector shells, (iv) NDT of the welds, (v) repairing, including machining of the welds, (vi) He-leak tests of the welds, and (vii) the non-planned functions that may turn out. This paper presents a reasonable method to assemble the ITER VV. In this article, one parallel mobile robot, running on the track rail fixed on the wall inside the VV, is designed and tested. The assembling process, carried out by the mobile robot together with the welding robot, is presented.

  6. Sparse electromagnetic imaging using nonlinear iterative shrinkage thresholding

    KAUST Repository

    Desmal, Abdulla; Bagci, Hakan

    2015-01-01

    A sparse nonlinear electromagnetic imaging scheme is proposed for reconstructing dielectric contrast of investigation domains from measured fields. The proposed approach constructs the optimization problem by introducing the sparsity constraint to the data misfit between the scattered fields expressed as a nonlinear function of the contrast and the measured fields and solves it using the nonlinear iterative shrinkage thresholding algorithm. The thresholding is applied to the result of every nonlinear Landweber iteration to enforce the sparsity constraint. Numerical results demonstrate the accuracy and efficiency of the proposed method in reconstructing sparse dielectric profiles.

  7. Sparse electromagnetic imaging using nonlinear iterative shrinkage thresholding

    KAUST Repository

    Desmal, Abdulla

    2015-04-13

    A sparse nonlinear electromagnetic imaging scheme is proposed for reconstructing dielectric contrast of investigation domains from measured fields. The proposed approach constructs the optimization problem by introducing the sparsity constraint to the data misfit between the scattered fields expressed as a nonlinear function of the contrast and the measured fields and solves it using the nonlinear iterative shrinkage thresholding algorithm. The thresholding is applied to the result of every nonlinear Landweber iteration to enforce the sparsity constraint. Numerical results demonstrate the accuracy and efficiency of the proposed method in reconstructing sparse dielectric profiles.

  8. ITER council proceedings: 2001

    International Nuclear Information System (INIS)

    2001-01-01

    Continuing the ITER EDA, two further ITER Council Meetings were held since the publication of ITER EDA documentation series no, 20, namely the ITER Council Meeting on 27-28 February 2001 in Toronto, and the ITER Council Meeting on 18-19 July in Vienna. That Meeting was the last one during the ITER EDA. This volume contains records of these Meetings, including: Records of decisions; List of attendees; ITER EDA status report; ITER EDA technical activities report; MAC report and advice; Final report of ITER EDA; and Press release

  9. Iterative Object Localization Algorithm Using Visual Images with a Reference Coordinate

    Directory of Open Access Journals (Sweden)

    We-Duke Cho

    2008-09-01

    Full Text Available We present a simplified algorithm for localizing an object using multiple visual images that are obtained from widely used digital imaging devices. We use a parallel projection model which supports both zooming and panning of the imaging devices. Our proposed algorithm is based on a virtual viewable plane for creating a relationship between an object position and a reference coordinate. The reference point is obtained from a rough estimate which may be obtained from the preestimation process. The algorithm minimizes localization error through the iterative process with relatively low-computational complexity. In addition, nonlinearity distortion of the digital image devices is compensated during the iterative process. Finally, the performances of several scenarios are evaluated and analyzed in both indoor and outdoor environments.

  10. ITER safety

    International Nuclear Information System (INIS)

    Raeder, J.; Piet, S.; Buende, R.

    1991-01-01

    As part of the series of publications by the IAEA that summarize the results of the Conceptual Design Activities for the ITER project, this document describes the ITER safety analyses. It contains an assessment of normal operation effluents, accident scenarios, plasma chamber safety, tritium system safety, magnet system safety, external loss of coolant and coolant flow problems, and a waste management assessment, while it describes the implementation of the safety approach for ITER. The document ends with a list of major conclusions, a set of topical remarks on technical safety issues, and recommendations for the Engineering Design Activities, safety considerations for siting ITER, and recommendations with regard to the safety issues for the R and D for ITER. Refs, figs and tabs

  11. Simulation-based design process for the verification of ITER remote handling systems

    International Nuclear Information System (INIS)

    Sibois, Romain; Määttä, Timo; Siuko, Mikko; Mattila, Jouni

    2014-01-01

    Highlights: •Verification and validation process for ITER remote handling system. •Simulation-based design process for early verification of ITER RH systems. •Design process centralized around simulation lifecycle management system. •Verification and validation roadmap for digital modelling phase. -- Abstract: The work behind this paper takes place in the EFDA's European Goal Oriented Training programme on Remote Handling (RH) “GOT-RH”. The programme aims to train engineers for activities supporting the ITER project and the long-term fusion programme. One of the projects of this programme focuses on the verification and validation (V and V) of ITER RH system requirements using digital mock-ups (DMU). The purpose of this project is to study and develop efficient approach of using DMUs in the V and V process of ITER RH system design utilizing a System Engineering (SE) framework. Complex engineering systems such as ITER facilities lead to substantial rise of cost while manufacturing the full-scale prototype. In the V and V process for ITER RH equipment, physical tests are a requirement to ensure the compliance of the system according to the required operation. Therefore it is essential to virtually verify the developed system before starting the prototype manufacturing phase. This paper gives an overview of the current trends in using digital mock-up within product design processes. It suggests a simulation-based process design centralized around a simulation lifecycle management system. The purpose of this paper is to describe possible improvements in the formalization of the ITER RH design process and V and V processes, in order to increase their cost efficiency and reliability

  12. Modeling of ELM Dynamics in ITER

    International Nuclear Information System (INIS)

    Pankin, A.Y.; Bateman, G.; Kritz, A.H.; Brennan, D.P.; Snyder, P.B.; Kruger, S.

    2007-01-01

    Edge localized modes (ELMs) are large scale instabilities that alter the H-mode pedestal, reduce the total plasma stored energy, and can result in heat pulses to the divertor plates. These modes can be triggered by pressure driven ballooning modes or by current driven peeling instabilities. In this study, stability analyses are carried out for a series of ITER equilibria that are generated with the TEQ and TOQ equilibrium codes. The H-mode pedestal pressure and parallel component of plasma current density are varied in a systematic way in order to include the relevant parameter space for a specific ITER discharge. Ideal MHD stability codes, DCON, ELITE, and BALOO code, are employed to determine whether or not each ITER equilibrium profile is unstable to peeling or ballooning modes in the pedestal region. Several equilibria that are close to the marginal stability boundary for peeling and ballooning modes are tested with the NIMROD non-ideal MHD code. The effects of finite resistivity are studied in a series of linear NIMROD computations. It is found that the peeling-ballooning stability threshold is very sensitive to the resistivity and viscosity profiles, which vary dramatically over a wide range near the separatrix. Due to the effects of finite resistivity and viscosity, the peeling-ballooning stability threshold is shifted compared to the ideal threshold. A fundamental question in the integrated modeling of ELMy H-mode discharges concerning how much plasma and current density is removed during each ELM crash can be addressed with nonlinear non-ideal MHD simulations. In this study, the NIMROD computer simulations are continued into the nonlinear stage for several ITER equilibria that are marginally unstable to peeling or ballooning modes. The role of two-fluid and finite Larmor radius effects on the ELM dynamics in ITER geometry is examined. The formation of ELM filament structures, which are observed in many existing tokamak experiments, is demonstrated for ITER

  13. Is Monte Carlo embarrassingly parallel?

    Energy Technology Data Exchange (ETDEWEB)

    Hoogenboom, J. E. [Delft Univ. of Technology, Mekelweg 15, 2629 JB Delft (Netherlands); Delft Nuclear Consultancy, IJsselzoom 2, 2902 LB Capelle aan den IJssel (Netherlands)

    2012-07-01

    Monte Carlo is often stated as being embarrassingly parallel. However, running a Monte Carlo calculation, especially a reactor criticality calculation, in parallel using tens of processors shows a serious limitation in speedup and the execution time may even increase beyond a certain number of processors. In this paper the main causes of the loss of efficiency when using many processors are analyzed using a simple Monte Carlo program for criticality. The basic mechanism for parallel execution is MPI. One of the bottlenecks turn out to be the rendez-vous points in the parallel calculation used for synchronization and exchange of data between processors. This happens at least at the end of each cycle for fission source generation in order to collect the full fission source distribution for the next cycle and to estimate the effective multiplication factor, which is not only part of the requested results, but also input to the next cycle for population control. Basic improvements to overcome this limitation are suggested and tested. Also other time losses in the parallel calculation are identified. Moreover, the threading mechanism, which allows the parallel execution of tasks based on shared memory using OpenMP, is analyzed in detail. Recommendations are given to get the maximum efficiency out of a parallel Monte Carlo calculation. (authors)

  14. Is Monte Carlo embarrassingly parallel?

    International Nuclear Information System (INIS)

    Hoogenboom, J. E.

    2012-01-01

    Monte Carlo is often stated as being embarrassingly parallel. However, running a Monte Carlo calculation, especially a reactor criticality calculation, in parallel using tens of processors shows a serious limitation in speedup and the execution time may even increase beyond a certain number of processors. In this paper the main causes of the loss of efficiency when using many processors are analyzed using a simple Monte Carlo program for criticality. The basic mechanism for parallel execution is MPI. One of the bottlenecks turn out to be the rendez-vous points in the parallel calculation used for synchronization and exchange of data between processors. This happens at least at the end of each cycle for fission source generation in order to collect the full fission source distribution for the next cycle and to estimate the effective multiplication factor, which is not only part of the requested results, but also input to the next cycle for population control. Basic improvements to overcome this limitation are suggested and tested. Also other time losses in the parallel calculation are identified. Moreover, the threading mechanism, which allows the parallel execution of tasks based on shared memory using OpenMP, is analyzed in detail. Recommendations are given to get the maximum efficiency out of a parallel Monte Carlo calculation. (authors)

  15. Parallel and series FED microstrip array with high efficiency and low cross polarization

    Science.gov (United States)

    Huang, John (Inventor)

    1995-01-01

    A microstrip array antenna for vertically polarized fan beam (approximately 2 deg x 50 deg) for C-band SAR applications with a physical area of 1.7 m by 0.17 m comprises two rows of patch elements and employs a parallel feed to left- and right-half sections of the rows. Each section is divided into two segments that are fed in parallel with the elements in each segment fed in series through matched transmission lines for high efficiency. The inboard section has half the number of patch elements of the outboard section, and the outboard sections, which have tapered distribution with identical transmission line sections, terminated with half wavelength long open-circuit stubs so that the remaining energy is reflected and radiated in phase. The elements of the two inboard segments of the two left- and right-half sections are provided with tapered transmission lines from element to element for uniform power distribution over the central third of the entire array antenna. The two rows of array elements are excited at opposite patch feed locations with opposite (180 deg difference) phases for reduced cross-polarization.

  16. Analysis of the ITER cryoplant operational modes

    International Nuclear Information System (INIS)

    Henry, D.; Journeaux, J.Y.; Roussel, P.; Michel, F.; Poncet, J.M.; Girard, A.; Kalinin, V.; Chesny, P.

    2007-01-01

    In the framework of an EFDA task, CEA is carrying out an analysis of the various ITER cryoplant operational modes. According to the project integration document, ITER is designed to be operated 365 days per year in order to optimize the available time of the Tokamak. It is anticipated that operation will be performed in long periods separated by maintenance periods (e.g. 10 days continuous operation and 1 week break) with annual or bi-annual major shutdown periods of a few months for maintenance, further installation and commissioning. For this operation schedule, auxiliary subsystems like the cryoplant and the cryodistribution have to cope with different heat loads which depend on the different ITER operating states. The cryoplant consists of four identical 4.5 K refrigerators and two 80 K helium loops coupled with two LN2 modules. All of these cryogenic subsystems have to operate in parallel to remove the heat loads from the magnet, 80 K shields, cryopumps and other small users. After a brief recall of the main particularities of a cryogenic system operating in a Tokamak environment, the first part of this study is dedicated to the assessment of the main ITER operation states. A new design of refrigeration loop for the HTS current leads, the updated layout of the cryodistribution system and revised strategy for operations of the cryopumps have been taken into consideration. The relevant normal operating scenarios of the cryoplant are checked for the typical ITER operating states like plasma operation state, short term stand by, short term maintenance, or test and conditioning state. The second part of the paper is dedicated to the abnormal operating modes coming from the magnets and from those generated by the cryoplant itself. The occurrence of a fast discharge or a quench of the magnets generates large heat loads disturbances and produces exceptional high mass flow rates which have to be managed by the cryoplant, while a failure of a cryogenic component induces

  17. Verification of SuperMC with ITER C-Lite neutronic model

    Energy Technology Data Exchange (ETDEWEB)

    Zhang, Shu [School of Nuclear Science and Technology, University of Science and Technology of China, Hefei, Anhui, 230027 (China); Key Laboratory of Neutronics and Radiation Safety, Institute of Nuclear Energy Safety Technology, Chinese Academy of Sciences, Hefei, Anhui, 230031 (China); Yu, Shengpeng [Key Laboratory of Neutronics and Radiation Safety, Institute of Nuclear Energy Safety Technology, Chinese Academy of Sciences, Hefei, Anhui, 230031 (China); He, Peng, E-mail: peng.he@fds.org.cn [Key Laboratory of Neutronics and Radiation Safety, Institute of Nuclear Energy Safety Technology, Chinese Academy of Sciences, Hefei, Anhui, 230031 (China)

    2016-12-15

    Highlights: • Verification of the SuperMC Monte Carlo transport code with ITER C-Lite model. • The modeling of the ITER C-Lite model using the latest SuperMC/MCAM. • All the calculated quantities are consistent with MCNP well. • Efficient variance reduction methods are adopted to accelerate the calculation. - Abstract: In pursit of accurate and high fidelity simulation, the reference model of ITER is becoming more and more detailed and complicated. Due to the complexity in geometry and the thick shielding of the reference model, the accurate modeling and precise simulaion of fusion neutronics are very challenging. Facing these difficulties, SuperMC, the Monte Carlo simulation software system developed by the FDS Team, has optimized its CAD interface for the automatic converting of more complicated models and increased its calculation efficiency with advanced variance reduction methods To demonstrate its capabilites of automatic modeling, neutron/photon coupled simulation and visual analysis for the ITER facility, numerical benchmarks using the ITER C-Lite neutronic model were performed. The nuclear heating in divertor and inboard toroidal field (TF) coils and a global neutron flux map were evaluated. All the calculated nuclear heating is compared with the results of the MCNP code and good consistencies between the two codes is shown. Using the global variance reduction methods in SuperMC, the average speed-up is 292 times for the calculation of inboard TF coils nuclear heating, and 91 times for the calculation of global flux map, compared with the analog run. These tests have shown that SuperMC is suitable for the design and analysis of ITER facility.

  18. Final report of the ITER EDA. Final report of the ITER Engineering Design Activities. Prepared by the ITER Council

    International Nuclear Information System (INIS)

    2001-01-01

    This is the Final Report by the ITER Council on work carried out by ITER participating countries on cooperation in the Engineering Design Activities (EDA) for the ITER. In this report the main ITER EDA technical objectives, the scope of ITER EDA, its organization and resources, engineering design of ITER tokamak and its main parameters are presented. This Report also includes safety and environmental assessments, site requirements and proposed schedule and estimates of manpower and cost as well as proposals on approaches to joint implementation of the project

  19. Limits on the efficiency of event-based algorithms for Monte Carlo neutron transport

    Directory of Open Access Journals (Sweden)

    Paul K. Romano

    2017-09-01

    Full Text Available The traditional form of parallelism in Monte Carlo particle transport simulations, wherein each individual particle history is considered a unit of work, does not lend itself well to data-level parallelism. Event-based algorithms, which were originally used for simulations on vector processors, may offer a path toward better utilizing data-level parallelism in modern computer architectures. In this study, a simple model is developed for estimating the efficiency of the event-based particle transport algorithm under two sets of assumptions. Data collected from simulations of four reactor problems using OpenMC was then used in conjunction with the models to calculate the speedup due to vectorization as a function of the size of the particle bank and the vector width. When each event type is assumed to have constant execution time, the achievable speedup is directly related to the particle bank size. We observed that the bank size generally needs to be at least 20 times greater than vector size to achieve vector efficiency greater than 90%. When the execution times for events are allowed to vary, the vector speedup is also limited by differences in the execution time for events being carried out in a single event-iteration.

  20. Parallel supercomputing: Advanced methods, algorithms, and software for large-scale linear and nonlinear problems

    Energy Technology Data Exchange (ETDEWEB)

    Carey, G.F.; Young, D.M.

    1993-12-31

    The program outlined here is directed to research on methods, algorithms, and software for distributed parallel supercomputers. Of particular interest are finite element methods and finite difference methods together with sparse iterative solution schemes for scientific and engineering computations of very large-scale systems. Both linear and nonlinear problems will be investigated. In the nonlinear case, applications with bifurcation to multiple solutions will be considered using continuation strategies. The parallelizable numerical methods of particular interest are a family of partitioning schemes embracing domain decomposition, element-by-element strategies, and multi-level techniques. The methods will be further developed incorporating parallel iterative solution algorithms with associated preconditioners in parallel computer software. The schemes will be implemented on distributed memory parallel architectures such as the CRAY MPP, Intel Paragon, the NCUBE3, and the Connection Machine. We will also consider other new architectures such as the Kendall-Square (KSQ) and proposed machines such as the TERA. The applications will focus on large-scale three-dimensional nonlinear flow and reservoir problems with strong convective transport contributions. These are legitimate grand challenge class computational fluid dynamics (CFD) problems of significant practical interest to DOE. The methods developed and algorithms will, however, be of wider interest.

  1. Efficient methodologies for system matrix modelling in iterative image reconstruction for rotating high-resolution PET

    Energy Technology Data Exchange (ETDEWEB)

    Ortuno, J E; Kontaxakis, G; Rubio, J L; Santos, A [Departamento de Ingenieria Electronica (DIE), Universidad Politecnica de Madrid, Ciudad Universitaria s/n, 28040 Madrid (Spain); Guerra, P [Networking Research Center on Bioengineering, Biomaterials and Nanomedicine (CIBER-BBN), Madrid (Spain)], E-mail: juanen@die.upm.es

    2010-04-07

    A fully 3D iterative image reconstruction algorithm has been developed for high-resolution PET cameras composed of pixelated scintillator crystal arrays and rotating planar detectors, based on the ordered subsets approach. The associated system matrix is precalculated with Monte Carlo methods that incorporate physical effects not included in analytical models, such as positron range effects and interaction of the incident gammas with the scintillator material. Custom Monte Carlo methodologies have been developed and optimized for modelling of system matrices for fast iterative image reconstruction adapted to specific scanner geometries, without redundant calculations. According to the methodology proposed here, only one-eighth of the voxels within two central transaxial slices need to be modelled in detail. The rest of the system matrix elements can be obtained with the aid of axial symmetries and redundancies, as well as in-plane symmetries within transaxial slices. Sparse matrix techniques for the non-zero system matrix elements are employed, allowing for fast execution of the image reconstruction process. This 3D image reconstruction scheme has been compared in terms of image quality to a 2D fast implementation of the OSEM algorithm combined with Fourier rebinning approaches. This work confirms the superiority of fully 3D OSEM in terms of spatial resolution, contrast recovery and noise reduction as compared to conventional 2D approaches based on rebinning schemes. At the same time it demonstrates that fully 3D methodologies can be efficiently applied to the image reconstruction problem for high-resolution rotational PET cameras by applying accurate pre-calculated system models and taking advantage of the system's symmetries.

  2. An efficient parallel algorithm for the calculation of unrestricted canonical MP2 energies.

    Science.gov (United States)

    Baker, Jon; Wolinski, Krzysztof

    2011-11-30

    We present details of our efficient implementation of full accuracy unrestricted open-shell second-order canonical Møller-Plesset (MP2) energies, both serial and parallel. The algorithm is based on our previous restricted closed-shell MP2 code using the Saebo-Almlöf direct integral transformation. Depending on system details, UMP2 energies take from less than 1.5 to about 3.0 times as long as a closed-shell RMP2 energy on a similar system using the same algorithm. Several examples are given including timings for some large stable radicals with 90+ atoms and over 3600 basis functions. Copyright © 2011 Wiley Periodicals, Inc.

  3. Gaussian Pulse-Based Two-Threshold Parallel Scaling Tone Reservation for PAPR Reduction of OFDM Signals

    Directory of Open Access Journals (Sweden)

    Lei Guan

    2011-01-01

    Full Text Available Tone Reservation (TR is a technique proposed to combat the high Peak-to-Average Power Ratio (PAPR problem of Orthogonal Frequency Division Multiplexing (OFDM signals. However conventional TR suffers from high computational cost due to the difficulties in finding an effective cancellation signal in the time domain by using only a few tones in the frequency domain. It also suffers from a high cost of hardware implementation and long handling time delay issues due to the need to conduct multiple iterations to cancel multiple high signal peaks. In this paper, we propose an efficient approach, called two-threshold parallel scaling, for implementing a previously proposed Gaussian pulse-based Tone Reservation algorithm. Compared to conventional approaches, this technique significantly reduces the hardware implementation complexity and cost, while also reducing signal processing time delay by using just two iterations. Experimental results show that the proposed technique can effectively reduce the PAPR of OFDM signals with only a very small number of reserved tones and with limited usage of hardware resources. This technique is suitable for any OFDM-based communication systems, especially for Digital Video Broadcasting (DVB systems employing large IFFT/FFT transforms.

  4. Robust Cell Detection for Large-Scale 3D Microscopy Using GPU-Accelerated Iterative Voting

    Directory of Open Access Journals (Sweden)

    Leila Saadatifard

    2018-04-01

    Full Text Available High-throughput imaging techniques, such as Knife-Edge Scanning Microscopy (KESM,are capable of acquiring three-dimensional whole-organ images at sub-micrometer resolution. These images are challenging to segment since they can exceed several terabytes (TB in size, requiring extremely fast and fully automated algorithms. Staining techniques are limited to contrast agents that can be applied to large samples and imaged in a single pass. This requires maximizing the number of structures labeled in a single channel, resulting in images that are densely packed with spatial features. In this paper, we propose a three-dimensional approach for locating cells based on iterative voting. Due to the computational complexity of this algorithm, a highly efficient GPU implementation is required to make it practical on large data sets. The proposed algorithm has a limited number of input parameters and is highly parallel.

  5. Parallel computation of rotating flows

    DEFF Research Database (Denmark)

    Lundin, Lars Kristian; Barker, Vincent A.; Sørensen, Jens Nørkær

    1999-01-01

    This paper deals with the simulation of 3‐D rotating flows based on the velocity‐vorticity formulation of the Navier‐Stokes equations in cylindrical coordinates. The governing equations are discretized by a finite difference method. The solution is advanced to a new time level by a two‐step process...... is that of solving a singular, large, sparse, over‐determined linear system of equations, and the iterative method CGLS is applied for this purpose. We discuss some of the mathematical and numerical aspects of this procedure and report on the performance of our software on a wide range of parallel computers. Darbe...

  6. STICS: surface-tethered iterative carbohydrate synthesis.

    Science.gov (United States)

    Pornsuriyasak, Papapida; Ranade, Sneha C; Li, Aixiao; Parlato, M Cristina; Sims, Charles R; Shulga, Olga V; Stine, Keith J; Demchenko, Alexei V

    2009-04-14

    A new surface-tethered iterative carbohydrate synthesis (STICS) technology is presented in which a surface functionalized 'stick' made of chemically stable high surface area porous gold allows one to perform cost efficient and simple synthesis of oligosaccharide chains; at the end of the synthesis, the oligosaccharide can be cleaved off and the stick reused for subsequent syntheses.

  7. iterClust: a statistical framework for iterative clustering analysis.

    Science.gov (United States)

    Ding, Hongxu; Wang, Wanxin; Califano, Andrea

    2018-03-22

    In a scenario where populations A, B1 and B2 (subpopulations of B) exist, pronounced differences between A and B may mask subtle differences between B1 and B2. Here we present iterClust, an iterative clustering framework, which can separate more pronounced differences (e.g. A and B) in starting iterations, followed by relatively subtle differences (e.g. B1 and B2), providing a comprehensive clustering trajectory. iterClust is implemented as a Bioconductor R package. andrea.califano@columbia.edu, hd2326@columbia.edu. Supplementary information is available at Bioinformatics online.

  8. Qualification of phased array ultrasonic examination on T-joint weld of austenitic stainless steel for ITER vacuum vessel

    Energy Technology Data Exchange (ETDEWEB)

    Kim, G.H. [ITER Korea, National Fusion Research Institute, Daejeon 305-333 (Korea, Republic of); Park, C.K., E-mail: love879@hanmail.net [ITER Korea, National Fusion Research Institute, Daejeon 305-333 (Korea, Republic of); Jin, S.W.; Kim, H.S.; Hong, K.H.; Lee, Y.J.; Ahn, H.J.; Chung, W. [ITER Korea, National Fusion Research Institute, Daejeon 305-333 (Korea, Republic of); Jung, Y.H.; Roh, B.R. [Hyundai Heavy Industries Co. Ltd., Ulsan 682-792 (Korea, Republic of); Sa, J.W.; Choi, C.H. [ITER Organization, Route de Vinon-sur-Verdon, CS 90 046, 13067 St. Paul Lez Durance Cedex (France)

    2016-11-01

    Highlights: • PAUT techniques has been developed by Hyundai Heavy Industries Co., LTD (HHI) and Korea Domestic Agency (KODA) to verify and settle down instrument calibration, test procedures, image processing, and so on. As the first step of development for PAUT technique, Several dozens of qualification blocks with artificial defects, which are parallel side drilled hole, embedded lack of fusion, embedded repair weld notch, and so on, have been designed and fabricated to simulate all potential defects during welding process. Real UT qualification group-1 for T-joint weld was successfully conducted in front of ANB inspector. • In this paper, remarkable progresses of UT qualification are presented for ITER vacuum vessel. - Abstract: Full penetration welding and 100% volumetric examination are required for all welds of pressure retaining parts of the ITER Vacuum Vessel (VV) according to RCC-MR Code and French Order of Nuclear Pressure Equipment (ESPN). The NDE requirement is one of important technical issues because radiographic examination (RT) is not applicable to many welding joints. Therefore the ultrasonic examination (UT) has been selected as an alternative method. Generally the UT on the austenitic welds is regarded as a great challenge due to the high attenuation and dispersion of the ultrasonic signal. In this paper, Phased array ultrasonic examination (PAUT) has been introduced on double sided T-shape austenitic welds of the ITER VV as a major NDE method as well as RT. Several dozens of qualification blocks with artificial defects, which are parallel side drilled hole, embedded lack of fusion, embedded repair weld notch, embedded parallel vertical notch, and so on, have been designed and fabricated to simulate all potential defects during welding process. PAUT techniques on the thick austenitic welds have been developed taking into account the acceptance criteria. Test procedure including calibration of equipment is derived and qualified through

  9. Status of R&D activity for ITER ICRF power source system

    International Nuclear Information System (INIS)

    Mukherjee, Aparajita; Trivedi, Rajesh; Singh, Raghuraj; Rajnish, Kumar; Machchhar, Harsha; Ajesh, P.; Suthar, Gajendra; Soni, Dipal; Patel, Manoj; Mohan, Kartik; Hari, J.V.S.; Anand, Rohit; Verma, Sriprakash; Agarwal, Rohit; Jha, Akhil; Kazarian, Fabienne; Beaumont, Bertrand

    2015-01-01

    Highlights: • R&D program to establish high power RF technology for ITER ICRF source is described. • R&D RF source is being developed using Diacrode & Tetrode technologies. • Test rig (3 MW/3600 s/35–65 MHz) simulating plasma load is developed. - Abstract: India is in-charge for the procurement of ITER Ion Cyclotron Resonance Frequency (ICRF) sources (1 Prototype + 8 series units) along with auxiliary power supplies and Local Control Unit. As there is no unique amplifier chain able to meet the output power specifications as per ITER requirement (2.5 MW per source at 35–65 MHz/CW/VSWR 2.0), two parallel three-stage amplifier chains along with a combiner circuit on the output side is considered. This kind of RF source will be unique in terms of its stringent specifications and building a first of its kind is always a challenge. An R&D phase has been initiated for establishing the technology considering single amplifier chain experimentation (1.5 MW/35–65 MHz/3600 s/VSWR 2.0) prior to Prototype and series production. This paper presents the status of R&D activity to resolve technological challenges involved and various infrastructures developed at ITER-India lab to support such operation.

  10. Status of R&D activity for ITER ICRF power source system

    Energy Technology Data Exchange (ETDEWEB)

    Mukherjee, Aparajita, E-mail: aparajita.mukherjee@iter-india.org [ITER-India, Institute for Plasma Research, Bhat, Gandhinagar–382428 (India); Trivedi, Rajesh; Singh, Raghuraj; Rajnish, Kumar; Machchhar, Harsha; Ajesh, P.; Suthar, Gajendra; Soni, Dipal; Patel, Manoj; Mohan, Kartik; Hari, J.V.S.; Anand, Rohit; Verma, Sriprakash; Agarwal, Rohit; Jha, Akhil [ITER-India, Institute for Plasma Research, Bhat, Gandhinagar–382428 (India); Kazarian, Fabienne; Beaumont, Bertrand [ITER Organization, CS 90 046, 13067 Sain-Paul-Les-Durance (France)

    2015-10-15

    Highlights: • R&D program to establish high power RF technology for ITER ICRF source is described. • R&D RF source is being developed using Diacrode & Tetrode technologies. • Test rig (3 MW/3600 s/35–65 MHz) simulating plasma load is developed. - Abstract: India is in-charge for the procurement of ITER Ion Cyclotron Resonance Frequency (ICRF) sources (1 Prototype + 8 series units) along with auxiliary power supplies and Local Control Unit. As there is no unique amplifier chain able to meet the output power specifications as per ITER requirement (2.5 MW per source at 35–65 MHz/CW/VSWR 2.0), two parallel three-stage amplifier chains along with a combiner circuit on the output side is considered. This kind of RF source will be unique in terms of its stringent specifications and building a first of its kind is always a challenge. An R&D phase has been initiated for establishing the technology considering single amplifier chain experimentation (1.5 MW/35–65 MHz/3600 s/VSWR 2.0) prior to Prototype and series production. This paper presents the status of R&D activity to resolve technological challenges involved and various infrastructures developed at ITER-India lab to support such operation.

  11. Feasibility study of the iterative x-ray phase retrieval algorithm

    International Nuclear Information System (INIS)

    Meng Fanbo; Liu Hong; Wu Xizeng

    2009-01-01

    An iterative phase retrieval algorithm was previously investigated for in-line x-ray phase imaging. Through detailed theoretical analysis and computer simulations, we now discuss the limitations, robustness, and efficiency of the algorithm. The iterative algorithm was proved robust against imaging noise but sensitive to the variations of several system parameters. It is also efficient in terms of calculation time. It was shown that the algorithm can be applied to phase retrieval based on one phase-contrast image and one attenuation image, or two phase-contrast images; in both cases, the two images can be obtained either by one detector in two exposures, or by two detectors in only one exposure as in the dual-detector scheme

  12. Boosting iterative stochastic ensemble method for nonlinear calibration of subsurface flow models

    KAUST Repository

    Elsheikh, Ahmed H.

    2013-06-01

    A novel parameter estimation algorithm is proposed. The inverse problem is formulated as a sequential data integration problem in which Gaussian process regression (GPR) is used to integrate the prior knowledge (static data). The search space is further parameterized using Karhunen-Loève expansion to build a set of basis functions that spans the search space. Optimal weights of the reduced basis functions are estimated by an iterative stochastic ensemble method (ISEM). ISEM employs directional derivatives within a Gauss-Newton iteration for efficient gradient estimation. The resulting update equation relies on the inverse of the output covariance matrix which is rank deficient.In the proposed algorithm we use an iterative regularization based on the ℓ2 Boosting algorithm. ℓ2 Boosting iteratively fits the residual and the amount of regularization is controlled by the number of iterations. A termination criteria based on Akaike information criterion (AIC) is utilized. This regularization method is very attractive in terms of performance and simplicity of implementation. The proposed algorithm combining ISEM and ℓ2 Boosting is evaluated on several nonlinear subsurface flow parameter estimation problems. The efficiency of the proposed algorithm is demonstrated by the small size of utilized ensembles and in terms of error convergence rates. © 2013 Elsevier B.V.

  13. Iterative methods for photoacoustic tomography in attenuating acoustic media

    Science.gov (United States)

    Haltmeier, Markus; Kowar, Richard; Nguyen, Linh V.

    2017-11-01

    The development of efficient and accurate reconstruction methods is an important aspect of tomographic imaging. In this article, we address this issue for photoacoustic tomography. To this aim, we use models for acoustic wave propagation accounting for frequency dependent attenuation according to a wide class of attenuation laws that may include memory. We formulate the inverse problem of photoacoustic tomography in attenuating medium as an ill-posed operator equation in a Hilbert space framework that is tackled by iterative regularization methods. Our approach comes with a clear convergence analysis. For that purpose we derive explicit expressions for the adjoint problem that can efficiently be implemented. In contrast to time reversal, the employed adjoint wave equation is again damping and, thus has a stable solution. This stability property can be clearly seen in our numerical results. Moreover, the presented numerical results clearly demonstrate the efficiency and accuracy of the derived iterative reconstruction algorithms in various situations including the limited view case.

  14. Arc detection for the ICRF system on ITER

    Science.gov (United States)

    D'Inca, R.

    2011-12-01

    The ICRF system for ITER is designed to respect the high voltage breakdown limits. However arcs can still statistically happen and must be quickly detected and suppressed by shutting the RF power down. For the conception of a reliable and efficient detector, the analysis of the mechanism of arcs is necessary to find their unique signature. Numerous systems have been conceived to address the issues of arc detection. VSWR-based detectors, RF noise detectors, sound detectors, optical detectors, S-matrix based detectors. Until now, none of them has succeeded in demonstrating the fulfillment of all requirements and the studies for ITER now follow three directions: improvement of the existing concepts to fix their flaws, development of new theoretically fully compliant detectors (like the GUIDAR) and combination of several detectors to benefit from the advantages of each of them. Together with the physical and engineering challenges, the development of an arc detection system for ITER raises methodological concerns to extrapolate the results from basic experiments and present machines to the ITER scale ICRF system and to conduct a relevant risk analysis.

  15. Massively parallel mathematical sieves

    Energy Technology Data Exchange (ETDEWEB)

    Montry, G.R.

    1989-01-01

    The Sieve of Eratosthenes is a well-known algorithm for finding all prime numbers in a given subset of integers. A parallel version of the Sieve is described that produces computational speedups over 800 on a hypercube with 1,024 processing elements for problems of fixed size. Computational speedups as high as 980 are achieved when the problem size per processor is fixed. The method of parallelization generalizes to other sieves and will be efficient on any ensemble architecture. We investigate two highly parallel sieves using scattered decomposition and compare their performance on a hypercube multiprocessor. A comparison of different parallelization techniques for the sieve illustrates the trade-offs necessary in the design and implementation of massively parallel algorithms for large ensemble computers.

  16. Feasibility analysis of fuzzy logic control for ITER Poloidal field (PF) AC/DC converter system

    Energy Technology Data Exchange (ETDEWEB)

    Hassan, Mahmood Ul; Fu, Peng [Institute of Plasma Physics, Chinese Academy of Sciences, Hefei 230031 (China); University of Science and Technology of China (China); Song, Zhiquan, E-mail: zhquansong@ipp.ac.cn [Institute of Plasma Physics, Chinese Academy of Sciences, Hefei 230031 (China); Chen, Xiaojiao [Institute of Plasma Physics, Chinese Academy of Sciences, Hefei 230031 (China); University of Science and Technology of China (China); Zhang, Xiuqing [Institute of Plasma Physics, Chinese Academy of Sciences, Hefei 230031 (China); Humayun, Muhammad [Shanghai Jiaotong University (China)

    2017-05-15

    Highlights: • The implementation of the Fuzzy controller for the ITER PF converter system is presented. • The comparison of the FLC and PI simulation are investigated. • The FLC single and parallel bridge operation are presented. • Fuzzification and Defuzzification algorithms are presented using FLC controller. - Abstract: This paper describes the feasibility analysis of the fuzzy logic control to increase the performance of the ITER poloidal field (PF) converter systems. A fuzzy-logic-based controller is designed for ITER PF converter system, using the traditional PI controller and Fuzzy controller (FC), the dynamic behavior and transient response of the PF converter system are compared under normal operation by analysis and simulation. The analysis results show that the fuzzy logic control can achieve better operation performance than PI control.

  17. Design of the ITER Neutral Beam injectors

    International Nuclear Information System (INIS)

    Hemsworth, R.S.; Feist, J.; Hanada, M.; Heinemann, B.; Inoue, T.; Kuessel, E.; Kulygin, V.; Krylov, A.; Lotte, P.; Miyamoto, K.; Miyamoto, N.; Murdoch, D.; Nagase, A.; Ohara, Y.; Okumura, Y.; Pamela, J.; Panasenkov, A.; Shibata, K.; Tanii, M.

    1996-01-01

    This paper describes the Neutral Beam Injection system which is presently being designed in Europe, Japan and Russia, with co-ordination by the Joint Central Team of ITER at Naka, Japan. The proposed system consists of three negative ion based neutral injectors, delivering a total of 50 MW of 1 MeV D 0 to the ITER plasma for pulse length of ≥1000 s. The injectors each use a single caesiated volume arc discharge negative ion source, and a multi-grid, multi-aperture accelerator, to produce about 40 A of 1 MeV D - . This will be neutralized in a sub-divided gas neutralizer, which has a conversion efficiency of about 60%. The charged fraction of the beam emerging from the neutralizer is dumped in an electrostatic residual ion dump. A water cooled calorimeter can be moved into the beam path to intercept the neutral beam, allowing commissioning of the injector independent of ITER. copyright 1996 American Institute of Physics

  18. Template based parallel checkpointing in a massively parallel computer system

    Science.gov (United States)

    Archer, Charles Jens [Rochester, MN; Inglett, Todd Alan [Rochester, MN

    2009-01-13

    A method and apparatus for a template based parallel checkpoint save for a massively parallel super computer system using a parallel variation of the rsync protocol, and network broadcast. In preferred embodiments, the checkpoint data for each node is compared to a template checkpoint file that resides in the storage and that was previously produced. Embodiments herein greatly decrease the amount of data that must be transmitted and stored for faster checkpointing and increased efficiency of the computer system. Embodiments are directed to a parallel computer system with nodes arranged in a cluster with a high speed interconnect that can perform broadcast communication. The checkpoint contains a set of actual small data blocks with their corresponding checksums from all nodes in the system. The data blocks may be compressed using conventional non-lossy data compression algorithms to further reduce the overall checkpoint size.

  19. Virtual fringe projection system with nonparallel illumination based on iteration

    International Nuclear Information System (INIS)

    Zhou, Duo; Wang, Zhangying; Gao, Nan; Zhang, Zonghua; Jiang, Xiangqian

    2017-01-01

    Fringe projection profilometry has been widely applied in many fields. To set up an ideal measuring system, a virtual fringe projection technique has been studied to assist in the design of hardware configurations. However, existing virtual fringe projection systems use parallel illumination and have a fixed optical framework. This paper presents a virtual fringe projection system with nonparallel illumination. Using an iterative method to calculate intersection points between rays and reference planes or object surfaces, the proposed system can simulate projected fringe patterns and captured images. A new explicit calibration method has been presented to validate the precision of the system. Simulated results indicate that the proposed iterative method outperforms previous systems. Our virtual system can be applied to error analysis, algorithm optimization, and help operators to find ideal system parameter settings for actual measurements. (paper)

  20. SQDFT: Spectral Quadrature method for large-scale parallel O(N) Kohn-Sham calculations at high temperature

    Science.gov (United States)

    Suryanarayana, Phanish; Pratapa, Phanisri P.; Sharma, Abhiraj; Pask, John E.

    2018-03-01

    We present SQDFT: a large-scale parallel implementation of the Spectral Quadrature (SQ) method for O(N) Kohn-Sham Density Functional Theory (DFT) calculations at high temperature. Specifically, we develop an efficient and scalable finite-difference implementation of the infinite-cell Clenshaw-Curtis SQ approach, in which results for the infinite crystal are obtained by expressing quantities of interest as bilinear forms or sums of bilinear forms, that are then approximated by spatially localized Clenshaw-Curtis quadrature rules. We demonstrate the accuracy of SQDFT by showing systematic convergence of energies and atomic forces with respect to SQ parameters to reference diagonalization results, and convergence with discretization to established planewave results, for both metallic and insulating systems. We further demonstrate that SQDFT achieves excellent strong and weak parallel scaling on computer systems consisting of tens of thousands of processors, with near perfect O(N) scaling with system size and wall times as low as a few seconds per self-consistent field iteration. Finally, we verify the accuracy of SQDFT in large-scale quantum molecular dynamics simulations of aluminum at high temperature.

  1. Is Carbon a Realistic Choice for ITER's Divertor?

    International Nuclear Information System (INIS)

    Skinner, C.H.; Federici, G.

    2005-01-01

    Tritium retention by co-deposition with carbon on the divertor target plate is predicted to limit ITER's DT burning plasma operations (e.g. to about 100 pulses for the worst conditions) before the in-vessel tritium inventory limit, currently set at 350 g, is reached. At this point, ITER will only be able to continue its burning plasma program if technology is available that is capable of rapidly removing large quantities of tritium from the vessel with over 90% efficiency. The removal rate required is four orders of magnitude faster than that demonstrated in current tokamaks. Eighteen years after the observation of co-deposition on JET and TFTR, such technology is nowhere in sight. The inexorable conclusion is that either a major initiative in tritium removal should be funded or that research priorities for ITER should focus on metal alternatives

  2. Distributed 3-D iterative reconstruction for quantitative SPECT

    International Nuclear Information System (INIS)

    Ju, Z.W.; Frey, E.C.; Tsui, B.M.W.

    1995-01-01

    The authors describe a distributed three dimensional (3-D) iterative reconstruction library for quantitative single-photon emission computed tomography (SPECT). This library includes 3-D projector-backprojector pairs (PBPs) and distributed 3-D iterative reconstruction algorithms. The 3-D PBPs accurately and efficiently model various combinations of the image degrading factors including attenuation, detector response and scatter response. These PBPs were validated by comparing projection data computed using the projectors with that from direct Monte Carlo (MC) simulations. The distributed 3-D iterative algorithms spread the projection-backprojection operations for all the projection angles over a heterogeneous network of single or multi-processor computers to reduce the reconstruction time. Based on a master/slave paradigm, these distributed algorithms provide dynamic load balancing and fault tolerance. The distributed algorithms were verified by comparing images reconstructed using both the distributed and non-distributed algorithms. Computation times for distributed 3-D reconstructions running on up to 4 identical processors were reduced by a factor approximately 80--90% times the number of the processors participating, compared to those for non-distributed 3-D reconstructions running on a single processor. When combined with faster affordable computers, this library provides an efficient means for implementing accurate reconstruction and compensation methods to improve quality and quantitative accuracy in SPECT images

  3. Massively parallel red-black algorithms for x-y-z response matrix equations

    International Nuclear Information System (INIS)

    Hanebutte, U.R.; Laurin-Kovitz, K.; Lewis, E.E.

    1992-01-01

    Recently, both discrete ordinates and spherical harmonic (S n and P n ) methods have been cast in the form of response matrices. In x-y geometry, massively parallel algorithms have been developed to solve the resulting response matrix equations on the Connection Machine family of parallel computers, the CM-2, CM-200, and CM-5. These algorithms utilize two-cycle iteration on a red-black checkerboard. In this work we examine the use of massively parallel red-black algorithms to solve response matric equations in three dimensions. This longer term objective is to utilize massively parallel algorithms to solve S n and/or P n response matrix problems. In this exploratory examination, however, we consider the simple 6 x 6 response matrices that are derivable from fine-mesh diffusion approximations in three dimensions

  4. Optimized iterative decoding method for TPC coded CPM

    Science.gov (United States)

    Ma, Yanmin; Lai, Penghui; Wang, Shilian; Xie, Shunqin; Zhang, Wei

    2018-05-01

    Turbo Product Code (TPC) coded Continuous Phase Modulation (CPM) system (TPC-CPM) has been widely used in aeronautical telemetry and satellite communication. This paper mainly investigates the improvement and optimization on the TPC-CPM system. We first add the interleaver and deinterleaver to the TPC-CPM system, and then establish an iterative system to iteratively decode. However, the improved system has a poor convergence ability. To overcome this issue, we use the Extrinsic Information Transfer (EXIT) analysis to find the optimal factors for the system. The experiments show our method is efficient to improve the convergence performance.

  5. FILMPAR: A parallel algorithm designed for the efficient and accurate computation of thin film flow on functional surfaces containing micro-structure

    Science.gov (United States)

    Lee, Y. C.; Thompson, H. M.; Gaskell, P. H.

    2009-12-01

    , industrial and physical applications. However, despite recent modelling advances, the accurate numerical solution of the equations governing such problems is still at a relatively early stage. Indeed, recent studies employing a simplifying long-wave approximation have shown that highly efficient numerical methods are necessary to solve the resulting lubrication equations in order to achieve the level of grid resolution required to accurately capture the effects of micro- and nano-scale topographical features. Solution method: A portable parallel multigrid algorithm has been developed for the above purpose, for the particular case of flow over submerged topographical features. Within the multigrid framework adopted, a W-cycle is used to accelerate convergence in respect of the time dependent nature of the problem, with relaxation sweeps performed using a fixed number of pre- and post-Red-Black Gauss-Seidel Newton iterations. In addition, the algorithm incorporates automatic adaptive time-stepping to avoid the computational expense associated with repeated time-step failure. Running time: 1.31 minutes using 128 processors on BlueGene/P with a problem size of over 16.7 million mesh points.

  6. Speeding up predictive electromagnetic simulations for ITER application

    International Nuclear Information System (INIS)

    Alekseev, A.B.; Amoskov, V.M.; Bazarov, A.M.; Belov, A.V.; Belyakov, V.A.; Gapionok, E.I.; Gornikel, I.V.; Gribov, Yu. V.; Kukhtin, V.P.; Lamzin, E.A.; Sytchevsky, S.E.

    2017-01-01

    Highlights: • A general concept of engineering EM simulator for tokamak application is proposed. • An algorithm is based on influence functions and superposition principle. • The software works with extensive databases and offers parallel processing. • The simulator allows us to obtain the solution hundreds times faster. - Abstract: The paper presents an attempt to proceed to a general concept of software environment for fast and consistent multi-task simulation of EM transients (engineering simulator for tokamak applications). As an example, the ITER tokamak is taken to introduce a computational technique. The strategy exploits parallel processing with optimized simulation algorithms based on using of influence functions and superposition principle to take full advantage of parallelism. The software has been tested on a multi-core supercomputer. The results were compared with data obtained in TYPHOON computations. A discrepancy was found to be below 0.4%. The computation cost for the simulator is proportional to the number of observation points. An average computation time with the simulator is found to be by hundreds times less than the time required to solve numerically a relevant system of differential equations for known software tools.

  7. Speeding up predictive electromagnetic simulations for ITER application

    Energy Technology Data Exchange (ETDEWEB)

    Alekseev, A.B. [ITER Organization, Route de Vinon sur Verdon, 13067 St. Paul Lez Durance Cedex (France); Amoskov, V.M. [JSC “NIIEFA”, Doroga na Metallostroy 3, St. Petersburg, 196641 (Russian Federation); Bazarov, A.M., E-mail: alexander.bazarov@gmail.com [JSC “NIIEFA”, Doroga na Metallostroy 3, St. Petersburg, 196641 (Russian Federation); Belov, A.V. [JSC “NIIEFA”, Doroga na Metallostroy 3, St. Petersburg, 196641 (Russian Federation); Belyakov, V.A. [JSC “NIIEFA”, Doroga na Metallostroy 3, St. Petersburg, 196641 (Russian Federation); St. Petersburg State University, 7/9 Universitetskaya Embankment, St. Petersburg, 199034 (Russian Federation); Gapionok, E.I. [JSC “NIIEFA”, Doroga na Metallostroy 3, St. Petersburg, 196641 (Russian Federation); Gornikel, I.V. [Alphysica GmbH, Unterreut, 6, D-76135, Karlsruhe (Germany); Gribov, Yu. V. [ITER Organization, Route de Vinon sur Verdon, 13067 St. Paul Lez Durance Cedex (France); Kukhtin, V.P.; Lamzin, E.A. [JSC “NIIEFA”, Doroga na Metallostroy 3, St. Petersburg, 196641 (Russian Federation); Sytchevsky, S.E. [JSC “NIIEFA”, Doroga na Metallostroy 3, St. Petersburg, 196641 (Russian Federation); St. Petersburg State University, 7/9 Universitetskaya Embankment, St. Petersburg, 199034 (Russian Federation)

    2017-05-15

    Highlights: • A general concept of engineering EM simulator for tokamak application is proposed. • An algorithm is based on influence functions and superposition principle. • The software works with extensive databases and offers parallel processing. • The simulator allows us to obtain the solution hundreds times faster. - Abstract: The paper presents an attempt to proceed to a general concept of software environment for fast and consistent multi-task simulation of EM transients (engineering simulator for tokamak applications). As an example, the ITER tokamak is taken to introduce a computational technique. The strategy exploits parallel processing with optimized simulation algorithms based on using of influence functions and superposition principle to take full advantage of parallelism. The software has been tested on a multi-core supercomputer. The results were compared with data obtained in TYPHOON computations. A discrepancy was found to be below 0.4%. The computation cost for the simulator is proportional to the number of observation points. An average computation time with the simulator is found to be by hundreds times less than the time required to solve numerically a relevant system of differential equations for known software tools.

  8. Performance of a fine-grained parallel model for multi-group nodal-transport calculations in three-dimensional pin-by-pin reactor geometry

    International Nuclear Information System (INIS)

    Masahiro, Tatsumi; Akio, Yamamoto

    2003-01-01

    A production code SCOPE2 was developed based on the fine-grained parallel algorithm by the red/black iterative method targeting parallel computing environments such as a PC-cluster. It can perform a depletion calculation in a few hours using a PC-cluster with the model based on a 9-group nodal-SP3 transport method in 3-dimensional pin-by-pin geometry for in-core fuel management of commercial PWRs. The present algorithm guarantees the identical convergence process as that in serial execution, which is very important from the viewpoint of quality management. The fine-mesh geometry is constructed by hierarchical decomposition with introduction of intermediate management layer as a block that is a quarter piece of a fuel assembly in radial direction. A combination of a mesh division scheme forcing even meshes on each edge and a latency-hidden communication algorithm provided simplicity and efficiency to message passing to enhance parallel performance. Inter-processor communication and parallel I/O access were realized using the MPI functions. Parallel performance was measured for depletion calculations by the 9-group nodal-SP3 transport method in 3-dimensional pin-by-pin geometry with 340 x 340 x 26 meshes for full core geometry and 170 x 170 x 26 for quarter core geometry. A PC cluster that consists of 24 Pentium-4 processors connected by the Fast Ethernet was used for the performance measurement. Calculations in full core geometry gave better speedups compared to those in quarter core geometry because of larger granularity. Fine-mesh sweep and feedback calculation parts gave almost perfect scalability since granularity is large enough, while 1-group coarse-mesh diffusion acceleration gave only around 80%. The speedup and parallel efficiency for total computation time were 22.6 and 94%, respectively, for the calculation in full core geometry with 24 processors. (authors)

  9. Fast parallel algorithms for the x-ray transform and its adjoint.

    Science.gov (United States)

    Gao, Hao

    2012-11-01

    Iterative reconstruction methods often offer better imaging quality and allow for reconstructions with lower imaging dose than classical methods in computed tomography. However, the computational speed is a major concern for these iterative methods, for which the x-ray transform and its adjoint are two most time-consuming components. The speed issue becomes even notable for the 3D imaging such as cone beam scans or helical scans, since the x-ray transform and its adjoint are frequently computed as there is usually not enough computer memory to save the corresponding system matrix. The purpose of this paper is to optimize the algorithm for computing the x-ray transform and its adjoint, and their parallel computation. The fast and highly parallelizable algorithms for the x-ray transform and its adjoint are proposed for the infinitely narrow beam in both 2D and 3D. The extension of these fast algorithms to the finite-size beam is proposed in 2D and discussed in 3D. The CPU and GPU codes are available at https://sites.google.com/site/fastxraytransform. The proposed algorithm is faster than Siddon's algorithm for computing the x-ray transform. In particular, the improvement for the parallel computation can be an order of magnitude. The authors have proposed fast and highly parallelizable algorithms for the x-ray transform and its adjoint, which are extendable for the finite-size beam. The proposed algorithms are suitable for parallel computing in the sense that the computational cost per parallel thread is O(1).

  10. Picard Trajectory Approximation Iteration for Efficient Orbit Propagation

    Science.gov (United States)

    2015-07-21

    computing language developed by NVIDIA for use upon their Graphics Processing Units (GPUs); effectively it allows lightweight parallel computation at...Computation Toolbox, and require Matlab 2010 or newer (2011 or newer recommended), and an NVIDIA GPU with compute capability of 1.3 or greater. 3...and Resonances, pp. 216–227, Dordrecht, Holland, 1970. D. Reidel Publishing Company . [4] Zadunaisky, P. E., On the Estimation of Errors Propagated in

  11. Performance of a multi-section ICRF array for a RTO/RC ITER

    International Nuclear Information System (INIS)

    Bosia, Giuseppe; Brambilla, Marco

    1999-01-01

    In an RTO/RC ITER, the Ion Cyclotron (IC) Heating and Current Drive System would need to operate at a power density of 6.5 MW/m 2 , (or about twice the design value adopted in the ITER Final Design Report), in order to provide the required total power output of 40 MW of RF power from two equatorial ports. A significant upgrade of the original IC array design is necessary, in order to keep the operating RF voltage at the plasma interface within acceptable limits. This is in principle possible by increasing the number of array elements and by operating them in parallel. In the paper the prospects of this modifications and the implications on the array layout are discussed

  12. ITER council proceedings: 2000

    International Nuclear Information System (INIS)

    2001-01-01

    No ITER Council Meetings were held during 2000. However, two ITER EDA Meetings were held, one in Tokyo, January 19-20, and one in Moscow, June 29-30. The parties participating in these meetings were those that partake in the extended ITER EDA, namely the EU, the Russian Federation, and Japan. This document contains, a/o, the records of these meetings, the list of attendees, the agenda, the ITER EDA Status Reports issued during these meetings, the TAC (Technical Advisory Committee) reports and recommendations, the MAC Reports and Advice (also for the July 1999 Meeting), the ITER-FEAT Outline Design Report, the TAC Reports and Recommendations both meetings), Site requirements and Site Design Assumptions, the Tentative Sequence of technical Activities 2000-2001, Report of the ITER SWG-P2 on Joint Implementation of ITER, EU/ITER Canada Proposal for New ITER Identification

  13. Perl Modules for Constructing Iterators

    Science.gov (United States)

    Tilmes, Curt

    2009-01-01

    The Iterator Perl Module provides a general-purpose framework for constructing iterator objects within Perl, and a standard API for interacting with those objects. Iterators are an object-oriented design pattern where a description of a series of values is used in a constructor. Subsequent queries can request values in that series. These Perl modules build on the standard Iterator framework and provide iterators for some other types of values. Iterator::DateTime constructs iterators from DateTime objects or Date::Parse descriptions and ICal/RFC 2445 style re-currence descriptions. It supports a variety of input parameters, including a start to the sequence, an end to the sequence, an Ical/RFC 2445 recurrence describing the frequency of the values in the series, and a format description that can refine the presentation manner of the DateTime. Iterator::String constructs iterators from string representations. This module is useful in contexts where the API consists of supplying a string and getting back an iterator where the specific iteration desired is opaque to the caller. It is of particular value to the Iterator::Hash module which provides nested iterations. Iterator::Hash constructs iterators from Perl hashes that can include multiple iterators. The constructed iterators will return all the permutations of the iterations of the hash by nested iteration of embedded iterators. A hash simply includes a set of keys mapped to values. It is a very common data structure used throughout Perl programming. The Iterator:: Hash module allows a hash to include strings defining iterators (parsed and dispatched with Iterator::String) that are used to construct an overall series of hash values.

  14. Computational acceleration for MR image reconstruction in partially parallel imaging.

    Science.gov (United States)

    Ye, Xiaojing; Chen, Yunmei; Huang, Feng

    2011-05-01

    In this paper, we present a fast numerical algorithm for solving total variation and l(1) (TVL1) based image reconstruction with application in partially parallel magnetic resonance imaging. Our algorithm uses variable splitting method to reduce computational cost. Moreover, the Barzilai-Borwein step size selection method is adopted in our algorithm for much faster convergence. Experimental results on clinical partially parallel imaging data demonstrate that the proposed algorithm requires much fewer iterations and/or less computational cost than recently developed operator splitting and Bregman operator splitting methods, which can deal with a general sensing matrix in reconstruction framework, to get similar or even better quality of reconstructed images.

  15. ITER overview

    International Nuclear Information System (INIS)

    Shimomura, Y.; Aymar, R.; Chuyanov, V.; Huguet, M.; Parker, R.R.

    2001-01-01

    This report summarizes technical works of six years done by the ITER Joint Central Team and Home Teams under terms of Agreement of the ITER Engineering Design Activities. The major products are as follows: complete and detailed engineering design with supporting assessments, industrial-based cost estimates and schedule, non-site specific comprehensive safety and environmental assessment, and technology R and D to validate and qualify design including proof of technologies and industrial manufacture and testing of full size or scalable models of key components. The ITER design is at an advanced stage of maturity and contains sufficient technical information for a construction decision. The operation of ITER will demonstrate the availability of a new energy source, fusion. (author)

  16. ITER Overview

    International Nuclear Information System (INIS)

    Shimomura, Y.; Aymar, R.; Chuyanov, V.; Huguet, M.; Parker, R.

    1999-01-01

    This report summarizes technical works of six years done by the ITER Joint Central Team and Home Teams under terms of Agreement of the ITER Engineering Design Activities. The major products are as follows: complete and detailed engineering design with supporting assessments, industrial-based cost estimates and schedule, non-site specific comprehensive safety and environmental assessment, and technology R and D to validate and qualify design including proof of technologies and industrial manufacture and testing of full size or scalable models of key components. The ITER design is at an advanced stage of maturity and contains sufficient technical information for a construction decision. The operation of ITER will demonstrate the availability of a new energy source, fusion. (author)

  17. Efficient implementation of multidimensional fast fourier transform on a distributed-memory parallel multi-node computer

    Science.gov (United States)

    Bhanot, Gyan V [Princeton, NJ; Chen, Dong [Croton-On-Hudson, NY; Gara, Alan G [Mount Kisco, NY; Giampapa, Mark E [Irvington, NY; Heidelberger, Philip [Cortlandt Manor, NY; Steinmacher-Burow, Burkhard D [Mount Kisco, NY; Vranas, Pavlos M [Bedford Hills, NY

    2012-01-10

    The present in invention is directed to a method, system and program storage device for efficiently implementing a multidimensional Fast Fourier Transform (FFT) of a multidimensional array comprising a plurality of elements initially distributed in a multi-node computer system comprising a plurality of nodes in communication over a network, comprising: distributing the plurality of elements of the array in a first dimension across the plurality of nodes of the computer system over the network to facilitate a first one-dimensional FFT; performing the first one-dimensional FFT on the elements of the array distributed at each node in the first dimension; re-distributing the one-dimensional FFT-transformed elements at each node in a second dimension via "all-to-all" distribution in random order across other nodes of the computer system over the network; and performing a second one-dimensional FFT on elements of the array re-distributed at each node in the second dimension, wherein the random order facilitates efficient utilization of the network thereby efficiently implementing the multidimensional FFT. The "all-to-all" re-distribution of array elements is further efficiently implemented in applications other than the multidimensional FFT on the distributed-memory parallel supercomputer.

  18. ITER Council proceedings: 1993

    International Nuclear Information System (INIS)

    1994-01-01

    Records of the third ITER Council Meeting (IC-3), held on 21-22 April 1993, in Tokyo, Japan, and the fourth ITER Council Meeting (IC-4) held on 29 September - 1 October 1993 in San Diego, USA, are presented, giving essential information on the evolution of the ITER Engineering Design Activities (EDA), such as the text of the draft of Protocol 2 further elaborated in ''ITER EDA Agreement and Protocol 2'' (ITER EDA Documentation Series No. 5), recommendations on future work programmes: a description of technology R and D tasks; the establishment of a trust fund for the ITER EDA activities; arrangements for Visiting Home Team Personnel; the general framework for the involvement of other countries in the ITER EDA; conditions for the involvement of Canada in the Euratom Contribution to the ITER EDA; and other attachments as parts of the Records of Decision of the aforementioned ITER Council Meetings

  19. ITER council proceedings: 1993

    Energy Technology Data Exchange (ETDEWEB)

    NONE

    1994-12-31

    Records of the third ITER Council Meeting (IC-3), held on 21-22 April 1993, in Tokyo, Japan, and the fourth ITER Council Meeting (IC-4) held on 29 September - 1 October 1993 in San Diego, USA, are presented, giving essential information on the evolution of the ITER Engineering Design Activities (EDA), such as the text of the draft of Protocol 2 further elaborated in ``ITER EDA Agreement and Protocol 2`` (ITER EDA Documentation Series No. 5), recommendations on future work programmes: a description of technology R and D tastes; the establishment of a trust fund for the ITER EDA activities; arrangements for Visiting Home Team Personnel; the general framework for the involvement of other countries in the ITER EDA; conditions for the involvement of Canada in the Euratom Contribution to the ITER EDA; and other attachments as parts of the Records of Decision of the aforementioned ITER Council Meetings.

  20. Hierarchical Image Segmentation of Remotely Sensed Data using Massively Parallel GNU-LINUX Software

    Science.gov (United States)

    Tilton, James C.

    2003-01-01

    A hierarchical set of image segmentations is a set of several image segmentations of the same image at different levels of detail in which the segmentations at coarser levels of detail can be produced from simple merges of regions at finer levels of detail. In [1], Tilton, et a1 describes an approach for producing hierarchical segmentations (called HSEG) and gave a progress report on exploiting these hierarchical segmentations for image information mining. The HSEG algorithm is a hybrid of region growing and constrained spectral clustering that produces a hierarchical set of image segmentations based on detected convergence points. In the main, HSEG employs the hierarchical stepwise optimization (HSWO) approach to region growing, which was described as early as 1989 by Beaulieu and Goldberg. The HSWO approach seeks to produce segmentations that are more optimized than those produced by more classic approaches to region growing (e.g. Horowitz and T. Pavlidis, [3]). In addition, HSEG optionally interjects between HSWO region growing iterations, merges between spatially non-adjacent regions (i.e., spectrally based merging or clustering) constrained by a threshold derived from the previous HSWO region growing iteration. While the addition of constrained spectral clustering improves the utility of the segmentation results, especially for larger images, it also significantly increases HSEG s computational requirements. To counteract this, a computationally efficient recursive, divide-and-conquer, implementation of HSEG (RHSEG) was devised, which includes special code to avoid processing artifacts caused by RHSEG s recursive subdivision of the image data. The recursive nature of RHSEG makes for a straightforward parallel implementation. This paper describes the HSEG algorithm, its recursive formulation (referred to as RHSEG), and the implementation of RHSEG using massively parallel GNU-LINUX software. Results with Landsat TM data are included comparing RHSEG with classic

  1. Massive Asynchronous Parallelization of Sparse Matrix Factorizations

    Energy Technology Data Exchange (ETDEWEB)

    Chow, Edmond [Georgia Inst. of Technology, Atlanta, GA (United States)

    2018-01-08

    Solving sparse problems is at the core of many DOE computational science applications. We focus on the challenge of developing sparse algorithms that can fully exploit the parallelism in extreme-scale computing systems, in particular systems with massive numbers of cores per node. Our approach is to express a sparse matrix factorization as a large number of bilinear constraint equations, and then solving these equations via an asynchronous iterative method. The unknowns in these equations are the matrix entries of the factorization that is desired.

  2. ITER-FEAT safety

    International Nuclear Information System (INIS)

    Gordon, C.W.; Bartels, H.-W.; Honda, T.; Raeder, J.; Topilski, L.; Iseli, M.; Moshonas, K.; Taylor, N.; Gulden, W.; Kolbasov, B.; Inabe, T.; Tada, E.

    2001-01-01

    Safety has been an integral part of the design process for ITER since the Conceptual Design Activities of the project. The safety approach adopted in the ITER-FEAT design and the complementary assessments underway, to be documented in the Generic Site Safety Report (GSSR), are expected to help demonstrate the attractiveness of fusion and thereby set a good precedent for future fusion power reactors. The assessments address ITER's radiological hazards taking into account fusion's favourable safety characteristics. The expectation that ITER will need regulatory approval has influenced the entire safety design and assessment approach. This paper summarises the ITER-FEAT safety approach and assessments underway. (author)

  3. Preparation of CAD infrastructure for ITER procurement packages allocated to Korea

    International Nuclear Information System (INIS)

    Kim, G.H.; Hwang, H.S.; Choi, J.H.; Lee, H.G.; Thomas, E.; Redon, F.; Wilhelm, B.

    2013-01-01

    Highlights: ► It is necessary to use the same CAD platform between project partners for efficient design collaboration. ► Several unexpected problems were found during preparation of the CAD infrastructure. ► The problems have resulted in a waste of time and cost. ► The approach using the same configurations is effective to avoid IT-related problems. ► The design activities are steadily being performed on the prepared ITER CAD platform. -- Abstract: It is necessary to use the same CAD platform for standardization and efficient design collaboration between project partners. ITER has selected CATIA with ENOVIA as the primary CAD and integration tool. During the preparation of the CAD infrastructure, there were several difficulties with respect to information technology (IT). ITER design is classified as mechanical and plant. The procurement arrangement is divided into three types; functional specification, detailed design, and build to print. Therefore, it is important to prepare the suitable prerequisites according to the design type, and to comply with CAD methodologies to avoid trial and error. This paper presents how to overcome the difficulties and how to perform the CAD activities for ITER Korea procurement packages including important matters on a CAD infrastructure in a big project

  4. A high-speed linear algebra library with automatic parallelism

    Science.gov (United States)

    Boucher, Michael L.

    1994-01-01

    Parallel or distributed processing is key to getting highest performance workstations. However, designing and implementing efficient parallel algorithms is difficult and error-prone. It is even more difficult to write code that is both portable to and efficient on many different computers. Finally, it is harder still to satisfy the above requirements and include the reliability and ease of use required of commercial software intended for use in a production environment. As a result, the application of parallel processing technology to commercial software has been extremely small even though there are numerous computationally demanding programs that would significantly benefit from application of parallel processing. This paper describes DSSLIB, which is a library of subroutines that perform many of the time-consuming computations in engineering and scientific software. DSSLIB combines the high efficiency and speed of parallel computation with a serial programming model that eliminates many undesirable side-effects of typical parallel code. The result is a simple way to incorporate the power of parallel processing into commercial software without compromising maintainability, reliability, or ease of use. This gives significant advantages over less powerful non-parallel entries in the market.

  5. Hybrid parallelization of the XTOR-2F code for the simulation of two-fluid MHD instabilities in tokamaks

    Science.gov (United States)

    Marx, Alain; Lütjens, Hinrich

    2017-03-01

    A hybrid MPI/OpenMP parallel version of the XTOR-2F code [Lütjens and Luciani, J. Comput. Phys. 229 (2010) 8130] solving the two-fluid MHD equations in full tokamak geometry by means of an iterative Newton-Krylov matrix-free method has been developed. The present work shows that the code has been parallelized significantly despite the numerical profile of the problem solved by XTOR-2F, i.e. a discretization with pseudo-spectral representations in all angular directions, the stiffness of the two-fluid stability problem in tokamaks, and the use of a direct LU decomposition to invert the physical pre-conditioner at every Krylov iteration of the solver. The execution time of the parallelized version is an order of magnitude smaller than the sequential one for low resolution cases, with an increasing speedup when the discretization mesh is refined. Moreover, it allows to perform simulations with higher resolutions, previously forbidden because of memory limitations.

  6. A study of reconstruction artifacts in cone beam tomography using filtered backprojection and iterative EM algorithms

    International Nuclear Information System (INIS)

    Zeng, G.L.; Gullberg, G.T.

    1990-01-01

    Reconstruction artifacts in cone beam tomography are studied for filtered backprojection (Feldkamp) and iterative EM algorithms. The filtered backprojection algorithm uses a voxel-driven, interpolated backprojection to reconstruct the cone beam data; whereas, the iterative EM algorithm performs ray-driven projection and backprojection operations for each iteration. Two weight in schemes for the projection and backprojection operations in the EM algorithm are studied. One weights each voxel by the length of the ray through the voxel and the other equates the value of a voxel to the functional value of the midpoint of the line intersecting the voxel, which is obtained by interpolating between eight neighboring voxels. Cone beam reconstruction artifacts such as rings, bright vertical extremities, and slice-to slice cross talk are not found with parallel beam and fan beam geometries

  7. Copper Mountain conference on iterative methods: Proceedings: Volume 1

    Energy Technology Data Exchange (ETDEWEB)

    NONE

    1996-10-01

    This volume (one of two) contains information presented during the first three days of the Copper Mountain Conference on Iterative Methods held April 9-13, 1996 at Copper Mountain, Colorado. Topics of the sessions held these three days included nonlinear systems, parallel processing, preconditioning, sparse matrix test collections, first-order system least squares, Arnoldi`s method, integral equations, software, Navier-Stokes equations, Euler equations, Krylov methods, and eigenvalues. The top three papers from a student competition are also included. Selected papers indexed separately for the Energy Science and Technology Database.

  8. Diverse Power Iteration Embeddings and Its Applications

    Energy Technology Data Exchange (ETDEWEB)

    Huang H.; Yoo S.; Yu, D.; Qin, H.

    2014-12-14

    Abstract—Spectral Embedding is one of the most effective dimension reduction algorithms in data mining. However, its computation complexity has to be mitigated in order to apply it for real-world large scale data analysis. Many researches have been focusing on developing approximate spectral embeddings which are more efficient, but meanwhile far less effective. This paper proposes Diverse Power Iteration Embeddings (DPIE), which not only retains the similar efficiency of power iteration methods but also produces a series of diverse and more effective embedding vectors. We test this novel method by applying it to various data mining applications (e.g. clustering, anomaly detection and feature selection) and evaluating their performance improvements. The experimental results show our proposed DPIE is more effective than popular spectral approximation methods, and obtains the similar quality of classic spectral embedding derived from eigen-decompositions. Moreover it is extremely fast on big data applications. For example in terms of clustering result, DPIE achieves as good as 95% of classic spectral clustering on the complex datasets but 4000+ times faster in limited memory environment.

  9. The Davidson Method as an alternative to power iterations for criticality calculations

    International Nuclear Information System (INIS)

    Subramanian, C.; Van Criekingen, S.; Heuveline, V.; Nataf, F.; Have, P.

    2011-01-01

    The Davidson method is implemented within the neutron transport core solver parafish to solve k-eigenvalue criticality transport problems. The parafish solver is based on domain decomposition, uses spherical harmonics (P_N method) for angular discretization, and nonconforming finite elements for spatial discretization. The Davidson method is compared to the traditional power iteration method in that context. Encouraging numerical results are obtained with both sequential and parallel calculations. (author)

  10. Optimization under uncertainty of parallel nonlinear energy sinks

    Science.gov (United States)

    Boroson, Ethan; Missoum, Samy; Mattei, Pierre-Olivier; Vergez, Christophe

    2017-04-01

    Nonlinear Energy Sinks (NESs) are a promising technique for passively reducing the amplitude of vibrations. Through nonlinear stiffness properties, a NES is able to passively and irreversibly absorb energy. Unlike the traditional Tuned Mass Damper (TMD), NESs do not require a specific tuning and absorb energy over a wider range of frequencies. Nevertheless, they are still only efficient over a limited range of excitations. In order to mitigate this limitation and maximize the efficiency range, this work investigates the optimization of multiple NESs configured in parallel. It is well known that the efficiency of a NES is extremely sensitive to small perturbations in loading conditions or design parameters. In fact, the efficiency of a NES has been shown to be nearly discontinuous in the neighborhood of its activation threshold. For this reason, uncertainties must be taken into account in the design optimization of NESs. In addition, the discontinuities require a specific treatment during the optimization process. In this work, the objective of the optimization is to maximize the expected value of the efficiency of NESs in parallel. The optimization algorithm is able to tackle design variables with uncertainty (e.g., nonlinear stiffness coefficients) as well as aleatory variables such as the initial velocity of the main system. The optimal design of several parallel NES configurations for maximum mean efficiency is investigated. Specifically, NES nonlinear stiffness properties, considered random design variables, are optimized for cases with 1, 2, 3, 4, 5, and 10 NESs in parallel. The distributions of efficiency for the optimal parallel configurations are compared to distributions of efficiencies of non-optimized NESs. It is observed that the optimization enables a sharp increase in the mean value of efficiency while reducing the corresponding variance, thus leading to more robust NES designs.

  11. Efficient Parallel Sorting for Migrating Birds Optimization When Solving Machine-Part Cell Formation Problems

    Directory of Open Access Journals (Sweden)

    Ricardo Soto

    2016-01-01

    Full Text Available The Machine-Part Cell Formation Problem (MPCFP is a NP-Hard optimization problem that consists in grouping machines and parts in a set of cells, so that each cell can operate independently and the intercell movements are minimized. This problem has largely been tackled in the literature by using different techniques ranging from classic methods such as linear programming to more modern nature-inspired metaheuristics. In this paper, we present an efficient parallel version of the Migrating Birds Optimization metaheuristic for solving the MPCFP. Migrating Birds Optimization is a population metaheuristic based on the V-Flight formation of the migrating birds, which is proven to be an effective formation in energy saving. This approach is enhanced by the smart incorporation of parallel procedures that notably improve performance of the several sorting processes performed by the metaheuristic. We perform computational experiments on 1080 benchmarks resulting from the combination of 90 well-known MPCFP instances with 12 sorting configurations with and without threads. We illustrate promising results where the proposal is able to reach the global optimum in all instances, while the solving time with respect to a nonparallel approach is notably reduced.

  12. Megawatt Power Level 120 GHz Gyrotrons for ITER Start-Up

    Energy Technology Data Exchange (ETDEWEB)

    Choi, E M; Marchewka, C; Mastovsky, I; Shapiro, M A; Sirigiri, J R; Temkin, R J [MIT - Plasma Science and Fusion Center, NW16-186, 167 Albany Street, Cambridge, MA 02139 (United States)

    2005-01-01

    We report operation of a 110 GHz gyrotron with 1.67 MW of output power measured in short pulses (3{mu}s) at an efficiency of 42% in the TE{sub 22,6} mode. We also present a preliminary design of a 1 MW, 120 GHz gyrotron for ITER start-up with an efficiency greater than 50%.

  13. Megawatt Power Level 120 GHz Gyrotrons for ITER Start-Up

    International Nuclear Information System (INIS)

    Choi, E M; Marchewka, C; Mastovsky, I; Shapiro, M A; Sirigiri, J R; Temkin, R J

    2005-01-01

    We report operation of a 110 GHz gyrotron with 1.67 MW of output power measured in short pulses (3μs) at an efficiency of 42% in the TE 22,6 mode. We also present a preliminary design of a 1 MW, 120 GHz gyrotron for ITER start-up with an efficiency greater than 50%

  14. Parallel optoelectronic trinary signed-digit division

    Science.gov (United States)

    Alam, Mohammad S.

    1999-03-01

    The trinary signed-digit (TSD) number system has been found to be very useful for parallel addition and subtraction of any arbitrary length operands in constant time. Using the TSD addition and multiplication modules as the basic building blocks, we develop an efficient algorithm for performing parallel TSD division in constant time. The proposed division technique uses one TSD subtraction and two TSD multiplication steps. An optoelectronic correlator based architecture is suggested for implementation of the proposed TSD division algorithm, which fully exploits the parallelism and high processing speed of optics. An efficient spatial encoding scheme is used to ensure better utilization of space bandwidth product of the spatial light modulators used in the optoelectronic implementation.

  15. ITER ITA newsletter. No. 24, July 2005

    International Nuclear Information System (INIS)

    2005-08-01

    stimulant for international co-operation on science and technology in the twenty first century, and taking a broader view of the situation, Japan has decided that they will let the EU host the ITER site. Dr. J. Potocnik, European Commissioner for Science and Research, thanked Minister Nakayama for the highly constructive spirit with which he and his colleagues had conducted the bilateral discussions. He expressed his respect for the honourable manner in which the most sensitive stages were handled. He pointed out that the EU was well aware of the important task it had in front of it as the Host of ITER. The action taken had implications beyond that of establishing fusion energy. It was also an expression of mutual confidence to face the scientific, technical and political challenges that will occur in the course of this first-of-a-kind true international science cooperation among the leading nations of the world. ITER was establishing a model of global co-operation to address the increasingly global nature of the challenges confronting today's society. The Chinese Minister of Science and Technology, Mr. Xu Guanhua, expressed his pleasure that agreement on the site had been found within the six-Party framework. China considered that a sustainable solution to the world's energy source problem required multilateral international collaboration on fusion, so that participants could complement each other's skills and pool resources in the shared challenge. Mr. S. Choi, Vice-Minister of Science and Technology, Republic of Korea, reminded the delegates that the eyes of the world were on ITER as one of the most significant projects of the century, with a view to it being a peaceful and affluent one. Having just crossed the barrier of the site decision, there was still more to be done ahead, particularly by concluding the ITER Joint Implementation Agreement as soon as possible. He quoted a Korean proverb, literally translated as 'After rain ground hardens', which parallels with the

  16. Temperature effect on hydrocarbon deposition on molybdenum mirrors under ITER-relevant long-term plasma operation

    NARCIS (Netherlands)

    Rapp, J.; van Rooij, G. J.; Litnovsky, A.; Marot, L.; De Temmerman, G.; Westerhout, J.; Zoethout, E.

    2009-01-01

    Optical diagnostics in ITER will rely on mirrors near the plasma and the deterioration of the reflectivity is a concern. The effect of temperature on the deposition efficiency of hydrocarbons under long-term operation conditions similar to ITER was investigated in the linear plasma generator

  17. Parallel CT image reconstruction based on GPUs

    International Nuclear Information System (INIS)

    Flores, Liubov A.; Vidal, Vicent; Mayo, Patricia; Rodenas, Francisco; Verdú, Gumersindo

    2014-01-01

    In X-ray computed tomography (CT) iterative methods are more suitable for the reconstruction of images with high contrast and precision in noisy conditions from a small number of projections. However, in practice, these methods are not widely used due to the high computational cost of their implementation. Nowadays technology provides the possibility to reduce effectively this drawback. It is the goal of this work to develop a fast GPU-based algorithm to reconstruct high quality images from under sampled and noisy projection data. - Highlights: • We developed GPU-based iterative algorithm to reconstruct images. • Iterative algorithms are capable to reconstruct images from under sampled set of projections. • The computer cost of the implementation of the developed algorithm is low. • The efficiency of the algorithm increases for the large scale problems

  18. An efficient implementation of 3D high-resolution imaging for large-scale seismic data with GPU/CPU heterogeneous parallel computing

    Science.gov (United States)

    Xu, Jincheng; Liu, Wei; Wang, Jin; Liu, Linong; Zhang, Jianfeng

    2018-02-01

    De-absorption pre-stack time migration (QPSTM) compensates for the absorption and dispersion of seismic waves by introducing an effective Q parameter, thereby making it an effective tool for 3D, high-resolution imaging of seismic data. Although the optimal aperture obtained via stationary-phase migration reduces the computational cost of 3D QPSTM and yields 3D stationary-phase QPSTM, the associated computational efficiency is still the main problem in the processing of 3D, high-resolution images for real large-scale seismic data. In the current paper, we proposed a division method for large-scale, 3D seismic data to optimize the performance of stationary-phase QPSTM on clusters of graphics processing units (GPU). Then, we designed an imaging point parallel strategy to achieve an optimal parallel computing performance. Afterward, we adopted an asynchronous double buffering scheme for multi-stream to perform the GPU/CPU parallel computing. Moreover, several key optimization strategies of computation and storage based on the compute unified device architecture (CUDA) were adopted to accelerate the 3D stationary-phase QPSTM algorithm. Compared with the initial GPU code, the implementation of the key optimization steps, including thread optimization, shared memory optimization, register optimization and special function units (SFU), greatly improved the efficiency. A numerical example employing real large-scale, 3D seismic data showed that our scheme is nearly 80 times faster than the CPU-QPSTM algorithm. Our GPU/CPU heterogeneous parallel computing framework significant reduces the computational cost and facilitates 3D high-resolution imaging for large-scale seismic data.

  19. An Algorithm for Parallel Sn Sweeps on Unstructured Meshes

    International Nuclear Information System (INIS)

    Pautz, Shawn D.

    2002-01-01

    A new algorithm for performing parallel S n sweeps on unstructured meshes is developed. The algorithm uses a low-complexity list ordering heuristic to determine a sweep ordering on any partitioned mesh. For typical problems and with 'normal' mesh partitionings, nearly linear speedups on up to 126 processors are observed. This is an important and desirable result, since although analyses of structured meshes indicate that parallel sweeps will not scale with normal partitioning approaches, no severe asymptotic degradation in the parallel efficiency is observed with modest (≤100) levels of parallelism. This result is a fundamental step in the development of efficient parallel S n methods

  20. Second meeting of the ITER Preparatory Committee

    International Nuclear Information System (INIS)

    Drew, M.

    2003-01-01

    The committee charged to oversee the ITER ITA (ITER transitional arrangements) the ITER preparatory committee, held its second meeting on 24 September at the JET facilities at Culham, UK. Dr. Umberto Finzi of the European Commission was chairman. This meeting was also the first since the succession by Dr. Yasuo Shimomura to Dr. Robert Aymar as Interim Project Leader (IPL). Welcoming Dr. Shimomura in his new capacity, the Committee paid tribute to the outstanding contributions of his predecessor to the definition, design and promotion of ITER, and expressed the gratitude of all Participants to Dr. Aymar and its best wishes for future success in his new appointment.The technical activities of the ITA were the main focus of the Committee's discussions. The Committee took note of the IPL's Status Report on ITA Technical Activities and endorsed the IPL's proposals for the top level structure of the International Team, including the designation of Dr. Pietro Barabaschi as Deputy to the IPL. The Committee took note of the IPL's proposals on Participants' contributions to the ITA and of the Participants' stated intentions and expectations in this regard. Several Delegations pointed out that access to necessary resources would depend strongly on progress made towards the Agreement. All Participants were invited, in the shared interests of the project, to respond constructively to the specific technical areas where the IPL reported a lack of resources. Following a presentation from the IT on Project Management Tools, the Committee expressed support, in general, for the proposed strategy designed to provide the current team with the CAD and Data Management elements necessary to prepare for an efficient start of ITER construction, and asked the IT Leader to report on an estimate and time profile of expenditure during the period to mid-2004. The Committee supported the proposals to re-establish the ITER Test Blanket Working. The Committee agreed that the phasing of planned

  1. Toward Generalization of Iterative Small Molecule Synthesis.

    Science.gov (United States)

    Lehmann, Jonathan W; Blair, Daniel J; Burke, Martin D

    2018-02-01

    Small molecules have extensive untapped potential to benefit society, but access to this potential is too often restricted by limitations inherent to the customized approach currently used to synthesize this class of chemical matter. In contrast, the "building block approach", i.e., generalized iterative assembly of interchangeable parts, has now proven to be a highly efficient and flexible way to construct things ranging all the way from skyscrapers to macromolecules to artificial intelligence algorithms. The structural redundancy found in many small molecules suggests that they possess a similar capacity for generalized building block-based construction. It is also encouraging that many customized iterative synthesis methods have been developed that improve access to specific classes of small molecules. There has also been substantial recent progress toward the iterative assembly of many different types of small molecules, including complex natural products, pharmaceuticals, biological probes, and materials, using common building blocks and coupling chemistry. Collectively, these advances suggest that a generalized building block approach for small molecule synthesis may be within reach.

  2. Toward Generalization of Iterative Small Molecule Synthesis

    Science.gov (United States)

    Lehmann, Jonathan W.; Blair, Daniel J.; Burke, Martin D.

    2018-01-01

    Small molecules have extensive untapped potential to benefit society, but access to this potential is too often restricted by limitations inherent to the customized approach currently used to synthesize this class of chemical matter. In contrast, the “building block approach”, i.e., generalized iterative assembly of interchangeable parts, has now proven to be a highly efficient and flexible way to construct things ranging all the way from skyscrapers to macromolecules to artificial intelligence algorithms. The structural redundancy found in many small molecules suggests that they possess a similar capacity for generalized building block-based construction. It is also encouraging that many customized iterative synthesis methods have been developed that improve access to specific classes of small molecules. There has also been substantial recent progress toward the iterative assembly of many different types of small molecules, including complex natural products, pharmaceuticals, biological probes, and materials, using common building blocks and coupling chemistry. Collectively, these advances suggest that a generalized building block approach for small molecule synthesis may be within reach. PMID:29696152

  3. Application of remote handling compatibility on ITER plant

    International Nuclear Information System (INIS)

    Sanders, S.; Rolfe, A.; Mills, S.F.; Tesini, A.

    2011-01-01

    The ITER plant will require fully remote maintenance during its operational life. For this to be effective, safe and efficient the plant will have to be developed in accordance with remote handling (RH) compatibility requirements. A system for ensuring RH compatibility on plant designed for Tokamaks was successfully developed and applied, inter alia, by the authors when working at the JET project. The experience gained in assuring RH compatibility of plant at JET is now being applied to RH relevant ITER plant. The methodologies required to ensure RH compatibility of plant include the standardization of common plant items, standardization of RH features, availability of common guidance on RH best practice and a protocol for design and interface review and approval. The protocol in use at ITER is covered by the ITER Remote Maintenance Management System (IRMMS) defines the processes and utilization of management controls including Plant Definition Forms (PDF), Task Definition Forms (TDFs) and RH Compatibility Assessment Forms (RHCA) and the ITER RH Code of Practice. This paper will describe specific examples where the authors have applied the methodology proven at JET to ensure remote handling compatibility on ITER plant. Examples studied are: ·ELM coils (to be installed in-vessel behind the Blanket Modules) - handling both in-vessel, in Casks and at the Hot Cell as well as fully remote installation and connection (mechanical and electrical) in-vessel. ·Neutral beam systems (in-vessel and in the NB Cell) - beam sources, cesium oven, beam line components (accessed in the NB Cell) and Duct Liner (remotely replaced from in-vessel). ·Divertor (in-vessel) - cooling pipe work and remotely operated electrical connector. The RH compatibility process can significantly affect plant design. This paper should therefore be of interest to all parties who develop ITER plant designs.

  4. Comparison of the deflated preconditioned conjugate gradient method and parallel direct solver for composite materials

    NARCIS (Netherlands)

    Jönsthövel, T.B.; Van Gijzen, M.B.; MacLachlan, S.; Vuik, C.; Scarpas, A.

    2011-01-01

    The demand for large FE meshes increases as parallel computing becomes the standard in FE simulations. Direct and iterative solution methods are used to solve the resulting linear systems. Many applications concern composite materials, which are characterized by large discontinuities in the material

  5. Improving the efficiency of molecular replacement by utilizing a new iterative transform phasing algorithm

    Energy Technology Data Exchange (ETDEWEB)

    He, Hongxing; Fang, Hengrui [Department of Physics and Texas Center for Superconductivity, University of Houston, Houston, Texas 77204 (United States); Miller, Mitchell D. [Department of BioSciences, Rice University, Houston, Texas 77005 (United States); Phillips, George N. Jr [Department of BioSciences, Rice University, Houston, Texas 77005 (United States); Department of Chemistry, Rice University, Houston, Texas 77005 (United States); Department of Biochemistry, University of Wisconsin-Madison, Madison, Wisconsin 53706 (United States); Su, Wu-Pei, E-mail: wpsu@uh.edu [Department of Physics and Texas Center for Superconductivity, University of Houston, Houston, Texas 77204 (United States)

    2016-07-15

    An iterative transform algorithm is proposed to improve the conventional molecular-replacement method for solving the phase problem in X-ray crystallography. Several examples of successful trial calculations carried out with real diffraction data are presented. An iterative transform method proposed previously for direct phasing of high-solvent-content protein crystals is employed for enhancing the molecular-replacement (MR) algorithm in protein crystallography. Target structures that are resistant to conventional MR due to insufficient similarity between the template and target structures might be tractable with this modified phasing method. Trial calculations involving three different structures are described to test and illustrate the methodology. The relationship of the approach to PHENIX Phaser-MR and MR-Rosetta is discussed.

  6. Comparison of collective Thomson scattering signals due to fast ions in ITER scenarios with fusion and auxiliary heating

    DEFF Research Database (Denmark)

    Salewski, Mirko; Asunta, O.; Eriksson, L.-G.

    2009-01-01

    Auxiliary heating such as neutral beam injection (NBI) and ion cyclotron resonance heating (ICRH) will accelerate ions in ITER up to energies in the MeV range, i.e. energies which are also typical for alpha particles. Fast ions of any of these populations will elevate the collective Thomson...... functions of fast ions generated by NBI and ICRH are calculated for a steady-state ITER burning plasma equilibrium with the ASCOT and PION codes, respectively. The parameters for the auxiliary heating systems correspond to the design currently foreseen for ITER. The geometry of the CTS system for ITER...... is chosen such that near perpendicular and near parallel velocity components are resolved. In the investigated ICRH scenario, waves at 50MHz resonate with tritium at the second harmonic off-axis on the low field side. Effects of a minority heating scheme with He-3 are also considered. CTS scattering...

  7. Iterative Nonlinear Tikhonov Algorithm with Constraints for Electromagnetic Tomography

    Science.gov (United States)

    Xu, Feng; Deshpande, Manohar

    2012-01-01

    Low frequency electromagnetic tomography such as the capacitance tomography (ECT) has been proposed for monitoring and mass-gauging of gas-liquid two-phase system under microgravity condition in NASA's future long-term space missions. Due to the ill-posed inverse problem of ECT, images reconstructed using conventional linear algorithms often suffer from limitations such as low resolution and blurred edges. Hence, new efficient high resolution nonlinear imaging algorithms are needed for accurate two-phase imaging. The proposed Iterative Nonlinear Tikhonov Regularized Algorithm with Constraints (INTAC) is based on an efficient finite element method (FEM) forward model of quasi-static electromagnetic problem. It iteratively minimizes the discrepancy between FEM simulated and actual measured capacitances by adjusting the reconstructed image using the Tikhonov regularized method. More importantly, it enforces the known permittivity of two phases to the unknown pixels which exceed the reasonable range of permittivity in each iteration. This strategy does not only stabilize the converging process, but also produces sharper images. Simulations show that resolution improvement of over 2 times can be achieved by INTAC with respect to conventional approaches. Strategies to further improve spatial imaging resolution are suggested, as well as techniques to accelerate nonlinear forward model and thus increase the temporal resolution.

  8. ITER council proceedings: 1998

    International Nuclear Information System (INIS)

    1999-01-01

    This volume contains documents of the 13th and the 14th ITER council meeting as well as of the 1st extraordinary ITER council meeting. Documents of the ITER meetings held in Vienna and Yokohama during 1998 are also included. The contents include an outline of the ITER objectives, the ITER parameters and design overview as well as operating scenarios and plasma performance. Furthermore, design features, safety and environmental characteristics are given

  9. Static and dynamic load-balancing strategies for parallel reservoir simulation

    International Nuclear Information System (INIS)

    Anguille, L.; Killough, J.E.; Li, T.M.C.; Toepfer, J.L.

    1995-01-01

    Accurate simulation of the complex phenomena that occur in flow in porous media can tax even the most powerful serial computers. Emergence of new parallel computer architectures as a future efficient tool in reservoir simulation may overcome this difficulty. Unfortunately, major problems remain to be solved before using parallel computers commercially: production serial programs must be rewritten to be efficient in parallel environments and load balancing methods must be explored to evenly distribute the workload on each processor during the simulation. This study implements both a static load-balancing algorithm and a receiver-initiated dynamic load-sharing algorithm to achieve high parallel efficiencies on both the IBM SP2 and Intel IPSC/860 parallel computers. Significant speedup improvement was recorded for both methods. Further optimization of these algorithms yielded a technique with efficiencies as high as 90% and 70% on 8 and 32 nodes, respectively. The increased performance was the result of the minimization of message-passing overhead

  10. Using SharePoint to manage and disseminate fusion project information: An ITER case study

    International Nuclear Information System (INIS)

    Prescott, Barry; Downing, James; Di Maio, Marco; How, John

    2010-01-01

    The ITER Organization, in common with many other fusion laboratories, has an authenticated-access website devoted to the communication of information to all its staff and remote collaborators. In 2007 and 2008, the number of registered users of this site increased by more than a factor of ten, to over 3000 at present, and with approximately 900 unique users using the website per month. In parallel, the project management of the organisation has been put in place. A decision was taken to move the web platform from simple HTML to Microsoft SharePoint and to web-enable the many applications and databases used for ITER management. This decision has been well justified by the power and extensive flexibility provided by SharePoint, for example it permits different groups to publish their own information and to collaborate, and to consolidate disparate spreadsheet data in linked SharePoint lists to improve quality and maintainability. This paper examines the use of SharePoint at ITER: why it was selected and what benefits it brings to both the local and remote ITER community. Some active case studies are presented. The paper also looks ahead at what future benefits to ITER this platform offers, and reviews the type of information that the site can profitably publish. The paper also highlights some of the limitations of the platform, the problems of integration with other ITER systems, and discusses its potential for adaptability in other scientific organisations.

  11. The numerical parallel computing of photon transport

    International Nuclear Information System (INIS)

    Huang Qingnan; Liang Xiaoguang; Zhang Lifa

    1998-12-01

    The parallel computing of photon transport is investigated, the parallel algorithm and the parallelization of programs on parallel computers both with shared memory and with distributed memory are discussed. By analyzing the inherent law of the mathematics and physics model of photon transport according to the structure feature of parallel computers, using the strategy of 'to divide and conquer', adjusting the algorithm structure of the program, dissolving the data relationship, finding parallel liable ingredients and creating large grain parallel subtasks, the sequential computing of photon transport into is efficiently transformed into parallel and vector computing. The program was run on various HP parallel computers such as the HY-1 (PVP), the Challenge (SMP) and the YH-3 (MPP) and very good parallel speedup has been gotten

  12. Iter

    Science.gov (United States)

    Iotti, Robert

    2015-04-01

    ITER is an international experimental facility being built by seven Parties to demonstrate the long term potential of fusion energy. The ITER Joint Implementation Agreement (JIA) defines the structure and governance model of such cooperation. There are a number of necessary conditions for such international projects to be successful: a complete design, strong systems engineering working with an agreed set of requirements, an experienced organization with systems and plans in place to manage the project, a cost estimate backed by industry, and someone in charge. Unfortunately for ITER many of these conditions were not present. The paper discusses the priorities in the JIA which led to setting up the project with a Central Integrating Organization (IO) in Cadarache, France as the ITER HQ, and seven Domestic Agencies (DAs) located in the countries of the Parties, responsible for delivering 90%+ of the project hardware as Contributions-in-Kind and also financial contributions to the IO, as ``Contributions-in-Cash.'' Theoretically the Director General (DG) is responsible for everything. In practice the DG does not have the power to control the work of the DAs, and there is not an effective management structure enabling the IO and the DAs to arbitrate disputes, so the project is not really managed, but is a loose collaboration of competing interests. Any DA can effectively block a decision reached by the DG. Inefficiencies in completing design while setting up a competent organization from scratch contributed to the delays and cost increases during the initial few years. So did the fact that the original estimate was not developed from industry input. Unforeseen inflation and market demand on certain commodities/materials further exacerbated the cost increases. Since then, improvements are debatable. Does this mean that the governance model of ITER is a wrong model for international scientific cooperation? I do not believe so. Had the necessary conditions for success

  13. Efficient Simulation of Population Overflow in Parallel Queues

    NARCIS (Netherlands)

    Nicola, V.F.; Zaburnenko, T.S.

    2006-01-01

    In this paper we propose a state-dependent importance sampling heuristic to estimate the probability of population overflow in networks of parallel queues. This heuristic approximates the “optimal��? state-dependent change of measure without the need for dif��?cult mathematical analysis or costly

  14. Current Status on the Korean Test Blanket Module Development for testing in the ITER

    International Nuclear Information System (INIS)

    Lee, Dong Won; Kim, Suk Kwon; Bae, Young Dug; Yoon, Jae Sung; Jung, Ki Sok

    2010-01-01

    Korea has proposed and designed a Helium Cooled Molten Lithium (HCML) Test Blanket Module (TBM) to be tested in the International Thermonuclear Experimental Reactor (ITER). Ferrite Martensitic (FM) steel is used as the structural material and helium (He) is used as a coolant to cool the first wall (FW) and breeding zone. Liquid lithium (Li) is circulated for a tritium breeding, not for a cooling purpose. Main purpose for developing the TBM is to develop the design technology for DEMO and fusion reactor and it should be proved through the experiment in the ITER with TBM. Therefore, we have developed the design scheme and related codes including the safety analysis for obtain the license to be tested in the ITER. In order to develop and install at the ITER, several technologies were developed in parallel; fabrication, breeder, He cooling, tritium extraction and so on. Figure 1 shows the overall TBM development scheme. In Korea, official strategy for developing the TBM is to participate to other parties' concept such as US and EU ones, in which PbLi (lead lithium eutectic), He, and FM steel were used for liquid breeder, coolant, and structural material, respectively

  15. An efficient iterative model reduction method for aeroviscoelastic panel flutter analysis in the supersonic regime

    Science.gov (United States)

    Cunha-Filho, A. G.; Briend, Y. P. J.; de Lima, A. M. G.; Donadon, M. V.

    2018-05-01

    The flutter boundary prediction of complex aeroelastic systems is not an easy task. In some cases, these analyses may become prohibitive due to the high computational cost and time associated with the large number of degrees of freedom of the aeroelastic models, particularly when the aeroelastic model incorporates a control strategy with the aim of suppressing the flutter phenomenon, such as the use of viscoelastic treatments. In this situation, the use of a model reduction method is essential. However, the construction of a modal reduction basis for aeroviscoelastic systems is still a challenge, owing to the inherent frequency- and temperature-dependent behavior of the viscoelastic materials. Thus, the main contribution intended for the present study is to propose an efficient and accurate iterative enriched Ritz basis to deal with aeroviscoelastic systems. The main features and capabilities of the proposed model reduction method are illustrated in the prediction of flutter boundary for a thin three-layer sandwich flat panel and a typical aeronautical stiffened panel, both under supersonic flow.

  16. Verifying large modular systems using iterative abstraction refinement

    International Nuclear Information System (INIS)

    Lahtinen, Jussi; Kuismin, Tuomas; Heljanko, Keijo

    2015-01-01

    Digital instrumentation and control (I&C) systems are increasingly used in the nuclear engineering domain. The exhaustive verification of these systems is challenging, and the usual verification methods such as testing and simulation are typically insufficient. Model checking is a formal method that is able to exhaustively analyse the behaviour of a model against a formally written specification. If the model checking tool detects a violation of the specification, it will give out a counter-example that demonstrates how the specification is violated in the system. Unfortunately, sometimes real life system designs are too big to be directly analysed by traditional model checking techniques. We have developed an iterative technique for model checking large modular systems. The technique uses abstraction based over-approximations of the model behaviour, combined with iterative refinement. The main contribution of the work is the concrete abstraction refinement technique based on the modular structure of the model, the dependency graph of the model, and a refinement sampling heuristic similar to delta debugging. The technique is geared towards proving properties, and outperforms BDD-based model checking, the k-induction technique, and the property directed reachability algorithm (PDR) in our experiments. - Highlights: • We have developed an iterative technique for model checking large modular systems. • The technique uses BDD-based model checking, k-induction, and PDR in parallel. • We have tested our algorithm by verifying two models with it. • The technique outperforms classical model checking methods in our experiments

  17. A survey of parallel multigrid algorithms

    Science.gov (United States)

    Chan, Tony F.; Tuminaro, Ray S.

    1987-01-01

    A typical multigrid algorithm applied to well-behaved linear-elliptic partial-differential equations (PDEs) is described. Criteria for designing and evaluating parallel algorithms are presented. Before evaluating the performance of some parallel multigrid algorithms, consideration is given to some theoretical complexity results for solving PDEs in parallel and for executing the multigrid algorithm. The effect of mapping and load imbalance on the partial efficiency of the algorithm is studied.

  18. Iterative Sparse Channel Estimation and Decoding for Underwater MIMO-OFDM

    Directory of Open Access Journals (Sweden)

    Berger ChristianR

    2010-01-01

    Full Text Available We propose a block-by-block iterative receiver for underwater MIMO-OFDM that couples channel estimation with multiple-input multiple-output (MIMO detection and low-density parity-check (LDPC channel decoding. In particular, the channel estimator is based on a compressive sensing technique to exploit the channel sparsity, the MIMO detector consists of a hybrid use of successive interference cancellation and soft minimum mean-square error (MMSE equalization, and channel coding uses nonbinary LDPC codes. Various feedback strategies from the channel decoder to the channel estimator are studied, including full feedback of hard or soft symbol decisions, as well as their threshold-controlled versions. We study the receiver performance using numerical simulation and experimental data collected from the RACE08 and SPACE08 experiments. We find that iterative receiver processing including sparse channel estimation leads to impressive performance gains. These gains are more pronounced when the number of available pilots to estimate the channel is decreased, for example, when a fixed number of pilots is split between an increasing number of parallel data streams in MIMO transmission. For the various feedback strategies for iterative channel estimation, we observe that soft decision feedback slightly outperforms hard decision feedback.

  19. A fast method to emulate an iterative POCS image reconstruction algorithm.

    Science.gov (United States)

    Zeng, Gengsheng L

    2017-10-01

    Iterative image reconstruction algorithms are commonly used to optimize an objective function, especially when the objective function is nonquadratic. Generally speaking, the iterative algorithms are computationally inefficient. This paper presents a fast algorithm that has one backprojection and no forward projection. This paper derives a new method to solve an optimization problem. The nonquadratic constraint, for example, an edge-preserving denoising constraint is implemented as a nonlinear filter. The algorithm is derived based on the POCS (projections onto projections onto convex sets) approach. A windowed FBP (filtered backprojection) algorithm enforces the data fidelity. An iterative procedure, divided into segments, enforces edge-enhancement denoising. Each segment performs nonlinear filtering. The derived iterative algorithm is computationally efficient. It contains only one backprojection and no forward projection. Low-dose CT data are used for algorithm feasibility studies. The nonlinearity is implemented as an edge-enhancing noise-smoothing filter. The patient studies results demonstrate its effectiveness in processing low-dose x ray CT data. This fast algorithm can be used to replace many iterative algorithms. © 2017 American Association of Physicists in Medicine.

  20. Three-Dimensional Induced Polarization Parallel Inversion Using Nonlinear Conjugate Gradients Method

    Directory of Open Access Journals (Sweden)

    Huan Ma

    2015-01-01

    Full Text Available Four kinds of array of induced polarization (IP methods (surface, borehole-surface, surface-borehole, and borehole-borehole are widely used in resource exploration. However, due to the presence of large amounts of the sources, it will take much time to complete the inversion. In the paper, a new parallel algorithm is described which uses message passing interface (MPI and graphics processing unit (GPU to accelerate 3D inversion of these four methods. The forward finite differential equation is solved by ILU0 preconditioner and the conjugate gradient (CG solver. The inverse problem is solved by nonlinear conjugate gradients (NLCG iteration which is used to calculate one forward and two “pseudo-forward” modelings and update the direction, space, and model in turn. Because each source is independent in forward and “pseudo-forward” modelings, multiprocess modes are opened by calling MPI library. The iterative matrix solver within CULA is called in each process. Some tables and synthetic data examples illustrate that this parallel inversion algorithm is effective. Furthermore, we demonstrate that the joint inversion of surface and borehole data produces resistivity and chargeability results are superior to those obtained from inversions of individual surface data.

  1. STEP: Self-supporting tailored k-space estimation for parallel imaging reconstruction.

    Science.gov (United States)

    Zhou, Zechen; Wang, Jinnan; Balu, Niranjan; Li, Rui; Yuan, Chun

    2016-02-01

    A new subspace-based iterative reconstruction method, termed Self-supporting Tailored k-space Estimation for Parallel imaging reconstruction (STEP), is presented and evaluated in comparison to the existing autocalibrating method SPIRiT and calibrationless method SAKE. In STEP, two tailored schemes including k-space partition and basis selection are proposed to promote spatially variant signal subspace and incorporated into a self-supporting structured low rank model to enforce properties of locality, sparsity, and rank deficiency, which can be formulated into a constrained optimization problem and solved by an iterative algorithm. Simulated and in vivo datasets were used to investigate the performance of STEP in terms of overall image quality and detail structure preservation. The advantage of STEP on image quality is demonstrated by retrospectively undersampled multichannel Cartesian data with various patterns. Compared with SPIRiT and SAKE, STEP can provide more accurate reconstruction images with less residual aliasing artifacts and reduced noise amplification in simulation and in vivo experiments. In addition, STEP has the capability of combining compressed sensing with arbitrary sampling trajectory. Using k-space partition and basis selection can further improve the performance of parallel imaging reconstruction with or without calibration signals. © 2015 Wiley Periodicals, Inc.

  2. Positron emission tomographic images and expectation maximization: A VLSI architecture for multiple iterations per second

    International Nuclear Information System (INIS)

    Jones, W.F.; Byars, L.G.; Casey, M.E.

    1988-01-01

    A digital electronic architecture for parallel processing of the expectation maximization (EM) algorithm for Positron Emission tomography (PET) image reconstruction is proposed. Rapid (0.2 second) EM iterations on high resolution (256 x 256) images are supported. Arrays of two very large scale integration (VLSI) chips perform forward and back projection calculations. A description of the architecture is given, including data flow and partitioning relevant to EM and parallel processing. EM images shown are produced with software simulating the proposed hardware reconstruction algorithm. Projected cost of the system is estimated to be small in comparison to the cost of current PET scanners

  3. Comparative efficiencies of three parallel algorithms for nonlinear ...

    Indian Academy of Sciences (India)

    R. Narasimhan (Krishtel eMaging) 1461 1996 Oct 15 13:05:22

    This algorithm is better suited for large size problems on coarse ... and reliable time integration algorithms for solving the second-order dynamic equilibrium equations that arise due ... Programming models required to take advantage of the parallel and distributed ..... In addition, MPI added the concept of a 'virtual topology'.

  4. Parallel Computing Characteristics of Two-Phase Thermal-Hydraulics code, CUPID

    International Nuclear Information System (INIS)

    Lee, Jae Ryong; Yoon, Han Young

    2013-01-01

    Parallelized CUPID code has proved to be able to reproduce multi-dimensional thermal hydraulic analysis by validating with various conceptual problems and experimental data. In this paper, the characteristics of the parallelized CUPID code were investigated. Both single- and two phase simulation are taken into account. Since the scalability of a parallel simulation is known to be better for fine mesh system, two types of mesh system are considered. In addition, the dependency of the preconditioner for matrix solver was also compared. The scalability for the single-phase flow is better than that for two-phase flow due to the less numbers of iterations for solving pressure matrix. The CUPID code was investigated the parallel performance in terms of scalability. The CUPID code was parallelized with domain decomposition method. The MPI library was adopted to communicate the information at the interface cells. As increasing the number of mesh, the scalability is improved. For a given mesh, single-phase flow simulation with diagonal preconditioner shows the best speedup. However, for the two-phase flow simulation, the ILU preconditioner is recommended since it reduces the overall simulation time

  5. Provably optimal parallel transport sweeps on regular grids

    International Nuclear Information System (INIS)

    Adams, M. P.; Adams, M. L.; Hawkins, W. D.; Smith, T.; Rauchwerger, L.; Amato, N. M.; Bailey, T. S.; Falgout, R. D.

    2013-01-01

    We have found provably optimal algorithms for full-domain discrete-ordinate transport sweeps on regular grids in 3D Cartesian geometry. We describe these algorithms and sketch a 'proof that they always execute the full eight-octant sweep in the minimum possible number of stages for a given P x x P y x P z partitioning. Computational results demonstrate that our optimal scheduling algorithms execute sweeps in the minimum possible stage count. Observed parallel efficiencies agree well with our performance model. An older version of our PDT transport code achieves almost 80% parallel efficiency on 131,072 cores, on a weak-scaling problem with only one energy group, 80 directions, and 4096 cells/core. A newer version is less efficient at present-we are still improving its implementation - but achieves almost 60% parallel efficiency on 393,216 cores. These results conclusively demonstrate that sweeps can perform with high efficiency on core counts approaching 10 6 . (authors)

  6. Provably optimal parallel transport sweeps on regular grids

    Energy Technology Data Exchange (ETDEWEB)

    Adams, M. P.; Adams, M. L.; Hawkins, W. D. [Dept. of Nuclear Engineering, Texas A and M University, 3133 TAMU, College Station, TX 77843-3133 (United States); Smith, T.; Rauchwerger, L.; Amato, N. M. [Dept. of Computer Science and Engineering, Texas A and M University, 3133 TAMU, College Station, TX 77843-3133 (United States); Bailey, T. S.; Falgout, R. D. [Lawrence Livermore National Laboratory (United States)

    2013-07-01

    We have found provably optimal algorithms for full-domain discrete-ordinate transport sweeps on regular grids in 3D Cartesian geometry. We describe these algorithms and sketch a 'proof that they always execute the full eight-octant sweep in the minimum possible number of stages for a given P{sub x} x P{sub y} x P{sub z} partitioning. Computational results demonstrate that our optimal scheduling algorithms execute sweeps in the minimum possible stage count. Observed parallel efficiencies agree well with our performance model. An older version of our PDT transport code achieves almost 80% parallel efficiency on 131,072 cores, on a weak-scaling problem with only one energy group, 80 directions, and 4096 cells/core. A newer version is less efficient at present-we are still improving its implementation - but achieves almost 60% parallel efficiency on 393,216 cores. These results conclusively demonstrate that sweeps can perform with high efficiency on core counts approaching 10{sup 6}. (authors)

  7. Intelligent controller of a flexible hybrid robot machine for ITER assembly and maintenance

    International Nuclear Information System (INIS)

    Al-saedi, Mazin I.; Wu, Huapeng; Handroos, Heikki

    2014-01-01

    Highlights: • Studying flexible multibody dynamic of hybrid parallel robot. • Investigating fuzzy-PD controller to control a hybrid flexible hydraulically driven robot. • Investigating ANFIS-PD controller to control a hybrid flexible robot. Compare to traditional PID this method gives better performance. • Using the equilibrium of reaction forces between the parallel and serial parts of hybrid robot to control the serial part hydraulically driven. - Abstract: The assembly and maintenance of International Thermonuclear Experimental Reactor (ITER) vacuum vessel (VV) is highly challenging since the tasks performed by the robot involve welding, material handling, and machine cutting from inside the VV. To fulfill the tasks in ITER application, this paper presents a hybrid redundant manipulator with four DOFs provided by serial kinematic axes and six DOFs by parallel mechanism. Thus, in machining, to achieve greater end-effector trajectory tracking accuracy for surface quality, a robust control of the actuators for the flexible link has to be deduced. In this paper, the intelligent control of a hydraulically driven parallel robot part based on the dynamic model and two control schemes have been investigated: (1) fuzzy-PID self tuning controller composed of the conventional PID control and with fuzzy logic; (2) adaptive neuro-fuzzy inference system-PID (ANFIS-PID) self tuning of the gains of the PID controller, which are implemented independently to control each hydraulic cylinder of the parallel robot based on rod position predictions. The obtained results of the fuzzy-PID and ANFIS-PID self tuning controller can reduce more tracking errors than the conventional PID controller. Subsequently, the serial component of the hybrid robot can be analyzed using the equilibrium of reaction forces at the universal joint connections of the hexa-element. To achieve precise positional control of the end effector for maximum precision machining, the hydraulic cylinder should

  8. Intelligent controller of a flexible hybrid robot machine for ITER assembly and maintenance

    Energy Technology Data Exchange (ETDEWEB)

    Al-saedi, Mazin I., E-mail: mazin.al-saedi@lut.fi; Wu, Huapeng; Handroos, Heikki

    2014-10-15

    Highlights: • Studying flexible multibody dynamic of hybrid parallel robot. • Investigating fuzzy-PD controller to control a hybrid flexible hydraulically driven robot. • Investigating ANFIS-PD controller to control a hybrid flexible robot. Compare to traditional PID this method gives better performance. • Using the equilibrium of reaction forces between the parallel and serial parts of hybrid robot to control the serial part hydraulically driven. - Abstract: The assembly and maintenance of International Thermonuclear Experimental Reactor (ITER) vacuum vessel (VV) is highly challenging since the tasks performed by the robot involve welding, material handling, and machine cutting from inside the VV. To fulfill the tasks in ITER application, this paper presents a hybrid redundant manipulator with four DOFs provided by serial kinematic axes and six DOFs by parallel mechanism. Thus, in machining, to achieve greater end-effector trajectory tracking accuracy for surface quality, a robust control of the actuators for the flexible link has to be deduced. In this paper, the intelligent control of a hydraulically driven parallel robot part based on the dynamic model and two control schemes have been investigated: (1) fuzzy-PID self tuning controller composed of the conventional PID control and with fuzzy logic; (2) adaptive neuro-fuzzy inference system-PID (ANFIS-PID) self tuning of the gains of the PID controller, which are implemented independently to control each hydraulic cylinder of the parallel robot based on rod position predictions. The obtained results of the fuzzy-PID and ANFIS-PID self tuning controller can reduce more tracking errors than the conventional PID controller. Subsequently, the serial component of the hybrid robot can be analyzed using the equilibrium of reaction forces at the universal joint connections of the hexa-element. To achieve precise positional control of the end effector for maximum precision machining, the hydraulic cylinder should

  9. RAMA: A file system for massively parallel computers

    Science.gov (United States)

    Miller, Ethan L.; Katz, Randy H.

    1993-01-01

    This paper describes a file system design for massively parallel computers which makes very efficient use of a few disks per processor. This overcomes the traditional I/O bottleneck of massively parallel machines by storing the data on disks within the high-speed interconnection network. In addition, the file system, called RAMA, requires little inter-node synchronization, removing another common bottleneck in parallel processor file systems. Support for a large tertiary storage system can easily be integrated in lo the file system; in fact, RAMA runs most efficiently when tertiary storage is used.

  10. Iterative Equalization and Interference Alignment for Multiuser MIMO HetNets with Imperfect CSI

    Directory of Open Access Journals (Sweden)

    Daniel Castanheira

    2015-01-01

    Full Text Available In this paper we consider a scenario, where several small-cells work under the same coverage area and spectrum of a macrocell. The signals stemming from the small-cell (macrocell users if not carefully dealt with will generate harmful interference into the macrocell (small-cell. To tackle this problem interference alignment and iterative equalization techniques are considered. By using IA all interference generated by the small-cell (macrocell users is aligned along a low dimensional subspace, at the macrocell (small-cells. This reduces considerably the amount of resources allocated, to enable the coexistence of the two systems. However, perfect IA requires the availability of error-free channel state information (CSI at the transmitters. Due to CSI errors one can have substantial performance degradation due to imperfect alignments. Since in this work the IA precoders are based on imperfect CSI, an efficient iterative space-frequency equalization is designed at the receiver side to cope with the residual aligned interference. The results demonstrate that iterative equalization is robust to imperfect CSI and removes efficiently the interference generated by the poorly aligned interference. Close to matched filter bound performance is achieved, with a very few number of iterations.

  11. Parallel alternating direction preconditioner for isogeometric simulations of explicit dynamics

    KAUST Repository

    Łoś, Marcin

    2015-04-27

    In this paper we present a parallel implementation of the alternating direction preconditioner for isogeometric simulations of explicit dynamics. The Alternating Direction Implicit (ADI) algorithm, belongs to the category of matrix-splitting iterative methods, was proposed almost six decades ago for solving parabolic and elliptic partial differential equations, see [1–4]. The new version of this algorithm has been recently developed for isogeometric simulations of two dimensional explicit dynamics [5] and steady-state diffusion equations with orthotropic heterogenous coefficients [6]. In this paper we present a parallel version of the alternating direction implicit algorithm for three dimensional simulations. The algorithm has been incorporated as a part of PETIGA an isogeometric framework [7] build on top of PETSc [8]. We show the scalability of the parallel algorithm on STAMPEDE linux cluster up to 10,000 processors, as well as the convergence rate of the PCG solver with ADI algorithm as preconditioner.

  12. Experiences in Data-Parallel Programming

    Directory of Open Access Journals (Sweden)

    Terry W. Clark

    1997-01-01

    Full Text Available To efficiently parallelize a scientific application with a data-parallel compiler requires certain structural properties in the source program, and conversely, the absence of others. A recent parallelization effort of ours reinforced this observation and motivated this correspondence. Specifically, we have transformed a Fortran 77 version of GROMOS, a popular dusty-deck program for molecular dynamics, into Fortran D, a data-parallel dialect of Fortran. During this transformation we have encountered a number of difficulties that probably are neither limited to this particular application nor do they seem likely to be addressed by improved compiler technology in the near future. Our experience with GROMOS suggests a number of points to keep in mind when developing software that may at some time in its life cycle be parallelized with a data-parallel compiler. This note presents some guidelines for engineering data-parallel applications that are compatible with Fortran D or High Performance Fortran compilers.

  13. A PARALLEL NONOVERLAPPING DOMAIN DECOMPOSITION METHOD FOR STOKES PROBLEMS

    Institute of Scientific and Technical Information of China (English)

    Mei-qun Jiang; Pei-liang Dai

    2006-01-01

    A nonoverlapping domain decomposition iterative procedure is developed and analyzed for generalized Stokes problems and their finite element approximate problems in RN(N=2,3). The method is based on a mixed-type consistency condition with two parameters as a transmission condition together with a derivative-free transmission data updating technique on the artificial interfaces. The method can be applied to a general multi-subdomain decomposition and implemented on parallel machines with local simple communications naturally.

  14. Vector-Parallel processing of the successive overrelaxation method

    International Nuclear Information System (INIS)

    Yokokawa, Mitsuo

    1988-02-01

    Successive overrelaxation method, called SOR method, is one of iterative methods for solving linear system of equations, and it has been calculated in serial with a natural ordering in many nuclear codes. After the appearance of vector processors, this natural SOR method has been changed for the parallel algorithm such as hyperplane or red-black method, in which the calculation order is modified. These methods are suitable for vector processors, and more high-speed calculation can be obtained compared with the natural SOR method on vector processors. In this report, a new scheme named 4-colors SOR method is proposed. We find that the 4-colors SOR method can be executed on vector-parallel processors and it gives the most high-speed calculation among all SOR methods according to results of the vector-parallel execution on the Alliant FX/8 multiprocessor system. It is also shown that the theoretical optimal acceleration parameters are equal among five different ordering SOR methods, and the difference between convergence rates of these SOR methods are examined. (author)

  15. PIXIE3D: An efficient, fully implicit, parallel, 3D extended MHD code for fusion plasma modeling

    International Nuclear Information System (INIS)

    Chacon, L.

    2007-01-01

    PIXIE3D is a modern, parallel, state-of-the-art extended MHD code that employs fully implicit methods for efficiency and accuracy. It features a general geometry formulation, and is therefore suitable for the study of many magnetic fusion configurations of interest. PIXIE3D advances the state of the art in extended MHD modeling in two fundamental ways. Firstly, it employs a novel conservative finite volume scheme which is remarkably robust and stable, and demands very small physical and/or numerical dissipation. This is a fundamental requirement when one wants to study fusion plasmas with realistic conductivities. Secondly, PIXIE3D features fully-implicit time stepping, employing Newton-Krylov methods for inverting the associated nonlinear systems. These methods have been shown to be scalable and efficient when preconditioned properly. Novel preconditioned ideas (so-called physics based), which were prototypes in the context of reduced MHD, have been adapted for 3D primitive-variable resistive MHD in PIXIE3D, and are currently being extended to Hall MHD. PIXIE3D is fully parallel, employing PETSc for parallelism. PIXIE3D has been thoroughly benchmarked against linear theory and against other available extended MHD codes on nonlinear test problems (such as the GEM reconnection challenge). We are currently in the process of extending such comparisons to fusion-relevant problems in realistic geometries. In this talk, we will describe both the spatial discretization approach and the preconditioning strategy employed for extended MHD in PIXIE3D. We will report on recent benchmarking studies between PIXIE3D and other 3D extended MHD codes, and will demonstrate its usefulness in a variety of fusion-relevant configurations such as Tokamaks and Reversed Field Pinches. (Author)

  16. Structural analysis of the ITER Vacuum Vessel regarding 2012 ITER Project-Level Loads

    Energy Technology Data Exchange (ETDEWEB)

    Martinez, J.-M., E-mail: jean-marc.martinez@live.fr [ITER Organization, Route de Vinon sur Verdon, 13115 St Paul lez Durance (France); Jun, C.H.; Portafaix, C.; Choi, C.-H.; Ioki, K.; Sannazzaro, G.; Sborchia, C. [ITER Organization, Route de Vinon sur Verdon, 13115 St Paul lez Durance (France); Cambazar, M.; Corti, Ph.; Pinori, K.; Sfarni, S.; Tailhardat, O. [Assystem EOS, 117 rue Jacquard, L' Atrium, 84120 Pertuis (France); Borrelly, S. [Sogeti High Tech, RE2, 180 rue René Descartes, Le Millenium – Bat C, 13857 Aix en Provence (France); Albin, V.; Pelletier, N. [SOM Calcul – Groupe ORTEC, 121 ancien Chemin de Cassis – Immeuble Grand Pré, 13009 Marseille (France)

    2014-10-15

    Highlights: • ITER Vacuum Vessel is a part of the first barrier to confine the plasma. • ITER Vacuum Vessel as Nuclear Pressure Equipment (NPE) necessitates a third party organization authorized by the French nuclear regulator to assure design, fabrication, conformance testing and quality assurance, i.e. Agreed Notified Body (ANB). • A revision of the ITER Project-Level Load Specification was implemented in April 2012. • ITER Vacuum Vessel Loads (seismic, pressure, thermal and electromagnetic loads) were summarized. • ITER Vacuum Vessel Structural Margins with regards to RCC-MR code were summarized. - Abstract: A revision of the ITER Project-Level Load Specification (to be used for all systems of the ITER machine) was implemented in April 2012. This revision supports ITER's licensing by accommodating requests from the French regulator to maintain consistency with the plasma physics database and our present understanding of plasma transients and electro-magnetic (EM) loads, to investigate the possibility of removing unnecessary conservatism in the load requirements and to review the list and definition of incidental cases. The purpose of this paper is to present the impact of this 2012 revision of the ITER Project-Level Load Specification (LS) on the ITER Vacuum Vessel (VV) loads and the main structural margins required by the applicable French code, RCC-MR.

  17. ITER test programme

    International Nuclear Information System (INIS)

    Abdou, M.; Baker, C.; Casini, G.

    1991-01-01

    ITER has been designed to operate in two phases. The first phase which lasts for 6 years, is devoted to machine checkout and physics testing. The second phase lasts for 8 years and is devoted primarily to technology testing. This report describes the technology test program development for ITER, the ancillary equipment outside the torus necessary to support the test modules, the international collaboration aspects of conducting the test program on ITER, the requirements on the machine major parameters and the R and D program required to develop the test modules for testing in ITER. 15 refs, figs and tabs

  18. A Novel Algorithm for Solving the Multidimensional Neutron Transport Equation on Massively Parallel Architectures

    Energy Technology Data Exchange (ETDEWEB)

    Azmy, Yousry

    2014-06-10

    We employ the Integral Transport Matrix Method (ITMM) as the kernel of new parallel solution methods for the discrete ordinates approximation of the within-group neutron transport equation. The ITMM abandons the repetitive mesh sweeps of the traditional source iterations (SI) scheme in favor of constructing stored operators that account for the direct coupling factors among all the cells' fluxes and between the cells' and boundary surfaces' fluxes. The main goals of this work are to develop the algorithms that construct these operators and employ them in the solution process, determine the most suitable way to parallelize the entire procedure, and evaluate the behavior and parallel performance of the developed methods with increasing number of processes, P. The fastest observed parallel solution method, Parallel Gauss-Seidel (PGS), was used in a weak scaling comparison with the PARTISN transport code, which uses the source iteration (SI) scheme parallelized with the Koch-baker-Alcouffe (KBA) method. Compared to the state-of-the-art SI-KBA with diffusion synthetic acceleration (DSA), this new method- even without acceleration/preconditioning-is completitive for optically thick problems as P is increased to the tens of thousands range. For the most optically thick cells tested, PGS reduced execution time by an approximate factor of three for problems with more than 130 million computational cells on P = 32,768. Moreover, the SI-DSA execution times's trend rises generally more steeply with increasing P than the PGS trend. Furthermore, the PGS method outperforms SI for the periodic heterogeneous layers (PHL) configuration problems. The PGS method outperforms SI and SI-DSA on as few as P = 16 for PHL problems and reduces execution time by a factor of ten or more for all problems considered with more than 2 million computational cells on P = 4.096.

  19. ITER-FEAT outline design report

    International Nuclear Information System (INIS)

    2001-01-01

    In July 1998 the ITER Parties were unable, for financial reasons, to proceed with construction of the ITER design proposed at that time, to meet the detailed technical objectives and target cost set in 1992. It was therefore decided to investigate options for the design of ITER with reduced technical objectives and with possibly decreased technical margins, whose target construction cost was one half that of the 1998 ITER design, while maintaining the overall programmatic objective. To identify designs that might meet the revised objectives, task forces involving the JCT and Home Teams met during 1998 and 1999 to analyse and compare a range of options for the design of such a device. This led at the end of 1999 to a single configuration for the ITER design with parameters considered to be the most credible consistent with technical limitations and the financial target, yet meeting fully the objectives with appropriate margins. This new design of ITER, called ''ITER-FEAT'', was submitted to the ITER Director to the ITER Parties as the ''ITER-FEAT Outline Design Report'' (ODR) in January 2000, at their meeting in Tokyo. The Parties subsequently conducted their domestic assessments of this report and fed the resulting comments back into the progressing design. The progress on the developing design was reported to the ITER Technical Advisory Committee (TAC) in June 2000 in the report ''Progress in Resolving Open Design Issues from the ODR'' alongside a report on Progress in Technology R and D for ITER. In addition, the progress in the ITER-FEAT Design and Validating R and D was reported to the ITER Parties. The ITER-FEAT design was subsequently approved by the governing body of ITER in Moscow in June 2000 as the basis for the preparation of the Final Design Report, recognising it as a single mature design for ITER consistent with its revised objectives. This volume contains the documents pertinent to the process described above. More detailed technical information

  20. Effect of aspect ratio and number of meshes on convergence of steady-state flow calculation using Newton-Raphson iterative procedure

    International Nuclear Information System (INIS)

    Shimizu, Takeshi

    1997-01-01

    In this paper, we discuss the stability of the convergence of a nonlinear iteration procedure which may be affected by a large number of numerical factors in a complicated way. A numerical parallel channel flow problem is solved using the finite element method and the Newton-Raphson iteration procedure. The numerical factors, on which we focus attention in this study, are the aspect ratio of the channel and the number of divided meshes. We propose a nondimensional value, which is obtained from the Reynolds number, the aspect ratio and the number of meshes. The results of the numerical experiment show that the threshold of divergence in the iteration is indicated clearly by the present nondimensional value. (author)

  1. Efficient implementation of a multidimensional fast fourier transform on a distributed-memory parallel multi-node computer

    Science.gov (United States)

    Bhanot, Gyan V [Princeton, NJ; Chen, Dong [Croton-On-Hudson, NY; Gara, Alan G [Mount Kisco, NY; Giampapa, Mark E [Irvington, NY; Heidelberger, Philip [Cortlandt Manor, NY; Steinmacher-Burow, Burkhard D [Mount Kisco, NY; Vranas, Pavlos M [Bedford Hills, NY

    2008-01-01

    The present in invention is directed to a method, system and program storage device for efficiently implementing a multidimensional Fast Fourier Transform (FFT) of a multidimensional array comprising a plurality of elements initially distributed in a multi-node computer system comprising a plurality of nodes in communication over a network, comprising: distributing the plurality of elements of the array in a first dimension across the plurality of nodes of the computer system over the network to facilitate a first one-dimensional FFT; performing the first one-dimensional FFT on the elements of the array distributed at each node in the first dimension; re-distributing the one-dimensional FFT-transformed elements at each node in a second dimension via "all-to-all" distribution in random order across other nodes of the computer system over the network; and performing a second one-dimensional FFT on elements of the array re-distributed at each node in the second dimension, wherein the random order facilitates efficient utilization of the network thereby efficiently implementing the multidimensional FFT. The "all-to-all" re-distribution of array elements is further efficiently implemented in applications other than the multidimensional FFT on the distributed-memory parallel supercomputer.

  2. A transport synthetic acceleration method for transport iterations

    International Nuclear Information System (INIS)

    Ramone, G.L.; Adams, M.L.

    1997-01-01

    A family of transport synthetic acceleration (TSA) methods for iteratively solving within group scattering problems is presented. A single iteration in these schemes consists of a transport sweep followed by a low-order calculation, which itself is a simplified transport problem. The method for isotropic-scattering problems in X-Y geometry is described. The Fourier analysis of a model problem for equations with no spatial discretization shows that a previously proposed TSA method is unstable in two dimensions but that the modifications make it stable and rapidly convergent. The same procedure for discretized transport equations, using the step characteristic and two bilinear discontinuous methods, shows that discretization enhances TSA performance. A conjugate gradient algorithm for the low-order problem is described, a crude quadrature set for the low-order problem is proposed, and the number of low-order iterations per high-order sweep is limited to a relatively small value. These features lead to simple and efficient improvements to the method. TSA is tested on a series of problems, and a set of parameters is proposed for which the method behaves especially well. TSA achieves a substantial reduction in computational cost over source iteration, regardless of discretization parameters or material properties, and this reduction increases with the difficulty of the problem

  3. Efficient Serial and Parallel Algorithms for Selection of Unique Oligos in EST Databases.

    Science.gov (United States)

    Mata-Montero, Manrique; Shalaby, Nabil; Sheppard, Bradley

    2013-01-01

    Obtaining unique oligos from an EST database is a problem of great importance in bioinformatics, particularly in the discovery of new genes and the mapping of the human genome. Many algorithms have been developed to find unique oligos, many of which are much less time consuming than the traditional brute force approach. An algorithm was presented by Zheng et al. (2004) which finds the solution of the unique oligos search problem efficiently. We implement this algorithm as well as several new algorithms based on some theorems included in this paper. We demonstrate how, with these new algorithms, we can obtain unique oligos much faster than with previous ones. We parallelize these new algorithms to further improve the time of finding unique oligos. All algorithms are run on ESTs obtained from a Barley EST database.

  4. Hypergraph partitioning implementation for parallelizing matrix-vector multiplication using CUDA GPU-based parallel computing

    Science.gov (United States)

    Murni, Bustamam, A.; Ernastuti, Handhika, T.; Kerami, D.

    2017-07-01

    Calculation of the matrix-vector multiplication in the real-world problems often involves large matrix with arbitrary size. Therefore, parallelization is needed to speed up the calculation process that usually takes a long time. Graph partitioning techniques that have been discussed in the previous studies cannot be used to complete the parallelized calculation of matrix-vector multiplication with arbitrary size. This is due to the assumption of graph partitioning techniques that can only solve the square and symmetric matrix. Hypergraph partitioning techniques will overcome the shortcomings of the graph partitioning technique. This paper addresses the efficient parallelization of matrix-vector multiplication through hypergraph partitioning techniques using CUDA GPU-based parallel computing. CUDA (compute unified device architecture) is a parallel computing platform and programming model that was created by NVIDIA and implemented by the GPU (graphics processing unit).

  5. New Parallel Algorithms for Structural Analysis and Design of Aerospace Structures

    Science.gov (United States)

    Nguyen, Duc T.

    1998-01-01

    Subspace and Lanczos iterations have been developed, well documented, and widely accepted as efficient methods for obtaining p-lowest eigen-pair solutions of large-scale, practical engineering problems. The focus of this paper is to incorporate recent developments in vectorized sparse technologies in conjunction with Subspace and Lanczos iterative algorithms for computational enhancements. Numerical performance, in terms of accuracy and efficiency of the proposed sparse strategies for Subspace and Lanczos algorithm, is demonstrated by solving for the lowest frequencies and mode shapes of structural problems on the IBM-R6000/590 and SunSparc 20 workstations.

  6. Project management techniques used in the European Vacuum Vessel sectors procurement for ITER

    Energy Technology Data Exchange (ETDEWEB)

    Losasso, Marcello, E-mail: marcello.losasso@f4e.europa.eu [Fusion for Energy (F4E), Barcelona (Spain); Ortiz de Zuniga, Maria; Jones, Lawrence; Bayon, Angel; Arbogast, Jean-Francois; Caixas, Joan; Fernandez, Jose; Galvan, Stefano; Jover, Teresa [Fusion for Energy (F4E), Barcelona (Spain); Ioki, Kimihiro [ITER Organisation, Route de Vinon sur Verdon, 13115 Saint Paul Lez Durance (France); Lewczanin, Michal; Mico, Gonzalo; Pacheco, Jose Miguel [Fusion for Energy (F4E), Barcelona (Spain); Preble, Joseph [ITER Organisation, Route de Vinon sur Verdon, 13115 Saint Paul Lez Durance (France); Stamos, Vassilis; Trentea, Alexandru [Fusion for Energy (F4E), Barcelona (Spain)

    2012-08-15

    Highlights: Black-Right-Pointing-Pointer File name contains the directory tree structure with a string of three-letter acronyms, thereby enabling parent directory location when confronted with orphan files. Black-Right-Pointing-Pointer The management of the procurement procedure was carried out in an efficient and timely manner, achieving precisely the contract placement date foreseen at the start of the process. Black-Right-Pointing-Pointer The contract start-up has been effectively implemented and a flexible project management system has been put in place for an efficient monitoring of the contract. - Abstract: The contract for the seven European Sectors of the ITER Vacuum Vessel (VV) was placed at the end of 2010 with a consortium of three Italian companies. The task of placing and the initial take-off of this large and complex contract, one of the largest placed by F4E, the European Domestic Agency for ITER, is described. A stringent quality controlled system with a bespoke Vacuum Vessel Project Lifecycle Management system to control the information flow, based on ENOVIA SmarTeam, was developed to handle the storage and approval of Documentation including links to the F4E Vacuum Vessel system and ITER International Organization System interfaces. The VV Sector design and manufacturing schedule is based on Primavera software, which is cost loaded thus allowing F4E to carry out performance measurement with respect to its payments and commitments. This schedule is then integrated into the overall Vacuum Vessel schedule, which includes ancillary activities such as instruments, preliminary design and analysis. The VV Sector Risk Management included three separate risk analyses from F4E and the bidders, utilizing two different methodologies. These efforts will lead to an efficient and effective implementation of this contract, vital to the success of the ITER machine, since the Vacuum Vessel is the biggest single work package of Europe's contribution to ITER and

  7. Infrared laser diagnostics for ITER

    International Nuclear Information System (INIS)

    Hutchinson, D.P.; Richards, R.K.; Ma, C.H.

    1995-01-01

    Two infrared laser-based diagnostics are under development at ORNL for measurements on burning plasmas such as ITER. The primary effort is the development of a CO 2 laser Thomson scattering diagnostic for the measurement of the velocity distribution of confined fusion-product alpha particles. Key components of the system include a high-power, single-mode CO 2 pulsed laser, an efficient optics system for beam transport and a multichannel low-noise infrared heterodyne receiver. A successful proof-of-principle experiment has been performed on the Advanced Toroidal Facility (ATF) stellerator at ORNL utilizing scattering from electron plasma frequency satellites. The diagnostic system is currently being installed on Alcator C-Mod at MIT for measurements of the fast ion tail produced by ICRH heating. A second diagnostic under development at ORNL is an infrared polarimeter for Faraday rotation measurements in future fusion experiments. A preliminary feasibility study of a CO 2 laser tangential viewing polarimeter for measuring electron density profiles in ITER has been completed. For ITER plasma parameters and a polarimeter wavelength of 10.6 microm, a Faraday rotation of up to 26 degree is predicted. An electro-optic polarization modulation technique has been developed at ORNL. Laboratory tests of this polarimeter demonstrated a sensitivity of ≤ 0.01 degree. Because of the similarity in the expected Faraday rotation in ITER and Alcator C-Mod, a collaboration between ORNL and the MIT Plasma Fusion Center has been undertaken to test this polarimeter system on Alcator C-Mod. A 10.6 microm polarimeter for this measurement has been constructed and integrated into the existing C-Mod multichannel two-color interferometer. With present experimental parameters for C-Mod, the predicted Faraday rotation was on the order of 0.1 degree. Significant output signals were observed during preliminary tests. Further experiment and detailed analyses are under way

  8. ITER council proceedings: 1995

    International Nuclear Information System (INIS)

    1996-01-01

    Records of the 8. ITER Council Meeting (IC-8), held on 26-27 July 1995, in San Diego, USA, and the 9. ITER Council Meeting (IC-9) held on 12-13 December 1995, in Garching, Germany, are presented, giving essential information on the evolution of the ITER Engineering Design Activities (EDA) and the ITER Interim Design Report Package and Relevant Documents. Figs, tabs

  9. Tests on the integration of the ITER divertor dummy armour prototype on a simplified model of cassette body

    International Nuclear Information System (INIS)

    Dell'Orco, G.; Canneta, A.; Cattadori, G.; Gaspari, G.P.; Merola, M.; Polazzi, G.; Vieider, G.; Zito, D.

    2001-01-01

    In 1998, in the frame of the European R and D on ITER high heat flux components, the fabrication of a full scale ITER Divertor Outboard mock-up was launched. It comprised a Cassette Body, designed with some mechanical and hydraulic simplifications with respect to the reference body, and the actively cooled Dummy Armour Prototype (DAP). This DAP consists of the Vertical Target, the Wing and the Dump Target, manufactured by the European industry, which are integrated with the Gas Box Liner supplied by the Russian Federation Home Team. In order to simplify the manufacturing, the DAP was layered with an equivalent CuCrZr thickness simulating the real armour (CFC or W tiles). In parallel with the manufacturing activity, the ITER European HT decided to assign to ENEA the Task EU-DV1 for the 'Component Integration and Thermal-Hydraulic Testing of the ITER Divertor Targets and Wing Dummy Prototypes and Cassette Body'

  10. Tests on the integration of the ITER divertor dummy armour prototype on a simplified model of cassette body

    Energy Technology Data Exchange (ETDEWEB)

    Dell' Orco, G. E-mail: dellorco@brasimone.enea.it; Canneta, A.; Cattadori, G.; Gaspari, G.P.; Merola, M.; Polazzi, G.; Vieider, G.; Zito, D

    2001-10-01

    In 1998, in the frame of the European R and D on ITER high heat flux components, the fabrication of a full scale ITER Divertor Outboard mock-up was launched. It comprised a Cassette Body, designed with some mechanical and hydraulic simplifications with respect to the reference body, and the actively cooled Dummy Armour Prototype (DAP). This DAP consists of the Vertical Target, the Wing and the Dump Target, manufactured by the European industry, which are integrated with the Gas Box Liner supplied by the Russian Federation Home Team. In order to simplify the manufacturing, the DAP was layered with an equivalent CuCrZr thickness simulating the real armour (CFC or W tiles). In parallel with the manufacturing activity, the ITER European HT decided to assign to ENEA the Task EU-DV1 for the 'Component Integration and Thermal-Hydraulic Testing of the ITER Divertor Targets and Wing Dummy Prototypes and Cassette Body'.

  11. Parallel computation of nondeterministic algorithms in VLSI

    Energy Technology Data Exchange (ETDEWEB)

    Hortensius, P D

    1987-01-01

    This work examines parallel VLSI implementations of nondeterministic algorithms. It is demonstrated that conventional pseudorandom number generators are unsuitable for highly parallel applications. Efficient parallel pseudorandom sequence generation can be accomplished using certain classes of elementary one-dimensional cellular automata. The pseudorandom numbers appear in parallel on each clock cycle. Extensive study of the properties of these new pseudorandom number generators is made using standard empirical random number tests, cycle length tests, and implementation considerations. Furthermore, it is shown these particular cellular automata can form the basis of efficient VLSI architectures for computations involved in the Monte Carlo simulation of both the percolation and Ising models from statistical mechanics. Finally, a variation on a Built-In Self-Test technique based upon cellular automata is presented. These Cellular Automata-Logic-Block-Observation (CALBO) circuits improve upon conventional design for testability circuitry.

  12. MPI_XSTAR: MPI-based Parallelization of the XSTAR Photoionization Program

    Science.gov (United States)

    Danehkar, Ashkbiz; Nowak, Michael A.; Lee, Julia C.; Smith, Randall K.

    2018-02-01

    We describe a program for the parallel implementation of multiple runs of XSTAR, a photoionization code that is used to predict the physical properties of an ionized gas from its emission and/or absorption lines. The parallelization program, called MPI_XSTAR, has been developed and implemented in the C++ language by using the Message Passing Interface (MPI) protocol, a conventional standard of parallel computing. We have benchmarked parallel multiprocessing executions of XSTAR, using MPI_XSTAR, against a serial execution of XSTAR, in terms of the parallelization speedup and the computing resource efficiency. Our experience indicates that the parallel execution runs significantly faster than the serial execution, however, the efficiency in terms of the computing resource usage decreases with increasing the number of processors used in the parallel computing.

  13. A simple and efficient parallel FFT algorithm using the BSP model

    NARCIS (Netherlands)

    Bisseling, R.H.; Inda, M.A.

    2000-01-01

    In this paper we present a new parallel radix FFT algorithm based on the BSP model Our parallel algorithm uses the groupcyclic distribution family which makes it simple to understand and easy to implement We show how to reduce the com munication cost of the algorithm by a factor of three in the case

  14. HAlign-II: efficient ultra-large multiple sequence alignment and phylogenetic tree reconstruction with distributed and parallel computing.

    Science.gov (United States)

    Wan, Shixiang; Zou, Quan

    2017-01-01

    Multiple sequence alignment (MSA) plays a key role in biological sequence analyses, especially in phylogenetic tree construction. Extreme increase in next-generation sequencing results in shortage of efficient ultra-large biological sequence alignment approaches for coping with different sequence types. Distributed and parallel computing represents a crucial technique for accelerating ultra-large (e.g. files more than 1 GB) sequence analyses. Based on HAlign and Spark distributed computing system, we implement a highly cost-efficient and time-efficient HAlign-II tool to address ultra-large multiple biological sequence alignment and phylogenetic tree construction. The experiments in the DNA and protein large scale data sets, which are more than 1GB files, showed that HAlign II could save time and space. It outperformed the current software tools. HAlign-II can efficiently carry out MSA and construct phylogenetic trees with ultra-large numbers of biological sequences. HAlign-II shows extremely high memory efficiency and scales well with increases in computing resource. THAlign-II provides a user-friendly web server based on our distributed computing infrastructure. HAlign-II with open-source codes and datasets was established at http://lab.malab.cn/soft/halign.

  15. A Globally Convergent Parallel SSLE Algorithm for Inequality Constrained Optimization

    Directory of Open Access Journals (Sweden)

    Zhijun Luo

    2014-01-01

    Full Text Available A new parallel variable distribution algorithm based on interior point SSLE algorithm is proposed for solving inequality constrained optimization problems under the condition that the constraints are block-separable by the technology of sequential system of linear equation. Each iteration of this algorithm only needs to solve three systems of linear equations with the same coefficient matrix to obtain the descent direction. Furthermore, under certain conditions, the global convergence is achieved.

  16. EU Developments of the ITER ECRH System

    International Nuclear Information System (INIS)

    Henderson, M.

    2006-01-01

    The electron cyclotron (EC) heating and current drive (H (and) CD) system of ITER will deliver 20 MW/CW in the plasma at 170 GHz for H (and) CD in addition to 2.5 MW/3 s at 120 GHz for plasma start-up. The EC system is composed of power supplies (PS), up to 24 H (and) CD gyrotrons (1 to 2 MW tubes), 3 start-up gyrotrons (1 MW tubes), 24 transmission lines and two sets of launching antennas: equatorial (EL) and upper (UL) launchers. Under the present ITER procurement package the EU is responsible for one third of the H (and) CD 170 GHz gyrotrons, all PSs associated with the H (and) CD system, and the whole set (4) of upper launchers. In all areas of participation, the EU EC partnership (coordinated by the European Fusion Development Association - EFDA) aims toward advancing the technology of each of these subsystems. For example, procurement of Pulse Step Modulator (PSM) HVPS is under consideration, which might have equivalent costs to the present ITER design (thyristor HVPS and HV series switch), but with an increased flexibility in operation and variation in the EC power waveform. The EU is at the forefront in gyrotron research and is developing a 2 MW CW 170 GHz coaxial cavity gyrotron offering an increase in output power while maintaining moderate power densities in the gyrotron cavity and collector. THALES R in collaboration with its EFDA partners (FZK, CRPP, TEKES) is manufacturing a series of prototype tubes in three phases of typically 1 s, 100 s and then CW pulse capacity (∼ 20 10 ). A 2 MW, CW gyrotron test facility is being built at CRPP that will be used to develop the 2 MW coaxial tube, in addition to testing various components required by the EC system. EFDA has undertaken a parallel development of two launcher options: front (FS) and remote (RS) steering, with the aim of providing an optimum launcher for ITER weighing EC physics aspects and operation reliability. The FS launcher (ITER reference design) offers a significant enhancement in physics

  17. ITER CTA newsletter. No. 3

    International Nuclear Information System (INIS)

    2001-11-01

    This ITER CTA newsletter comprises reports of Dr. P. Barnard, Iter Canada Chairman and CEO, about the progress of the first formal ITER negotiations and about the demonstration of details of Canada's bid on ITER workshops, and Dr. V. Vlasenkov, Project Board Secretary, about the meeting of the ITER CTA project board

  18. Progress in LHCD: a tool for advanced regimes on ITER

    Energy Technology Data Exchange (ETDEWEB)

    Tuccillo, A. A.

    2005-07-01

    Since the early 80s, Lower Hybrid (LH) waves have driven plasma current non-inductively in tokamak experiments with efficiency far higher than other auxiliary systems, especially at low plasma temperature. The latter aspect makes LH the natural candidate for off axis current drive (CD) in ITER where current profile control will be required to maintain burning performance on a long time scale. The difficulties in coupling LH waves in high performance H-mode plasmas led in the 90s to the decision, by the ITER JCT, of not supporting directly the development of a LH system, with subsequent exclusion of LH from ITER initial Hn and CD systems. Nevertheless, the LH activity continued in all areas: theory, experiments, modelling and R and D. Recently, in JET, the problem of coupling LH waves in severe edge conditions (ELMy plasmas and very low SOL densities) has been solved by localised gas injection. LH has then boosted the research on advanced scenarios allowing a variety of current profiles, from peaked to deeply hollow, to be used in generating Internal Transport Barriers (ITBs). In JET, long ITBs (11s) have been sustained for times close to the resistive time scale in full CD conditions, additionally pulses lasting minutes, relying on LHCD, are routinely obtained in Tore Supra. Advanced scenarios have also been obtained in conditions of dominant electron heating (FTU, Tore Supra) and efficient CD has been demonstrated at ITER relevant densities on FTU. In these experiments, an improvement of the Electron Cyclotron (EC) CD efficiency (up to 4?) due to synergy with LH is also observed both in normal operation and with EC resonating at a down shifted frequency. Modelling all these experiments has been a good test bench for LH codes and has increased the confidence in their use to predict future experiments. The use of LH as an actuator for Real Time Control of the plasma current profile has become a powerful tool to optimise ITB dynamics in JET and JT-60U. An LH system

  19. Architectural concept for the ITER Plasma Control System

    Energy Technology Data Exchange (ETDEWEB)

    Treutterer, W., E-mail: Wolfgang.Treutterer@ipp.mpg.de [Max-Planck Institute for Plasma Physics, EURATOM Association, Garching (Germany); Humphreys, D., E-mail: humphreys@fusion.gat.com [General Atomics, San Diego, CA (United States); Raupp, G., E-mail: Gerhard.Raupp@ipp.mpg.de [Max-Planck Institute for Plasma Physics, EURATOM Association, Garching (Germany); Schuster, E., E-mail: schuster@lehigh.edu [Lehigh University, Bethlehem, PA (United States); Snipes, J., E-mail: Joseph.Snipes@iter.org [ITER Organization, 13115 St. Paul-lez-Durance (France); De Tommasi, G., E-mail: detommas@unina.it [CREATE/Università di Napoli Federico II, Napoli (Italy); Walker, M., E-mail: walker@fusion.gat.com [General Atomics, San Diego, CA (United States); Winter, A., E-mail: Axel.Winter@iter.org [ITER Organization, 13115 St. Paul-lez-Durance (France)

    2014-05-15

    The plasma control system is a key instrument for successfully investigating the physics of burning plasma at ITER. It has the task to execute an experimental plan, known as pulse schedule, in the presence of complex relationships between plasma parameters like temperature, pressure, confinement and shape. The biggest challenge in the design of the control system is to find an adequate breakdown of this task in a hierarchy of feedback control functions. But it is also important to foresee structures that allow handling unplanned exceptional situations to protect the machine. Also the management of the limited number of actuator systems for multiple targets is an aspect with a strong impact on system architecture. Finally, the control system must be flexible and reconfigurable to cover the manifold facets of plasma behaviour and investigation goals. In order to prepare the development of a control system for ITER plasma operation, a conceptual design has been proposed by a group of worldwide experts and reviewed by an ITER panel in 2012. In this paper we describe the fundamental principles of the proposed control system architecture and how they were derived from a systematic collection and analysis of use cases and requirements. The experience and best practices from many fusion devices and research laboratories, augmented by the envisaged ITER specific tasks, build the foundation of this collection. In the next step control functions were distilled from this input. An analysis of the relationships between the functions allowed sequential and parallel structures, alternate branches and conflicting requirements to be identified. Finally, a concept of selectable control layers consisting of nested “compact controllers” was synthesised. Each control layer represents a cascaded scheme from high-level to elementary controllers and implements a control hierarchy. The compact controllers are used to resolve conflicts when several control functions would use the same

  20. Architectural concept for the ITER Plasma Control System

    International Nuclear Information System (INIS)

    Treutterer, W.; Humphreys, D.; Raupp, G.; Schuster, E.; Snipes, J.; De Tommasi, G.; Walker, M.; Winter, A.

    2014-01-01

    The plasma control system is a key instrument for successfully investigating the physics of burning plasma at ITER. It has the task to execute an experimental plan, known as pulse schedule, in the presence of complex relationships between plasma parameters like temperature, pressure, confinement and shape. The biggest challenge in the design of the control system is to find an adequate breakdown of this task in a hierarchy of feedback control functions. But it is also important to foresee structures that allow handling unplanned exceptional situations to protect the machine. Also the management of the limited number of actuator systems for multiple targets is an aspect with a strong impact on system architecture. Finally, the control system must be flexible and reconfigurable to cover the manifold facets of plasma behaviour and investigation goals. In order to prepare the development of a control system for ITER plasma operation, a conceptual design has been proposed by a group of worldwide experts and reviewed by an ITER panel in 2012. In this paper we describe the fundamental principles of the proposed control system architecture and how they were derived from a systematic collection and analysis of use cases and requirements. The experience and best practices from many fusion devices and research laboratories, augmented by the envisaged ITER specific tasks, build the foundation of this collection. In the next step control functions were distilled from this input. An analysis of the relationships between the functions allowed sequential and parallel structures, alternate branches and conflicting requirements to be identified. Finally, a concept of selectable control layers consisting of nested “compact controllers” was synthesised. Each control layer represents a cascaded scheme from high-level to elementary controllers and implements a control hierarchy. The compact controllers are used to resolve conflicts when several control functions would use the same

  1. Nonlinear and parallel algorithms for finite element discretizations of the incompressible Navier-Stokes equations

    Science.gov (United States)

    Arteaga, Santiago Egido

    1998-12-01

    The steady-state Navier-Stokes equations are of considerable interest because they are used to model numerous common physical phenomena. The applications encountered in practice often involve small viscosities and complicated domain geometries, and they result in challenging problems in spite of the vast attention that has been dedicated to them. In this thesis we examine methods for computing the numerical solution of the primitive variable formulation of the incompressible equations on distributed memory parallel computers. We use the Galerkin method to discretize the differential equations, although most results are stated so that they apply also to stabilized methods. We also reformulate some classical results in a single framework and discuss some issues frequently dismissed in the literature, such as the implementation of pressure space basis and non- homogeneous boundary values. We consider three nonlinear methods: Newton's method, Oseen's (or Picard) iteration, and sequences of Stokes problems. All these iterative nonlinear methods require solving a linear system at every step. Newton's method has quadratic convergence while that of the others is only linear; however, we obtain theoretical bounds showing that Oseen's iteration is more robust, and we confirm it experimentally. In addition, although Oseen's iteration usually requires more iterations than Newton's method, the linear systems it generates tend to be simpler and its overall costs (in CPU time) are lower. The Stokes problems result in linear systems which are easier to solve, but its convergence is much slower, so that it is competitive only for large viscosities. Inexact versions of these methods are studied, and we explain why the best timings are obtained using relatively modest error tolerances in solving the corresponding linear systems. We also present a new damping optimization strategy based on the quadratic nature of the Navier-Stokes equations, which improves the robustness of all the

  2. 3D, parallel fluid-structure interaction code

    CSIR Research Space (South Africa)

    Oxtoby, Oliver F

    2011-01-01

    Full Text Available The authors describe the development of a 3D parallel Fluid–Structure–Interaction (FSI) solver and its application to benchmark problems. Fluid and solid domains are discretised using and edge-based finite-volume scheme for efficient parallel...

  3. How to use MPI communication in highly parallel climate simulations more easily and more efficiently.

    Science.gov (United States)

    Behrens, Jörg; Hanke, Moritz; Jahns, Thomas

    2014-05-01

    In this talk we present a way to facilitate efficient use of MPI communication for developers of climate models. Exploitation of the performance potential of today's highly parallel supercomputers with real world simulations is a complex task. This is partly caused by the low level nature of the MPI communication library which is the dominant communication tool at least for inter-node communication. In order to manage the complexity of the task, climate simulations with non-trivial communication patterns often use an internal abstraction layer above MPI without exploiting the benefits of communication aggregation or MPI-datatypes. The solution for the complexity and performance problem we propose is the communication library YAXT. This library is built on top of MPI and takes high level descriptions of arbitrary domain decompositions and automatically derives an efficient collective data exchange. Several exchanges can be aggregated in order to reduce latency costs. Examples are given which demonstrate the simplicity and the performance gains for selected climate applications.

  4. Non-iterative Voltage Stability

    Energy Technology Data Exchange (ETDEWEB)

    Makarov, Yuri V. [Pacific Northwest National Lab. (PNNL), Richland, WA (United States); Vyakaranam, Bharat [Pacific Northwest National Lab. (PNNL), Richland, WA (United States); Hou, Zhangshuan [Pacific Northwest National Lab. (PNNL), Richland, WA (United States); Wu, Di [Pacific Northwest National Lab. (PNNL), Richland, WA (United States); Meng, Da [Pacific Northwest National Lab. (PNNL), Richland, WA (United States); Wang, Shaobu [Pacific Northwest National Lab. (PNNL), Richland, WA (United States); Elbert, Stephen T. [Pacific Northwest National Lab. (PNNL), Richland, WA (United States); Miller, Laurie E. [Pacific Northwest National Lab. (PNNL), Richland, WA (United States); Huang, Zhenyu [Pacific Northwest National Lab. (PNNL), Richland, WA (United States)

    2014-09-30

    This report demonstrates promising capabilities and performance characteristics of the proposed method using several power systems models. The new method will help to develop a new generation of highly efficient tools suitable for real-time parallel implementation. The ultimate benefit obtained will be early detection of system instability and prevention of system blackouts in real time.

  5. Parallelizing the spectral transform method: A comparison of alternative parallel algorithms

    International Nuclear Information System (INIS)

    Foster, I.; Worley, P.H.

    1993-01-01

    The spectral transform method is a standard numerical technique for solving partial differential equations on the sphere and is widely used in global climate modeling. In this paper, we outline different approaches to parallelizing the method and describe experiments that we are conducting to evaluate the efficiency of these approaches on parallel computers. The experiments are conducted using a testbed code that solves the nonlinear shallow water equations on a sphere, but are designed to permit evaluation in the context of a global model. They allow us to evaluate the relative merits of the approaches as a function of problem size and number of processors. The results of this study are guiding ongoing work on PCCM2, a parallel implementation of the Community Climate Model developed at the National Center for Atmospheric Research

  6. Subroutine MLTGRD: a multigrid algorithm based on multiplicative correction and implicit non-stationary iteration

    International Nuclear Information System (INIS)

    Barry, J.M.; Pollard, J.P.

    1986-11-01

    A FORTRAN subroutine MLTGRD is provided to solve efficiently the large systems of linear equations arising from a five-point finite difference discretisation of some elliptic partial differential equations. MLTGRD is a multigrid algorithm which provides multiplicative correction to iterative solution estimates from successively reduced systems of linear equations. It uses the method of implicit non-stationary iteration for all grid levels

  7. Iterative and iterative-noniterative integral solutions in 3-loop massive QCD calculations

    International Nuclear Information System (INIS)

    Ablinger, J.; Radu, C.S.; Schneider, C.; Behring, A.; Imamoglu, E.; Van Hoeij, M.; Von Manteuffel, A.; Raab, C.G.

    2017-11-01

    Various of the single scale quantities in massless and massive QCD up to 3-loop order can be expressed by iterative integrals over certain classes of alphabets, from the harmonic polylogarithms to root-valued alphabets. Examples are the anomalous dimensions to 3-loop order, the massless Wilson coefficients and also different massive operator matrix elements. Starting at 3-loop order, however, also other letters appear in the case of massive operator matrix elements, the so called iterative non-iterative integrals, which are related to solutions based on complete elliptic integrals or any other special function with an integral representation that is definite but not a Volterra-type integral. After outlining the formalism leading to iterative non-iterative integrals,we present examples for both of these cases with the 3-loop anomalous dimension γ (2) qg and the structure of the principle solution in the iterative non-interative case of the 3-loop QCD corrections to the ρ-parameter.

  8. Iterative and iterative-noniterative integral solutions in 3-loop massive QCD calculations

    Energy Technology Data Exchange (ETDEWEB)

    Ablinger, J.; Radu, C.S.; Schneider, C. [Johannes Kepler Univ., Linz (Austria). Research Inst. for Symbolic Computation (RISC); Behring, A. [RWTH Aachen Univ. (Germany). Inst. fuer Theoretische Teilchenphysik und Kosmologie; Bluemlein, J.; Freitas, A. de [Deutsches Elektronen-Synchrotron (DESY), Zeuthen (Germany); Imamoglu, E.; Van Hoeij, M. [Florida State Univ., Tallahassee, FL (United States). Dept. of Mathematics; Von Manteuffel, A. [Michigan State Univ., East Lansing, MI (United States). Dept. of Physics and Astronomy; Raab, C.G. [Johannes Kepler Univ., Linz (Austria). Inst. for Algebra

    2017-11-15

    Various of the single scale quantities in massless and massive QCD up to 3-loop order can be expressed by iterative integrals over certain classes of alphabets, from the harmonic polylogarithms to root-valued alphabets. Examples are the anomalous dimensions to 3-loop order, the massless Wilson coefficients and also different massive operator matrix elements. Starting at 3-loop order, however, also other letters appear in the case of massive operator matrix elements, the so called iterative non-iterative integrals, which are related to solutions based on complete elliptic integrals or any other special function with an integral representation that is definite but not a Volterra-type integral. After outlining the formalism leading to iterative non-iterative integrals,we present examples for both of these cases with the 3-loop anomalous dimension γ{sup (2)}{sub qg} and the structure of the principle solution in the iterative non-interative case of the 3-loop QCD corrections to the ρ-parameter.

  9. Towards the procurement of the ITER divertor

    International Nuclear Information System (INIS)

    Merola, M.; Tivey, R.; Martin, A.; Pick, M.

    2006-01-01

    The procurement of the ITER divertor is planned to start in 2009. On the basis of the present common understanding of the sharing of the ITER components, the Japanese Participating Team (JAPT) will supply the outer vertical target, the Russian Federation (RF) PT the dome liner and will perform the high heat flux testing, the EU PT will supply the inner vertical targets and the cassette bodies, including final assembly of the divertor plasma-facing components (PFCs). The manufacturing of the PFCs of the ITER divertor represents a challenging endeavor due to the high technologies which are involved, and due to the unprecedented series production. To mitigate the associated risks, special arrangements need to be put in place prior to and during procurement to ensure quality and to keep to the time schedule. Before procurement can start, an ITER review of the qualification and production capability of each candidate PT is planned. Well in advance of the assumed start of the procurement, each PT which would like to contribute to the divertor PFC procurement, should first demonstrate its technical qualification to carry out the procurement with the required quality, and in an efficient and timely manner. Appropriate precautions, like subdivision of the procurement into stages, are also to be adopted during the procurement phase to mitigate the consequences of possible unexpected manufacturing problems. In preparation for writing the procurement specification for the vertical targets, the topic of setting acceptance criteria is also being addressed. This activity has the objective of defining workable acceptance criteria for the PFC armour joints. A complete set of analyses is also in progress to assess the latest design modifications against the design requirements. This task includes neutronic, shielding, thermo-mechanical and electromagnetic analyses. More than half of the ITER plasma parameters that must be measured and the related diagnostics are located in the

  10. Two-phase flow steam generator simulations on parallel computers using domain decomposition method

    International Nuclear Information System (INIS)

    Belliard, M.

    2003-01-01

    Within the framework of the Domain Decomposition Method (DDM), we present industrial steady state two-phase flow simulations of PWR Steam Generators (SG) using iteration-by-sub-domain methods: standard and Adaptive Dirichlet/Neumann methods (ADN). The averaged mixture balance equations are solved by a Fractional-Step algorithm, jointly with the Crank-Nicholson scheme and the Finite Element Method. The algorithm works with overlapping or non-overlapping sub-domains and with conforming or nonconforming meshing. Computations are run on PC networks or on massively parallel mainframe computers. A CEA code-linker and the PVM package are used (master-slave context). SG mock-up simulations, involving up to 32 sub-domains, highlight the efficiency (speed-up, scalability) and the robustness of the chosen approach. With the DDM, the computational problem size is easily increased to about 1,000,000 cells and the CPU time is significantly reduced. The difficulties related to industrial use are also discussed. (author)

  11. A Hybrid Genetic Algorithm to Minimize Total Tardiness for Unrelated Parallel Machine Scheduling with Precedence Constraints

    Directory of Open Access Journals (Sweden)

    Chunfeng Liu

    2013-01-01

    Full Text Available The paper presents a novel hybrid genetic algorithm (HGA for a deterministic scheduling problem where multiple jobs with arbitrary precedence constraints are processed on multiple unrelated parallel machines. The objective is to minimize total tardiness, since delays of the jobs may lead to punishment cost or cancellation of orders by the clients in many situations. A priority rule-based heuristic algorithm, which schedules a prior job on a prior machine according to the priority rule at each iteration, is suggested and embedded to the HGA for initial feasible schedules that can be improved in further stages. Computational experiments are conducted to show that the proposed HGA performs well with respect to accuracy and efficiency of solution for small-sized problems and gets better results than the conventional genetic algorithm within the same runtime for large-sized problems.

  12. Physics research needs for ITER

    International Nuclear Information System (INIS)

    Sauthoff, N.R.

    1995-01-01

    Design of ITER entails the application of physics design tools that have been validated against the world-wide data base of fusion research. In many cases, these tools do not yet exist and must be developed as part of the ITER physics program. ITER's considerable increases in power and size demand significant extrapolations from the current data base; in several cases, new physical effects are projected to dominate the behavior of the ITER plasma. This paper focuses on those design tools and data that have been identified by the ITER team and are not yet available; these needs serve as the basis for the ITER Physics Research Needs, which have been developed jointly by the ITER Physics Expert Groups and the ITER design team. Development of the tools and the supporting data base is an on-going activity that constitutes a significant opportunity for contributions to the ITER program by fusion research programs world-wide

  13. R and D on support to ITER safety assessment

    International Nuclear Information System (INIS)

    Van Dorsselaere, J.P.; Perrault, D.; Barrachin, M.; Bentaib, A.; Bez, J.; Cortes, P.; Seropian, C.; Tregoures, N.; Vendel, J.

    2009-01-01

    After performing its first ITER safety assessment in 2002 on behalf of the French 'Autorite de Surete Nucleaire (ASN)', the French 'Institut de Radioprotection et de Surete Nucleaire (IRSN)' is now analysing the new ITER Fusion facility safety file. The operator delivered this file to the ASN as part of its request for a creation decree, legally necessary before building works can begin on the site. The IRSN first task in following ITER throughout its lifetime is to study the safety approach adopted by the operator and the associated issues. Such a challenging new technology calls for further in-house expertise and so in parallel a R and D program has been set up to support this safety assessment process, now and in the next years. Its main objectives are to identify the key parameters for mastering some risks (that would have been insufficiently justified by the operator) and to perform some verifications with methods and codes independent from the operator's ones. Priority has been given to four technical issues (others could be investigated in the future, like the behaviour of activated corrosion products). The first issue concerns the simulation of accident sequences with the help of the ASTEC European system code, developed by IRSN (jointly with its German counterpart, the GRS) for severe accidents in Pressurised Water Reactors. A preliminary analysis showed that most of its physical models are already applicable, e.g., for thermal-hydraulics in accidents caused by water or air ingress into the vacuum vessel (VV) or dust transport. Work has started in 2008 on some model adaptations, for instance oxidation of VV first wall materials by steam or air, and on validation on the ITER-specific ICE and LOVA experiments. Other model improvements are planned in the next years, as feedback from the work done for the other technical issues and from the code validation. The second issue concerns the risk of gas explosion due to concentrations of hydrogen and carbon

  14. Parallel Monte Carlo simulation of aerosol dynamics

    KAUST Repository

    Zhou, K.

    2014-01-01

    A highly efficient Monte Carlo (MC) algorithm is developed for the numerical simulation of aerosol dynamics, that is, nucleation, surface growth, and coagulation. Nucleation and surface growth are handled with deterministic means, while coagulation is simulated with a stochastic method (Marcus-Lushnikov stochastic process). Operator splitting techniques are used to synthesize the deterministic and stochastic parts in the algorithm. The algorithm is parallelized using the Message Passing Interface (MPI). The parallel computing efficiency is investigated through numerical examples. Near 60% parallel efficiency is achieved for the maximum testing case with 3.7 million MC particles running on 93 parallel computing nodes. The algorithm is verified through simulating various testing cases and comparing the simulation results with available analytical and/or other numerical solutions. Generally, it is found that only small number (hundreds or thousands) of MC particles is necessary to accurately predict the aerosol particle number density, volume fraction, and so forth, that is, low order moments of the Particle Size Distribution (PSD) function. Accurately predicting the high order moments of the PSD needs to dramatically increase the number of MC particles. 2014 Kun Zhou et al.

  15. Shared Variable Oriented Parallel Precompiler for SPMD Model

    Institute of Scientific and Technical Information of China (English)

    1995-01-01

    For the moment,commercial parallel computer systems with distributed memory architecture are usually provided with parallel FORTRAN or parallel C compliers,which are just traditional sequential FORTRAN or C compilers expanded with communication statements.Programmers suffer from writing parallel programs with communication statements. The Shared Variable Oriented Parallel Precompiler (SVOPP) proposed in this paper can automatically generate appropriate communication statements based on shared variables for SPMD(Single Program Multiple Data) computation model and greatly ease the parallel programming with high communication efficiency.The core function of parallel C precompiler has been successfully verified on a transputer-based parallel computer.Its prominent performance shows that SVOPP is probably a break-through in parallel programming technique.

  16. Development of in situ cleaning techniques for diagnostic mirrors in ITER

    International Nuclear Information System (INIS)

    Litnovsky, A.; Laengner, M.; Matveeva, M.; Schulz, Ch.; Marot, L.; Voitsenya, V.S.; Philipps, V.; Biel, W.; Samm, U.

    2011-01-01

    Mirrors will be used in all optical and laser-based diagnostic systems of ITER. In the severe environment, the optical characteristics of mirrors will be degraded, hampering the entire performance of the respective diagnostics. A minute impurity deposition of 20 nm of carbon on the mirror is sufficient to decrease the mirror reflectivity by tens of percent outlining the necessity of the mirror cleaning in ITER. The results of R and D on plasma cleaning of molybdenum diagnostic mirrors are reported. The mirrors contaminated with amorphous carbon films in the laboratory conditions and in the tokamaks were cleaned in steady-state hydrogenic plasmas. The maximum cleaning efficiency of 4.2 nm/min was reached for the laboratory and soft tokamak hydrocarbon films, whereas for the hard tokamak films the carbidization of mirrors drastically decreased the cleaning efficiency down to 0.016 nm/min. This implies the necessity of sputtering cleaning of contaminated mirrors as the only reliable tool to remove the deposits by plasma cleaning. An overview of R and D program on mirror cleaning is provided along with plans for further studies and the recommendations for ITER mirror-based diagnostics.

  17. Non-iterative distance constraints enforcement for cloth drapes simulation

    Science.gov (United States)

    Hidajat, R. L. L. G.; Wibowo, Arifin, Z.; Suyitno

    2016-03-01

    A cloth simulation represents the behavior of cloth objects such as flag, tablecloth, or even garments has application in clothing animation for games and virtual shops. Elastically deformable models have widely used to provide realistic and efficient simulation, however problem of overstretching is encountered. We introduce a new cloth simulation algorithm that replaces iterative distance constraint enforcement steps with non-iterative ones for preventing over stretching in a spring-mass system for cloth modeling. Our method is based on a simple position correction procedure applied at one end of a spring. In our experiments, we developed a rectangle cloth model which is initially at a horizontal position with one point is fixed, and it is allowed to drape by its own weight. Our simulation is able to achieve a plausible cloth drapes as in reality. This paper aims to demonstrate the reliability of our approach to overcome overstretches while decreasing the computational cost of the constraint enforcement process due to an iterative procedure that is eliminated.

  18. Optimisation of a parallel ocean general circulation model

    OpenAIRE

    M. I. Beare; D. P. Stevens

    1997-01-01

    International audience; This paper presents the development of a general-purpose parallel ocean circulation model, for use on a wide range of computer platforms, from traditional scalar machines to workstation clusters and massively parallel processors. Parallelism is provided, as a modular option, via high-level message-passing routines, thus hiding the technical intricacies from the user. An initial implementation highlights that the parallel efficiency of the model is adversely affected by...

  19. The fusion code XGC: Enabling kinetic study of multi-scale edge turbulent transport in ITER

    Energy Technology Data Exchange (ETDEWEB)

    D' Azevedo, Eduardo [Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States); Abbott, Stephen [Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States); Koskela, Tuomas [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Worley, Patrick [Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States); Ku, Seung-Hoe [Princeton Plasma Physics Lab. (PPPL), Princeton, NJ (United States); Ethier, Stephane [Princeton Plasma Physics Lab. (PPPL), Princeton, NJ (United States); Yoon, Eisung [Rensselaer Polytechnic Inst., Troy, NY (United States); Shephard, Mark [Rensselaer Polytechnic Inst., Troy, NY (United States); Hager, Robert [Princeton Plasma Physics Lab. (PPPL), Princeton, NJ (United States); Lang, Jianying [Princeton Plasma Physics Lab. (PPPL), Princeton, NJ (United States); Intel Corporation, Santa Clara, CA (United States); Choi, Jong [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Podhorszki, Norbert [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Klasky, Scott [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Parashar, Manish [Rutgers Univ., Piscataway, NJ (United States); Chang, Choong-Seock [Princeton Plasma Physics Lab. (PPPL), Princeton, NJ (United States)

    2017-01-01

    The XGC fusion gyrokinetic code combines state-of-the-art, portable computational and algorithmic technologies to enable complicated multiscale simulations of turbulence and transport dynamics in ITER edge plasma on the largest US open-science computer, the CRAY XK7 Titan, at its maximal heterogeneous capability, which have not been possible before due to a factor of over 10 shortage in the time-to-solution for less than 5 days of wall-clock time for one physics case. Frontier techniques such as nested OpenMP parallelism, adaptive parallel I/O, staging I/O and data reduction using dynamic and asynchronous applications interactions, dynamic repartitioning.

  20. Regularization iteration imaging algorithm for electrical capacitance tomography

    Science.gov (United States)

    Tong, Guowei; Liu, Shi; Chen, Hongyan; Wang, Xueyao

    2018-03-01

    The image reconstruction method plays a crucial role in real-world applications of the electrical capacitance tomography technique. In this study, a new cost function that simultaneously considers the sparsity and low-rank properties of the imaging targets is proposed to improve the quality of the reconstruction images, in which the image reconstruction task is converted into an optimization problem. Within the framework of the split Bregman algorithm, an iterative scheme that splits a complicated optimization problem into several simpler sub-tasks is developed to solve the proposed cost function efficiently, in which the fast-iterative shrinkage thresholding algorithm is introduced to accelerate the convergence. Numerical experiment results verify the effectiveness of the proposed algorithm in improving the reconstruction precision and robustness.