Xu, Zheng; Wang, Sheng; Li, Yeqing; Zhu, Feiyun; Huang, Junzhou
2018-02-08
The most recent history of parallel Magnetic Resonance Imaging (pMRI) has in large part been devoted to finding ways to reduce acquisition time. While joint total variation (JTV) regularized model has been demonstrated as a powerful tool in increasing sampling speed for pMRI, however, the major bottleneck is the inefficiency of the optimization method. While all present state-of-the-art optimizations for the JTV model could only reach a sublinear convergence rate, in this paper, we squeeze the performance by proposing a linear-convergent optimization method for the JTV model. The proposed method is based on the Iterative Reweighted Least Squares algorithm. Due to the complexity of the tangled JTV objective, we design a novel preconditioner to further accelerate the proposed method. Extensive experiments demonstrate the superior performance of the proposed algorithm for pMRI regarding both accuracy and efficiency compared with state-of-the-art methods.
Parallel S/sub n/ iteration schemes
International Nuclear Information System (INIS)
Wienke, B.R.; Hiromoto, R.E.
1986-01-01
The iterative, multigroup, discrete ordinates (S/sub n/) technique for solving the linear transport equation enjoys widespread usage and appeal. Serial iteration schemes and numerical algorithms developed over the years provide a timely framework for parallel extension. On the Denelcor HEP, the authors investigate three parallel iteration schemes for solving the one-dimensional S/sub n/ transport equation. The multigroup representation and serial iteration methods are also reviewed. This analysis represents a first attempt to extend serial S/sub n/ algorithms to parallel environments and provides good baseline estimates on ease of parallel implementation, relative algorithm efficiency, comparative speedup, and some future directions. The authors examine ordered and chaotic versions of these strategies, with and without concurrent rebalance and diffusion acceleration. Two strategies efficiently support high degrees of parallelization and appear to be robust parallel iteration techniques. The third strategy is a weaker parallel algorithm. Chaotic iteration, difficult to simulate on serial machines, holds promise and converges faster than ordered versions of the schemes. Actual parallel speedup and efficiency are high and payoff appears substantial
International Nuclear Information System (INIS)
Alleon, G.; Carpentieri, B.; Du, I.S.; Giraud, L.; Langou, J.; Martin, E.
2003-01-01
The boundary element method has become a popular tool for the solution of Maxwell's equations in electromagnetism. It discretizes only the surface of the radiating object and gives rise to linear systems that are smaller in size compared to those arising from finite element or finite difference discretizations. However, these systems are prohibitively demanding in terms of memory for direct methods and challenging to solve by iterative methods. In this paper we address the iterative solution via preconditioned Krylov methods of electromagnetic scattering problems expressed in an integral formulation, with main focus on the design of the pre-conditioner. We consider an approximate inverse method based on the Frobenius-norm minimization with a pattern prescribed in advance. The pre-conditioner is constructed from a sparse approximation of the dense coefficient matrix, and the patterns both for the pre-conditioner and for the coefficient matrix are computed a priori using geometric information from the mesh. We describe the implementation of the approximate inverse in an out-of-core parallel code that uses multipole techniques for the matrix-vector products, and show results on the numerical scalability of our method on systems of size up to one million unknowns. We propose an embedded iterative scheme based on the GMRES method and combined with multipole techniques, aimed at improving the robustness of the approximate inverse for large problems. We prove by numerical experiments that the proposed scheme enables the solution of very large and difficult problems efficiently at reduced computational and memory cost. Finally we perform a preliminary study on a spectral two-level pre-conditioner to enhance the robustness of our method. This numerical technique exploits spectral information of the preconditioned systems to build a low rank-update of the pre-conditioner. (authors)
Sparse BLIP: BLind Iterative Parallel imaging reconstruction using compressed sensing.
She, Huajun; Chen, Rong-Rong; Liang, Dong; DiBella, Edward V R; Ying, Leslie
2014-02-01
To develop a sensitivity-based parallel imaging reconstruction method to reconstruct iteratively both the coil sensitivities and MR image simultaneously based on their prior information. Parallel magnetic resonance imaging reconstruction problem can be formulated as a multichannel sampling problem where solutions are sought analytically. However, the channel functions given by the coil sensitivities in parallel imaging are not known exactly and the estimation error usually leads to artifacts. In this study, we propose a new reconstruction algorithm, termed Sparse BLind Iterative Parallel, for blind iterative parallel imaging reconstruction using compressed sensing. The proposed algorithm reconstructs both the sensitivity functions and the image simultaneously from undersampled data. It enforces the sparseness constraint in the image as done in compressed sensing, but is different from compressed sensing in that the sensing matrix is unknown and additional constraint is enforced on the sensitivities as well. Both phantom and in vivo imaging experiments were carried out with retrospective undersampling to evaluate the performance of the proposed method. Experiments show improvement in Sparse BLind Iterative Parallel reconstruction when compared with Sparse SENSE, JSENSE, IRGN-TV, and L1-SPIRiT reconstructions with the same number of measurements. The proposed Sparse BLind Iterative Parallel algorithm reduces the reconstruction errors when compared to the state-of-the-art parallel imaging methods. Copyright © 2013 Wiley Periodicals, Inc.
Iterative algorithms for large sparse linear systems on parallel computers
Adams, L. M.
1982-01-01
Algorithms for assembling in parallel the sparse system of linear equations that result from finite difference or finite element discretizations of elliptic partial differential equations, such as those that arise in structural engineering are developed. Parallel linear stationary iterative algorithms and parallel preconditioned conjugate gradient algorithms are developed for solving these systems. In addition, a model for comparing parallel algorithms on array architectures is developed and results of this model for the algorithms are given.
Iteration schemes for parallelizing models of superconductivity
Energy Technology Data Exchange (ETDEWEB)
Gray, P.A. [Michigan State Univ., East Lansing, MI (United States)
1996-12-31
The time dependent Lawrence-Doniach model, valid for high fields and high values of the Ginzburg-Landau parameter, is often used for studying vortex dynamics in layered high-T{sub c} superconductors. When solving these equations numerically, the added degrees of complexity due to the coupling and nonlinearity of the model often warrant the use of high-performance computers for their solution. However, the interdependence between the layers can be manipulated so as to allow parallelization of the computations at an individual layer level. The reduced parallel tasks may then be solved independently using a heterogeneous cluster of networked workstations connected together with Parallel Virtual Machine (PVM) software. Here, this parallelization of the model is discussed and several computational implementations of varying degrees of parallelism are presented. Computational results are also given which contrast properties of convergence speed, stability, and consistency of these implementations. Included in these results are models involving the motion of vortices due to an applied current and pinning effects due to various material properties.
Parallel GPU implementation of iterative PCA algorithms.
Andrecut, M
2009-11-01
Principal component analysis (PCA) is a key statistical technique for multivariate data analysis. For large data sets, the common approach to PCA computation is based on the standard NIPALS-PCA algorithm, which unfortunately suffers from loss of orthogonality, and therefore its applicability is usually limited to the estimation of the first few components. Here we present an algorithm based on Gram-Schmidt orthogonalization (called GS-PCA), which eliminates this shortcoming of NIPALS-PCA. Also, we discuss the GPU (Graphics Processing Unit) parallel implementation of both NIPALS-PCA and GS-PCA algorithms. The numerical results show that the GPU parallel optimized versions, based on CUBLAS (NVIDIA), are substantially faster (up to 12 times) than the CPU optimized versions based on CBLAS (GNU Scientific Library).
Image segmentation by iterative parallel region growing and splitting
Tilton, James C.
1989-01-01
The spatially constrained clustering (SCC) iterative parallel region-growing technique is applied to image analysis. The SCC algorithm is implemented on the massively parallel processor at NASA Goddard. Most previous region-growing approaches have the drawback that the segmentation produced depends on the order in which portions of the image are processed. The ideal solution to this problem (merging only the single most similar pair of spatially adjacent regions in the image in each iteration) becomes impractical except for very small images, even on a massively parallel computer. The SCC algorithm overcomes these problems by performing, in parallel, the best merge within each of a set of local, possibly overlapping, subimages. A region-splitting stage is also incorporated into the algorithm, but experiments show that region splitting generally does not improve segmentation results. The SCC algorithm has been tested on various imagery data, and test results for a Landsat TM image are summarized.
AZTEC: A parallel iterative package for the solving linear systems
Energy Technology Data Exchange (ETDEWEB)
Hutchinson, S.A.; Shadid, J.N.; Tuminaro, R.S. [Sandia National Labs., Albuquerque, NM (United States)
1996-12-31
We describe a parallel linear system package, AZTEC. The package incorporates a number of parallel iterative methods (e.g. GMRES, biCGSTAB, CGS, TFQMR) and preconditioners (e.g. Jacobi, Gauss-Seidel, polynomial, domain decomposition with LU or ILU within subdomains). Additionally, AZTEC allows for the reuse of previous preconditioning factorizations within Newton schemes for nonlinear methods. Currently, a number of different users are using this package to solve a variety of PDE applications.
A Parallel Iterative Method for Computing Molecular Absorption Spectra.
Koval, Peter; Foerster, Dietrich; Coulaud, Olivier
2010-09-14
We describe a fast parallel iterative method for computing molecular absorption spectra within TDDFT linear response and using the LCAO method. We use a local basis of "dominant products" to parametrize the space of orbital products that occur in the LCAO approach. In this basis, the dynamic polarizability is computed iteratively within an appropriate Krylov subspace. The iterative procedure uses a matrix-free GMRES method to determine the (interacting) density response. The resulting code is about 1 order of magnitude faster than our previous full-matrix method. This acceleration makes the speed of our TDDFT code comparable with codes based on Casida's equation. The implementation of our method uses hybrid MPI and OpenMP parallelization in which load balancing and memory access are optimized. To validate our approach and to establish benchmarks, we compute spectra of large molecules on various types of parallel machines. The methods developed here are fairly general, and we believe they will find useful applications in molecular physics/chemistry, even for problems that are beyond TDDFT, such as organic semiconductors, particularly in photovoltaics.
Parallelization of the model-based iterative reconstruction algorithm DIRA
International Nuclear Information System (INIS)
Oertenberg, A.; Sandborg, M.; Alm Carlsson, G.; Malusek, A.; Magnusson, M.
2016-01-01
New paradigms for parallel programming have been devised to simplify software development on multi-core processors and many-core graphical processing units (GPU). Despite their obvious benefits, the parallelization of existing computer programs is not an easy task. In this work, the use of the Open Multiprocessing (OpenMP) and Open Computing Language (OpenCL) frameworks is considered for the parallelization of the model-based iterative reconstruction algorithm DIRA with the aim to significantly shorten the code's execution time. Selected routines were parallelized using OpenMP and OpenCL libraries; some routines were converted from MATLAB to C and optimised. Parallelization of the code with the OpenMP was easy and resulted in an overall speedup of 15 on a 16-core computer. Parallelization with OpenCL was more difficult owing to differences between the central processing unit and GPU architectures. The resulting speedup was substantially lower than the theoretical peak performance of the GPU; the cause was explained. (authors)
PARALLEL ITERATIVE RECONSTRUCTION OF PHANTOM CATPHAN ON EXPERIMENTAL DATA
Directory of Open Access Journals (Sweden)
M. A. Mirzavand
2016-01-01
Full Text Available The principles of fast parallel iterative algorithms based on the use of graphics accelerators and OpenGL library are considered in the paper. The proposed approach provides simultaneous minimization of the residuals of the desired solution and total variation of the reconstructed three- dimensional image. The number of necessary input data, i. e. conical X-ray projections, can be reduced several times. It means in a corresponding number of times the possibility to reduce radiation exposure to the patient. At the same time maintain the necessary contrast and spatial resolution of threedimensional image of the patient. Heuristic iterative algorithm can be used as an alternative to the well-known three-dimensional Feldkamp algorithm.
P-SPARSLIB: A parallel sparse iterative solution package
Energy Technology Data Exchange (ETDEWEB)
Saad, Y. [Univ. of Minnesota, Minneapolis, MN (United States)
1994-12-31
Iterative methods are gaining popularity in engineering and sciences at a time where the computational environment is changing rapidly. P-SPARSLIB is a project to build a software library for sparse matrix computations on parallel computers. The emphasis is on iterative methods and the use of distributed sparse matrices, an extension of the domain decomposition approach to general sparse matrices. One of the goals of this project is to develop a software package geared towards specific applications. For example, the author will test the performance and usefulness of P-SPARSLIB modules on linear systems arising from CFD applications. Equally important is the goal of portability. In the long run, the author wishes to ensure that this package is portable on a variety of platforms, including SIMD environments and shared memory environments.
Iterative schemes for parallel Sn algorithms in a shared-memory computing environment
International Nuclear Information System (INIS)
Haghighat, A.; Hunter, M.A.; Mattis, R.E.
1995-01-01
Several two-dimensional spatial domain partitioning S n transport theory algorithms are developed on the basis of different iterative schemes. These algorithms are incorporated into TWOTRAN-II and tested on the shared-memory CRAY Y-MP C90 computer. For a series of fixed-source r-z geometry homogeneous problems, it is demonstrated that the concurrent red-black algorithms may result in large parallel efficiencies (>60%) on C90. It is also demonstrated that for a realistic shielding problem, the use of the negative flux fixup causes high load imbalance, which results in a significant loss of parallel efficiency
Energy Technology Data Exchange (ETDEWEB)
Kim, S. [Purdue Univ., West Lafayette, IN (United States)
1994-12-31
Parallel iterative procedures based on domain decomposition techniques are defined and analyzed for the numerical solution of wave propagation by finite element and finite difference methods. For finite element methods, in a Lagrangian framework, an efficient way for choosing the algorithm parameter as well as the algorithm convergence are indicated. Some heuristic arguments for finding the algorithm parameter for finite difference schemes are addressed. Numerical results are presented to indicate the effectiveness of the methods.
Organizing Compression of Hyperspectral Imagery to Allow Efficient Parallel Decompression
Klimesh, Matthew A.; Kiely, Aaron B.
2014-01-01
family of schemes has been devised for organizing the output of an algorithm for predictive data compression of hyperspectral imagery so as to allow efficient parallelization in both the compressor and decompressor. In these schemes, the compressor performs a number of iterations, during each of which a portion of the data is compressed via parallel threads operating on independent portions of the data. The general idea is that for each iteration it is predetermined how much compressed data will be produced from each thread.
Parallel iterative solution of the Hermite Collocation equations on GPUs II
International Nuclear Information System (INIS)
Vilanakis, N; Mathioudakis, E
2014-01-01
Hermite Collocation is a high order finite element method for Boundary Value Problems modelling applications in several fields of science and engineering. Application of this integration free numerical solver for the solution of linear BVPs results in a large and sparse general system of algebraic equations, suggesting the usage of an efficient iterative solver especially for realistic simulations. In part I of this work an efficient parallel algorithm of the Schur complement method coupled with Bi-Conjugate Gradient Stabilized (BiCGSTAB) iterative solver has been designed for multicore computing architectures with a Graphics Processing Unit (GPU). In the present work the proposed algorithm has been extended for high performance computing environments consisting of multiprocessor machines with multiple GPUs. Since this is a distributed GPU and shared CPU memory parallel architecture, a hybrid memory treatment is needed for the development of the parallel algorithm. The realization of the algorithm took place on a multiprocessor machine HP SL390 with Tesla M2070 GPUs using the OpenMP and OpenACC standards. Execution time measurements reveal the efficiency of the parallel implementation
Parallel iterative solvers and preconditioners using approximate hierarchical methods
Energy Technology Data Exchange (ETDEWEB)
Grama, A.; Kumar, V.; Sameh, A. [Univ. of Minnesota, Minneapolis, MN (United States)
1996-12-31
In this paper, we report results of the performance, convergence, and accuracy of a parallel GMRES solver for Boundary Element Methods. The solver uses a hierarchical approximate matrix-vector product based on a hybrid Barnes-Hut / Fast Multipole Method. We study the impact of various accuracy parameters on the convergence and show that with minimal loss in accuracy, our solver yields significant speedups. We demonstrate the excellent parallel efficiency and scalability of our solver. The combined speedups from approximation and parallelism represent an improvement of several orders in solution time. We also develop fast and paralellizable preconditioners for this problem. We report on the performance of an inner-outer scheme and a preconditioner based on truncated Green`s function. Experimental results on a 256 processor Cray T3D are presented.
International Nuclear Information System (INIS)
Kirk, B.L.; Azmy, Y.Y.
1992-01-01
In this paper the one-group, steady-state neutron diffusion equation in two-dimensional Cartesian geometry is solved using the nodal integral method. The discrete variable equations comprise loosely coupled sets of equations representing the nodal balance of neutrons, as well as neutron current continuity along rows or columns of computational cells. An iterative algorithm that is more suitable for solving large problems concurrently is derived based on the decomposition of the spatial domain and is accelerated using successive overrelaxation. This algorithm is very well suited for parallel computers, especially since the spatial domain decomposition occurs naturally, so that the number of iterations required for convergence does not depend on the number of processors participating in the calculation. Implementation of the authors' algorithm on the Intel iPSC/2 hypercube and Sequent Balance 8000 parallel computer is presented, and measured speedup and efficiency for test problems are reported. The results suggest that the efficiency of the hypercube quickly deteriorates when many processors are used, while the Sequent Balance retains very high efficiency for a comparable number of participating processors. This leads to the conjecture that message-passing parallel computers are not as well suited for this algorithm as shared-memory machines
Accuracy analysis of hybrid parallel robot for the assembling of ITER
International Nuclear Information System (INIS)
Wang Yongbo; Pessi, Pekka; Wu Huapeng; Handroos, Heikki
2009-01-01
This paper presents a novel mobile parallel robot, which is able to carry welding and machining processes from inside the international thermonuclear experimental reactor (ITER) vacuum vessel (VV). The kinematics design of the robot has been optimized for ITER access. To improve the accuracy of the parallel robot, the errors caused by the stiffness and manufacture process have to be compensated or limited to a minimum value. In this paper kinematics errors and stiffness modeling are given. The simulation results are presented.
Structured Parallel Programming Patterns for Efficient Computation
McCool, Michael; Robison, Arch
2012-01-01
Programming is now parallel programming. Much as structured programming revolutionized traditional serial programming decades ago, a new kind of structured programming, based on patterns, is relevant to parallel programming today. Parallel computing experts and industry insiders Michael McCool, Arch Robison, and James Reinders describe how to design and implement maintainable and efficient parallel algorithms using a pattern-based approach. They present both theory and practice, and give detailed concrete examples using multiple programming models. Examples are primarily given using two of th
International Nuclear Information System (INIS)
Chen Jian-Lin; Li Lei; Wang Lin-Yuan; Cai Ai-Long; Xi Xiao-Qi; Zhang Han-Ming; Li Jian-Xin; Yan Bin
2015-01-01
The projection matrix model is used to describe the physical relationship between reconstructed object and projection. Such a model has a strong influence on projection and backprojection, two vital operations in iterative computed tomographic reconstruction. The distance-driven model (DDM) is a state-of-the-art technology that simulates forward and back projections. This model has a low computational complexity and a relatively high spatial resolution; however, it includes only a few methods in a parallel operation with a matched model scheme. This study introduces a fast and parallelizable algorithm to improve the traditional DDM for computing the parallel projection and backprojection operations. Our proposed model has been implemented on a GPU (graphic processing unit) platform and has achieved satisfactory computational efficiency with no approximation. The runtime for the projection and backprojection operations with our model is approximately 4.5 s and 10.5 s per loop, respectively, with an image size of 256×256×256 and 360 projections with a size of 512×512. We compare several general algorithms that have been proposed for maximizing GPU efficiency by using the unmatched projection/backprojection models in a parallel computation. The imaging resolution is not sacrificed and remains accurate during computed tomographic reconstruction. (paper)
Directory of Open Access Journals (Sweden)
Daniel Marcsa
2015-01-01
Full Text Available The analysis and design of electromechanical devices involve the solution of large sparse linear systems, and require therefore high performance algorithms. In this paper, the primal Domain Decomposition Method (DDM with parallel forward-backward and with parallel Preconditioned Conjugate Gradient (PCG solvers are introduced in two-dimensional parallel time-stepping finite element formulation to analyze rotating machine considering the electromagnetic field, external circuit and rotor movement. The proposed parallel direct and the iterative solver with two preconditioners are analyzed concerning its computational efficiency and number of iterations of the solver with different preconditioners. Simulation results of a rotating machine is also presented.
CERN. Geneva
2016-01-01
Large scale scientific computing raises questions on different levels ranging from the fomulation of the problems to the choice of the best algorithms and their implementation for a specific platform. There are similarities in these different topics that can be exploited by modern-style C++ template metaprogramming techniques to produce readable, maintainable and generic code. Traditional low-level code tend to be fast but platform-dependent, and it obfuscates the meaning of the algorithm. On the other hand, object-oriented approach is nice to read, but may come with an inherent performance penalty. These lectures aim to present he basics of the Expression Template (ET) idiom which allows us to keep the object-oriented approach without sacrificing performance. We will in particular show to to enhance ET to include SIMD vectorization. We will then introduce techniques for abstracting iteration, and introduce thread-level parallelism for use in heavy data-centric loads. We will show to to apply these methods i...
International Nuclear Information System (INIS)
Rajagopalan, S.; Jethra, A.; Khare, A.N.; Ghodgaonkar, M.D.; Srivenkateshan, R.; Menon, S.V.G.
1990-01-01
Issues relating to implementing iterative procedures, for numerical solution of elliptic partial differential equations, on a distributed parallel computing system are discussed. Preliminary investigations show that a speed-up of about 3.85 is achievable on a four transputer pipeline network. (author). 2 figs., 3 a ppendixes., 7 refs
Efficient Parallel Algorithms for Unsteady Incompressible Flows
Guermond, Jean-Luc
2013-01-01
The objective of this paper is to give an overview of recent developments on splitting schemes for solving the time-dependent incompressible Navier–Stokes equations and to discuss possible extensions to the variable density/viscosity case. A particular attention is given to algorithms that can be implemented efficiently on large parallel clusters.
Huang, Kuo-Chan; Wu, Wei-Ya; Wang, Feng-Jian; Liu, Hsiao-Ching; Hung, Chun-Hao
2016-01-01
Parallel computation has been widely applied in a variety of large-scale scientific and engineering applications. Many studies indicate that exploiting both task and data parallelisms, i.e. mixed-parallel workflows, to solve large computational problems can get better efficacy compared with either pure task parallelism or pure data parallelism. Scheduling traditional workflows of pure task parallelism on parallel systems has long been known to be an NP-complete problem. Mixed-parallel workflow scheduling has to deal with an additional challenging issue of processor allocation. In this paper, we explore the processor allocation issue in scheduling mixed-parallel workflows of moldable tasks, called M-task, and propose an Iterative Allocation Expanding and Shrinking (IAES) approach. Compared to previous approaches, our IAES has two distinguishing features. The first is allocating more processors to the tasks on allocated critical paths for effectively reducing the makespan of workflow execution. The second is allowing the processor allocation of an M-task to shrink during the iterative procedure, resulting in a more flexible and effective process for finding better allocation. The proposed IAES approach has been evaluated with a series of simulation experiments and compared to several well-known previous methods, including CPR, CPA, MCPA, and MCPA2. The experimental results indicate that our IAES approach outperforms those previous methods significantly in most situations, especially when nodes of the same layer in a workflow might have unequal workloads.
International Nuclear Information System (INIS)
Pessi, P.; Huapeng Wu; Handroos, H.; Jones, L.
2006-01-01
ITER sectors require more stringent tolerances ± 5 mm than normally expected for the size of structure involved. The walls of ITER sectors are made of 60 mm thick stainless steel and are joined together by high efficiency structural and leak tight welds. In addition to the initial vacuum vessel assembly, sectors may have to be replaced for repair. Since commercially available machines are too heavy for the required machining operations and the lifting of a possible e-beam gun column system, and conventional robots lack the stiffness and accuracy in such machining condition, a new flexible, lightweight and mobile robotic machine is being considered. For the assembly of the ITER vacuum vessel sector, precise positioning of welding end-effectors, at some distance in a confined space from the available supports, will be required, which is not possible using conventional machines or robots. This paper presents a special robot, able to carry out welding and machining processes from inside the ITER vacuum vessel, consisting of a ten-degree-of-freedom parallel robot mounted on a carriage driven by electric motor/gearbox on a track. The robot consists of a Stewart platform based parallel mechanism. Water hydraulic cylinders are used as actuators to reach six degrees of freedom for parallel construction. Two linear and two rotational motions are used for enlargement the workspace of the manipulator. The robot carries both welding gun such as a TIG, hybrid laser or e-beam welding gun to weld the inner and outer walls of the ITER vacuum vessel sectors and machining tools to cut and milling the walls with necessary accuracy, it can also carry other tools and material to a required position inside the vacuum vessel . For assembling an on line six degrees of freedom seam finding algorithm has been developed, which enables the robot to find welding seam automatically in a very complex environment. In the machining multi flexible machining processes carried out automatically by
Lee, Jae H; Yao, Yushu; Shrestha, Uttam; Gullberg, Grant T; Seo, Youngho
2014-11-01
The primary goal of this project is to implement the iterative statistical image reconstruction algorithm, in this case maximum likelihood expectation maximum (MLEM) used for dynamic cardiac single photon emission computed tomography, on Spark/GraphX. This involves porting the algorithm to run on large-scale parallel computing systems. Spark is an easy-to- program software platform that can handle large amounts of data in parallel. GraphX is a graph analytic system running on top of Spark to handle graph and sparse linear algebra operations in parallel. The main advantage of implementing MLEM algorithm in Spark/GraphX is that it allows users to parallelize such computation without any expertise in parallel computing or prior knowledge in computer science. In this paper we demonstrate a successful implementation of MLEM in Spark/GraphX and present the performance gains with the goal to eventually make it useable in clinical setting.
Tilton, James C.
1988-01-01
Image segmentation can be a key step in data compression and image analysis. However, the segmentation results produced by most previous approaches to region growing are suspect because they depend on the order in which portions of the image are processed. An iterative parallel segmentation algorithm avoids this problem by performing globally best merges first. Such a segmentation approach, and two implementations of the approach on NASA's Massively Parallel Processor (MPP) are described. Application of the segmentation approach to data compression and image analysis is then described, and results of such application are given for a LANDSAT Thematic Mapper image.
An efficient parallel computing scheme for Monte Carlo criticality calculations
International Nuclear Information System (INIS)
Dufek, Jan; Gudowski, Waclaw
2009-01-01
The existing parallel computing schemes for Monte Carlo criticality calculations suffer from a low efficiency when applied on many processors. We suggest a new fission matrix based scheme for efficient parallel computing. The results are derived from the fission matrix that is combined from all parallel simulations. The scheme allows for a practically ideal parallel scaling as no communication among the parallel simulations is required, and inactive cycles are not needed.
Efficient Parallel Algorithms for Landscape Evolution Modelling
Moresi, L. N.; Mather, B.; Beucher, R.
2017-12-01
Landscape erosion and the deposition of sediments by river systems are strongly controlled bytopography, rainfall patterns, and the susceptibility of the basement to the action ofrunning water. It is well understood that each of these processes depends on the other, for example:topography results from active tectonic processes; deformation, metamorphosis andexhumation alter the competence of the basement; rainfall patterns depend on topography;uplift and subsidence in response to tectonic stress can be amplified by erosionand sediment deposition. We typically gain understanding of such coupled systems through forward models which capture theessential interactions of the various components and attempt parameterise those parts of the individual systemthat are unresolvable at the scale of the interaction. Here we address the problem of predicting erosion and deposition rates at a continental scalewith a resolution of tens to hundreds of metres in a dynamic, Lagrangian framework. This isa typical requirement for a code to interface with a mantle / lithosphere dynamics model anddemands an efficient, unstructured, parallel implementation. We address this through a very general algorithm that treats all parts of the landscape evolution equationsin sparse-matrix form including those for stream-flow accumulation, dam-filling and catchment determination. This givesus considerable flexibility in developing unstructured, parallel code, and in creating a modular packagethat can be configured by users to work at different temporal and spatial scales, but is also has potential advantagesin treating the non-linear parts of the problem in a general manner.
Efficient Parallel Strategy Improvement for Parity Games
Fearnley, John
2017-01-01
We study strategy improvement algorithms for solving parity games. While these algorithms are known to solve parity games using a very small number of iterations, experimental studies have found that a high step complexity causes them to perform poorly in practice. In this paper we seek to address this situation. Every iteration of the algorithm must compute a best response, and while the standard way of doing this uses the Bellman-Ford algorithm, we give experimental results that show that o...
Variation in efficiency of parallel algorithms. [for study of stiffness matrices in planar trusses
Hayashi, A.; Melosh, R. J.; Utku, S.; Salama, M.
1985-01-01
The present study has the objective to investigate some iterative parallel-processor linear equation solving algorithms with respect to efficiency for analyses of typical linear engineering systems. Attention is given to a set of n linear equations, Ku = p, where K = an n x n positive definite, sparsely populated, symmetric matrix, u = an n x 1 vector of unknown responses, and p = an n x 1 vector of prescribed constants. This study is concerned with a hybrid method in which iteration is used to solve the problem, while a direct method is used on the local processor level. Variations in the efficiency of parallel algorithms are explored. Measures of the efficiency are based on computer experiments regarding the algorithms. For all the algorithms, the wall clock time is found to decrease as the number of processors increases.
Block iterative restoration of astronomical images with the massively parallel processor
International Nuclear Information System (INIS)
Heap, S.R.; Lindler, D.J.
1987-01-01
A method is described for algebraic image restoration capable of treating astronomical images. For a typical 500 x 500 image, direct algebraic restoration would require the solution of a 250,000 x 250,000 linear system. The block iterative approach is used to reduce the problem to solving 4900 121 x 121 linear systems. The algorithm was implemented on the Goddard Massively Parallel Processor, which can solve a 121 x 121 system in approximately 0.06 seconds. Examples are shown of the results for various astronomical images
An efficient iterative method for the generalized Stokes problem
Energy Technology Data Exchange (ETDEWEB)
Sameh, A. [Univ. of Minnesota, Twin Cities, MN (United States); Sarin, V. [Univ. of Illinois, Urbana, IL (United States)
1996-12-31
This paper presents an efficient iterative scheme for the generalized Stokes problem, which arises frequently in the simulation of time-dependent Navier-Stokes equations for incompressible fluid flow. The general form of the linear system is where A = {alpha}M + vT is an n x n symmetric positive definite matrix, in which M is the mass matrix, T is the discrete Laplace operator, {alpha} and {nu} are positive constants proportional to the inverses of the time-step {Delta}t and the Reynolds number Re respectively, and B is the discrete gradient operator of size n x k (k < n). Even though the matrix A is symmetric and positive definite, the system is indefinite due to the incompressibility constraint (B{sup T}u = 0). This causes difficulties both for iterative methods and commonly used preconditioners. Moreover, depending on the ratio {alpha}/{nu}, A behaves like the mass matrix M at one extreme and the Laplace operator T at the other, thus complicating the issue of preconditioning.
Energy Technology Data Exchange (ETDEWEB)
Joubert, W. [Los Alamos National Lab., NM (United States); Carey, G.F. [Univ. of Texas, Austin, TX (United States)
1994-12-31
A great need exists for high performance numerical software libraries transportable across parallel machines. This talk concerns the PCG package, which solves systems of linear equations by iterative methods on parallel computers. The features of the package are discussed, as well as techniques used to obtain high performance as well as transportability across architectures. Representative numerical results are presented for several machines including the Connection Machine CM-5, Intel Paragon and Cray T3D parallel computers.
Iotti, Robert
2015-04-01
ITER is an international experimental facility being built by seven Parties to demonstrate the long term potential of fusion energy. The ITER Joint Implementation Agreement (JIA) defines the structure and governance model of such cooperation. There are a number of necessary conditions for such international projects to be successful: a complete design, strong systems engineering working with an agreed set of requirements, an experienced organization with systems and plans in place to manage the project, a cost estimate backed by industry, and someone in charge. Unfortunately for ITER many of these conditions were not present. The paper discusses the priorities in the JIA which led to setting up the project with a Central Integrating Organization (IO) in Cadarache, France as the ITER HQ, and seven Domestic Agencies (DAs) located in the countries of the Parties, responsible for delivering 90%+ of the project hardware as Contributions-in-Kind and also financial contributions to the IO, as ``Contributions-in-Cash.'' Theoretically the Director General (DG) is responsible for everything. In practice the DG does not have the power to control the work of the DAs, and there is not an effective management structure enabling the IO and the DAs to arbitrate disputes, so the project is not really managed, but is a loose collaboration of competing interests. Any DA can effectively block a decision reached by the DG. Inefficiencies in completing design while setting up a competent organization from scratch contributed to the delays and cost increases during the initial few years. So did the fact that the original estimate was not developed from industry input. Unforeseen inflation and market demand on certain commodities/materials further exacerbated the cost increases. Since then, improvements are debatable. Does this mean that the governance model of ITER is a wrong model for international scientific cooperation? I do not believe so. Had the necessary conditions for success
Improved Iterative Parallel Interference Cancellation Receiver for Future Wireless DS-CDMA Systems
Directory of Open Access Journals (Sweden)
Andrea Bernacchioni
2005-04-01
Full Text Available We present a new turbo multiuser detector for turbo-coded direct sequence code division multiple access (DS-CDMA systems. The proposed detector is based on the utilization of a parallel interference cancellation (PIC and a bank of turbo decoders. The PIC is broken up in order to perform interference cancellation after each constituent decoder of the turbo decoding scheme. Moreover, in the paper we propose a new enhanced algorithm that provides a more accurate estimation of the signal-to-noise-plus-interference-ratio used in the tentative decision device and in the MAP decoding algorithm. The performance of the proposed receiver is evaluated by means of computer simulations for medium to very high system loads, in AWGN and multipath fading channel, and compared to recently proposed interference cancellation-based iterative MUD, by taking into account the number of iterations and the complexity involved. We will see that the proposed receiver outperforms the others especially for highly loaded systems.
Lin, Jyh-Miin; Patterson, Andrew J; Chang, Hing-Chiu; Gillard, Jonathan H; Graves, Martin J
2015-10-01
To propose a new reduced field-of-view (rFOV) strategy for iterative reconstructions in a clinical environment. Iterative reconstructions can incorporate regularization terms to improve the image quality of periodically rotated overlapping parallel lines with enhanced reconstruction (PROPELLER) MRI. However, the large amount of calculations required for full FOV iterative reconstructions has posed a huge computational challenge for clinical usage. By subdividing the entire problem into smaller rFOVs, the iterative reconstruction can be accelerated on a desktop with a single graphic processing unit (GPU). This rFOV strategy divides the iterative reconstruction into blocks, based on the block-diagonal dominant structure. A near real-time reconstruction system was developed for the clinical MR unit, and parallel computing was implemented using the object-oriented model. In addition, the Toeplitz method was implemented on the GPU to reduce the time required for full interpolation. Using the data acquired from the PROPELLER MRI, the reconstructed images were then saved in the digital imaging and communications in medicine format. The proposed rFOV reconstruction reduced the gridding time by 97%, as the total iteration time was 3 s even with multiple processes running. A phantom study showed that the structure similarity index for rFOV reconstruction was statistically superior to conventional density compensation (p concept of rFOV reconstruction may potentially be applied to other kinds of iterative reconstructions for shortened reconstruction duration.
Efficient Parallel Engineering Computing on Linux Workstations
Lou, John Z.
2010-01-01
A C software module has been developed that creates lightweight processes (LWPs) dynamically to achieve parallel computing performance in a variety of engineering simulation and analysis applications to support NASA and DoD project tasks. The required interface between the module and the application it supports is simple, minimal and almost completely transparent to the user applications, and it can achieve nearly ideal computing speed-up on multi-CPU engineering workstations of all operating system platforms. The module can be integrated into an existing application (C, C++, Fortran and others) either as part of a compiled module or as a dynamically linked library (DLL).
International Nuclear Information System (INIS)
Rosa, Massimiliano; Warsa, James S.; Perks, Michael
2011-01-01
We have implemented a cell-wise, block-Gauss-Seidel (bGS) iterative algorithm, for the solution of the S n transport equations on the Roadrunner hybrid, parallel computer architecture. A compute node of this massively parallel machine comprises AMD Opteron cores that are linked to a Cell Broadband Engine™ (Cell/B.E.) 1 . LAPACK routines have been ported to the Cell/B.E. in order to make use of its parallel Synergistic Processing Elements (SPEs). The bGS algorithm is based on the LU factorization and solution of a linear system that couples the fluxes for all S n angles and energy groups on a mesh cell. For every cell of a mesh that has been parallel decomposed on the higher-level Opteron processors, a linear system is transferred to the Cell/B.E. and the parallel LAPACK routines are used to compute a solution, which is then transferred back to the Opteron, where the rest of the computations for the S n transport problem take place. Compared to standard parallel machines, a hundred-fold speedup of the bGS was observed on the hybrid Roadrunner architecture. Numerical experiments with strong and weak parallel scaling demonstrate the bGS method is viable and compares favorably to full parallel sweeps (FPS) on two-dimensional, unstructured meshes when it is applied to optically thick, multi-material problems. As expected, however, it is not as efficient as FPS in optically thin problems. (author)
Efficient use of iterative solvers in nested topology optimization
DEFF Research Database (Denmark)
Amir, Oded; Stolpe, Mathias; Sigmund, Ole
2009-01-01
In the nested approach to structural optimization, most of the computational effort is invested in the solution of the finite element analysis equations. In this study, it is suggested to reduce this computational cost by using an approximation to the solution of the nested problem, generated...... by a Krylov subspace iterative solver. By choosing convergence criteria for the iterative solver that are strongly related to the optimization objective and to the design sensitivities, it is possible to terminate the iterative solution of the nested equations earlier compared to traditional convergence...... measures. The approximation is shown to be sufficiently accurate for the practical purpose of optimization even though the nested equation system is not solved accurately. The approach is tested on several medium-scale topology optimization problems, including three dimensional minimum compliance problems...
Efficient use of iterative solvers in nested topology optimization
DEFF Research Database (Denmark)
Amir, Oded; Stolpe, Mathias; Sigmund, Ole
2010-01-01
In the nested approach to structural optimization, most of the computational effort is invested in the solution of the analysis equations. In this study, it is suggested to reduce this computational cost by using an approximation to the solution of the analysis problem, generated by a Krylov...... subspace iterative solver. By choosing convergence criteria for the iterative solver that are strongly related to the optimization objective and to the design sensitivities, it is possible to terminate the iterative solution of the nested equations earlier compared to traditional convergence measures....... The approximation is computationally shown to be sufficiently accurate for the purpose of optimization though the nested equation system is not necessarily solved accurately. The approach is tested on several large-scale topology optimization problems, including minimum compliance problems and compliant mechanism...
International Nuclear Information System (INIS)
Wu Huapeng; Handroos, Heikki; Pessi, Pekka; Kilkki, Juha; Jones, Lawrence
2005-01-01
This paper presents a special robot, able to carry out welding and machining processes from inside the ITER vacuum vessel (VV), consisting of a five degree-of-freedom parallel mechanism, mounted on a carriage driven by two electric motors on a rack. The kinematic design of the robot has been optimised for ITER access and a hydraulically actuated pre-prototype built. A hybrid controller is designed for the robot, including position, speed and pressure feedback loops to achieve high accuracy and high dynamic performances. Finally, the experimental tests are given and discussed
Parallel state transfer and efficient quantum routing on quantum networks.
Chudzicki, Christopher; Strauch, Frederick W
2010-12-31
We study the routing of quantum information in parallel on multidimensional networks of tunable qubits and oscillators. These theoretical models are inspired by recent experiments in superconducting circuits. We show that perfect parallel state transfer is possible for certain networks of harmonic oscillator modes. We extend this to the distribution of entanglement between every pair of nodes in the network, finding that the routing efficiency of hypercube networks is optimal and robust in the presence of dissipation and finite bandwidth.
An Expert System for the Development of Efficient Parallel Code
Jost, Gabriele; Chun, Robert; Jin, Hao-Qiang; Labarta, Jesus; Gimenez, Judit
2004-01-01
We have built the prototype of an expert system to assist the user in the development of efficient parallel code. The system was integrated into the parallel programming environment that is currently being developed at NASA Ames. The expert system interfaces to tools for automatic parallelization and performance analysis. It uses static program structure information and performance data in order to automatically determine causes of poor performance and to make suggestions for improvements. In this paper we give an overview of our programming environment, describe the prototype implementation of our expert system, and demonstrate its usefulness with several case studies.
Discontinuous interleaving of parallel inverters for efficiency improvement
DEFF Research Database (Denmark)
Rannestad, Bjørn; Munk-Nielsen, Stig; Gadgaard, Kristian
2017-01-01
Interleaved switching of parallel inverters has previously been proposed for efficiency/size improvements of grid connected three-phase inverters. This paper proposes a novel interleaving method which practically eliminates insulated gate bipolar transistor (IGBT) turn-on losses and drastically...... reduces diode reverse recovery losses. The reduction in switching losses are obtained by interleaving two parallel inverter branches so that only one branch conducts the load current at a time. By placing saturable inductors between the parallel branches, soft switching may be obtained, and thereby...
International Nuclear Information System (INIS)
Doster, J.M.; Sills, E.D.
1986-01-01
Current efforts are under way to develop and evaluate numerical algorithms for the parallel solution of the large sparse matrix equations associated with the finite difference representation of the macroscopic Navier-Stokes equations. Previous work has shown that these equations can be cast into smaller coupled matrix equations suitable for solution utilizing multiple computer processors operating in parallel. The individual processors themselves may exhibit parallelism through the use of vector pipelines. This wor, has concentrated on the one-dimensional drift flux form of the Navier-Stokes equations. Direct and iterative algorithms that may be suitable for implementation on parallel computer architectures are evaluated in terms of accuracy and overall execution speed. This work has application to engineering and training simulations, on-line process control systems, and engineering workstations where increased computational speeds are required
Efficient parallel CFD-DEM simulations using OpenMP
Amritkar, Amit; Deb, Surya; Tafti, Danesh
2014-01-01
The paper describes parallelization strategies for the Discrete Element Method (DEM) used for simulating dense particulate systems coupled to Computational Fluid Dynamics (CFD). While the field equations of CFD are best parallelized by spatial domain decomposition techniques, the N-body particulate phase is best parallelized over the number of particles. When the two are coupled together, both modes are needed for efficient parallelization. It is shown that under these requirements, OpenMP thread based parallelization has advantages over MPI processes. Two representative examples, fairly typical of dense fluid-particulate systems are investigated, including the validation of the DEM-CFD and thermal-DEM implementation with experiments. Fluidized bed calculations are performed on beds with uniform particle loading, parallelized with MPI and OpenMP. It is shown that as the number of processing cores and the number of particles increase, the communication overhead of building ghost particle lists at processor boundaries dominates time to solution, and OpenMP which does not require this step is about twice as fast as MPI. In rotary kiln heat transfer calculations, which are characterized by spatially non-uniform particle distributions, the low overhead of switching the parallelization mode in OpenMP eliminates the load imbalances, but introduces increased overheads in fetching non-local data. In spite of this, it is shown that OpenMP is between 50-90% faster than MPI.
DEFF Research Database (Denmark)
Dieterle, Mischa; Horstmeyer, Thomas; Berthold, Jost
2012-01-01
a particular skeleton ad-hoc for repeated execution turns out to be considerably complicated, and raises general questions about introducing state into a stateless parallel computation. In addition, one would strongly prefer an approach which leaves the original skeleton intact, and only uses it as a building...... block inside a bigger structure. In this work, we present a general framework for skeleton iteration and discuss requirements and variations of iteration control and iteration body. Skeleton iteration is expressed by synchronising a parallel iteration body skeleton with a (likewise parallel) state......Skeleton-based programming is an area of increasing relevance with upcoming highly parallel hardware, since it substantially facilitates parallel programming and separates concerns. When parallel algorithms expressed by skeletons involve iterations – applying the same algorithm repeatedly...
An efficient data dependence analysis for parallelizing compilers
Li, Zhiyuan; Yew, Pen-Chung; Zhu, Chuan-Qi
1990-01-01
A novel algorithm, called the lambda test, is presented for an efficient and accurate data dependence analysis of multidimensional array references. It extends the numerical methods to allow all dimensions of array references to be tested simultaneously. Hence, it combines the efficiency and the accuracy of the both approaches. This algorithm has been implemented in PARAFRASE, a FORTRAN program parallelization restructurer developed at the University of Illinois at Urbana-Champaign. Some experimental results are presented to show its effectiveness.
On the efficient parallel computation of Legendre transforms
Inda, M.A.; Bisseling, R.H.; Maslen, D.K.
2001-01-01
In this article, we discuss a parallel implementation of efficient algorithms for computation of Legendre polynomial transforms and other orthogonal polynomial transforms. We develop an approach to the Driscoll-Healy algorithm using polynomial arithmetic and present experimental results on the
On the efficient parallel computation of Legendre transforms
Inda, M.A.; Bisseling, R.H.; Maslen, D.K.
1999-01-01
In this article we discuss a parallel implementation of efficient algorithms for computation of Legendre polynomial transforms and other orthogonal polynomial transforms. We develop an approach to the Driscoll-Healy algorithm using polynomial arithmetic and present experimental results on the
Energy Technology Data Exchange (ETDEWEB)
Rosa, Massimiliano [Los Alamos National Laboratory; Warsa, James S [Los Alamos National Laboratory; Perks, Michael [Los Alamos National Laboratory
2010-12-14
We have implemented a cell-wise, block-Gauss-Seidel (bGS) iterative algorithm, for the solution of the S{sub n} transport equations on the Roadrunner hybrid, parallel computer architecture. A compute node of this massively parallel machine comprises AMD Opteron cores that are linked to a Cell Broadband Engine{trademark} (Cell/B.E.). LAPACK routines have been ported to the Cell/B.E. in order to make use of its parallel Synergistic Processing Elements (SPEs). The bGS algorithm is based on the LU factorization and solution of a linear system that couples the fluxes for all S{sub n} angles and energy groups on a mesh cell. For every cell of a mesh that has been parallel decomposed on the higher-level Opteron processors, a linear system is transferred to the Cell/B.E. and the parallel LAPACK routines are used to compute a solution, which is then transferred back to the Opteron, where the rest of the computations for the S{sub n} transport problem take place. Compared to standard parallel machines, a hundred-fold speedup of the bGS was observed on the hybrid Roadrunner architecture. Numerical experiments with strong and weak parallel scaling demonstrate the bGS method is viable and compares favorably to full parallel sweeps (FPS) on two-dimensional, unstructured meshes when it is applied to optically thick, multi-material problems. As expected, however, it is not as efficient as FPS in optically thin problems.
Efficient Parallel Algorithm For Direct Numerical Simulation of Turbulent Flows
Moitra, Stuti; Gatski, Thomas B.
1997-01-01
A distributed algorithm for a high-order-accurate finite-difference approach to the direct numerical simulation (DNS) of transition and turbulence in compressible flows is described. This work has two major objectives. The first objective is to demonstrate that parallel and distributed-memory machines can be successfully and efficiently used to solve computationally intensive and input/output intensive algorithms of the DNS class. The second objective is to show that the computational complexity involved in solving the tridiagonal systems inherent in the DNS algorithm can be reduced by algorithm innovations that obviate the need to use a parallelized tridiagonal solver.
Efficient parallel simulation of CO2 geologic sequestration in saline aquifers
International Nuclear Information System (INIS)
Zhang, Keni; Doughty, Christine; Wu, Yu-Shu; Pruess, Karsten
2007-01-01
An efficient parallel simulator for large-scale, long-term CO2 geologic sequestration in saline aquifers has been developed. The parallel simulator is a three-dimensional, fully implicit model that solves large, sparse linear systems arising from discretization of the partial differential equations for mass and energy balance in porous and fractured media. The simulator is based on the ECO2N module of the TOUGH2code and inherits all the process capabilities of the single-CPU TOUGH2code, including a comprehensive description of the thermodynamics and thermophysical properties of H2O-NaCl- CO2 mixtures, modeling single and/or two-phase isothermal or non-isothermal flow processes, two-phase mixtures, fluid phases appearing or disappearing, as well as salt precipitation or dissolution. The new parallel simulator uses MPI for parallel implementation, the METIS software package for simulation domain partitioning, and the iterative parallel linear solver package Aztec for solving linear equations by multiple processors. In addition, the parallel simulator has been implemented with an efficient communication scheme. Test examples show that a linear or super-linear speedup can be obtained on Linux clusters as well as on supercomputers. Because of the significant improvement in both simulation time and memory requirement, the new simulator provides a powerful tool for tackling larger scale and more complex problems than can be solved by single-CPU codes. A high-resolution simulation example is presented that models buoyant convection, induced by a small increase in brine density caused by dissolution of CO2
Efficient Parallel Execution of Event-Driven Electromagnetic Hybrid Models
Energy Technology Data Exchange (ETDEWEB)
Perumalla, Kalyan S [ORNL; Karimabadi, Dr. Homa [SciberQuest Inc.; Fujimoto, Richard [ORNL
2007-01-01
New discrete-event formulations of physics simulation models are emerging that can outperform traditional time-stepped models, especially in simulations containing multiple timescales. Detailed simulation of the Earth's magnetosphere, for example, requires execution of sub-models that operate at timescales that differ by orders of magnitude. In contrast to time-stepped simulation which requires tightly coupled updates to almost the entire system state at regular time intervals, the new discrete event simulation (DES) approaches help evolve the states of sub-models on relatively independent timescales. However, in contrast to relative ease of parallelization of time-stepped codes, the parallelization of DES-based models raises challenges with respect to their scalability and performance. One of the key challenges is to improve the computation granularity to offset synchronization and communication overheads within and across processors. Our previous work on parallelization was limited in scalability and runtime performance due to such challenges. Here we report on optimizations we performed on DES-based plasma simulation models to improve parallel execution performance. The mapping of the model to simulation processes is optimized via aggregation techniques, and the parallel runtime engine is optimized for communication and memory efficiency. The net result is the capability to simulate hybrid particle-in-cell (PIC) models with over 2 billion ion particles using 512 processors on supercomputing platforms.
Efficient multitasking: parallel versus serial processing of multiple tasks.
Fischer, Rico; Plessow, Franziska
2015-01-01
In the context of performance optimizations in multitasking, a central debate has unfolded in multitasking research around whether cognitive processes related to different tasks proceed only sequentially (one at a time), or can operate in parallel (simultaneously). This review features a discussion of theoretical considerations and empirical evidence regarding parallel versus serial task processing in multitasking. In addition, we highlight how methodological differences and theoretical conceptions determine the extent to which parallel processing in multitasking can be detected, to guide their employment in future research. Parallel and serial processing of multiple tasks are not mutually exclusive. Therefore, questions focusing exclusively on either task-processing mode are too simplified. We review empirical evidence and demonstrate that shifting between more parallel and more serial task processing critically depends on the conditions under which multiple tasks are performed. We conclude that efficient multitasking is reflected by the ability of individuals to adjust multitasking performance to environmental demands by flexibly shifting between different processing strategies of multiple task-component scheduling.
High-Efficient Parallel CAVLC Encoders on Heterogeneous Multicore Architectures
Directory of Open Access Journals (Sweden)
H. Y. Su
2012-04-01
Full Text Available This article presents two high-efficient parallel realizations of the context-based adaptive variable length coding (CAVLC based on heterogeneous multicore processors. By optimizing the architecture of the CAVLC encoder, three kinds of dependences are eliminated or weaken, including the context-based data dependence, the memory accessing dependence and the control dependence. The CAVLC pipeline is divided into three stages: two scans, coding, and lag packing, and be implemented on two typical heterogeneous multicore architectures. One is a block-based SIMD parallel CAVLC encoder on multicore stream processor STORM. The other is a component-oriented SIMT parallel encoder on massively parallel architecture GPU. Both of them exploited rich data-level parallelism. Experiments results show that compared with the CPU version, more than 70 times of speedup can be obtained for STORM and over 50 times for GPU. The implementation of encoder on STORM can make a real-time processing for 1080p @30fps and GPU-based version can satisfy the requirements for 720p real-time encoding. The throughput of the presented CAVLC encoders is more than 10 times higher than that of published software encoders on DSP and multicore platforms.
2D-RBUC for efficient parallel compression of residuals
Đurđević, Đorđe M.; Tartalja, Igor I.
2018-02-01
In this paper, we present a method for lossless compression of residuals with an efficient SIMD parallel decompression. The residuals originate from lossy or near lossless compression of height fields, which are commonly used to represent models of terrains. The algorithm is founded on the existing RBUC method for compression of non-uniform data sources. We have adapted the method to capture 2D spatial locality of height fields, and developed the data decompression algorithm for modern GPU architectures already present even in home computers. In combination with the point-level SIMD-parallel lossless/lossy high field compression method HFPaC, characterized by fast progressive decompression and seamlessly reconstructed surface, the newly proposed method trades off small efficiency degradation for a non negligible compression ratio (measured up to 91%) benefit.
Parallel Iterative Solution Methods for Linear Systems arising from Discretized PDE's
Vorst, H.A. van der
1995-01-01
In these notes we will present an overview of a number of related iterative methods for the solution of linear systems of equations. These methods are so-called Krylov projection type methods and the include popular methods as Conjugate Gradients, Bi-Conjugate Gradients, CGST Bi-CGSTAB, QMR, LSQR and GMRES. We will show how these methods can be derived from simple basic iteration formulas. We will not give convergence proofs, but we will refer for these, as far as available, to litterature. I...
Computationally efficient implementation of combustion chemistry in parallel PDF calculations
International Nuclear Information System (INIS)
Lu Liuyan; Lantz, Steven R.; Ren Zhuyin; Pope, Stephen B.
2009-01-01
In parallel calculations of combustion processes with realistic chemistry, the serial in situ adaptive tabulation (ISAT) algorithm [S.B. Pope, Computationally efficient implementation of combustion chemistry using in situ adaptive tabulation, Combustion Theory and Modelling, 1 (1997) 41-63; L. Lu, S.B. Pope, An improved algorithm for in situ adaptive tabulation, Journal of Computational Physics 228 (2009) 361-386] substantially speeds up the chemistry calculations on each processor. To improve the parallel efficiency of large ensembles of such calculations in parallel computations, in this work, the ISAT algorithm is extended to the multi-processor environment, with the aim of minimizing the wall clock time required for the whole ensemble. Parallel ISAT strategies are developed by combining the existing serial ISAT algorithm with different distribution strategies, namely purely local processing (PLP), uniformly random distribution (URAN), and preferential distribution (PREF). The distribution strategies enable the queued load redistribution of chemistry calculations among processors using message passing. They are implemented in the software x2f m pi, which is a Fortran 95 library for facilitating many parallel evaluations of a general vector function. The relative performance of the parallel ISAT strategies is investigated in different computational regimes via the PDF calculations of multiple partially stirred reactors burning methane/air mixtures. The results show that the performance of ISAT with a fixed distribution strategy strongly depends on certain computational regimes, based on how much memory is available and how much overlap exists between tabulated information on different processors. No one fixed strategy consistently achieves good performance in all the regimes. Therefore, an adaptive distribution strategy, which blends PLP, URAN and PREF, is devised and implemented. It yields consistently good performance in all regimes. In the adaptive parallel
Computationally efficient implementation of combustion chemistry in parallel PDF calculations
Lu, Liuyan; Lantz, Steven R.; Ren, Zhuyin; Pope, Stephen B.
2009-08-01
In parallel calculations of combustion processes with realistic chemistry, the serial in situ adaptive tabulation (ISAT) algorithm [S.B. Pope, Computationally efficient implementation of combustion chemistry using in situ adaptive tabulation, Combustion Theory and Modelling, 1 (1997) 41-63; L. Lu, S.B. Pope, An improved algorithm for in situ adaptive tabulation, Journal of Computational Physics 228 (2009) 361-386] substantially speeds up the chemistry calculations on each processor. To improve the parallel efficiency of large ensembles of such calculations in parallel computations, in this work, the ISAT algorithm is extended to the multi-processor environment, with the aim of minimizing the wall clock time required for the whole ensemble. Parallel ISAT strategies are developed by combining the existing serial ISAT algorithm with different distribution strategies, namely purely local processing (PLP), uniformly random distribution (URAN), and preferential distribution (PREF). The distribution strategies enable the queued load redistribution of chemistry calculations among processors using message passing. They are implemented in the software x2f_mpi, which is a Fortran 95 library for facilitating many parallel evaluations of a general vector function. The relative performance of the parallel ISAT strategies is investigated in different computational regimes via the PDF calculations of multiple partially stirred reactors burning methane/air mixtures. The results show that the performance of ISAT with a fixed distribution strategy strongly depends on certain computational regimes, based on how much memory is available and how much overlap exists between tabulated information on different processors. No one fixed strategy consistently achieves good performance in all the regimes. Therefore, an adaptive distribution strategy, which blends PLP, URAN and PREF, is devised and implemented. It yields consistently good performance in all regimes. In the adaptive parallel
Parallel Iterative Solution Methods for Linear Systems arising from Discretized PDE's
Vorst, H.A. van der
1995-01-01
In these notes we will present an overview of a number of related iterative methods for the solution of linear systems of equations. These methods are so-called Krylov projection type methods and the include popular methods as Conjugate Gradients, Bi-Conjugate Gradients, CGST Bi-CGSTAB, QMR, LSQR
Malasics, Attila; Boda, Dezso
2010-06-28
Two iterative procedures have been proposed recently to calculate the chemical potentials corresponding to prescribed concentrations from grand canonical Monte Carlo (GCMC) simulations. Both are based on repeated GCMC simulations with updated excess chemical potentials until the desired concentrations are established. In this paper, we propose combining our robust and fast converging iteration algorithm [Malasics, Gillespie, and Boda, J. Chem. Phys. 128, 124102 (2008)] with the suggestion of Lamperski [Mol. Simul. 33, 1193 (2007)] to average the chemical potentials in the iterations (instead of just using the chemical potentials obtained in the last iteration). We apply the unified method for various electrolyte solutions and show that our algorithm is more efficient if we use the averaging procedure. We discuss the convergence problems arising from violation of charge neutrality when inserting/deleting individual ions instead of neutral groups of ions (salts). We suggest a correction term to the iteration procedure that makes the algorithm efficient to determine the chemical potentials of individual ions too.
Energy Technology Data Exchange (ETDEWEB)
Joubert, W.D. [Los Alamos National Lab., NM (United States); Carey, G.F.; Kohli, H.; Lorber, A.; McLay, R.T.; Shen, Y. [Texas Univ., Austin, TX (United States); Berner, N.A. [Texas Univ., Austin, TX (United States)]|[Los Alamos National Lab., NM (United States); Kalhan, A. [Los Alamos National Lab., NM (United States)]|[Tennessee Univ., Knoxville, TN (United States)
1995-01-01
PCG (Preconditioned Conjugate Gradient package) is a system for solving linear equations of the form Au = b, for A a given matrix and b and u vectors. PCG, employing various gradient-type iterative methods coupled with preconditioners, is designed for general linear systems, with emphasis on sparse systems such as these arising from discretization of partial differential equations arising from physical applications. It can be used to solve linear equations efficiently on parallel computer architectures. Much of the code is reusable across architectures and the package is portable across different systems; the machines that are currently supported is listed. This manual is intended to be the general-purpose reference describing all features of the package accessible to the user; suggestions are also given regarding which methods to use for a given problem.
Efficient Parallel Statistical Model Checking of Biochemical Networks
Directory of Open Access Journals (Sweden)
Paolo Ballarini
2009-12-01
Full Text Available We consider the problem of verifying stochastic models of biochemical networks against behavioral properties expressed in temporal logic terms. Exact probabilistic verification approaches such as, for example, CSL/PCTL model checking, are undermined by a huge computational demand which rule them out for most real case studies. Less demanding approaches, such as statistical model checking, estimate the likelihood that a property is satisfied by sampling executions out of the stochastic model. We propose a methodology for efficiently estimating the likelihood that a LTL property P holds of a stochastic model of a biochemical network. As with other statistical verification techniques, the methodology we propose uses a stochastic simulation algorithm for generating execution samples, however there are three key aspects that improve the efficiency: first, the sample generation is driven by on-the-fly verification of P which results in optimal overall simulation time. Second, the confidence interval estimation for the probability of P to hold is based on an efficient variant of the Wilson method which ensures a faster convergence. Third, the whole methodology is designed according to a parallel fashion and a prototype software tool has been implemented that performs the sampling/verification process in parallel over an HPC architecture.
Yarrow, Maurice; VanderWijngaart, Rob; Kutler, Paul (Technical Monitor)
1997-01-01
The first release of the MPI version of the LU NAS Parallel Benchmark (NPB2.0) performed poorly compared to its companion NPB2.0 codes. The later LU release (NPB2.1 & 2.2) runs up to two and a half times faster, thanks to a revised point access scheme and related communications scheme. The new scheme sends substantially fewer messages. is cache "friendly", and has a better load balance. We detail the, observations and modifications that resulted in this efficiency improvement, and show that the poor behavior of the original code resulted from deriving a message passing scheme from an algorithm originally devised for a vector architecture.
Biomedical applications on the GRID efficient management of parallel jobs
Moscicki, Jakub T; Lee Hurng Chun; Lin, S C; Pia, Maria Grazia
2004-01-01
Distributed computing based on the Master-Worker and PULL interaction model is applicable to a number of applications in high energy physics, medical physics and bio-informatics. We demonstrate a realistic medical physics use-case of a dosimetric system for brachytherapy using distributed Grid resources. We present the efficient techniques for running parallel jobs in a case of the BLAST, a gene sequencing application, as well as for the Monte Carlo simulation based on Geant4. We present a strategy for improving the runtime performance and robustness of the jobs as well as for the minimization of the development time needed to migrate the applications to a distributed environment.
Parallel efficient rate control methods for JPEG 2000
Martínez-del-Amor, Miguel Á.; Bruns, Volker; Sparenberg, Heiko
2017-09-01
Since the introduction of JPEG 2000, several rate control methods have been proposed. Among them, post-compression rate-distortion optimization (PCRD-Opt) is the most widely used, and the one recommended by the standard. The approach followed by this method is to first compress the entire image split in code blocks, and subsequently, optimally truncate the set of generated bit streams according to the maximum target bit rate constraint. The literature proposes various strategies on how to estimate ahead of time where a block will get truncated in order to stop the execution prematurely and save time. However, none of them have been defined bearing in mind a parallel implementation. Today, multi-core and many-core architectures are becoming popular for JPEG 2000 codecs implementations. Therefore, in this paper, we analyze how some techniques for efficient rate control can be deployed in GPUs. In order to do that, the design of our GPU-based codec is extended, allowing stopping the process at a given point. This extension also harnesses a higher level of parallelism on the GPU, leading to up to 40% of speedup with 4K test material on a Titan X. In a second step, three selected rate control methods are adapted and implemented in our parallel encoder. A comparison is then carried out, and used to select the best candidate to be deployed in a GPU encoder, which gave an extra 40% of speedup in those situations where it was really employed.
Zheng, Jingjing; Frisch, Michael J
2017-12-12
An efficient geometry optimization algorithm based on interpolated potential energy surfaces with iteratively updated Hessians is presented in this work. At each step of geometry optimization (including both minimization and transition structure search), an interpolated potential energy surface is properly constructed by using the previously calculated information (energies, gradients, and Hessians/updated Hessians), and Hessians of the two latest geometries are updated in an iterative manner. The optimized minimum or transition structure on the interpolated surface is used for the starting geometry of the next geometry optimization step. The cost of searching the minimum or transition structure on the interpolated surface and iteratively updating Hessians is usually negligible compared with most electronic structure single gradient calculations. These interpolated potential energy surfaces are often better representations of the true potential energy surface in a broader range than a local quadratic approximation that is usually used in most geometry optimization algorithms. Tests on a series of large and floppy molecules and transition structures both in gas phase and in solutions show that the new algorithm can significantly improve the optimization efficiency by using the iteratively updated Hessians and optimizations on interpolated surfaces.
Efficiency of thermal outgassing for tritium retention measurement and removal in ITER
Directory of Open Access Journals (Sweden)
G. De Temmerman
2017-08-01
Full Text Available As a licensed nuclear facility, ITER must limit the in-vessel tritium (T retention to reduce the risks of potential release during accidents, the inventory limit being set at 1kg. Simulations and extrapolations from existing experiments indicate that T-retention in ITER will mainly be driven by co-deposition with beryllium (Be eroded from the first wall, with co-deposits forming mainly in the divertor region but also possibly on the first wall itself. A pulsed Laser-Induced Desorption (LID system, called Tritium Monitor, is being designed to locally measure the T-retention in co-deposits forming on the inner divertor baffle of ITER. Regarding tritium removal, the baseline strategy is to perform baking of the plasma-facing components, at 513K for the FW and 623K for the divertor. Both baking and laser desorption rely on the thermal desorption of tritium from the surface, the efficiency of which remains unclear for thick (and possibly impure co-deposits. This contribution reports on the results of TMAP7 studies of this efficiency for ITER-relevant deposits.
Directory of Open Access Journals (Sweden)
Zhang Wei
2005-01-01
Full Text Available The optimum and many suboptimum iterative soft-input soft-output (SISO multiuser detectors require a priori information about the multiuser system, such as the users' transmitted signature waveforms, relative delays, as well as the channel impulse response. In this paper, we employ adaptive algorithms in the SISO multiuser detector in order to avoid the need for this a priori information. First, we derive the optimum SISO parallel decision-feedback detector for asynchronous coded DS-CDMA systems. Then, we propose two adaptive versions of this SISO detector, which are based on the normalized least mean square (NLMS and recursive least squares (RLS algorithms. Our SISO adaptive detectors effectively exploit the a priori information of coded symbols, whose soft inputs are obtained from a bank of single-user decoders. Furthermore, we consider how to select practical finite feedforward and feedback filter lengths to obtain a good tradeoff between the performance and computational complexity of the receiver.
Efficient sequential and parallel algorithms for record linkage.
Mamun, Abdullah-Al; Mi, Tian; Aseltine, Robert; Rajasekaran, Sanguthevar
2014-01-01
Integrating data from multiple sources is a crucial and challenging problem. Even though there exist numerous algorithms for record linkage or deduplication, they suffer from either large time needs or restrictions on the number of datasets that they can integrate. In this paper we report efficient sequential and parallel algorithms for record linkage which handle any number of datasets and outperform previous algorithms. Our algorithms employ hierarchical clustering algorithms as the basis. A key idea that we use is radix sorting on certain attributes to eliminate identical records before any further processing. Another novel idea is to form a graph that links similar records and find the connected components. Our sequential and parallel algorithms have been tested on a real dataset of 1,083,878 records and synthetic datasets ranging in size from 50,000 to 9,000,000 records. Our sequential algorithm runs at least two times faster, for any dataset, than the previous best-known algorithm, the two-phase algorithm using faster computation of the edit distance (TPA (FCED)). The speedups obtained by our parallel algorithm are almost linear. For example, we get a speedup of 7.5 with 8 cores (residing in a single node), 14.1 with 16 cores (residing in two nodes), and 26.4 with 32 cores (residing in four nodes). We have compared the performance of our sequential algorithm with TPA (FCED) and found that our algorithm outperforms the previous one. The accuracy is the same as that of this previous best-known algorithm.
Efficient parallel algorithms for string editing and related problems
Apostolico, Alberto; Atallah, Mikhail J.; Larmore, Lawrence; Mcfaddin, H. S.
1988-01-01
The string editing problem for input strings x and y consists of transforming x into y by performing a series of weighted edit operations on x of overall minimum cost. An edit operation on x can be the deletion of a symbol from x, the insertion of a symbol in x or the substitution of a symbol x with another symbol. This problem has a well known O((absolute value of x)(absolute value of y)) time sequential solution (25). The efficient Program Requirements Analysis Methods (PRAM) parallel algorithms for the string editing problem are given. If m = ((absolute value of x),(absolute value of y)) and n = max((absolute value of x),(absolute value of y)), then the CREW bound is O (log m log n) time with O (mn/log m) processors. In all algorithms, space is O (mn).
Efficient sequential and parallel algorithms for planted motif search.
Nicolae, Marius; Rajasekaran, Sanguthevar
2014-01-31
Motif searching is an important step in the detection of rare events occurring in a set of DNA or protein sequences. One formulation of the problem is known as (l,d)-motif search or Planted Motif Search (PMS). In PMS we are given two integers l and d and n biological sequences. We want to find all sequences of length l that appear in each of the input sequences with at most d mismatches. The PMS problem is NP-complete. PMS algorithms are typically evaluated on certain instances considered challenging. Despite ample research in the area, a considerable performance gap exists because many state of the art algorithms have large runtimes even for moderately challenging instances. This paper presents a fast exact parallel PMS algorithm called PMS8. PMS8 is the first algorithm to solve the challenging (l,d) instances (25,10) and (26,11). PMS8 is also efficient on instances with larger l and d such as (50,21). We include a comparison of PMS8 with several state of the art algorithms on multiple problem instances. This paper also presents necessary and sufficient conditions for 3 l-mers to have a common d-neighbor. The program is freely available at http://engr.uconn.edu/~man09004/PMS8/. We present PMS8, an efficient exact algorithm for Planted Motif Search. PMS8 introduces novel ideas for generating common neighborhoods. We have also implemented a parallel version for this algorithm. PMS8 can solve instances not solved by any previous algorithms.
Electrical defibrillation optimization: An automated, iterative parallel finite-element approach
Energy Technology Data Exchange (ETDEWEB)
Hutchinson, S.A.; Shadid, J.N. [Sandia National Lab., Albuquerque, NM (United States); Ng, K.T. [New Mexico State Univ., Las Cruces, NM (United States); Nadeem, A. [Univ. of Pittsburgh, PA (United States)
1997-04-01
To date, optimization of electrode systems for electrical defibrillation has been limited to hand-selected electrode configurations. In this paper we present an automated approach which combines detailed, three-dimensional (3-D) finite element torso models with optimization techniques to provide a flexible analysis and design tool for electrical defibrillation optimization. Specifically, a parallel direct search (PDS) optimization technique is used with a representative objective function to find an electrode configuration which corresponds to the satisfaction of a postulated defibrillation criterion with a minimum amount of power and a low possibility of myocardium damage. For adequate representation of the thoracic inhomogeneities, 3-D finite-element torso models are used in the objective function computations. The CPU-intensive finite-element calculations required for the objective function evaluation have been implemented on a message-passing parallel computer in order to complete the optimization calculations in a timely manner. To illustrate the optimization procedure, it has been applied to a representative electrode configuration for transmyocardial defibrillation, namely the subcutaneous patch-right ventricular catheter (SP-RVC) system. Sensitivity of the optimal solutions to various tissue conductivities has been studied. 39 refs., 9 figs., 2 tabs.
Two Efficient Derivative-Free Iterative Methods for Solving Nonlinear Systems
Directory of Open Access Journals (Sweden)
Xiaofeng Wang
2016-02-01
Full Text Available In this work, two multi-step derivative-free iterative methods are presented for solving system of nonlinear equations. The new methods have high computational efficiency and low computational cost. The order of convergence of the new methods is proved by a development of an inverse first-order divided difference operator. The computational efficiency is compared with the existing methods. Numerical experiments support the theoretical results. Experimental results show that the new methods remarkably reduce the computing time in the process of high-precision computing.
Comparative efficiencies of three parallel algorithms for nonlinear ...
Indian Academy of Sciences (India)
Transient dynamic analysis; parallel processing; Newmark algorithm; group implicit algorithm; domain decomposition. ... The prime aim of the research work reported here is to test the portability of the parallel algorithms and also to study and understand the comparative efﬁciencies of three parallel algorithms developed for ...
International Nuclear Information System (INIS)
Choi, Joonsung; Kim, Dongchan; Oh, Changhyun; Han, Yeji; Park, HyunWook
2013-01-01
In MRI (magnetic resonance imaging), signal sampling along a radial k-space trajectory is preferred in certain applications due to its distinct advantages such as robustness to motion, and the radial sampling can be beneficial for reconstruction algorithms such as parallel MRI (pMRI) due to the incoherency. For radial MRI, the image is usually reconstructed from projection data using analytic methods such as filtered back-projection or Fourier reconstruction after gridding. However, the quality of the reconstructed image from these analytic methods can be degraded when the number of acquired projection views is insufficient. In this paper, we propose a novel reconstruction method based on the expectation maximization (EM) method, where the EM algorithm is remodeled for MRI so that complex images can be reconstructed. Then, to optimize the proposed method for radial pMRI, a reconstruction method that uses coil sensitivity information of multichannel RF coils is formulated. Experiment results from synthetic and in vivo data show that the proposed method introduces better reconstructed images than the analytic methods, even from highly subsampled data, and provides monotonic convergence properties compared to the conjugate gradient based reconstruction method. (paper)
Muckley, Matthew J; Noll, Douglas C; Fessler, Jeffrey A
2015-02-01
Sparsity-promoting regularization is useful for combining compressed sensing assumptions with parallel MRI for reducing scan time while preserving image quality. Variable splitting algorithms are the current state-of-the-art algorithms for SENSE-type MR image reconstruction with sparsity-promoting regularization. These methods are very general and have been observed to work with almost any regularizer; however, the tuning of associated convergence parameters is a commonly-cited hindrance in their adoption. Conversely, majorize-minimize algorithms based on a single Lipschitz constant have been observed to be slow in shift-variant applications such as SENSE-type MR image reconstruction since the associated Lipschitz constants are loose bounds for the shift-variant behavior. This paper bridges the gap between the Lipschitz constant and the shift-variant aspects of SENSE-type MR imaging by introducing majorizing matrices in the range of the regularizer matrix. The proposed majorize-minimize methods (called BARISTA) converge faster than state-of-the-art variable splitting algorithms when combined with momentum acceleration and adaptive momentum restarting. Furthermore, the tuning parameters associated with the proposed methods are unitless convergence tolerances that are easier to choose than the constraint penalty parameters required by variable splitting algorithms.
Gaudiani, Adriana; Carusela, Florencia; Soba, Alejandro
2013-01-01
A great challenge for scientists is to execute their computational applications efficiently. Nowadays, parallel programming has become a fundamental key to achieve this goal. High-performance computing provides a solution to exploit parallel architectures in order to get optimal performance. Both parallel programming model and the system architecture will maximize the benefits if both together are suitable to the inherent parallelism of the problem. We compared three parallelized versions ...
Efficient iterative method for solving the Dirac-Kohn-Sham density functional theory
Energy Technology Data Exchange (ETDEWEB)
Lin, Lin; Shao, Sihong; E, Weinan
2012-11-06
We present for the first time an efficient iterative method to directly solve the four-component Dirac-Kohn-Sham (DKS) density functional theory. Due to the existence of the negative energy continuum in the DKS operator, the existing iterative techniques for solving the Kohn-Sham systems cannot be efficiently applied to solve the DKS systems. The key component of our method is a novel filtering step (F) which acts as a preconditioner in the framework of the locally optimal block preconditioned conjugate gradient (LOBPCG) method. The resulting method, dubbed the LOBPCG-F method, is able to compute the desired eigenvalues and eigenvectors in the positive energy band without computing any state in the negative energy band. The LOBPCG-F method introduces mild extra cost compared to the standard LOBPCG method and can be easily implemented. We demonstrate our method in the pseudopotential framework with a planewave basis set which naturally satisfies the kinetic balance prescription. Numerical results for Pt$_{2}$, Au$_{2}$, TlF, and Bi$_{2}$Se$_{3}$ indicate that the LOBPCG-F method is a robust and efficient method for investigating the relativistic effect in systems containing heavy elements.
Comparative efficiencies of three parallel algorithms for nonlinear ...
Indian Academy of Sciences (India)
R. Narasimhan (Krishtel eMaging) 1461 1996 Oct 15 13:05:22
overall organisation and data structures of the program. Many researchers have devised algorithms for nonlinear dynamic analysis exploiting parallelism in both explicit and implicit time integration techniques. The explicit time integration algorithms like central difference method can easily be moved on to parallel pro-.
Boltz, F. W.
1984-01-01
An algorithm is presented for efficient p-iterative solution of the Lambert/Gauss orbit-determination problem using second-order Newton iteration. The algorithm is based on a universal transformation of Kepler's time-of-flight equation and approximate inverse solutions of this equation for short-way and long-way flight paths. The approximate solutions provide both good starting values for iteration and simplified computation of the second-order term in the iteration formula. Numerical results are presented which indicate that in many cases of practical significance (except those having collinear position vectors) the algorithm produces at least eight significant digits of accuracy with just two or three steps of iteration.
An efficient parallel algorithm for matrix-vector multiplication
Energy Technology Data Exchange (ETDEWEB)
Hendrickson, B.; Leland, R.; Plimpton, S.
1993-03-01
The multiplication of a vector by a matrix is the kernel computation of many algorithms in scientific computation. A fast parallel algorithm for this calculation is therefore necessary if one is to make full use of the new generation of parallel supercomputers. This paper presents a high performance, parallel matrix-vector multiplication algorithm that is particularly well suited to hypercube multiprocessors. For an n x n matrix on p processors, the communication cost of this algorithm is O(n/[radical]p + log(p)), independent of the matrix sparsity pattern. The performance of the algorithm is demonstrated by employing it as the kernel in the well-known NAS conjugate gradient benchmark, where a run time of 6.09 seconds was observed. This is the best published performance on this benchmark achieved to date using a massively parallel supercomputer.
Efficient numerical methods for the large-scale, parallel solution of elastoplastic contact problems
Frohne, Jörg
2015-08-06
© 2016 John Wiley & Sons, Ltd. Quasi-static elastoplastic contact problems are ubiquitous in many industrial processes and other contexts, and their numerical simulation is consequently of great interest in accurately describing and optimizing production processes. The key component in these simulations is the solution of a single load step of a time iteration. From a mathematical perspective, the problems to be solved in each time step are characterized by the difficulties of variational inequalities for both the plastic behavior and the contact problem. Computationally, they also often lead to very large problems. In this paper, we present and evaluate a complete set of methods that are (1) designed to work well together and (2) allow for the efficient solution of such problems. In particular, we use adaptive finite element meshes with linear and quadratic elements, a Newton linearization of the plasticity, active set methods for the contact problem, and multigrid-preconditioned linear solvers. Through a sequence of numerical experiments, we show the performance of these methods. This includes highly accurate solutions of a three-dimensional benchmark problem and scaling our methods in parallel to 1024 cores and more than a billion unknowns.
Energy Technology Data Exchange (ETDEWEB)
Bhaskaran-Nair, Kiran; Brabec, Jiri; Apra, Edoardo; van Dam, Hubertus JJ; Pittner, Jiri; Kowalski, Karol
2012-09-07
In this paper we discuss the performance of the non-iterative State-Specific Mul- tireference Coupled Cluster (SS-MRCC) methods accounting for the effect of triply excited cluster amplitudes. The corrections to the Brillouin-Wigner and Mukherjee MRCC models based on the manifold of singly and doubly excited cluster amplitudes (BW-MRCCSD and Mk-MRCCSD, respectively) are tested and compared with the exact full configuration interaction results (FCI) for small systems (H2O, N2, and Be3). For larger systems (naphthyne isomers and -carotene), the non-iterative BW-MRCCSD(T) and Mk-MRCCSD(T) methods are compared against the results obtained with the single reference coupled cluster methods. We also report on the parallel performance of the non-iterative implementations based on the use of pro- cessor groups.
Efficient Simulation of Population Overflow in Parallel Queues
Nicola, V.F.; Zaburnenko, T.S.
2006-01-01
In this paper we propose a state-dependent importance sampling heuristic to estimate the probability of population overﬂow in networks of parallel queues. This heuristic approximates the “optimal��? state-dependent change of measure without the need for dif��?cult mathematical analysis or costly
Efficient Heuristics for Simulating Population Overflow in Parallel Networks
Zaburnenko, T.S.; Nicola, V.F.
2006-01-01
In this paper we propose a state-dependent importance sampling heuristic to estimate the probability of population overflow in networks of parallel queues. This heuristic approximates the “optimal��? state-dependent change of measure without the need for costly optimization involved in other
An efficient massively parallel Euler solver for unstructured grids
International Nuclear Information System (INIS)
Hammond, S.W.; Barth, T.J.
1991-01-01
A data parallel mesh-vertex upwind finite-volume scheme for solving the Euler equations on triangular unstructured meshes is described. A novel vertex-based partitioning of the problem is introduced which minimizes the computation and communication costs associated with distributing the computation to the processors of a massively parallel computer. Finally, the performance of this unstructured computation on 8K processors of the Connection Machine CM-2 is compared with one processor of a Cray-YMP. The experiments show that 8K processors of the CM-2 achieve approximately 70 percent of the performance of one processor of the Cray-YMP on the unstructured mesh computations described here. 8 refs
Multi-Level Parallelism for Time- and Cost-efficient Parallel Discrete Event Simulation on GPUs
Kunz, Georg; Schemmel, Daniel; Gross, James; Wehrle, Klaus
2012-01-01
eveloping complex technical systems requires a systematic exploration of the given design space in order to identify optimal system configurations. However, studying the effects and interactions of even a small number of system parameters often requires an extensive number of simulation runs. This in turn results in excessive runtime demands which severely hamper thorough design space explorations. In this paper, we present a parallel discrete event simulation scheme that enables cost- and ti...
Multi states electromechanical switch for energy efficient parallel data processing
Kloub, Hussam
2011-04-01
We present a design, simulation results and fabrication of electromechanical switches enabling parallel data processing and multi functionality. The device is applied in logic gates AND, NOR, XNOR, and Flip-Flops. The device footprint size is 2μm by 0.5μm, and has a pull-in voltage of 5.15V which is verified by FEM simulation. © 2011 IEEE.
Gartling, D. K.; Roache, P. J.
1978-01-01
The efficiency characteristics of finite element and finite difference approximations for the steady-state solution of the Navier-Stokes equations are examined. The finite element method discussed is a standard Galerkin formulation of the incompressible, steady-state Navier-Stokes equations. The finite difference formulation uses simple centered differences that are O(delta x-squared). Operation counts indicate that a rapidly converging Newton-Raphson-Kantorovitch iteration scheme is generally preferable over a Picard method. A split NOS Picard iterative algorithm for the finite difference method was most efficient.
A parallel reconfigurable platform for efficient sequence alignment
African Journals Online (AJOL)
SAM
2014-08-13
Aug 13, 2014 ... focuses on an efficient and optimized computation, analysis and sequencing of DNA pattern. The ... hours to few seconds. Key words: DNA, sequencing, bioinformatics, efficient computations, repetitive finding, optimized sequencing. .... logic is associated with one bit position, and shared among all the bits ...
Work-Efficient Parallel Skyline Computation for the GPU
DEFF Research Database (Denmark)
Bøgh, Kenneth Sejdenfaden; Chester, Sean; Assent, Ira
2015-01-01
The skyline operator returns records in a dataset that provide optimal trade-offs of multiple dimensions. State-of-the-art skyline computation involves complex tree traversals, data-ordering, and conditional branching to minimize the number of point-to-point comparisons. Meanwhile, GPGPU computing...... offers the potential for parallelizing skyline computation across thousands of cores. However, attempts to port skyline algorithms to the GPU have prioritized throughput and failed to outperform sequential algorithms. In this paper, we introduce a new skyline algorithm, designed for the GPU, that uses...
Energy and fuel efficient parallel mild hybrids for urban roads
International Nuclear Information System (INIS)
Babu, Ajay; Ashok, S.
2016-01-01
Highlights: • Energy and fuel savings depend on battery charge variations and the vehicle speed parameters. • Indian urban conditions provide lot of scope for energy and fuel savings in mild hybrids. • Energy saving strategy has lower payback periods than the fuel saving one in mild hybrids. • Sensitivity to parameter variations is the least for energy saving strategy in a mild hybrid. - Abstract: Fuel economy improvements and battery energy savings can promote the adoption of parallel mild hybrids for urban driving conditions. The aim of this study is to establish these benefits through two operating modes: an energy saving mode and a fuel saving mode. The performances of a typical parallel mild hybrid using these modes were analysed over urban driving cycles, in the US, Europe, and India, with a particular focus on the Indian urban conditions. The energy pack available from the proposed energy-saving operating mode, in addition to the energy already available from the conventional mode, was observed to be the highest for the representative urban driving cycle of the US. The extra energy pack available was found to be approximately 21.9 times that available from the conventional mode. By employing the proposed fuel saving operating mode, the fuel economy improvement achievable in New York City was observed to be approximately 22.69% of the fuel economy with the conventional strategy. The energy saving strategy was found to possess the lowest payback periods and highest immunity to variations in various cost parameters.
Efficient biased random bit generation for parallel processing
Energy Technology Data Exchange (ETDEWEB)
Slone, Dale M. [Univ. of California, Davis, CA (United States)
1994-09-28
A lattice gas automaton was implemented on a massively parallel machine (the BBN TC2000) and a vector supercomputer (the CRAY C90). The automaton models Burgers equation ρt + ρρ_{x} = vρ_{xx} in 1 dimension. The lattice gas evolves by advecting and colliding pseudo-particles on a 1-dimensional, periodic grid. The specific rules for colliding particles are stochastic in nature and require the generation of many billions of random numbers to create the random bits necessary for the lattice gas. The goal of the thesis was to speed up the process of generating the random bits and thereby lessen the computational bottleneck of the automaton.
A parallel reconfigurable platform for efficient sequence alignment ...
African Journals Online (AJOL)
Bioinformatics is one of the emerging trends in today's world. The major part of bioinformatics is dealing with DNA. Analysis of DNA requires more memory and high efficient computations to produce accurate outputs. Researchers use various bioinformatics algorithms for sequencing and pattern detection techniques, but still ...
Valasek, Lukas; Glasa, Jan
2017-12-01
Current fire simulation systems are capable to utilize advantages of high-performance computer (HPC) platforms available and to model fires efficiently in parallel. In this paper, efficiency of a corridor fire simulation on a HPC computer cluster is discussed. The parallel MPI version of Fire Dynamics Simulator is used for testing efficiency of selected strategies of allocation of computational resources of the cluster using a greater number of computational cores. Simulation results indicate that if the number of cores used is not equal to a multiple of the total number of cluster node cores there are allocation strategies which provide more efficient calculations.
Efficient Out of Core Sorting Algorithms for the Parallel Disks Model.
Kundeti, Vamsi; Rajasekaran, Sanguthevar
2011-11-01
In this paper we present efficient algorithms for sorting on the Parallel Disks Model (PDM). Numerous asymptotically optimal algorithms have been proposed in the literature. However many of these merge based algorithms have large underlying constants in the time bounds, because they suffer from the lack of read parallelism on PDM. The irregular consumption of the runs during the merge affects the read parallelism and contributes to the increased sorting time. In this paper we first introduce a novel idea called the dirty sequence accumulation that improves the read parallelism. Secondly, we show analytically that this idea can reduce the number of parallel I/O's required to sort the input close to the lower bound of [Formula: see text]. We experimentally verify our dirty sequence idea with the standard R-Way merge and show that our idea can reduce the number of parallel I/Os to sort on PDM significantly.
An efficient implementation of parallel molecular dynamics method on SMP cluster architecture
International Nuclear Information System (INIS)
Suzuki, Masaaki; Okuda, Hiroshi; Yagawa, Genki
2003-01-01
The authors have applied MPI/OpenMP hybrid parallel programming model to parallelize a molecular dynamics (MD) method on a symmetric multiprocessor (SMP) cluster architecture. In that architecture, it can be expected that the hybrid parallel programming model, which uses the message passing library such as MPI for inter-SMP node communication and the loop directive such as OpenMP for intra-SNP node parallelization, is the most effective one. In this study, the parallel performance of the hybrid style has been compared with that of conventional flat parallel programming style, which uses only MPI, both in cases the fast multipole method (FMM) is employed for computing long-distance interactions and that is not employed. The computer environments used here are Hitachi SR8000/MPP placed at the University of Tokyo. The results of calculation are as follows. Without FMM, the parallel efficiency using 16 SMP nodes (128 PEs) is: 90% with the hybrid style, 75% with the flat-MPI style for MD simulation with 33,402 atoms. With FMM, the parallel efficiency using 16 SMP nodes (128 PEs) is: 60% with the hybrid style, 48% with the flat-MPI style for MD simulation with 117,649 atoms. (author)
Madison, Anna; Lleras, Alejandro; Buetti, Simona
2018-02-01
Recent results from our laboratory showed that, in fixed-target parallel search tasks, reaction times increase in a logarithmic fashion with set size, and the slope of this logarithmic function is modulated by lure-target similarity. These results were interpreted as being consistent with a processing architecture where early vision (stage one) processes elements in the display in exhaustive fashion with unlimited capacity and with a limitation in resolution. Here, we evaluate the contribution of crowding to our recent logarithmic search slope findings, considering the possibility that peripheral pooling of features (as observed in crowding) may be responsible for logarithmic efficiency. Factors known to affect the strength of crowding were varied, specifically: item spacing and similarity. The results from three experiments converge on the same pattern of results: reaction times increased logarithmically with set size and were modulated by lure-target similarity even when crowding was minimized within displays through an inter-item spacing manipulation. Furthermore, we found logarithmic search efficiencies were overall improved in displays where crowding was minimized compared to displays where crowding was possible. The findings from these three experiments suggest logarithmic efficiency in efficient search is not the result peripheral pooling of features. That said, the presence of crowding does tend to reduce search efficiency, even in "pop-out" search situations.
Efficient Parallelization of a Dynamic Unstructured Application on the Tera MTA
Oliker, Leonid; Biswas, Rupak
1999-01-01
The success of parallel computing in solving real-life computationally-intensive problems relies on their efficient mapping and execution on large-scale multiprocessor architectures. Many important applications are both unstructured and dynamic in nature, making their efficient parallel implementation a daunting task. This paper presents the parallelization of a dynamic unstructured mesh adaptation algorithm using three popular programming paradigms on three leading supercomputers. We examine an MPI message-passing implementation on the Cray T3E and the SGI Origin2OOO, a shared-memory implementation using cache coherent nonuniform memory access (CC-NUMA) of the Origin2OOO, and a multi-threaded version on the newly-released Tera Multi-threaded Architecture (MTA). We compare several critical factors of this parallel code development, including runtime, scalability, programmability, and memory overhead. Our overall results demonstrate that multi-threaded systems offer tremendous potential for quickly and efficiently solving some of the most challenging real-life problems on parallel computers.
Remote Memory Access: A Case for Portable, Efficient and Library Independent Parallel Programming
Directory of Open Access Journals (Sweden)
Alexandros V. Gerbessiotis
2004-01-01
Full Text Available In this work we make a strong case for remote memory access (RMA as the effective way to program a parallel computer by proposing a framework that supports RMA in a library independent, simple and intuitive way. If one uses our approach the parallel code one writes will run transparently under MPI-2 enabled libraries but also bulk-synchronous parallel libraries. The advantage of using RMA is code simplicity, reduced programming complexity, and increased efficiency. We support the latter claims by implementing under this framework a collection of benchmark programs consisting of a communication and synchronization performance assessment program, a dense matrix multiplication algorithm, and two variants of a parallel radix-sort algorithm and examine their performance on a LINUX-based PC cluster under three different RMA enabled libraries: LAM MPI, BSPlib, and PUB. We conclude that implementations of such parallel algorithms using RMA communication primitives lead to code that is as efficient as the message-passing equivalent code and in the case of radix-sort substantially more efficient. In addition our work can be used as a comparative study of the relevant capabilities of the three libraries.
Solving the dynamic equations of a 3-PRS Parallel Manipulator for efficient model-based designs
Directory of Open Access Journals (Sweden)
M. Díaz-Rodríguez
2016-01-01
Full Text Available Introduction of parallel manipulator systems for different applications areas has influenced many researchers to develop techniques for obtaining accurate and computational efficient inverse dynamic models. Some subject areas make use of these models, such as, optimal design, parameter identification, model based control and even actuation redundancy approaches. In this context, by revisiting some of the current computationally-efficient solutions for obtaining the inverse dynamic model of parallel manipulators, this paper compares three different methods for inverse dynamic modelling of a general, lower mobility, 3-PRS parallel manipulator. The first method obtains the inverse dynamic model by describing the manipulator as three open kinematic chains. Then, vector-loop closure constraints are introduced for obtaining the relationship between the dynamics of the open kinematic chains (such as a serial robot and the closed chains (such as a parallel robot. The second method exploits certain characteristics of parallel manipulators such that the platform and the links are considered as independent subsystems. The proposed third method is similar to the second method but it uses a different Jacobian matrix formulation in order to reduce computational complexity. Analysis of these numerical formulations will provide fundamental software support for efficient model-based designs. In addition, computational cost reduction presented in this paper can also be an effective guideline for optimal design of this type of manipulator and for real-time embedded control.
Gunnels, John
2010-06-01
We provide a first demonstration of the idea that matrix-based algorithms for nonlinear combinatorial optimization problems can be efficiently implemented. Such algorithms were mainly conceived by theoretical computer scientists for proving efficiency. We are able to demonstrate the practicality of our approach by developing an implementation on a massively parallel architecture, and exploiting scalable and efficient parallel implementations of algorithms for ultra high-precision linear algebra. Additionally, we have delineated and implemented the necessary algorithmic and coding changes required in order to address problems several orders of magnitude larger, dealing with the limits of scalability from memory footprint, computational efficiency, reliability, and interconnect perspectives. © Springer and Mathematical Programming Society 2010.
Directory of Open Access Journals (Sweden)
JONG WOON KIM
2014-04-01
In this paper, we introduce a modified scattering kernel approach to avoid the unnecessarily repeated calculations involved with the scattering source calculation, and used it with parallel computing to effectively reduce the computation time. Its computational efficiency was tested for three-dimensional full-coupled photon-electron transport problems using our computer program which solves the multi-group discrete ordinates transport equation by using the discontinuous finite element method with unstructured tetrahedral meshes for complicated geometrical problems. The numerical tests show that we can improve speed up to 17∼42 times for the elapsed time per iteration using the modified scattering kernel, not only in the single CPU calculation but also in the parallel computing with several CPUs.
Directory of Open Access Journals (Sweden)
2009-03-01
Full Text Available We introduce a new approximation scheme combining the viscosity method with parallel method for finding a common element of the set of solutions of a generalized equilibrium problem and the set of fixed points of a family of finitely strict pseudocontractions. We obtain a strong convergence theorem for the sequences generated by these processes in Hilbert spaces. Based on this result, we also get some new and interesting results. The results in this paper extend and improve some well-known results in the literature.
An efficient parallel algorithm for the solution of a tridiagonal linear system of equations
Stone, H. S.
1971-01-01
Tridiagonal linear systems of equations are solved on conventional serial machines in a time proportional to N, where N is the number of equations. The conventional algorithms do not lend themselves directly to parallel computations on computers of the ILLIAC IV class, in the sense that they appear to be inherently serial. An efficient parallel algorithm is presented in which computation time grows as log sub 2 N. The algorithm is based on recursive doubling solutions of linear recurrence relations, and can be used to solve recurrence relations of all orders.
Ma, Sangback
In this paper we compare various parallel preconditioners such as Point-SSOR (Symmetric Successive OverRelaxation), ILU(0) (Incomplete LU) in the Wavefront ordering, ILU(0) in the Multi-color ordering, Multi-Color Block SOR (Successive OverRelaxation), SPAI (SParse Approximate Inverse) and pARMS (Parallel Algebraic Recursive Multilevel Solver) for solving large sparse linear systems arising from two-dimensional PDE (Partial Differential Equation)s on structured grids. Point-SSOR is well-known, and ILU(0) is one of the most popular preconditioner, but it is inherently serial. ILU(0) in the Wavefront ordering maximizes the parallelism in the natural order, but the lengths of the wave-fronts are often nonuniform. ILU(0) in the Multi-color ordering is a simple way of achieving a parallelism of the order N, where N is the order of the matrix, but its convergence rate often deteriorates as compared to that of natural ordering. We have chosen the Multi-Color Block SOR preconditioner combined with direct sparse matrix solver, since for the Laplacian matrix the SOR method is known to have a nondeteriorating rate of convergence when used with the Multi-Color ordering. By using block version we expect to minimize the interprocessor communications. SPAI computes the sparse approximate inverse directly by least squares method. Finally, ARMS is a preconditioner recursively exploiting the concept of independent sets and pARMS is the parallel version of ARMS. Experiments were conducted for the Finite Difference and Finite Element discretizations of five two-dimensional PDEs with large meshsizes up to a million on an IBM p595 machine with distributed memory. Our matrices are real positive, i. e., their real parts of the eigenvalues are positive. We have used GMRES(m) as our outer iterative method, so that the convergence of GMRES(m) for our test matrices are mathematically guaranteed. Interprocessor communications were done using MPI (Message Passing Interface) primitives. The
Efficient Parallel Implementation of Active Appearance Model Fitting Algorithm on GPU
Directory of Open Access Journals (Sweden)
Jinwei Wang
2014-01-01
Full Text Available The active appearance model (AAM is one of the most powerful model-based object detecting and tracking methods which has been widely used in various situations. However, the high-dimensional texture representation causes very time-consuming computations, which makes the AAM difficult to apply to real-time systems. The emergence of modern graphics processing units (GPUs that feature a many-core, fine-grained parallel architecture provides new and promising solutions to overcome the computational challenge. In this paper, we propose an efficient parallel implementation of the AAM fitting algorithm on GPUs. Our design idea is fine grain parallelism in which we distribute the texture data of the AAM, in pixels, to thousands of parallel GPU threads for processing, which makes the algorithm fit better into the GPU architecture. We implement our algorithm using the compute unified device architecture (CUDA on the Nvidia’s GTX 650 GPU, which has the latest Kepler architecture. To compare the performance of our algorithm with different data sizes, we built sixteen face AAM models of different dimensional textures. The experiment results show that our parallel AAM fitting algorithm can achieve real-time performance for videos even on very high-dimensional textures.
Efficient parallel and out of core algorithms for constructing large bi-directed de Bruijn graphs
Directory of Open Access Journals (Sweden)
Vaughn Matthew
2010-11-01
Full Text Available Abstract Background Assembling genomic sequences from a set of overlapping reads is one of the most fundamental problems in computational biology. Algorithms addressing the assembly problem fall into two broad categories - based on the data structures which they employ. The first class uses an overlap/string graph and the second type uses a de Bruijn graph. However with the recent advances in short read sequencing technology, de Bruijn graph based algorithms seem to play a vital role in practice. Efficient algorithms for building these massive de Bruijn graphs are very essential in large sequencing projects based on short reads. In an earlier work, an O(n/p time parallel algorithm has been given for this problem. Here n is the size of the input and p is the number of processors. This algorithm enumerates all possible bi-directed edges which can overlap with a node and ends up generating Θ(nΣ messages (Σ being the size of the alphabet. Results In this paper we present a Θ(n/p time parallel algorithm with a communication complexity that is equal to that of parallel sorting and is not sensitive to Σ. The generality of our algorithm makes it very easy to extend it even to the out-of-core model and in this case it has an optimal I/O complexity of Θ(nlog(n/BBlog(M/B (M being the main memory size and B being the size of the disk block. We demonstrate the scalability of our parallel algorithm on a SGI/Altix computer. A comparison of our algorithm with the previous approaches reveals that our algorithm is faster - both asymptotically and practically. We demonstrate the scalability of our sequential out-of-core algorithm by comparing it with the algorithm used by VELVET to build the bi-directed de Bruijn graph. Our experiments reveal that our algorithm can build the graph with a constant amount of memory, which clearly outperforms VELVET. We also provide efficient algorithms for the bi-directed chain compaction problem. Conclusions The bi
Efficient parallel and out of core algorithms for constructing large bi-directed de Bruijn graphs.
Kundeti, Vamsi K; Rajasekaran, Sanguthevar; Dinh, Hieu; Vaughn, Matthew; Thapar, Vishal
2010-11-15
Assembling genomic sequences from a set of overlapping reads is one of the most fundamental problems in computational biology. Algorithms addressing the assembly problem fall into two broad categories - based on the data structures which they employ. The first class uses an overlap/string graph and the second type uses a de Bruijn graph. However with the recent advances in short read sequencing technology, de Bruijn graph based algorithms seem to play a vital role in practice. Efficient algorithms for building these massive de Bruijn graphs are very essential in large sequencing projects based on short reads. In an earlier work, an O(n/p) time parallel algorithm has been given for this problem. Here n is the size of the input and p is the number of processors. This algorithm enumerates all possible bi-directed edges which can overlap with a node and ends up generating Θ(nΣ) messages (Σ being the size of the alphabet). In this paper we present a Θ(n/p) time parallel algorithm with a communication complexity that is equal to that of parallel sorting and is not sensitive to Σ. The generality of our algorithm makes it very easy to extend it even to the out-of-core model and in this case it has an optimal I/O complexity of Θ(nlog(n/B)Blog(M/B)) (M being the main memory size and B being the size of the disk block). We demonstrate the scalability of our parallel algorithm on a SGI/Altix computer. A comparison of our algorithm with the previous approaches reveals that our algorithm is faster--both asymptotically and practically. We demonstrate the scalability of our sequential out-of-core algorithm by comparing it with the algorithm used by VELVET to build the bi-directed de Bruijn graph. Our experiments reveal that our algorithm can build the graph with a constant amount of memory, which clearly outperforms VELVET. We also provide efficient algorithms for the bi-directed chain compaction problem. The bi-directed de Bruijn graph is a fundamental data structure for
Efficient fractal-based mutation in evolutionary algorithms from iterated function systems
Salcedo-Sanz, S.; Aybar-Ruíz, A.; Camacho-Gómez, C.; Pereira, E.
2018-03-01
In this paper we present a new mutation procedure for Evolutionary Programming (EP) approaches, based on Iterated Function Systems (IFSs). The new mutation procedure proposed consists of considering a set of IFS which are able to generate fractal structures in a two-dimensional phase space, and use them to modify a current individual of the EP algorithm, instead of using random numbers from different probability density functions. We test this new proposal in a set of benchmark functions for continuous optimization problems. In this case, we compare the proposed mutation against classical Evolutionary Programming approaches, with mutations based on Gaussian, Cauchy and chaotic maps. We also include a discussion on the IFS-based mutation in a real application of Tuned Mass Dumper (TMD) location and optimization for vibration cancellation in buildings. In both practical cases, the proposed EP with the IFS-based mutation obtained extremely competitive results compared to alternative classical mutation operators.
An efficient parallel stochastic simulation method for analysis of nonviral gene delivery systems
Kuwahara, Hiroyuki
2011-01-01
Gene therapy has a great potential to become an effective treatment for a wide variety of diseases. One of the main challenges to make gene therapy practical in clinical settings is the development of efficient and safe mechanisms to deliver foreign DNA molecules into the nucleus of target cells. Several computational and experimental studies have shown that the design process of synthetic gene transfer vectors can be greatly enhanced by computational modeling and simulation. This paper proposes a novel, effective parallelization of the stochastic simulation algorithm (SSA) for pharmacokinetic models that characterize the rate-limiting, multi-step processes of intracellular gene delivery. While efficient parallelizations of the SSA are still an open problem in a general setting, the proposed parallel simulation method is able to substantially accelerate the next reaction selection scheme and the reaction update scheme in the SSA by exploiting and decomposing the structures of stochastic gene delivery models. This, thus, makes computationally intensive analysis such as parameter optimizations and gene dosage control for specific cell types, gene vectors, and transgene expression stability substantially more practical than that could otherwise be with the standard SSA. Here, we translated the nonviral gene delivery model based on mass-action kinetics by Varga et al. [Molecular Therapy, 4(5), 2001] into a more realistic model that captures intracellular fluctuations based on stochastic chemical kinetics, and as a case study we applied our parallel simulation to this stochastic model. Our results show that our simulation method is able to increase the efficiency of statistical analysis by at least 50% in various settings. © 2011 ACM.
Liu, Xiaolei; Huang, Meng; Fan, Bin; Buckler, Edward S.; Zhang, Zhiwu
2016-01-01
False positives in a Genome-Wide Association Study (GWAS) can be effectively controlled by a fixed effect and random effect Mixed Linear Model (MLM) that incorporates population structure and kinship among individuals to adjust association tests on markers; however, the adjustment also compromises true positives. The modified MLM method, Multiple Loci Linear Mixed Model (MLMM), incorporates multiple markers simultaneously as covariates in a stepwise MLM to partially remove the confounding between testing markers and kinship. To completely eliminate the confounding, we divided MLMM into two parts: Fixed Effect Model (FEM) and a Random Effect Model (REM) and use them iteratively. FEM contains testing markers, one at a time, and multiple associated markers as covariates to control false positives. To avoid model over-fitting problem in FEM, the associated markers are estimated in REM by using them to define kinship. The P values of testing markers and the associated markers are unified at each iteration. We named the new method as Fixed and random model Circulating Probability Unification (FarmCPU). Both real and simulated data analyses demonstrated that FarmCPU improves statistical power compared to current methods. Additional benefits include an efficient computing time that is linear to both number of individuals and number of markers. Now, a dataset with half million individuals and half million markers can be analyzed within three days. PMID:26828793
Bode, Paul; Ostriker, Jeremiah P.
2003-03-01
An improved implementation of an N-body code for simulating collisionless cosmological dynamics is presented. TPM (tree particle-mesh) combines the PM method on large scales with a tree code to handle particle-particle interactions at small separations. After the global PM forces are calculated, spatially distinct regions above a given density contrast are located; the tree code calculates the gravitational interactions inside these denser objects at higher spatial and temporal resolution. The new implementation includes individual particle time steps within trees, an improved treatment of tidal forces on trees, new criteria for higher force resolution and choice of time step, and parallel treatment of large trees. TPM is compared to P3M and a tree code (GADGET) and is found to give equivalent results in significantly less time. The implementation is highly portable (requiring a FORTRAN compiler and MPI) and efficient on parallel machines. The source code can be found on the World Wide Web.
Tolba, Khaled Ibrahim; Morgenthal, Guido
2018-01-01
This paper presents an analysis of the scalability and efficiency of a simulation framework based on the vortex particle method. The code is applied for the numerical aerodynamic analysis of line-like structures. The numerical code runs on multicore CPU and GPU architectures using OpenCL framework. The focus of this paper is the analysis of the parallel efficiency and scalability of the method being applied to an engineering test case, specifically the aeroelastic response of a long-span bridge girder at the construction stage. The target is to assess the optimal configuration and the required computer architecture, such that it becomes feasible to efficiently utilise the method within the computational resources available for a regular engineering office. The simulations and the scalability analysis are performed on a regular gaming type computer.
International Nuclear Information System (INIS)
Troyon, F.
1997-01-01
Recurrent attacks against ITER, the new generation of tokamak are a mix of political and scientific arguments. This short article draws a historical review of the European fusion program. This program has allowed to build and manage several installations in the aim of getting experimental results necessary to lead the program forwards. ITER will bring together a fusion reactor core with technologies such as materials, superconductive coils, heating devices and instrumentation in order to validate and delimit the operating range. ITER will be a logical and decisive step towards the use of controlled fusion. (A.C.)
Barreiros, Willian; Teodoro, George; Kurc, Tahsin; Kong, Jun; Melo, Alba C M A; Saltz, Joel
2017-09-01
We investigate efficient sensitivity analysis (SA) of algorithms that segment and classify image features in a large dataset of high-resolution images. Algorithm SA is the process of evaluating variations of methods and parameter values to quantify differences in the output. A SA can be very compute demanding because it requires re-processing the input dataset several times with different parameters to assess variations in output. In this work, we introduce strategies to efficiently speed up SA via runtime optimizations targeting distributed hybrid systems and reuse of computations from runs with different parameters. We evaluate our approach using a cancer image analysis workflow on a hybrid cluster with 256 nodes, each with an Intel Phi and a dual socket CPU. The SA attained a parallel efficiency of over 90% on 256 nodes. The cooperative execution using the CPUs and the Phi available in each node with smart task assignment strategies resulted in an additional speedup of about 2×. Finally, multi-level computation reuse lead to an additional speedup of up to 2.46× on the parallel version. The level of performance attained with the proposed optimizations will allow the use of SA in large-scale studies.
An efficient numerical scheme for the simulation of parallel-plate active magnetic regenerators
DEFF Research Database (Denmark)
Torregrosa-Jaime, Bárbara; Corberán, José M.; Payá, Jorge
2015-01-01
A one-dimensional model of a parallel-plate active magnetic regenerator (AMR) is presented in this work. The model is based on an efficient numerical scheme which has been developed after analysing the heat transfer mechanisms in the regenerator bed. The new finite difference scheme optimally...... combines explicit and implicit techniques in order to solve the one-dimensional conjugate heat transfer problem in an accurate and fast manner while ensuring energy conservation. The present model has been thoroughly validated against passive regenerator cases with an analytical solution. Compared...
Cortés, Guillermo; García, Luz; Álvarez, Isaac; Benítez, Carmen; de la Torre, Ángel; Ibáñez, Jesús
2014-02-01
Automatic recognition of volcano-seismic events is becoming one of the most demanded features in the early warning area at continuous monitoring facilities. While human-driven cataloguing is time-consuming and often an unreliable task, an appropriate machine framework allows expert technicians to focus only on result analysis and decision-making. This work presents an alternative to serial architectures used in classic recognition systems introducing a parallel implementation of the whole process: configuration, feature extraction, feature selection and classification stages are independently carried out for each type of events in order to exploit the intrinsic properties of each signal class. The system uses Gaussian Mixture Models (GMMs) to classify the database recorded at Deception Volcano Island (Antarctica) obtaining a baseline recognition rate of 84% with a cepstral-based waveform parameterization in the serial architecture. The parallel approach increases the results to close to 92% using mixture-based parameterization vectors or up to 91% when the vector size is reduced by 19% via the Discriminative Feature Selection (DFS) algorithm. Besides the result improvement, the parallel architecture represents a major step in terms of flexibility and reliability thanks to the class-focused analysis, providing an efficient tool for monitoring observatories which require real-time solutions.
Development and application of efficient strategies for parallel magnetic resonance imaging
Energy Technology Data Exchange (ETDEWEB)
Breuer, F.
2006-07-01
artifacts. Unfortunately, parallel imaging is associated with a loss in signal-to-noise ratio (SNR) and therefore is limited to applications which do not already operate at the SNR limit. An additional limitation is the fact that the coil array must provide sufficient sensitivity variations throughout the object under investigation in order to offer enough spatial encoding capacity. This doctoral thesis exhibits an overview of my research on the topic of efficient parallel imaging strategies. Based on existing parallel acquisition and reconstruction strategies, such as SENSE and GRAPPA, new concepts have been developed and transferred to potential clinical applications. (orig.)
Development and application of efficient strategies for parallel magnetic resonance imaging
International Nuclear Information System (INIS)
Breuer, F.
2006-01-01
. Unfortunately, parallel imaging is associated with a loss in signal-to-noise ratio (SNR) and therefore is limited to applications which do not already operate at the SNR limit. An additional limitation is the fact that the coil array must provide sufficient sensitivity variations throughout the object under investigation in order to offer enough spatial encoding capacity. This doctoral thesis exhibits an overview of my research on the topic of efficient parallel imaging strategies. Based on existing parallel acquisition and reconstruction strategies, such as SENSE and GRAPPA, new concepts have been developed and transferred to potential clinical applications. (orig.)
ITER council proceedings: 1997
International Nuclear Information System (INIS)
1997-01-01
This volume of the ITER EDA Documentation Series presents records of the 12th ITER Council Meeting, IC-12, which took place on 23-24 July, 1997 in Tampere, Finland. The Council received from the Parties (EU, Japan, Russia, US) positive responses on the Detailed Design Report. The Parties stated their willingness to contribute to fulfil their obligations in contributing to the ITER EDA. The summary discussions among the Parties led to the consensus that in July 1998 the ITER activities should proceed for additional three years with a general intent to enable an efficient start of possible, future ITER construction
Efficient sequential and parallel algorithms for finding edit distance based motifs.
Pal, Soumitra; Xiao, Peng; Rajasekaran, Sanguthevar
2016-08-18
Motif search is an important step in extracting meaningful patterns from biological data. The general problem of motif search is intractable and there is a pressing need to develop efficient, exact and approximation algorithms to solve this problem. In this paper, we present several novel, exact, sequential and parallel algorithms for solving the (l,d) Edit-distance-based Motif Search (EMS) problem: given two integers l,d and n biological strings, find all strings of length l that appear in each input string with atmost d errors of types substitution, insertion and deletion. One popular technique to solve the problem is to explore for each input string the set of all possible l-mers that belong to the d-neighborhood of any substring of the input string and output those which are common for all input strings. We introduce a novel and provably efficient neighborhood exploration technique. We show that it is enough to consider the candidates in neighborhood which are at a distance exactly d. We compactly represent these candidate motifs using wildcard characters and efficiently explore them with very few repetitions. Our sequential algorithm uses a trie based data structure to efficiently store and sort the candidate motifs. Our parallel algorithm in a multi-core shared memory setting uses arrays for storing and a novel modification of radix-sort for sorting the candidate motifs. The algorithms for EMS are customarily evaluated on several challenging instances such as (8,1), (12,2), (16,3), (20,4), and so on. The best previously known algorithm, EMS1, is sequential and in estimated 3 days solves up to instance (16,3). Our sequential algorithms are more than 20 times faster on (16,3). On other hard instances such as (9,2), (11,3), (13,4), our algorithms are much faster. Our parallel algorithm has more than 600 % scaling performance while using 16 threads. Our algorithms have pushed up the state-of-the-art of EMS solvers and we believe that the techniques introduced in
High efficiency radiofrequency power amplifier module for parallel transmit arrays at 3 Tesla.
Twieg, Michael; Griswold, Mark A
2017-10-01
The purpose of this study is to develop an in-bore radiofrequency (RF) power amplifier (RFPA) module with high power efficiency and density for use in parallel transmit (pTX) arrays at 3 Tesla. The modules use a combination of current mode class D, class S, and class E amplifiers based on enhancement-mode gallium nitride-on-silicon field-effect transistors. Together the amplifiers implement envelope elimination and restoration to achieve amplitude modulation with high efficiency over a wide operating range. The static nonlinearity and power efficiency of the module were measured using pulsed RF measurements over a 37 dB dynamic range. Thermal performance was also measured with and without forced convection cooling. The modules produces peak RF power up to 130 W with an overall efficiency of 85%. When producing 100 W RF pulses at a duty cycle of 10%, maximum junction temperatures did not exceed 80 °C, even without the use of heatsinks or forced convection. The small size and low cost of the modules promise lower cost implementation of pTX systems compared with linear RFPAs located remotely. Further work must be done on control of the RF output in the presence of nonlinearities and coupling. Magn Reson Med 78:1589-1598, 2017. © 2016 International Society for Magnetic Resonance in Medicine. © 2016 International Society for Magnetic Resonance in Medicine.
Directory of Open Access Journals (Sweden)
Ricardo Soto
2016-01-01
Full Text Available The Machine-Part Cell Formation Problem (MPCFP is a NP-Hard optimization problem that consists in grouping machines and parts in a set of cells, so that each cell can operate independently and the intercell movements are minimized. This problem has largely been tackled in the literature by using different techniques ranging from classic methods such as linear programming to more modern nature-inspired metaheuristics. In this paper, we present an efficient parallel version of the Migrating Birds Optimization metaheuristic for solving the MPCFP. Migrating Birds Optimization is a population metaheuristic based on the V-Flight formation of the migrating birds, which is proven to be an effective formation in energy saving. This approach is enhanced by the smart incorporation of parallel procedures that notably improve performance of the several sorting processes performed by the metaheuristic. We perform computational experiments on 1080 benchmarks resulting from the combination of 90 well-known MPCFP instances with 12 sorting configurations with and without threads. We illustrate promising results where the proposal is able to reach the global optimum in all instances, while the solving time with respect to a nonparallel approach is notably reduced.
Parallel and series FED microstrip array with high efficiency and low cross polarization
Huang, John (Inventor)
1995-01-01
A microstrip array antenna for vertically polarized fan beam (approximately 2 deg x 50 deg) for C-band SAR applications with a physical area of 1.7 m by 0.17 m comprises two rows of patch elements and employs a parallel feed to left- and right-half sections of the rows. Each section is divided into two segments that are fed in parallel with the elements in each segment fed in series through matched transmission lines for high efficiency. The inboard section has half the number of patch elements of the outboard section, and the outboard sections, which have tapered distribution with identical transmission line sections, terminated with half wavelength long open-circuit stubs so that the remaining energy is reflected and radiated in phase. The elements of the two inboard segments of the two left- and right-half sections are provided with tapered transmission lines from element to element for uniform power distribution over the central third of the entire array antenna. The two rows of array elements are excited at opposite patch feed locations with opposite (180 deg difference) phases for reduced cross-polarization.
An Efficient MapReduce-Based Parallel Clustering Algorithm for Distributed Traffic Subarea Division
Directory of Open Access Journals (Sweden)
Dawen Xia
2015-01-01
Full Text Available Traffic subarea division is vital for traffic system management and traffic network analysis in intelligent transportation systems (ITSs. Since existing methods may not be suitable for big traffic data processing, this paper presents a MapReduce-based Parallel Three-Phase K-Means (Par3PKM algorithm for solving traffic subarea division problem on a widely adopted Hadoop distributed computing platform. Specifically, we first modify the distance metric and initialization strategy of K-Means and then employ a MapReduce paradigm to redesign the optimized K-Means algorithm for parallel clustering of large-scale taxi trajectories. Moreover, we propose a boundary identifying method to connect the borders of clustering results for each cluster. Finally, we divide traffic subarea of Beijing based on real-world trajectory data sets generated by 12,000 taxis in a period of one month using the proposed approach. Experimental evaluation results indicate that when compared with K-Means, Par2PK-Means, and ParCLARA, Par3PKM achieves higher efficiency, more accuracy, and better scalability and can effectively divide traffic subarea with big taxi trajectory data.
ERA: Efficient serial and parallel suffix tree construction for very long strings
Mansour, Essam
2011-09-01
The suffix tree is a data structure for indexing strings. It is used in a variety of applications such as bioinformatics, time series analysis, clustering, text editing and data compression. However, when the string and the resulting suffix tree are too large to fit into the main memory, most existing construction algorithms become very inefficient. This paper presents a disk-based suffix tree construction method, called Elastic Range (ERa), which works efficiently with very long strings that are much larger than the available memory. ERa partitions the tree construction process horizontally and vertically and minimizes I/Os by dynamically adjusting the horizontal partitions independently for each vertical partition, based on the evolving shape of the tree and the available memory. Where appropriate, ERa also groups vertical partitions together to amortize the I/O cost. We developed a serial version; a parallel version for shared-memory and shared-disk multi-core systems; and a parallel version for shared-nothing architectures. ERa indexes the entire human genome in 19 minutes on an ordinary desktop computer. For comparison, the fastest existing method needs 15 minutes using 1024 CPUs on an IBM BlueGene supercomputer.
International Nuclear Information System (INIS)
Cao, Dingzhou; Murat, Alper; Chinnam, Ratna Babu
2013-01-01
This paper proposes a decomposition-based approach to exactly solve the multi-objective Redundancy Allocation Problem for series-parallel systems. Redundancy allocation problem is a form of reliability optimization and has been the subject of many prior studies. The majority of these earlier studies treat redundancy allocation problem as a single objective problem maximizing the system reliability or minimizing the cost given certain constraints. The few studies that treated redundancy allocation problem as a multi-objective optimization problem relied on meta-heuristic solution approaches. However, meta-heuristic approaches have significant limitations: they do not guarantee that Pareto points are optimal and, more importantly, they may not identify all the Pareto-optimal points. In this paper, we treat redundancy allocation problem as a multi-objective problem, as is typical in practice. We decompose the original problem into several multi-objective sub-problems, efficiently and exactly solve sub-problems, and then systematically combine the solutions. The decomposition-based approach can efficiently generate all the Pareto-optimal solutions for redundancy allocation problems. Experimental results demonstrate the effectiveness and efficiency of the proposed method over meta-heuristic methods on a numerical example taken from the literature.
An efficient parallel algorithm for the calculation of unrestricted canonical MP2 energies.
Baker, Jon; Wolinski, Krzysztof
2011-11-30
We present details of our efficient implementation of full accuracy unrestricted open-shell second-order canonical Møller-Plesset (MP2) energies, both serial and parallel. The algorithm is based on our previous restricted closed-shell MP2 code using the Saebo-Almlöf direct integral transformation. Depending on system details, UMP2 energies take from less than 1.5 to about 3.0 times as long as a closed-shell RMP2 energy on a similar system using the same algorithm. Several examples are given including timings for some large stable radicals with 90+ atoms and over 3600 basis functions. Copyright © 2011 Wiley Periodicals, Inc.
Efficient job handling in the GRID short deadline, interactivity, fault tolerance and parallelism
Moscicki, Jakub
2006-01-01
The major GRID infastructures are designed mainly for batch-oriented computing with coarse-grained jobs and relatively high job turnaround time. However many practical applications in natural and physical sciences may be easily parallelized and run as a set of smaller tasks which require little or no synchronization and which may be scheduled in a more efficient way. The Distributed Analysis Environment Framework (DIANE), is a Master-Worker execution skeleton for applications, which complements the GRID middleware stack. Automatic failure recovery and task dispatching policies enable an easy customization of the behaviour of the framework in a dynamic and non-reliable computing environment. We demonstrate the experience of using the framework with several diverse real-life applications, including Monte Carlo Simulation, Physics Data Analysis and Biotechnology. The interfacing of existing sequential applications from the point of view of non-expert user is made easy, also for legacy applications. We analyze th...
Efficient Serial and Parallel Algorithms for Selection of Unique Oligos in EST Databases.
Mata-Montero, Manrique; Shalaby, Nabil; Sheppard, Bradley
2013-01-01
Obtaining unique oligos from an EST database is a problem of great importance in bioinformatics, particularly in the discovery of new genes and the mapping of the human genome. Many algorithms have been developed to find unique oligos, many of which are much less time consuming than the traditional brute force approach. An algorithm was presented by Zheng et al. (2004) which finds the solution of the unique oligos search problem efficiently. We implement this algorithm as well as several new algorithms based on some theorems included in this paper. We demonstrate how, with these new algorithms, we can obtain unique oligos much faster than with previous ones. We parallelize these new algorithms to further improve the time of finding unique oligos. All algorithms are run on ESTs obtained from a Barley EST database.
Behrens, Jörg; Hanke, Moritz; Jahns, Thomas
2014-05-01
In this talk we present a way to facilitate efficient use of MPI communication for developers of climate models. Exploitation of the performance potential of today's highly parallel supercomputers with real world simulations is a complex task. This is partly caused by the low level nature of the MPI communication library which is the dominant communication tool at least for inter-node communication. In order to manage the complexity of the task, climate simulations with non-trivial communication patterns often use an internal abstraction layer above MPI without exploiting the benefits of communication aggregation or MPI-datatypes. The solution for the complexity and performance problem we propose is the communication library YAXT. This library is built on top of MPI and takes high level descriptions of arbitrary domain decompositions and automatically derives an efficient collective data exchange. Several exchanges can be aggregated in order to reduce latency costs. Examples are given which demonstrate the simplicity and the performance gains for selected climate applications.
Active Vibration Suppression of a 3-DOF Flexible Parallel Manipulator Using Efficient Modal Control
Directory of Open Access Journals (Sweden)
Quan Zhang
2014-01-01
Full Text Available This paper addresses the dynamic modeling and efficient modal control of a planar parallel manipulator (PPM with three flexible linkages actuated by linear ultrasonic motors (LUSM. To achieve active vibration control, multiple lead zirconate titanate (PZT transducers are mounted on the flexible links as vibration sensors and actuators. Based on Lagrange’s equations, the dynamic model of the flexible links is derived with the dynamics of PZT actuators incorporated. Using the assumed mode method (AMM, the elastic motion of the flexible links are discretized under the assumptions of pinned-free boundary conditions, and the assumed mode shapes are validated through experimental modal test. Efficient modal control (EMC, in which the feedback forces in different modes are determined according to the vibration amplitude or energy of their own, is employed to control the PZT actuators to realize active vibration suppression. Modal filters are developed to extract the modal displacements and velocities from the vibration sensors. Numerical simulation and vibration control experiments are conducted to verify the proposed dynamic model and controller. The results show that the EMC method has the capability of suppressing multimode vibration simultaneously, and both the structural and residual vibrations of the flexible links are effectively suppressed using EMC approach.
Cusack, Rhodri; Vicente-Grabovetsky, Alejandro; Mitchell, Daniel J; Wild, Conor J; Auer, Tibor; Linke, Annika C; Peelle, Jonathan E
2014-01-01
Recent years have seen neuroimaging data sets becoming richer, with larger cohorts of participants, a greater variety of acquisition techniques, and increasingly complex analyses. These advances have made data analysis pipelines complicated to set up and run (increasing the risk of human error) and time consuming to execute (restricting what analyses are attempted). Here we present an open-source framework, automatic analysis (aa), to address these concerns. Human efficiency is increased by making code modular and reusable, and managing its execution with a processing engine that tracks what has been completed and what needs to be (re)done. Analysis is accelerated by optional parallel processing of independent tasks on cluster or cloud computing resources. A pipeline comprises a series of modules that each perform a specific task. The processing engine keeps track of the data, calculating a map of upstream and downstream dependencies for each module. Existing modules are available for many analysis tasks, such as SPM-based fMRI preprocessing, individual and group level statistics, voxel-based morphometry, tractography, and multi-voxel pattern analyses (MVPA). However, aa also allows for full customization, and encourages efficient management of code: new modules may be written with only a small code overhead. aa has been used by more than 50 researchers in hundreds of neuroimaging studies comprising thousands of subjects. It has been found to be robust, fast, and efficient, for simple-single subject studies up to multimodal pipelines on hundreds of subjects. It is attractive to both novice and experienced users. aa can reduce the amount of time neuroimaging laboratories spend performing analyses and reduce errors, expanding the range of scientific questions it is practical to address.
Directory of Open Access Journals (Sweden)
Rhodri eCusack
2015-01-01
Full Text Available Recent years have seen neuroimaging data becoming richer, with larger cohorts of participants, a greater variety of acquisition techniques, and increasingly complex analyses. These advances have made data analysis pipelines complex to set up and run (increasing the risk of human error and time consuming to execute (restricting what analyses are attempted. Here we present an open-source framework, automatic analysis (aa, to address these concerns. Human efficiency is increased by making code modular and reusable, and managing its execution with a processing engine that tracks what has been completed and what needs to be (redone. Analysis is accelerated by optional parallel processing of independent tasks on cluster or cloud computing resources. A pipeline comprises a series of modules that each perform a specific task. The processing engine keeps track of the data, calculating a map of upstream and downstream dependencies for each module. Existing modules are available for many analysis tasks, such as SPM-based fMRI preprocessing, individual and group level statistics, voxel-based morphometry, tractography, and multi-voxel pattern analyses (MVPA. However, aa also allows for full customization, and encourages efficient management of code: new modules may be written with only a small code overhead. aa has been used by more than 50 researchers in hundreds of neuroimaging studies comprising thousands of subjects. It has been found to be robust, fast and efficient, for simple single subject studies up to multimodal pipelines on hundreds of subjects. It is attractive to both novice and experienced users. aa can reduce the amount of time neuroimaging laboratories spend performing analyses and reduce errors, expanding the range of scientific questions it is practical to address.
Directory of Open Access Journals (Sweden)
Lei Zhao
2014-01-01
Full Text Available An efficient algorithm is proposed to analyze the electromagnetic scattering problem from a high resolution head model with pixel data format. The algorithm is based on parallel technique and the conjugate gradient (CG method combined with the fast Fourier transform (FFT. Using the parallel CG-FFT method, the proposed algorithm is very efficient and can solve very electrically large-scale problems which cannot be solved using the conventional CG-FFT method in a personal computer. The accuracy of the proposed algorithm is verified by comparing numerical results with analytical Mie-series solutions for dielectric spheres. Numerical experiments have demonstrated that the proposed method has good performance on parallel efficiency.
Energy Technology Data Exchange (ETDEWEB)
Hermenegildo, M.V.
1986-01-01
The term Logic Programming refers to a variety of computer languages and execution models based on the traditional concept of Symbolic Logic. The expressive power of these languages offers promise to be of great assistance in facing the programming challenges of present and future symbolic processing applications in artificial intelligence, knowledge-based systems, and many other areas of computing. This dissertation presents an efficient parallel execution model for logic programs. The model is described from the source language level down to an Abstract Machine level, suitable for direct implementation on existing parallel systems or for the design of special purpose parallel architectures. Few assumptions are made at the source language level and, therefore, the techniques developed and the general Abstract Machine design are applicable to a variety of logic (and also functional) languages. These techniques offer efficient solutions to several areas of parallel Logic Programming implementation previously considered problematic or a source of considerable overhead, such as the detection and handling of variable binding conflicts in AND-parallelism, the specification of control and management of the execution tree, the treatment of distributed backtracking, and goal scheduling and memory management issues, etc. A parallel Abstract Machine design is offered, specifying data areas, operation, and a suitable instruction set.
Greenberg, Albert G.; Lubachevsky, Boris D.; Nicol, David M.; Wright, Paul E.
1994-01-01
Fast, efficient parallel algorithms are presented for discrete event simulations of dynamic channel assignment schemes for wireless cellular communication networks. The driving events are call arrivals and departures, in continuous time, to cells geographically distributed across the service area. A dynamic channel assignment scheme decides which call arrivals to accept, and which channels to allocate to the accepted calls, attempting to minimize call blocking while ensuring co-channel interference is tolerably low. Specifically, the scheme ensures that the same channel is used concurrently at different cells only if the pairwise distances between those cells are sufficiently large. Much of the complexity of the system comes from ensuring this separation. The network is modeled as a system of interacting continuous time automata, each corresponding to a cell. To simulate the model, conservative methods are used; i.e., methods in which no errors occur in the course of the simulation and so no rollback or relaxation is needed. Implemented on a 16K processor MasPar MP-1, an elegant and simple technique provides speedups of about 15 times over an optimized serial simulation running on a high speed workstation. A drawback of this technique, typical of conservative methods, is that processor utilization is rather low. To overcome this, new methods were developed that exploit slackness in event dependencies over short intervals of time, thereby raising the utilization to above 50 percent and the speedup over the optimized serial code to about 120 times.
Ask, Kristine Skoglund; Bardakci, Turgay; Parmer, Marthe Petrine; Halvorsen, Trine Grønhaug; Øiestad, Elisabeth Leere; Pedersen-Bjergaard, Stig; Gjelstad, Astrid
2016-09-10
Generic Parallel Artificial Liquid Membrane Extraction (PALME) methods for non-polar basic and non-polar acidic drugs from human plasma were investigated with respect to phospholipid removal. In both cases, extractions in 96-well format were performed from plasma (125μL), through 4μL organic solvent used as supported liquid membranes (SLMs), and into 50μL aqueous acceptor solutions. The acceptor solutions were subsequently analysed by liquid chromatography-tandem mass spectrometry (LC-MS/MS) using in-source fragmentation and monitoring the m/z 184→184 transition for investigation of phosphatidylcholines (PC), sphingomyelins (SM), and lysophosphatidylcholines (Lyso-PC). In both generic methods, no phospholipids were detected in the acceptor solutions. Thus, PALME appeared to be highly efficient for phospholipid removal. To further support this, qualitative (post-column infusion) and quantitative matrix effects were investigated with fluoxetine, fluvoxamine, and quetiapine as model analytes. No signs of matrix effects were observed. Finally, PALME was evaluated for the aforementioned drug substances, and data were in accordance with European Medicines Agency (EMA) guidelines. Copyright © 2016 Elsevier B.V. All rights reserved.
Liu, Kuojuey Ray
1990-01-01
Least-squares (LS) estimations and spectral decomposition algorithms constitute the heart of modern signal processing and communication problems. Implementations of recursive LS and spectral decomposition algorithms onto parallel processing architectures such as systolic arrays with efficient fault-tolerant schemes are the major concerns of this dissertation. There are four major results in this dissertation. First, we propose the systolic block Householder transformation with application to the recursive least-squares minimization. It is successfully implemented on a systolic array with a two-level pipelined implementation at the vector level as well as at the word level. Second, a real-time algorithm-based concurrent error detection scheme based on the residual method is proposed for the QRD RLS systolic array. The fault diagnosis, order degraded reconfiguration, and performance analysis are also considered. Third, the dynamic range, stability, error detection capability under finite-precision implementation, order degraded performance, and residual estimation under faulty situations for the QRD RLS systolic array are studied in details. Finally, we propose the use of multi-phase systolic algorithms for spectral decomposition based on the QR algorithm. Two systolic architectures, one based on triangular array and another based on rectangular array, are presented for the multiphase operations with fault-tolerant considerations. Eigenvectors and singular vectors can be easily obtained by using the multi-pase operations. Performance issues are also considered.
Directory of Open Access Journals (Sweden)
M. Huang
2015-09-01
Full Text Available The planetary boundary layer (PBL is the lowest part of the atmosphere and where its character is directly affected by its contact with the underlying planetary surface. The PBL is responsible for vertical sub-grid-scale fluxes due to eddy transport in the whole atmospheric column. It determines the flux profiles within the well-mixed boundary layer and the more stable layer above. It thus provides an evolutionary model of atmospheric temperature, moisture (including clouds, and horizontal momentum in the entire atmospheric column. For such purposes, several PBL models have been proposed and employed in the weather research and forecasting (WRF model of which the Yonsei University (YSU scheme is one. To expedite weather research and prediction, we have put tremendous effort into developing an accelerated implementation of the entire WRF model using graphics processing unit (GPU massive parallel computing architecture whilst maintaining its accuracy as compared to its central processing unit (CPU-based implementation. This paper presents our efficient GPU-based design on a WRF YSU PBL scheme. Using one NVIDIA Tesla K40 GPU, the GPU-based YSU PBL scheme achieves a speedup of 193× with respect to its CPU counterpart running on one CPU core, whereas the speedup for one CPU socket (4 cores with respect to 1 CPU core is only 3.5×. We can even boost the speedup to 360× with respect to 1 CPU core as two K40 GPUs are applied.
Efficient graph-based dynamic load-balancing for parallel large-scale agent-based traffic simulation
Xu, Y.; Cai, W.; Aydt, H.; Lees, M.; Tolk, A.; Diallo, S.Y.; Ryzhov, I.O.; Yilmaz, L.; Buckley, S.; Miller, J.A.
2014-01-01
One of the issues of parallelizing large-scale agent-based traffic simulations is partitioning and load-balancing. Traffic simulations are dynamic applications where the distribution of workload in the spatial domain constantly changes. Dynamic load-balancing at run-time has shown better efficiency
A simple and efficient parallel FFT algorithm using the BSP model
Bisseling, R.H.; Inda, M.A.
2000-01-01
In this paper we present a new parallel radix FFT algorithm based on the BSP model Our parallel algorithm uses the groupcyclic distribution family which makes it simple to understand and easy to implement We show how to reduce the com munication cost of the algorithm by a factor of three in the case
International Nuclear Information System (INIS)
Taraglio, S.; Massaioli, F.
1995-08-01
A parallel implementation of a library to build and train Multi Layer Perceptrons via the Back Propagation algorithm is presented. The target machine is the SIMD massively parallel supercomputer Quadrics. Performance measures are provided on three different machines with different number of processors, for two network examples. A sample source code is given
International Nuclear Information System (INIS)
Azmy, Y.Y.; Kirk, B.L.
1990-01-01
Modern parallel computer architectures offer an enormous potential for reducing CPU and wall-clock execution times of large-scale computations commonly performed in various applications in science and engineering. Recently, several authors have reported their efforts in developing and implementing parallel algorithms for solving the neutron diffusion equation on a variety of shared- and distributed-memory parallel computers. Testing of these algorithms for a variety of two- and three-dimensional meshes showed significant speedup of the computation. Even for very large problems (i.e., three-dimensional fine meshes) executed concurrently on a few nodes in serial (nonvector) mode, however, the measured computational efficiency is very low (40 to 86%). In this paper, the authors present a highly efficient (∼85 to 99.9%) algorithm for solving the two-dimensional nodal diffusion equations on the Sequent Balance 8000 parallel computer. Also presented is a model for the performance, represented by the efficiency, as a function of problem size and the number of participating processors. The model is validated through several tests and then extrapolated to larger problems and more processors to predict the performance of the algorithm in more computationally demanding situations
Parallel Implicit Algorithms for CFD
Keyes, David E.
1998-01-01
The main goal of this project was efficient distributed parallel and workstation cluster implementations of Newton-Krylov-Schwarz (NKS) solvers for implicit Computational Fluid Dynamics (CFD.) "Newton" refers to a quadratically convergent nonlinear iteration using gradient information based on the true residual, "Krylov" to an inner linear iteration that accesses the Jacobian matrix only through highly parallelizable sparse matrix-vector products, and "Schwarz" to a domain decomposition form of preconditioning the inner Krylov iterations with primarily neighbor-only exchange of data between the processors. Prior experience has established that Newton-Krylov methods are competitive solvers in the CFD context and that Krylov-Schwarz methods port well to distributed memory computers. The combination of the techniques into Newton-Krylov-Schwarz was implemented on 2D and 3D unstructured Euler codes on the parallel testbeds that used to be at LaRC and on several other parallel computers operated by other agencies or made available by the vendors. Early implementations were made directly in Massively Parallel Integration (MPI) with parallel solvers we adapted from legacy NASA codes and enhanced for full NKS functionality. Later implementations were made in the framework of the PETSC library from Argonne National Laboratory, which now includes pseudo-transient continuation Newton-Krylov-Schwarz solver capability (as a result of demands we made upon PETSC during our early porting experiences). A secondary project pursued with funding from this contract was parallel implicit solvers in acoustics, specifically in the Helmholtz formulation. A 2D acoustic inverse problem has been solved in parallel within the PETSC framework.
Directory of Open Access Journals (Sweden)
Haritonova Larisa
2017-01-01
Full Text Available The article gives the analytical generalization of the data on the energy efficiency for heat exchangers with the flat heat exchange surface to which systems of impact plane parallel jets are sent. Functional relations of specific power consumption (per unit of area, which were obtained for the first time using the techniques of the similarity law, for moving a heat carrier are shown with regard to design and operation factors. The regression equations representing a mathematical model of the process enable to carry out an analysis of various factors impact on the parameter to be determined. The obtained results can be used to optimize or to create the calculation techniques for new highly-efficient heat exchange devices with jet plane -parallel impingement systems and also to reduce power consumption for moving a heat carrier.
International Nuclear Information System (INIS)
Wang Jin-Ze; Huang Qing-Li; Xu Xin; Quan Bao-Gang; Luo Jian-Heng; Li Dong-Mei; Meng Qing-Bo; Yang Guo-Zhen; Zhang Yan; Ye Jia-Sheng
2015-01-01
Based on the facts that multijunction solar cells can increase the efficiency and concentration can reduce the cost dramatically, a special design of parallel multijunction solar cells was presented. The design employed a diffractive optical element (DOE) to split and concentrate the sunlight. A rainbow region and a zero-order diffraction region were generated on the output plane where solar cells with corresponding band gaps were placed. An analytical expression of the light intensity distribution on the output plane of the special DOE was deduced, and the limiting photovoltaic efficiency of such parallel multijunction solar cells was obtained based on Shockley–Queisser’s theory. An efficiency exceeding the Shockley–Queisser limit (33%) can be expected using multijunction solar cells consisting of separately fabricated subcells. The results provide an important alternative approach to realize high photovoltaic efficiency without the need for expensive epitaxial technology widely used in tandem solar cells, thus stimulating the research and application of high efficiency and low cost solar cells. (paper)
Efficient Heuristics for the Simulation of Buffer Overflow in Series and Parallel Queueing Networks
Nicola, V.F.; Zaburnenko, T.S.
2006-01-01
In this paper we propose state-dependent importance sampling heuristics to estimate the probability of population overï¬‚ow in Markovian networks of series and parallel queues. These heuristics capture state-dependence along the boundaries (when one or more queues are empty) which is critical for
Angular parallelization of a curvilinear Sn transport theory method
International Nuclear Information System (INIS)
Haghighat, A.
1991-01-01
In this paper a parallel algorithm for angular domain decomposition (or parallelization) of an r-dependent spherical S n transport theory method is derived. The parallel formulation is incorporated into TWOTRAN-II using the IBM Parallel Fortran compiler and implemented on an IBM 3090/400 (with four processors). The behavior of the parallel algorithm for different physical problems is studied, and it is concluded that the parallel algorithm behaves differently in the presence of a fission source as opposed to the absence of a fission source; this is attributed to the relative contributions of the source and the angular redistribution terms in the S s algorithm. Further, the parallel performance of the algorithm is measured for various problem sizes and different combinations of angular subdomains or processors. Poor parallel efficiencies between ∼35 and 50% are achieved in situations where the relative difference of parallel to serial iterations is ∼50%. High parallel efficiencies between ∼60% and 90% are obtained in situations where the relative difference of parallel to serial iterations is <35%
International Nuclear Information System (INIS)
Raeder, J.; Piet, S.; Buende, R.
1991-01-01
As part of the series of publications by the IAEA that summarize the results of the Conceptual Design Activities for the ITER project, this document describes the ITER safety analyses. It contains an assessment of normal operation effluents, accident scenarios, plasma chamber safety, tritium system safety, magnet system safety, external loss of coolant and coolant flow problems, and a waste management assessment, while it describes the implementation of the safety approach for ITER. The document ends with a list of major conclusions, a set of topical remarks on technical safety issues, and recommendations for the Engineering Design Activities, safety considerations for siting ITER, and recommendations with regard to the safety issues for the R and D for ITER. Refs, figs and tabs
Muller, A.; Hughes, T. J. R.
1984-01-01
A weak formulation in structural analysis that provides well conditioned matrices suitable for iterative solutions is presented. A mixed formulation ensures the proper representation of the problem and the constitutive relations are added in a penalized form. The problem is solved by a double conjugate gradient algorithm combined with an element by element approximate factorization procedure. The double conjugate gradient strategy resembles Uzawa's variable-length type algorithms the main difference is the presence of quadratic terms in the mixed variables. In the case of shear deformable beams these terms ensure that the proper finite thickness solution is obtained.
Czech Academy of Sciences Publication Activity Database
Šůcha, P.; Hanzálek, Z.; Heřmánek, Antonín; Schier, Jan
2007-01-01
Roč. 46, č. 1 (2007), s. 35-53 ISSN 0922-5773 R&D Projects: GA AV ČR(CZ) 1ET300750402; GA MŠk(CZ) 1M0567; GA MPO(CZ) FD-K3/082 Institutional research plan: CEZ:AV0Z10750506 Keywords : high-level synthesis * cyclic scheduling * iterative algorithms * imperfectly nested loops * integer linear programming * FPGA * VLSI design * blind equalization * implementation Subject RIV: BA - General Mathematics Impact factor: 0.449, year: 2007 http://www.springerlink.com/content/t217kg0822538014/fulltext.pdf
Colorado Conference on iterative methods. Volume 1
Energy Technology Data Exchange (ETDEWEB)
NONE
1994-12-31
The conference provided a forum on many aspects of iterative methods. Volume I topics were:Session: domain decomposition, nonlinear problems, integral equations and inverse problems, eigenvalue problems, iterative software kernels. Volume II presents nonsymmetric solvers, parallel computation, theory of iterative methods, software and programming environment, ODE solvers, multigrid and multilevel methods, applications, robust iterative methods, preconditioners, Toeplitz and circulation solvers, and saddle point problems. Individual papers are indexed separately on the EDB.
Parallel Sequential Monte Carlo for Efficient Density Combination: The Deco Matlab Toolbox
DEFF Research Database (Denmark)
Casarin, Roberto; Grassi, Stefano; Ravazzolo, Francesco
This paper presents the Matlab package DeCo (Density Combination) which is based on the paper by Billio et al. (2013) where a constructive Bayesian approach is presented for combining predictive densities originating from different models or other sources of information. The combination weights...... for standard CPU computing and for Graphical Process Unit (GPU) parallel computing. For the GPU implementation we use the Matlab parallel computing toolbox and show how to use General Purposes GPU computing almost effortless. This GPU implementation comes with a speed up of the execution time up to seventy...... times compared to a standard CPU Matlab implementation on a multicore CPU. We show the use of the package and the computational gain of the GPU version, through some simulation experiments and empirical applications....
DEFF Research Database (Denmark)
Keibler, Evan; Arumugam, Manimozhiyan; Brent, Michael R
2007-01-01
be prohibitive. Existing approaches to reducing memory usage either sacrifice optimality or trade increased running time for reduced memory. RESULTS: We developed two novel decoding algorithms, Treeterbi and Parallel Treeterbi, and implemented them in the TWINSCAN/N-SCAN gene-prediction system. The worst case......MOTIVATION: Hidden Markov models (HMMs) and generalized HMMs been successfully applied to many problems, but the standard Viterbi algorithm for computing the most probable interpretation of an input sequence (known as decoding) requires memory proportional to the length of the sequence, which can...... asymptotic space and time are the same as for standard Viterbi, but in practice, Treeterbi optimally decodes arbitrarily long sequences with generalized HMMs in bounded memory without increasing running time. Parallel Treeterbi uses the same ideas to split optimal decoding across processors, dividing latency...
Comments on the parallelization efficiency of the Sunway TaihuLight supercomputer
Végh, János
2016-01-01
In the world of supercomputers, the large number of processors requires to minimize the inefficiencies of parallelization, which appear as a sequential part of the program from the point of view of Amdahl's law. The recently suggested new figure of merit is applied to the recently presented supercomputer, and the timeline of "Top 500" supercomputers is scrutinized using the metric. It is demonstrated, that in addition to the computing performance and power consumption, the new supercomputer i...
International Nuclear Information System (INIS)
Woodruff, S.B.
1992-01-01
The Transient Reactor Analysis Code (TRAC), which features a two- fluid treatment of thermal-hydraulics, is designed to model transients in water reactors and related facilities. One of the major computational costs associated with TRAC and similar codes is calculating constitutive coefficients. Although the formulations for these coefficients are local the costs are flow-regime- or data-dependent; i.e., the computations needed for a given spatial node often vary widely as a function of time. Consequently, poor load balancing will degrade efficiency on either vector or data parallel architectures when the data are organized according to spatial location. Unfortunately, a general automatic solution to the load-balancing problem associated with data-dependent computations is not yet available for massively parallel architectures. This document discusses why developers algorithms, such as a neural net representation, that do not exhibit algorithms, such as a neural net representation, that do not exhibit load-balancing problems
DEFF Research Database (Denmark)
Ask, Kristine Skoglund; Bardakci, Turgay; Parmer, Marthe Petrine
2016-01-01
Generic Parallel Artificial Liquid Membrane Extraction (PALME) methods for non-polar basic and non-polar acidic drugs from human plasma were investigated with respect to phospholipid removal. In both cases, extractions in 96-well format were performed from plasma (125μL), through 4μL organic...... solvent used as supported liquid membranes (SLMs), and into 50μL aqueous acceptor solutions. The acceptor solutions were subsequently analysed by liquid chromatography-tandem mass spectrometry (LC-MS/MS) using in-source fragmentation and monitoring the m/z 184→184 transition for investigation...
International Nuclear Information System (INIS)
Rebuffel, V.; Gonon, G.
1992-01-01
A software package is presented that can be employed for any 3D imaging modalities: X-ray tomography, emission tomography, magnetic resonance imaging. This system uses a hierarchical data structure, named Octree, that naturally allows a multi-resolution approach. The well-known problems of such an indeterministic representation, especially the neighbor finding, has been solved. Several algorithms of volume processing have been developed, using these techniques and an optimal data storage for the Octree. A parallel implementation was chosen that is compatible with the constraints of the Octree base and the various algorithms. (authors) 4 refs., 3 figs., 1 tab
Energy Technology Data Exchange (ETDEWEB)
Madduri, Kamesh; Ediger, David; Jiang, Karl; Bader, David A.; Chavarria-Miranda, Daniel
2009-02-15
We present a new lock-free parallel algorithm for computing betweenness centralityof massive small-world networks. With minor changes to the data structures, ouralgorithm also achieves better spatial cache locality compared to previous approaches. Betweenness centrality is a key algorithm kernel in HPCS SSCA#2, a benchmark extensively used to evaluate the performance of emerging high-performance computing architectures for graph-theoretic computations. We design optimized implementations of betweenness centrality and the SSCA#2 benchmark for two hardware multithreaded systems: a Cray XMT system with the Threadstorm processor, and a single-socket Sun multicore server with the UltraSPARC T2 processor. For a small-world network of 134 million vertices and 1.073 billion edges, the 16-processor XMT system and the 8-core Sun Fire T5120 server achieve TEPS scores (an algorithmic performance count for the SSCA#2 benchmark) of 160 million and 90 million respectively, which corresponds to more than a 2X performance improvement over the previous parallel implementations. To better characterize the performance of these multithreaded systems, we correlate the SSCA#2 performance results with data from the memory-intensive STREAM and RandomAccess benchmarks. Finally, we demonstrate the applicability of our implementation to analyze massive real-world datasets by computing approximate betweenness centrality for a large-scale IMDb movie-actor network.
Efficient Method for Parallel Process and Matching of Large Data set in Grid Computing Environment
Directory of Open Access Journals (Sweden)
E. Sankar
2014-09-01
Full Text Available Data management is one of the challenging issues in grid computing and its environments. Because grid computing systems and its applications deals with huge amount of data sets, due to the heterogeneous grid resources that belongs to different organizations and various locations with many access policies. Here To achieve the promising potentials of tremendous distributed resources, useful and capable Scheduling Algorithms are important. Task Scheduling is the mapping of tasks to a selected group of resources which may be distributed in different administrative domains. In this the Parallel Processing of the distributed systems will works using the grid scheduling algorithms. Genetic Algorithm which is a type of scheduling algorithm used for task scheduling to the various resources are working as parallel in the distributed systems.Basically, a Grid scheduler receives applications from Grid users, selects sufficient resources for these applications according to acquired information from the Grid Information Service module, and in conclusion generates application to resource mappings based on assured objective functions and predicted resource performance. Information about the status of available resources is very important for a Grid scheduler to make a proper scheduling, particularly when the heterogeneous and self motivated nature of the Grid is taken into account .The function of the Grid information service is to provide such information to Grid schedulers.
Keibler, Evan; Arumugam, Manimozhiyan; Brent, Michael R
2007-03-01
Hidden Markov models (HMMs) and generalized HMMs been successfully applied to many problems, but the standard Viterbi algorithm for computing the most probable interpretation of an input sequence (known as decoding) requires memory proportional to the length of the sequence, which can be prohibitive. Existing approaches to reducing memory usage either sacrifice optimality or trade increased running time for reduced memory. We developed two novel decoding algorithms, Treeterbi and Parallel Treeterbi, and implemented them in the TWINSCAN/N-SCAN gene-prediction system. The worst case asymptotic space and time are the same as for standard Viterbi, but in practice, Treeterbi optimally decodes arbitrarily long sequences with generalized HMMs in bounded memory without increasing running time. Parallel Treeterbi uses the same ideas to split optimal decoding across processors, dividing latency to completion by approximately the number of available processors with constant average overhead per processor. Using these algorithms, we were able to optimally decode all human chromosomes with N-SCAN, which increased its accuracy relative to heuristic solutions. We also implemented Treeterbi for Pairagon, our pair HMM based cDNA-to-genome aligner. The TWINSCAN/N-SCAN/PAIRAGON open source software package is available from http://genes.cse.wustl.edu.
Honkonen, I.
2015-03-01
I present a method for developing extensible and modular computational models without sacrificing serial or parallel performance or source code readability. By using a generic simulation cell method I show that it is possible to combine several distinct computational models to run in the same computational grid without requiring modification of existing code. This is an advantage for the development and testing of, e.g., geoscientific software as each submodel can be developed and tested independently and subsequently used without modification in a more complex coupled program. An implementation of the generic simulation cell method presented here, generic simulation cell class (gensimcell), also includes support for parallel programming by allowing model developers to select which simulation variables of, e.g., a domain-decomposed model to transfer between processes via a Message Passing Interface (MPI) library. This allows the communication strategy of a program to be formalized by explicitly stating which variables must be transferred between processes for the correct functionality of each submodel and the entire program. The generic simulation cell class requires a C++ compiler that supports a version of the language standardized in 2011 (C++11). The code is available at https://github.com/nasailja/gensimcell for everyone to use, study, modify and redistribute; those who do are kindly requested to acknowledge and cite this work.
An efficient parallel simulation of unsteady blood flows in patient-specific pulmonary artery.
Kong, Fande; Kheyfets, Vitaly; Finol, Ender; Cai, Xiao-Chuan
2018-04-01
Simulation of blood flows in the pulmonary artery provides some insight into certain diseases by examining the relationship between some continuum metrics, eg, the wall shear stress acting on the vascular endothelium, which responds to flow-induced mechanical forces by releasing vasodilators/constrictors. V. Kheyfets, in his previous work, studies numerically a patient-specific pulmonary circulation to show that decreasing wall shear stress is correlated with increasing pulmonary vascular impedance. In this paper, we develop a scalable parallel algorithm based on domain decomposition methods to investigate an unsteady model with patient-specific pulsatile waveforms as the inlet boundary condition. The unsteady model offers tremendously more information about the dynamic behavior of the flow field, but computationally speaking, the simulation is a lot more expensive since a problem which is similar to the steady-state problem has to be solved many times, and therefore, the traditional sequential approach is not suitable anymore. We show computationally that simulations using the proposed parallel approach with up to 10 000 processor cores can be obtained with much reduced compute time. This makes the technology potentially usable for the routine study of the dynamic behavior of blood flows in the pulmonary artery, in particular, the changes of the blood flows and the wall shear stress in the spatial and temporal dimensions. Copyright © 2017 John Wiley & Sons, Ltd.
A Scheduling-Based Framework for Efficient Massively Parallel Execution, Phase I
National Aeronautics and Space Administration — The barrier to entry creating efficient, scalable applications for heterogeneous supercomputing environments is too high. EM Photonics has found that the majority of...
van de Water, S.; Kraan, A. C.; Breedveld, S.; Schillemans, W.; Teguh, D. N.; Kooy, H. M.; Madden, T. M.; Heijmen, B. J. M.; Hoogeman, M. S.
2013-10-01
This study investigates whether ‘pencil beam resampling’, i.e. iterative selection and weight optimization of randomly placed pencil beams (PBs), reduces optimization time and improves plan quality for multi-criteria optimization in intensity-modulated proton therapy, compared with traditional modes in which PBs are distributed over a regular grid. Resampling consisted of repeatedly performing: (1) random selection of candidate PBs from a very fine grid, (2) inverse multi-criteria optimization, and (3) exclusion of low-weight PBs. The newly selected candidate PBs were added to the PBs in the existing solution, causing the solution to improve with each iteration. Resampling and traditional regular grid planning were implemented into our in-house developed multi-criteria treatment planning system ‘Erasmus iCycle’. The system optimizes objectives successively according to their priorities as defined in the so-called ‘wish-list’. For five head-and-neck cancer patients and two PB widths (3 and 6 mm sigma at 230 MeV), treatment plans were generated using: (1) resampling, (2) anisotropic regular grids and (3) isotropic regular grids, while using varying sample sizes (resampling) or grid spacings (regular grid). We assessed differences in optimization time (for comparable plan quality) and in plan quality parameters (for comparable optimization time). Resampling reduced optimization time by a factor of 2.8 and 5.6 on average (7.8 and 17.0 at maximum) compared with the use of anisotropic and isotropic grids, respectively. Doses to organs-at-risk were generally reduced when using resampling, with median dose reductions ranging from 0.0 to 3.0 Gy (maximum: 14.3 Gy, relative: 0%-42%) compared with anisotropic grids and from -0.3 to 2.6 Gy (maximum: 11.4 Gy, relative: -4%-19%) compared with isotropic grids. Resampling was especially effective when using thin PBs (3 mm sigma). Resampling plans contained on average fewer PBs, energy layers and protons than anisotropic
Murphy, Mark; Alley, Marcus; Demmel, James; Keutzer, Kurt; Vasanawala, Shreyas; Lustig, Michael
2012-01-01
We present ℓ1-SPIRiT, a simple algorithm for auto calibrating parallel imaging (acPI) and compressed sensing (CS) that permits an efficient implementation with clinically-feasible runtimes. We propose a CS objective function that minimizes cross-channel joint sparsity in the Wavelet domain. Our reconstruction minimizes this objective via iterative soft-thresholding, and integrates naturally with iterative Self-Consistent Parallel Imaging (SPIRiT). Like many iterative MRI reconstructions, ℓ1-SPIRiT’s image quality comes at a high computational cost. Excessively long runtimes are a barrier to the clinical use of any reconstruction approach, and thus we discuss our approach to efficiently parallelizing ℓ1-SPIRiT and to achieving clinically-feasible runtimes. We present parallelizations of ℓ1-SPIRiT for both multi-GPU systems and multi-core CPUs, and discuss the software optimization and parallelization decisions made in our implementation. The performance of these alternatives depends on the processor architecture, the size of the image matrix, and the number of parallel imaging channels. Fundamentally, achieving fast runtime requires the correct trade-off between cache usage and parallelization overheads. We demonstrate image quality via a case from our clinical experimentation, using a custom 3DFT Spoiled Gradient Echo (SPGR) sequence with up to 8× acceleration via poisson-disc undersampling in the two phase-encoded directions. PMID:22345529
Murphy, Mark; Alley, Marcus; Demmel, James; Keutzer, Kurt; Vasanawala, Shreyas; Lustig, Michael
2012-06-01
We present l₁-SPIRiT, a simple algorithm for auto calibrating parallel imaging (acPI) and compressed sensing (CS) that permits an efficient implementation with clinically-feasible runtimes. We propose a CS objective function that minimizes cross-channel joint sparsity in the wavelet domain. Our reconstruction minimizes this objective via iterative soft-thresholding, and integrates naturally with iterative self-consistent parallel imaging (SPIRiT). Like many iterative magnetic resonance imaging reconstructions, l₁-SPIRiT's image quality comes at a high computational cost. Excessively long runtimes are a barrier to the clinical use of any reconstruction approach, and thus we discuss our approach to efficiently parallelizing l₁-SPIRiT and to achieving clinically-feasible runtimes. We present parallelizations of l₁-SPIRiT for both multi-GPU systems and multi-core CPUs, and discuss the software optimization and parallelization decisions made in our implementation. The performance of these alternatives depends on the processor architecture, the size of the image matrix, and the number of parallel imaging channels. Fundamentally, achieving fast runtime requires the correct trade-off between cache usage and parallelization overheads. We demonstrate image quality via a case from our clinical experimentation, using a custom 3DFT spoiled gradient echo (SPGR) sequence with up to 8× acceleration via Poisson-disc undersampling in the two phase-encoded directions.
International Nuclear Information System (INIS)
Bhowmick, S; Shafiullah, M; Rai, H; Bastola, D
2010-01-01
Biological sequence comparison programs have revolutionized the practice of biochemistry, and molecular and evolutionary biology. Pairwise comparison of genomic sequences is a popular method of choice for analyzing genetic sequence data. However the quality of results from most sequence comparison methods are significantly affected by small perturbations in the data and furthermore, there is a dearth of computational tools to compare sequences beyond a certain length. In this paper, we describe a parallel algorithm for comparing genetic sequences using an alignment free-method based on computing the Longest Common Subsequence (LCS) between genetic sequences. We validate the quality of our results by comparing the phylogenetic tress obtained from ClustalW and LCS. We also show through complexity analysis of the isoefficiency and by empirical measurement of the running time that our algorithm is very scalable.
Scalability of Parallel Scientific Applications on the Cloud
Directory of Open Access Journals (Sweden)
Satish Narayana Srirama
2011-01-01
Full Text Available Cloud computing, with its promise of virtually infinite resources, seems to suit well in solving resource greedy scientific computing problems. To study the effects of moving parallel scientific applications onto the cloud, we deployed several benchmark applications like matrix–vector operations and NAS parallel benchmarks, and DOUG (Domain decomposition On Unstructured Grids on the cloud. DOUG is an open source software package for parallel iterative solution of very large sparse systems of linear equations. The detailed analysis of DOUG on the cloud showed that parallel applications benefit a lot and scale reasonable on the cloud. We could also observe the limitations of the cloud and its comparison with cluster in terms of performance. However, for efficiently running the scientific applications on the cloud infrastructure, the applications must be reduced to frameworks that can successfully exploit the cloud resources, like the MapReduce framework. Several iterative and embarrassingly parallel algorithms are reduced to the MapReduce model and their performance is measured and analyzed. The analysis showed that Hadoop MapReduce has significant problems with iterative methods, while it suits well for embarrassingly parallel algorithms. Scientific computing often uses iterative methods to solve large problems. Thus, for scientific computing on the cloud, this paper raises the necessity for better frameworks or optimizations for MapReduce.
Wan, Shixiang; Zou, Quan
2017-01-01
Multiple sequence alignment (MSA) plays a key role in biological sequence analyses, especially in phylogenetic tree construction. Extreme increase in next-generation sequencing results in shortage of efficient ultra-large biological sequence alignment approaches for coping with different sequence types. Distributed and parallel computing represents a crucial technique for accelerating ultra-large (e.g. files more than 1 GB) sequence analyses. Based on HAlign and Spark distributed computing system, we implement a highly cost-efficient and time-efficient HAlign-II tool to address ultra-large multiple biological sequence alignment and phylogenetic tree construction. The experiments in the DNA and protein large scale data sets, which are more than 1GB files, showed that HAlign II could save time and space. It outperformed the current software tools. HAlign-II can efficiently carry out MSA and construct phylogenetic trees with ultra-large numbers of biological sequences. HAlign-II shows extremely high memory efficiency and scales well with increases in computing resource. THAlign-II provides a user-friendly web server based on our distributed computing infrastructure. HAlign-II with open-source codes and datasets was established at http://lab.malab.cn/soft/halign.
A fast iterative scheme for the linearized Boltzmann equation
Wu, Lei; Zhang, Jun; Liu, Haihu; Zhang, Yonghao; Reese, Jason M.
2017-06-01
Iterative schemes to find steady-state solutions to the Boltzmann equation are efficient for highly rarefied gas flows, but can be very slow to converge in the near-continuum flow regime. In this paper, a synthetic iterative scheme is developed to speed up the solution of the linearized Boltzmann equation by penalizing the collision operator L into the form L = (L + Nδh) - Nδh, where δ is the gas rarefaction parameter, h is the velocity distribution function, and N is a tuning parameter controlling the convergence rate. The velocity distribution function is first solved by the conventional iterative scheme, then it is corrected such that the macroscopic flow velocity is governed by a diffusion-type equation that is asymptotic-preserving into the Navier-Stokes limit. The efficiency of this new scheme is assessed by calculating the eigenvalue of the iteration, as well as solving for Poiseuille and thermal transpiration flows. We find that the fastest convergence of our synthetic scheme for the linearized Boltzmann equation is achieved when Nδ is close to the average collision frequency. The synthetic iterative scheme is significantly faster than the conventional iterative scheme in both the transition and the near-continuum gas flow regimes. Moreover, due to its asymptotic-preserving properties, the synthetic iterative scheme does not need high spatial resolution in the near-continuum flow regime, which makes it even faster than the conventional iterative scheme. Using this synthetic scheme, with the fast spectral approximation of the linearized Boltzmann collision operator, Poiseuille and thermal transpiration flows between two parallel plates, through channels of circular/rectangular cross sections and various porous media are calculated over the whole range of gas rarefaction. Finally, the flow of a Ne-Ar gas mixture is solved based on the linearized Boltzmann equation with the Lennard-Jones intermolecular potential for the first time, and the difference
Power Efficient Design of Parallel/Serial FIR Filters in RNS
DEFF Research Database (Denmark)
Petricca, Massimo; Albicocco, Pietro; Cardarilli, Gian Carlo
2012-01-01
filter potentially disadvantageous with respect to filters implemented in the conventional number systems. In this work, we show a number of solutions which demonstrate that the power efficiency of RNS FIR filters implemented serially is maintained in ASIC technology, while in modern FPGA technology RNS...
A high efficient integrated planar transformer for primary-parallel isolated boost converters
DEFF Research Database (Denmark)
Sen, Gökhan; Ouyang, Ziwei; Thomsen, Ole Cornelius
2010-01-01
for a fuel cell fed battery charger application with 50–110 V input and 65–105 V output. Input inductors are coupled for current sharing, eliminating the use of current sharing transformers. An efficiency of 94% is achieved during nominal operating condition where the input is 70-V and the output is 84-V....
3D-radiative transfer in terrestrial atmosphere: An efficient parallel numerical procedure
Bass, L. P.; Germogenova, T. A.; Nikolaeva, O. V.; Kokhanovsky, A. A.; Kuznetsov, V. S.
2003-04-01
Light propagation and scattering in terrestrial atmosphere is usually studied in the framework of the 1D radiative transfer theory [1]. However, in reality particles (e.g., ice crystals, solid and liquid aerosols, cloud droplets) are randomly distributed in 3D space. In particular, their concentrations vary both in vertical and horizontal directions. Therefore, 3D effects influence modern cloud and aerosol retrieval procedures, which are currently based on the 1D radiative transfer theory. It should be pointed out that the standard radiative transfer equation allows to study these more complex situations as well [2]. In recent year the parallel version of the 2D and 3D RADUGA code has been developed. This version is successfully used in gammas and neutrons transport problems [3]. Applications of this code to radiative transfer in atmosphere problems are contained in [4]. Possibilities of code RADUGA are presented in [5]. The RADUGA code system is an universal solver of radiative transfer problems for complicated models, including 2D and 3D aerosol and cloud fields with arbitrary scattering anisotropy, light absorption, inhomogeneous underlying surface and topography. Both delta type and distributed light sources can be accounted for in the framework of the algorithm developed. The accurate numerical procedure is based on the new discrete ordinate SWDD scheme [6]. The algorithm is specifically designed for parallel supercomputers. The version RADUGA 5.1(P) can run on MBC1000M [7] (768 processors with 10 Gb of hard disc memory for each processor). The peak productivity is equal 1 Tfl. Corresponding scalar version RADUGA 5.1 is working on PC. As a first example of application of the algorithm developed, we have studied the shadowing effects of clouds on neighboring cloudless atmosphere, depending on the cloud optical thickness, surface albedo, and illumination conditions. This is of importance for modern satellite aerosol retrieval algorithms development. [1] Sobolev
Wright, Robin; Parrish, Mark L; Cadera, Emily; Larson, Lynnelle; Matson, Clinton K; Garrett-Engele, Philip; Armour, Chris; Lum, Pek Yee; Shoemaker, Daniel D
2003-07-30
Increased levels of HMG-CoA reductase induce cell type- and isozyme-specific proliferation of the endoplasmic reticulum. In yeast, the ER proliferations induced by Hmg1p consist of nuclear-associated stacks of smooth ER membranes known as karmellae. To identify genes required for karmellae assembly, we compared the composition of populations of homozygous diploid S. cerevisiae deletion mutants following 20 generations of growth with and without karmellae. Using an initial population of 1,557 deletion mutants, 120 potential mutants were identified as a result of three independent experiments. Each experiment produced a largely non-overlapping set of potential mutants, suggesting that differences in specific growth conditions could be used to maximize the comprehensiveness of similar parallel analysis screens. Only two genes, UBC7 and YAL011W, were identified in all three experiments. Subsequent analysis of individual mutant strains confirmed that each experiment was identifying valid mutations, based on the mutant's sensitivity to elevated HMG-CoA reductase and inability to assemble normal karmellae. The largest class of HMG-CoA reductase-sensitive mutations was a subset of genes that are involved in chromatin structure and transcriptional regulation, suggesting that karmellae assembly requires changes in transcription or that the presence of karmellae may interfere with normal transcriptional regulation. Copyright 2003 John Wiley & Sons, Ltd.
Efficient parallel implementations of approximation algorithms for guarding 1.5D terrains
Directory of Open Access Journals (Sweden)
Goran Martinović
2015-03-01
Full Text Available In the 1.5D terrain guarding problem, an x-monotone polygonal line is dened by k vertices and a G set of terrain points, i.e. guards, and a N set of terrain points which guards are to observe (guard. This involves a weighted version of the guarding problem where guards G have weights. The goal is to determine a minimum weight subset of G to cover all the points in N, including a version where points from N have demands. Furthermore, another goal is to determine the smallest subset of G, such that every point in N is observed by the required number of guards. Both problems are NP-hard and have a factor 5 approximation [3, 4]. This paper will show that if the (1+ϵ-approximate solver for the corresponding linear program is a computer, for any ϵ > 0, an extra 1+ϵ factor will appear in the final approximation factor for both problems. A comparison will be carried out the parallel implementation based on GPU and CPU threads with the Gurobi solver, leading to the conclusion that the respective algorithm outperforms the Gurobi solver on large and dense inputs typically by one order of magnitude.
Power-efficient high-speed parallel-sampling adcs for broadband multi-carrier systems
Lin, Yu; Doris, Kostas; van Roermund, Arthur H M
2015-01-01
This book addresses the challenges of designing high performance analog-to-digital converters (ADCs) based on the “smart data converters” concept, which implies context awareness, on-chip intelligence and adaptation. Readers will learn to exploit various information either a-priori or a-posteriori (obtained from devices, signals, applications or the ambient situations, etc.) for circuit and architecture optimization during the design phase or adaptation during operation, to enhance data converters performance, flexibility, robustness and power-efficiency. The authors focus on exploiting the a-priori knowledge of the system/application to develop enhancement techniques for ADCs, with particular emphasis on improving the power efficiency of high-speed and high-resolution ADCs for broadband multi-carrier systems.
High-efficiency one-dimensional atom localization via two parallel standing-wave fields
International Nuclear Information System (INIS)
Wang, Zhiping; Wu, Xuqiang; Lu, Liang; Yu, Benli
2014-01-01
We present a new scheme of high-efficiency one-dimensional (1D) atom localization via measurement of upper state population or the probe absorption in a four-level N-type atomic system. By applying two classical standing-wave fields, the localization peak position and number, as well as the conditional position probability, can be easily controlled by the system parameters, and the sub-half-wavelength atom localization is also observed. More importantly, there is 100% detecting probability of the atom in the subwavelength domain when the corresponding conditions are satisfied. The proposed scheme may open up a promising way to achieve high-precision and high-efficiency 1D atom localization. (paper)
International Nuclear Information System (INIS)
Takano, M.; Masukawa, F.; Naito, Y.
1994-01-01
The MCACE code, a radiation shielding analysis code by the Monte Carlo method is examined and modified to execute on a parallel computer. The parallelized MCACE code has achieved a speed-up of 52.5 times when random walk processes are executed by 128 batches of 400 particles on the parallel computer AP-1000 equipped with 64 cell processors. In order to achieve high performance, the number of particles for each batch must be large enough to reduce a fluctuation among the execution times in the cell processors, which are mainly caused by differences in random walk processes. (authors). 3 refs., 2 figs., 1 tab
Bhanot, Gyan V [Princeton, NJ; Chen, Dong [Croton-On-Hudson, NY; Gara, Alan G [Mount Kisco, NY; Giampapa, Mark E [Irvington, NY; Heidelberger, Philip [Cortlandt Manor, NY; Steinmacher-Burow, Burkhard D [Mount Kisco, NY; Vranas, Pavlos M [Bedford Hills, NY
2012-01-10
The present in invention is directed to a method, system and program storage device for efficiently implementing a multidimensional Fast Fourier Transform (FFT) of a multidimensional array comprising a plurality of elements initially distributed in a multi-node computer system comprising a plurality of nodes in communication over a network, comprising: distributing the plurality of elements of the array in a first dimension across the plurality of nodes of the computer system over the network to facilitate a first one-dimensional FFT; performing the first one-dimensional FFT on the elements of the array distributed at each node in the first dimension; re-distributing the one-dimensional FFT-transformed elements at each node in a second dimension via "all-to-all" distribution in random order across other nodes of the computer system over the network; and performing a second one-dimensional FFT on elements of the array re-distributed at each node in the second dimension, wherein the random order facilitates efficient utilization of the network thereby efficiently implementing the multidimensional FFT. The "all-to-all" re-distribution of array elements is further efficiently implemented in applications other than the multidimensional FFT on the distributed-memory parallel supercomputer.
Parallel preconditioning techniques for sparse CG solvers
Energy Technology Data Exchange (ETDEWEB)
Basermann, A.; Reichel, B.; Schelthoff, C. [Central Institute for Applied Mathematics, Juelich (Germany)
1996-12-31
Conjugate gradient (CG) methods to solve sparse systems of linear equations play an important role in numerical methods for solving discretized partial differential equations. The large size and the condition of many technical or physical applications in this area result in the need for efficient parallelization and preconditioning techniques of the CG method. In particular for very ill-conditioned matrices, sophisticated preconditioner are necessary to obtain both acceptable convergence and accuracy of CG. Here, we investigate variants of polynomial and incomplete Cholesky preconditioners that markedly reduce the iterations of the simply diagonally scaled CG and are shown to be well suited for massively parallel machines.
Jung, Jaewoon; Kobayashi, Chigusa; Imamura, Toshiyuki; Sugita, Yuji
2016-03-01
Three-dimensional Fast Fourier Transform (3D FFT) plays an important role in a wide variety of computer simulations and data analyses, including molecular dynamics (MD) simulations. In this study, we develop hybrid (MPI+OpenMP) parallelization schemes of 3D FFT based on two new volumetric decompositions, mainly for the particle mesh Ewald (PME) calculation in MD simulations. In one scheme, (1d_Alltoall), five all-to-all communications in one dimension are carried out, and in the other, (2d_Alltoall), one two-dimensional all-to-all communication is combined with two all-to-all communications in one dimension. 2d_Alltoall is similar to the conventional volumetric decomposition scheme. We performed benchmark tests of 3D FFT for the systems with different grid sizes using a large number of processors on the K computer in RIKEN AICS. The two schemes show comparable performances, and are better than existing 3D FFTs. The performances of 1d_Alltoall and 2d_Alltoall depend on the supercomputer network system and number of processors in each dimension. There is enough leeway for users to optimize performance for their conditions. In the PME method, short-range real-space interactions as well as long-range reciprocal-space interactions are calculated. Our volumetric decomposition schemes are particularly useful when used in conjunction with the recently developed midpoint cell method for short-range interactions, due to the same decompositions of real and reciprocal spaces. The 1d_Alltoall scheme of 3D FFT takes 4.7 ms to simulate one MD cycle for a virus system containing more than 1 million atoms using 32,768 cores on the K computer.
On a New Efficient Steffensen-Like Iterative Class by Applying a Suitable Self-Accelerator Parameter
Directory of Open Access Journals (Sweden)
Taher Lotfi
2014-01-01
interpolation in such a way that it improves its convergence order from 8 to 12 without any extra function evaluation. Therefore, its efficiency index is increased from 81/4 to 121/4 which is the main feature of this class. To show applicability of the proposed methods, some numerical illustrations are presented.
International Nuclear Information System (INIS)
Woodruff, S.B.
1994-01-01
The Transient Reactor Analysis Code (TRAC), which features a two-fluid treatment of thermal-hydraulics, is designed to model transients in water reactors and related facilities. One of the major computational costs associated with TRAC and similar codes is calculating constitutive coefficients. Although the formulations for these coefficients are local, the costs are flow-regime- or data-dependent; i.e., the computations needed for a given spatial node often vary widely as a function of time. Consequently, a fixed, uniform assignment of nodes to prallel processors will result in degraded computational efficiency due to the poor load balancing. A standard method for treating data-dependent models on vector architectures has been to use gather operations (or indirect adressing) to sort the nodes into subsets that (temporarily) share a common computational model. However, this method is not effective on distributed memory data parallel architectures, where indirect adressing involves expensive communication overhead. Another serious problem with this method involves software engineering challenges in the areas of maintainability and extensibility. For example, an implementation that was hand-tuned to achieve good computational efficiency would have to be rewritten whenever the decision tree governing the sorting was modified. Using an example based on the calculation of the wall-to-liquid and wall-to-vapor heat-transfer coefficients for three nonboiling flow regimes, we describe how the use of the Fortran 90 WHERE construct and automatic inlining of functions can be used to ameliorate this problem while improving both efficiency and software engineering. Unfortunately, a general automatic solution to the load-balancing problem associated with data-dependent computations is not yet available for massively parallel architectures. We discuss why developers should either wait for such solutions or consider alternative numerical algorithms, such as a neural network
Iterative methods for simultaneous inclusion of polynomial zeros
Petković, Miodrag
1989-01-01
The simultaneous inclusion of polynomial complex zeros is a crucial problem in numerical analysis. Rapidly converging algorithms are presented in these notes, including convergence analysis in terms of circular regions, and in complex arithmetic. Parallel circular iterations, where the approximations to the zeros have the form of circular regions containing these zeros, are efficient because they also provide error estimates. There are at present no book publications on this topic and one of the aims of this book is to collect most of the algorithms produced in the last 15 years. To decrease the high computational cost of interval methods, several effective iterative processes for the simultaneous inclusion of polynomial zeros which combine the efficiency of ordinary floating-point arithmetic with the accuracy control that may be obtained by the interval methods, are set down, and their computational efficiency is described. The rate of these methods is of interest in designing a package for the simultaneous ...
Directory of Open Access Journals (Sweden)
Mehiddin Al-Baali
2015-12-01
Full Text Available We deal with the design of parallel algorithms by using variable partitioning techniques to solve nonlinear optimization problems. We propose an iterative solution method that is very efficient for separable functions, our scope being to discuss its performance for general functions. Experimental results on an illustrative example have suggested some useful modifications that, even though they improve the efficiency of our parallel method, leave some questions open for further investigation.
Parallel conjugate gradient algorithms for manipulator dynamic simulation
Fijany, Amir; Scheld, Robert E.
1989-01-01
Parallel conjugate gradient algorithms for the computation of multibody dynamics are developed for the specialized case of a robot manipulator. For an n-dimensional positive-definite linear system, the Classical Conjugate Gradient (CCG) algorithms are guaranteed to converge in n iterations, each with a computation cost of O(n); this leads to a total computational cost of O(n sq) on a serial processor. A conjugate gradient algorithms is presented that provide greater efficiency using a preconditioner, which reduces the number of iterations required, and by exploiting parallelism, which reduces the cost of each iteration. Two Preconditioned Conjugate Gradient (PCG) algorithms are proposed which respectively use a diagonal and a tridiagonal matrix, composed of the diagonal and tridiagonal elements of the mass matrix, as preconditioners. Parallel algorithms are developed to compute the preconditioners and their inversions in O(log sub 2 n) steps using n processors. A parallel algorithm is also presented which, on the same architecture, achieves the computational time of O(log sub 2 n) for each iteration. Simulation results for a seven degree-of-freedom manipulator are presented. Variants of the proposed algorithms are also developed which can be efficiently implemented on the Robot Mathematics Processor (RMP).
Parallel preconditioned conjugate gradient algorithm applied to neutron diffusion problem
International Nuclear Information System (INIS)
Majumdar, A.; Martin, W.R.
1992-01-01
Numerical solution of the neutron diffusion problem requires solving a linear system of equations such as Ax = b, where A is an n x n symmetric positive definite (SPD) matrix; x and b are vectors with n components. The preconditioned conjugate gradient (PCG) algorithm is an efficient iterative method for solving such a linear system of equations. In this paper, the authors describe the implementation of a parallel PCG algorithm on a shared memory machine (BBN TC2000) and on a distributed workstation (IBM RS6000) environment created by the parallel virtual machine parallelization software
Reliability-Based Optimization of Series Systems of Parallel Systems
DEFF Research Database (Denmark)
Enevoldsen, I.; Sørensen, John Dalsgaard
1993-01-01
Reliability-based design of structural systems is considered. In particular, systems where the reliability model is a series system of parallel systems are treated. A sensitivity analysis for this class of problems is presented. Optimization problems with series systems of parallel systems......) a sequential formulation based on optimality criteria; and (4) a sequential formulation including a new so-called bounds iteration method (BIM). Numerical tests indicate that the sequential technique including the BIM is particularly fast and stable. The B1M is not only effective in reliabilitybased...... optimization of series systems of parallel systems, but it is also efficient in reliability-based optimization of series systems in general....
Colorado Conference on iterative methods. Volume 2
Energy Technology Data Exchange (ETDEWEB)
NONE
1994-12-31
The conference provided a forum for many topics in iterative methods. Volume II presents sessions on these topics: nonsymmetric solvers, parallel computation, ODE solvers, multigrid and multilevel methods, applications, robust iterative methods, preconditioners, Toeplitz and circulant matrix solvers, and saddle point problems. Individual papers are indexed separately on the EDB.
Directory of Open Access Journals (Sweden)
M.W. Zehn
2003-01-01
Full Text Available Various well-known modal synthesis methods exist in the literature, which are all based upon certain assumptions for the relation of generalised modal co-ordinates with internal modal co-ordinates. If employed in a dynamical FE substructure/superelement technique the generalised modal co-ordinates are represented by the master degrees of freedom (DOF of the master nodes of the substructure. To conduct FE modal analysis the modal synthesis method can be integrated to reduce the number of necessary master nodes or to ease the process of defining additional master points within the structure. The paper presents such a combined method, which can be integrated very efficiently and seamless into a special subspace eigenvalue problem solver with no need to alter the FE system matrices within the FE code. Accordingly, the merits of using the new algorithm are the easy implementation into a FE code, the less effort to carry out modal synthesis, and the versatility in dealing with superelements. The paper presents examples to illustrate the proper work of the algorithm proposed.
A homotopy method for solving Riccati equations on a shared memory parallel computer
International Nuclear Information System (INIS)
Zigic, D.; Watson, L.T.; Collins, E.G. Jr.; Davis, L.D.
1993-01-01
Although there are numerous algorithms for solving Riccati equations, there still remains a need for algorithms which can operate efficiently on large problems and on parallel machines. This paper gives a new homotopy-based algorithm for solving Riccati equations on a shared memory parallel computer. The central part of the algorithm is the computation of the kernel of the Jacobian matrix, which is essential for the corrector iterations along the homotopy zero curve. Using a Schur decomposition the tensor product structure of various matrices can be efficiently exploited. The algorithm allows for efficient parallelization on shared memory machines
Kim, M.-H.; Cho, J. H.; Park, S.-J.; Eden, J. G.
2017-08-01
Plasmachemical systems based on the production of a specific molecule (O3) in literally thousands of microchannel plasmas simultaneously have been demonstrated, developed and engineered over the past seven years, and commercialized. At the heart of this new plasma technology is the plasma chip, a flat aluminum strip fabricated by photolithographic and wet chemical processes and comprising 24-48 channels, micromachined into nanoporous aluminum oxide, with embedded electrodes. By integrating 4-6 chips into a module, the mass output of an ozone microplasma system is scaled linearly with the number of modules operating in parallel. A 115 g/hr (2.7 kg/day) ozone system, for example, is realized by the combined output of 18 modules comprising 72 chips and 1,800 microchannels. The implications of this plasma processing architecture for scaling ozone production capability, and reducing capital and service costs when introducing redundancy into the system, are profound. In contrast to conventional ozone generator technology, microplasma systems operate reliably (albeit with reduced output) in ambient air and humidity levels up to 90%, a characteristic attributable to the water adsorption/desorption properties and electrical breakdown strength of nanoporous alumina. Extensive testing has documented chip and system lifetimes (MTBF) beyond 5,000 hours, and efficiencies >130 g/kWh when oxygen is the feedstock gas. Furthermore, the weight and volume of microplasma systems are a factor of 3-10 lower than those for conventional ozone systems of comparable output. Massively-parallel plasmachemical processing offers functionality, performance, and commercial value beyond that afforded by conventional technology, and is currently in operation in more than 30 countries worldwide.
Allphin, Devin
Computational fluid dynamics (CFD) solution approximations for complex fluid flow problems have become a common and powerful engineering analysis technique. These tools, though qualitatively useful, remain limited in practice by their underlying inverse relationship between simulation accuracy and overall computational expense. While a great volume of research has focused on remedying these issues inherent to CFD, one traditionally overlooked area of resource reduction for engineering analysis concerns the basic definition and determination of functional relationships for the studied fluid flow variables. This artificial relationship-building technique, called meta-modeling or surrogate/offline approximation, uses design of experiments (DOE) theory to efficiently approximate non-physical coupling between the variables of interest in a fluid flow analysis problem. By mathematically approximating these variables, DOE methods can effectively reduce the required quantity of CFD simulations, freeing computational resources for other analytical focuses. An idealized interpretation of a fluid flow problem can also be employed to create suitably accurate approximations of fluid flow variables for the purposes of engineering analysis. When used in parallel with a meta-modeling approximation, a closed-form approximation can provide useful feedback concerning proper construction, suitability, or even necessity of an offline approximation tool. It also provides a short-circuit pathway for further reducing the overall computational demands of a fluid flow analysis, again freeing resources for otherwise unsuitable resource expenditures. To validate these inferences, a design optimization problem was presented requiring the inexpensive estimation of aerodynamic forces applied to a valve operating on a simulated piston-cylinder heat engine. The determination of these forces was to be found using parallel surrogate and exact approximation methods, thus evidencing the comparative
Improved parallel solution techniques for the integral transport matrix method
International Nuclear Information System (INIS)
Zerr, R. Joseph; Azmy, Yousry Y.
2011-01-01
Alternative solution strategies to the parallel block Jacobi (PBJ) method for the solution of the global problem with the integral transport matrix method operators have been designed and tested. The most straightforward improvement to the Jacobi iterative method is the Gauss-Seidel alternative. The parallel red-black Gauss-Seidel (PGS) algorithm can improve on the number of iterations and reduce work per iteration by applying an alternating red-black color-set to the subdomains and assigning multiple sub-domains per processor. A parallel GMRES(m) method was implemented as an alternative to stationary iterations. Computational results show that the PGS method can improve on the PBJ method execution time by up to 10´ when eight sub-domains per processor are used. However, compared to traditional source iterations with diffusion synthetic acceleration, it is still approximately an order of magnitude slower. The best-performing cases are optically thick because sub-domains decouple, yielding faster convergence. Further tests revealed that 64 sub-domains per processor was the best performing level of sub-domain division. An acceleration technique that improves the convergence rate would greatly improve the ITMM. The GMRES(m) method with a diagonal block pre conditioner consumes approximately the same time as the PBJ solver but could be improved by an as yet undeveloped, more efficient pre conditioner. (author)
Non-Cartesian parallel imaging reconstruction.
Wright, Katherine L; Hamilton, Jesse I; Griswold, Mark A; Gulani, Vikas; Seiberlich, Nicole
2014-11-01
Non-Cartesian parallel imaging has played an important role in reducing data acquisition time in MRI. The use of non-Cartesian trajectories can enable more efficient coverage of k-space, which can be leveraged to reduce scan times. These trajectories can be undersampled to achieve even faster scan times, but the resulting images may contain aliasing artifacts. Just as Cartesian parallel imaging can be used to reconstruct images from undersampled Cartesian data, non-Cartesian parallel imaging methods can mitigate aliasing artifacts by using additional spatial encoding information in the form of the nonhomogeneous sensitivities of multi-coil phased arrays. This review will begin with an overview of non-Cartesian k-space trajectories and their sampling properties, followed by an in-depth discussion of several selected non-Cartesian parallel imaging algorithms. Three representative non-Cartesian parallel imaging methods will be described, including Conjugate Gradient SENSE (CG SENSE), non-Cartesian generalized autocalibrating partially parallel acquisition (GRAPPA), and Iterative Self-Consistent Parallel Imaging Reconstruction (SPIRiT). After a discussion of these three techniques, several potential promising clinical applications of non-Cartesian parallel imaging will be covered. © 2014 Wiley Periodicals, Inc.
Conformable variational iteration method
Directory of Open Access Journals (Sweden)
Omer Acan
2017-02-01
Full Text Available In this study, we introduce the conformable variational iteration method based on new defined fractional derivative called conformable fractional derivative. This new method is applied two fractional order ordinary differential equations. To see how the solutions of this method, linear homogeneous and non-linear non-homogeneous fractional ordinary differential equations are selected. Obtained results are compared the exact solutions and their graphics are plotted to demonstrate efficiency and accuracy of the method.
Progress and Achievements on the R and D Activities for ITER Vacuum Vessel
International Nuclear Information System (INIS)
Nakahira, M.; Koizumi, K.; Takahashi, H.; Onozuka, M.; Ioki, K.; Kuzumin, E.; Krylov, V.; Maslakowski, J.; Nelson, Brad E.; Jones, L.; Danner, W.; Maisonnier, D.
2001-01-01
The ITER vacuum vessel (VV) is designed to be large double-walled structure with a D-shaped crosssection. The achievable fabrication tolerance of this structure was unknown due to the size and complexity of shape. The Full-scale Sector Model of ITER Vacuum Vessel, which was 15m in height, was fabricated and tested to obtain the fabrication and assembly tolerances. The model was fabricated within the target tolerance of 5mm and welding deformation during assembly operation was obtained. The port structure was also connected using remotized welding tools to demonstrate the basic maintenance activity. In parallel, the tests of advanced welding, cutting and inspection system were performed to improve the efficiency of fabrication and maintenance of the Vacuum Vessel. These activities show the feasibility of ITER Vacuum Vessel as feasible in a realistic way. This paper describes the major progress, achievement and latest status of the R and D activities on the ITER vacuum vessel.
International Nuclear Information System (INIS)
Bottura, L.; Hasegawa, M.; Heim, J.
1991-01-01
As part of the summary of the Conceptual Design Activities (CDA) for the International Thermonuclear Experimental Reactor (ITER), this document describes the magnet systems for ITER, including the Toroidal Field (TF) and Poloidal Field (PF) Magnets, the Structural Support System and Cryostat, the Cryogenic System, the TF and PF Power and Protection Systems, and Coil Services and Diagnostics. After an Introduction and Summary, the document discusses the (i) Design Basis, including General Requirements, Design Criteria, Design Philosophy, and the Database (a.o., engineering data on key materials and components), and (ii) the Subsystem Design and Analysis, including Conductor Design, TF Coil and Structure Design, TF Structural Analysis, PF Coil and Structure Design, PF Structural Performance, Fatigue Assessment of Structures, AC Loss Performance, Thermohydraulic Performance, Stability, Cryogenic System, Power Supply Systems, and Coil Services. All magnets are superconducting, (based on Nb 3 Sn) except the Active Control Coils inside the Vacuum Vessel. The fault analysis has been taken to a level consistent with the design definition, showing that the present design meets the requirement for passive safety or can be made to meet it with only minor modifications. A more detailed assessment in this regard is needed but must await further development of the design. In conclusion, the magnet design concepts presently proposed can be developed into an engineering design. Refs, figs and tabs
Xu, Bing; Du, Wen-Qiang; Li, Jia-Wen; Hu, Yan-Lei; Yang, Liang; Zhang, Chen-Chu; Li, Guo-Qiang; Lao, Zhao-Xin; Ni, Jin-Cheng; Chu, Jia-Ru; Wu, Dong; Liu, Su-Ling; Sugioka, Koji
2016-01-01
High efficiency fabrication and integration of three-dimension (3D) functional devices in Lab-on-a-chip systems are crucial for microfluidic applications. Here, a spatial light modulator (SLM)-based multifoci parallel femtosecond laser scanning technology was proposed to integrate microstructures inside a given ‘Y’ shape microchannel. The key novelty of our approach lies on rapidly integrating 3D microdevices inside a microchip for the first time, which significantly reduces the fabrication time. The high quality integration of various 2D-3D microstructures was ensured by quantitatively optimizing the experimental conditions including prebaking time, laser power and developing time. To verify the designable and versatile capability of this method for integrating functional 3D microdevices in microchannel, a series of microfilters with adjustable pore sizes from 12.2 μm to 6.7 μm were fabricated to demonstrate selective filtering of the polystyrene (PS) particles and cancer cells with different sizes. The filter can be cleaned by reversing the flow and reused for many times. This technology will advance the fabrication technique of 3D integrated microfluidic and optofluidic chips.
International Nuclear Information System (INIS)
Tsuji, Masashi; Chiba, Gou
2000-01-01
A hierarchical domain decomposition boundary element method (HDD-BEM) for solving the multiregion neutron diffusion equation (NDE) has been fully parallelized, both for numerical computations and for data communications, to accomplish a high parallel efficiency on distributed memory message passing parallel computers. Data exchanges between node processors that are repeated during iteration processes of HDD-BEM are implemented, without any intervention of the host processor that was used to supervise parallel processing in the conventional parallelized HDD-BEM (P-HDD-BEM). Thus, the parallel processing can be executed with only cooperative operations of node processors. The communication overhead was even the dominant time consuming part in the conventional P-HDD-BEM, and the parallelization efficiency decreased steeply with the increase of the number of processors. With the parallel data communication, the efficiency is affected only by the number of boundary elements assigned to decomposed subregions, and the communication overhead can be drastically reduced. This feature can be particularly advantageous in the analysis of three-dimensional problems where a large number of processors are required. The proposed P-HDD-BEM offers a promising solution to the deterioration problem of parallel efficiency and opens a new path to parallel computations of NDEs on distributed memory message passing parallel computers. (author)
ITER council proceedings: 2001
International Nuclear Information System (INIS)
2001-01-01
Continuing the ITER EDA, two further ITER Council Meetings were held since the publication of ITER EDA documentation series no, 20, namely the ITER Council Meeting on 27-28 February 2001 in Toronto, and the ITER Council Meeting on 18-19 July in Vienna. That Meeting was the last one during the ITER EDA. This volume contains records of these Meetings, including: Records of decisions; List of attendees; ITER EDA status report; ITER EDA technical activities report; MAC report and advice; Final report of ITER EDA; and Press release
Existence test for asynchronous interval iterations
DEFF Research Database (Denmark)
Madsen, Kaj; Caprani, O.; Stauning, Ole
1997-01-01
In the search for regions that contain fixed points ofa real function of several variables, tests based on interval calculationscan be used to establish existence ornon-existence of fixed points in regions that are examined in the course ofthe search. The search can e.g. be performed...... as a synchronous (sequential) interval iteration:In each iteration step all components of the iterate are calculatedbased on the previous iterate. In this case it is straight forward to base simple interval existence and non-existencetests on the calculations done in each step of the iteration. The search can also...... on thecomponentwise calculations done in the course of the iteration. These componentwisetests are useful for parallel implementation of the search, sincethe tests can then be performed local to each processor and only when a test issuccessful do a processor communicate this result to other processors....
A Novel Parallel Algorithm for Edit Distance Computation
Directory of Open Access Journals (Sweden)
Muhammad Murtaza Yousaf
2018-01-01
Full Text Available The edit distance between two sequences is the minimum number of weighted transformation-operations that are required to transform one string into the other. The weighted transformation-operations are insert, remove, and substitute. Dynamic programming solution to find edit distance exists but it becomes computationally intensive when the lengths of strings become very large. This work presents a novel parallel algorithm to solve edit distance problem of string matching. The algorithm is based on resolving dependencies in the dynamic programming solution of the problem and it is able to compute each row of edit distance table in parallel. In this way, it becomes possible to compute the complete table in min(m,n iterations for strings of size m and n whereas state-of-the-art parallel algorithm solves the problem in max(m,n iterations. The proposed algorithm also increases the amount of parallelism in each of its iteration. The algorithm is also capable of exploiting spatial locality while its implementation. Additionally, the algorithm works in a load balanced way that further improves its performance. The algorithm is implemented for multicore systems having shared memory. Implementation of the algorithm in OpenMP shows linear speedup and better execution time as compared to state-of-the-art parallel approach. Efficiency of the algorithm is also proven better in comparison to its competitor.
Parallel adaptive mesh refinement for electronic structure calculations
Energy Technology Data Exchange (ETDEWEB)
Kohn, S.; Weare, J.; Ong, E.; Baden, S.
1996-12-01
We have applied structured adaptive mesh refinement techniques to the solution of the LDA equations for electronic structure calculations. Local spatial refinement concentrates memory resources and numerical effort where it is most needed, near the atomic centers and in regions of rapidly varying charge density. The structured grid representation enables us to employ efficient iterative solver techniques such as conjugate gradients with multigrid preconditioning. We have parallelized our solver using an object-oriented adaptive mesh refinement framework.
CERN. Geneva
2016-01-01
The traditionally used and well established parallel programming models OpenMP and MPI are both targeting lower level parallelism and are meant to be as language agnostic as possible. For a long time, those models were the only widely available portable options for developing parallel C++ applications beyond using plain threads. This has strongly limited the optimization capabilities of compilers, has inhibited extensibility and genericity, and has restricted the use of those models together with other, modern higher level abstractions introduced by the C++11 and C++14 standards. The recent revival of interest in the industry and wider community for the C++ language has also spurred a remarkable amount of standardization proposals and technical specifications being developed. Those efforts however have so far failed to build a vision on how to seamlessly integrate various types of parallelism, such as iterative parallel execution, task-based parallelism, asynchronous many-task execution flows, continuation s...
International Nuclear Information System (INIS)
Aymar, R.
2001-01-01
The Project has focused on drafting the Plant Description Document (PDD), which will be published as the Technical Basis for the ITER Final Design Report (FDR), and its related documentation in time for the ITER review process. The preparations have involved continued intensive detailed design work, analyses and assessments by the Home Teams and the Joint Central Team, who have co-operated closely and efficiently. The main technical document has been completed in time for circulation, as planned, to TAC members for their review at TAC-17 (19-22 February 2001). Some of the supporting documents, such as the Plant Design Specification (PDS), Design Requirements and Guidelines (DRG1 and DRG2), and the Plant Safety Requirement (PSR) are also available for reference in draft form. A summary paper of the PDD for the Council's information is available as a separate document. A new documentation structure for the Project has been established. This hierarchical structure for documentation facilitates the entire organization in a way that allows better change control and avoids duplications. The initiative was intended to make this documentation system valid for the construction and operation phases of ITER. As requested, the Director and the JCT have been assisting the Explorations to plan for future joint technical activities during the Negotiations, and to consider technical issues important for ITER construction and operation for their introduction in the draft of a future joint implementation agreement. As charged by the Explorers, the Director has held discussions with the Home Team Leaders in order to prepare for the staffing of the International Team and Participants Teams during the Negotiations (Co-ordinated Technical Activities, CTA) and also in view of informing all ITER staff about their future directions in a timely fashion. One important element of the work was the completion by the Parties' industries of costing studies of about 83 ''procurement packages
Design of radial neutron spectrometer for ITER
International Nuclear Information System (INIS)
Nishitani, Takeo; Kasai, Satoshi; Iguchi, Tetsuo; Ebisawa, Katsuyuki; Kita, Yoshio.
1996-09-01
We designed the radial neutron spectrometer using a new type DT neutron spectrometer base on a recoil proton counter-telescope technique aiming ion temperature measurement for ITER. The neutron spectrometer will be installed on the well-collimated neutron beam line. A large-area recoil proton emitter is placed in parallel to the incident neutron beam and a micro-channel collimating plates are inserted between the radiator and the recoil proton detectors away from the neutron beam in order to limit the scattering angle of protons to the proton detectors. Here a very thin polyethylene film and a silicon surface barrier detector are employed as the radiator and proton detector, respectively. The energy resolution and detection efficiency are estimated to be 2.5% and 1x10 -5 counts/(n/cm 2 ), respectively for DT neutron through Monte Carlo calculations. Five units of the spectrometers will be installed just out side the bio-shield and consist a fun array using penetrations inside the bio-shield and a pre-collimator in the horizontal port. The life time of the proton detectors is estimated to be about one year in the Basic Performance Phase of ITER by neutron transport calculations using MCNP Monte Carlo code. The necessary R and D items and the design work were identified. (author)
ITER Central Solenoid Module Fabrication
Energy Technology Data Exchange (ETDEWEB)
Smith, John [General Atomics, San Diego, CA (United States)
2016-09-23
The fabrication of the modules for the ITER Central Solenoid (CS) has started in a dedicated production facility located in Poway, California, USA. The necessary tools have been designed, built, installed, and tested in the facility to enable the start of production. The current schedule has first module fabrication completed in 2017, followed by testing and subsequent shipment to ITER. The Central Solenoid is a key component of the ITER tokamak providing the inductive voltage to initiate and sustain the plasma current and to position and shape the plasma. The design of the CS has been a collaborative effort between the US ITER Project Office (US ITER), the international ITER Organization (IO) and General Atomics (GA). GA’s responsibility includes: completing the fabrication design, developing and qualifying the fabrication processes and tools, and then completing the fabrication of the seven 110 tonne CS modules. The modules will be shipped separately to the ITER site, and then stacked and aligned in the Assembly Hall prior to insertion in the core of the ITER tokamak. A dedicated facility in Poway, California, USA has been established by GA to complete the fabrication of the seven modules. Infrastructure improvements included thick reinforced concrete floors, a diesel generator for backup power, along with, cranes for moving the tooling within the facility. The fabrication process for a single module requires approximately 22 months followed by five months of testing, which includes preliminary electrical testing followed by high current (48.5 kA) tests at 4.7K. The production of the seven modules is completed in a parallel fashion through ten process stations. The process stations have been designed and built with most stations having completed testing and qualification for carrying out the required fabrication processes. The final qualification step for each process station is achieved by the successful production of a prototype coil. Fabrication of the first
Eigenvalues calculation algorithms for λ-modes determination. Parallelization approach
International Nuclear Information System (INIS)
Vidal, V.; Verdu, G.; Munoz-Cobo, J.L.; Ginestart, D.
1997-01-01
In this paper, we review two methods to obtain the λ-modes of a nuclear reactor, Subspace Iteration method and Arnoldi's method, which are popular methods to solve the partial eigenvalue problem for a given matrix. In the developed application for the neutron diffusion equation we include improved acceleration techniques for both methods. Also, we propose two parallelization approaches for these methods, a coarse grain parallelization and a fine grain one. We have tested the developed algorithms with two realistic problems, focusing on the efficiency of the methods according to the CPU times. (author)
Eigenvalues calculation algorithms for {lambda}-modes determination. Parallelization approach
Energy Technology Data Exchange (ETDEWEB)
Vidal, V. [Universidad Politecnica de Valencia (Spain). Departamento de Sistemas Informaticos y Computacion; Verdu, G.; Munoz-Cobo, J.L. [Universidad Politecnica de Valencia (Spain). Departamento de Ingenieria Quimica y Nuclear; Ginestart, D. [Universidad Politecnica de Valencia (Spain). Departamento de Matematica Aplicada
1997-03-01
In this paper, we review two methods to obtain the {lambda}-modes of a nuclear reactor, Subspace Iteration method and Arnoldi`s method, which are popular methods to solve the partial eigenvalue problem for a given matrix. In the developed application for the neutron diffusion equation we include improved acceleration techniques for both methods. Also, we propose two parallelization approaches for these methods, a coarse grain parallelization and a fine grain one. We have tested the developed algorithms with two realistic problems, focusing on the efficiency of the methods according to the CPU times. (author).
Iterative Splitting Methods for Differential Equations
Geiser, Juergen
2011-01-01
Iterative Splitting Methods for Differential Equations explains how to solve evolution equations via novel iterative-based splitting methods that efficiently use computational and memory resources. It focuses on systems of parabolic and hyperbolic equations, including convection-diffusion-reaction equations, heat equations, and wave equations. In the theoretical part of the book, the author discusses the main theorems and results of the stability and consistency analysis for ordinary differential equations. He then presents extensions of the iterative splitting methods to partial differential
ITER EDA newsletter. V. 10, no. 1
International Nuclear Information System (INIS)
2001-01-01
This article provides a summary of results of the ITER Physics Committee Meeting, which was held on 14 October 2000 at the ITER Garching Joint Work Site, Germany. The ITER Physics Committee is the body responsible for overseeing, through the seven specialized Expert Groups, the R and D activities contributed voluntarily by the ITER Parties. The Parties' Physics Designated Persons, the Chairs and Co-Chairs of ITER Physics Expert Groups and the JCT members involved attended the Meeting. As usual, the meeting was chaired by the ITER Director, Dr. R. Aymar, who reported on the status of the ITER EDA. Dr. Aymar described the steps being taken in preparing the ITER-FEAT Final Design Report (FDR), and further stated that the Report would be available in time to be of benefit to the Negotiations on the ITER Joint Implementation, expected to start around May 2001. All Parties recognize that the ITER Physics Expert Group structure has been useful in focusing the tokamak physics activity on the ITER-relevant issues and provides an efficient worldwide collaboration on confirming innovative solutions. The concept of an international workshop to be organized as a pre-meeting of each Expert Group meeting, in order to involve U.S. scientists in the discussion of generic tokamak physics issues, was introduced in 2000, with some success, and its goal should be pursued
DEFF Research Database (Denmark)
Høyerby, Mikkel Christian Wendelboe; Andersen, Michael Andreas E.
2005-01-01
This paper presents a high-performance power conversion scheme for power supply applications that require very high output voltage slew rates (dV/dt). The concept is to parallel 2 switching bandpass current sources, each optimized for its passband frequency space and the expected load current....... The principle is demonstrated with a power supply, designed for supplying a 40 W linear RF power amplifier for efficient amplification of a 16-QAM modulated data stream...
Parallelized preconditioned BiCGStab solution of sparse linear system equations in F-COBRA-TF
International Nuclear Information System (INIS)
Geemert, Rene van; Glück, Markus; Riedmann, Michael; Gabriel, Harry
2011-01-01
Recently, the in-house development of a preconditioned and parallelized BiCGStab solver has been pursued successfully in AREVA’s advanced sub-channel code F-COBRA-TF. This solver can be run either in a sequential computation mode on a single CPU, or in a parallel computation mode on multiple parallel CPUs. The developed procedure enables the computation of several thousands of successive sparse linear system solutions in F-COBRA-TF with acceptable wall clock run times. The current paper provides general information about F-COBRA-TF in terms of modeling capabilities and application areas, and points out where the relevance arises for the efficient iterative solution of sparse linear systems. Furthermore, the preconditioning and parallelization strategies in the developed BiCGStab iterative solution approach are discussed. The paper is concluded with a number of verification examples. (author)
Farr, Benjamin; Kalogera, Vicky; Luijten, Erik
2014-07-01
We introduce a new Markov-chain Monte Carlo (MCMC) approach designed for the efficient sampling of highly correlated and multimodal posteriors. Parallel tempering, though effective, is a costly technique for sampling such posteriors. Our approach minimizes the use of parallel tempering, only applying it for a short time to build a proposal distribution that is based upon estimation of the kernel density and tuned to the target posterior. This proposal makes subsequent use of parallel tempering unnecessary, allowing all chains to be cooled to sample the target distribution. Gains in efficiency are found to increase with increasing posterior complexity, ranging from tens of percent in the simplest cases to over a factor of 10 for the more complex cases. Our approach is particularly useful in the context of parameter estimation of gravitational-wave signals measured by ground-based detectors, which is currently done through Bayesian inference with MCMC, one of the leading sampling methods. Posteriors for these signals are typically multimodal with strong nonlinear correlations, making sampling difficult. As we enter the advanced-detector era, improved sensitivities and wider bandwidths will drastically increase the computational cost of analyses, demanding more efficient search algorithms to meet these challenges.
Totally parallel multilevel algorithms
Frederickson, Paul O.
1988-01-01
Four totally parallel algorithms for the solution of a sparse linear system have common characteristics which become quite apparent when they are implemented on a highly parallel hypercube such as the CM2. These four algorithms are Parallel Superconvergent Multigrid (PSMG) of Frederickson and McBryan, Robust Multigrid (RMG) of Hackbusch, the FFT based Spectral Algorithm, and Parallel Cyclic Reduction. In fact, all four can be formulated as particular cases of the same totally parallel multilevel algorithm, which are referred to as TPMA. In certain cases the spectral radius of TPMA is zero, and it is recognized to be a direct algorithm. In many other cases the spectral radius, although not zero, is small enough that a single iteration per timestep keeps the local error within the required tolerance.
Parallel 3-D method of characteristics in MPACT
International Nuclear Information System (INIS)
Kochunas, B.; Dovvnar, T. J.; Liu, Z.
2013-01-01
A new parallel 3-D MOC kernel has been developed and implemented in MPACT which makes use of the modular ray tracing technique to reduce computational requirements and to facilitate parallel decomposition. The parallel model makes use of both distributed and shared memory parallelism which are implemented with the MPI and OpenMP standards, respectively. The kernel is capable of parallel decomposition of problems in space, angle, and by characteristic rays up to 0(104) processors. Initial verification of the parallel 3-D MOC kernel was performed using the Takeda 3-D transport benchmark problems. The eigenvalues computed by MPACT are within the statistical uncertainty of the benchmark reference and agree well with the averages of other participants. The MPACT k eff differs from the benchmark results for rodded and un-rodded cases by 11 and -40 pcm, respectively. The calculations were performed for various numbers of processors and parallel decompositions up to 15625 processors; all producing the same result at convergence. The parallel efficiency of the worst case was 60%, while very good efficiency (>95%) was observed for cases using 500 processors. The overall run time for the 500 processor case was 231 seconds and 19 seconds for the case with 15625 processors. Ongoing work is focused on developing theoretical performance models and the implementation of acceleration techniques to minimize the number of iterations to converge. (authors)
With enhanced data availability, distributed watershed models for large areas with high spatial and temporal resolution are increasingly used to understand water budgets and examine effects of human activities and climate change/variability on water resources. Developing parallel computing software...
ITER council proceedings: 1998
International Nuclear Information System (INIS)
1999-01-01
This volume contains documents of the 13th and the 14th ITER council meeting as well as of the 1st extraordinary ITER council meeting. Documents of the ITER meetings held in Vienna and Yokohama during 1998 are also included. The contents include an outline of the ITER objectives, the ITER parameters and design overview as well as operating scenarios and plasma performance. Furthermore, design features, safety and environmental characteristics are given
Kepper, Nick; Ettig, Ramona; Dickmann, Frank; Stehr, Rene; Grosveld, Frank G; Wedemann, Gero; Knoch, Tobias A
2010-01-01
Especially in the life-science and the health-care sectors the huge IT requirements are imminent due to the large and complex systems to be analysed and simulated. Grid infrastructures play here a rapidly increasing role for research, diagnostics, and treatment, since they provide the necessary large-scale resources efficiently. Whereas grids were first used for huge number crunching of trivially parallelizable problems, increasingly parallel high-performance computing is required. Here, we show for the prime example of molecular dynamic simulations how the presence of large grid clusters including very fast network interconnects within grid infrastructures allows now parallel high-performance grid computing efficiently and thus combines the benefits of dedicated super-computing centres and grid infrastructures. The demands for this service class are the highest since the user group has very heterogeneous requirements: i) two to many thousands of CPUs, ii) different memory architectures, iii) huge storage capabilities, and iv) fast communication via network interconnects, are all needed in different combinations and must be considered in a highly dedicated manner to reach highest performance efficiency. Beyond, advanced and dedicated i) interaction with users, ii) the management of jobs, iii) accounting, and iv) billing, not only combines classic with parallel high-performance grid usage, but more importantly is also able to increase the efficiency of IT resource providers. Consequently, the mere "yes-we-can" becomes a huge opportunity like e.g. the life-science and health-care sectors as well as grid infrastructures by reaching higher level of resource efficiency.
National Research Council Canada - National Science Library
Howard, Kevin
2003-01-01
Tools and techniques were developed by Massively Parallel Technologies Inc. (MPT) which enable intrinsically non-parallel processing problems to be processed by inexpensive parallel processing architectures...
Denovo--A New Three-Dimensional Parallel Discrete Ordinates Code in SCALE
International Nuclear Information System (INIS)
Evans, Thomas M.; Stafford, Alissa; Clarno, Kevin T.
2010-01-01
Denovo is a new, three-dimensional, discrete ordinates (SN) transport code that uses state-of-the-art solution methods to obtain accurate solutions to the Boltzmann transport equation. Denovo uses the Koch-Baker-Alcouffe parallel sweep algorithm to obtain high parallel efficiency on O(100) processors on XYZ orthogonal meshes. As opposed to traditional SN codes that use source iteration, Denovo uses nonstationary Krylov methods to solve the within-group equations. Krylov methods are far more efficient than stationary schemes. Additionally, classic acceleration schemes (diffusion synthetic acceleration) do not suffer stability problems when used as a preconditioner to a Krylov solver. Denovo's generic programming framework allows multiple spatial discretization schemes and solution methodologies. Denovo currently provides diamond-difference, theta-weighted diamond-difference, linear-discontinuous finite element, trilinear-discontinuous finite element, and step characteristics spatial differencing schemes. Also, users have the option of running traditional source iteration instead of Krylov iteration. Multigroup upscatter problems can be solved using Gauss-Seidel iteration with transport, two-grid acceleration. A parallel first-collision source is also available. Denovo solutions to the Kobayashi benchmarks are in excellent agreement with published results. Parallel performance shows excellent weak scaling up to 20000 cores and good scaling up to 40000 cores.
On the adequacy of message-passing parallel supercomputers for solving neutron transport problems
International Nuclear Information System (INIS)
Azmy, Y.Y.
1990-01-01
A coarse-grained, static-scheduling parallelization of the standard iterative scheme used for solving the discrete-ordinates approximation of the neutron transport equation is described. The parallel algorithm is based on a decomposition of the angular domain along the discrete ordinates, thus naturally producing a set of completely uncoupled systems of equations in each iteration. Implementation of the parallel code on Intcl's iPSC/2 hypercube, and solutions to test problems are presented as evidence of the high speedup and efficiency of the parallel code. The performance of the parallel code on the iPSC/2 is analyzed, and a model for the CPU time as a function of the problem size (order of angular quadrature) and the number of participating processors is developed and validated against measured CPU times. The performance model is used to speculate on the potential of massively parallel computers for significantly speeding up real-life transport calculations at acceptable efficiencies. We conclude that parallel computers with a few hundred processors are capable of producing large speedups at very high efficiencies in very large three-dimensional problems. 10 refs., 8 figs
ITER Council proceedings: 1993
International Nuclear Information System (INIS)
1994-01-01
Records of the third ITER Council Meeting (IC-3), held on 21-22 April 1993, in Tokyo, Japan, and the fourth ITER Council Meeting (IC-4) held on 29 September - 1 October 1993 in San Diego, USA, are presented, giving essential information on the evolution of the ITER Engineering Design Activities (EDA), such as the text of the draft of Protocol 2 further elaborated in ''ITER EDA Agreement and Protocol 2'' (ITER EDA Documentation Series No. 5), recommendations on future work programmes: a description of technology R and D tasks; the establishment of a trust fund for the ITER EDA activities; arrangements for Visiting Home Team Personnel; the general framework for the involvement of other countries in the ITER EDA; conditions for the involvement of Canada in the Euratom Contribution to the ITER EDA; and other attachments as parts of the Records of Decision of the aforementioned ITER Council Meetings
ITER council proceedings: 2000
International Nuclear Information System (INIS)
2001-01-01
No ITER Council Meetings were held during 2000. However, two ITER EDA Meetings were held, one in Tokyo, January 19-20, and one in Moscow, June 29-30. The parties participating in these meetings were those that partake in the extended ITER EDA, namely the EU, the Russian Federation, and Japan. This document contains, a/o, the records of these meetings, the list of attendees, the agenda, the ITER EDA Status Reports issued during these meetings, the TAC (Technical Advisory Committee) reports and recommendations, the MAC Reports and Advice (also for the July 1999 Meeting), the ITER-FEAT Outline Design Report, the TAC Reports and Recommendations both meetings), Site requirements and Site Design Assumptions, the Tentative Sequence of technical Activities 2000-2001, Report of the ITER SWG-P2 on Joint Implementation of ITER, EU/ITER Canada Proposal for New ITER Identification
Iterative solution of general sparse linear systems on clusters of workstations
Energy Technology Data Exchange (ETDEWEB)
Lo, Gen-Ching; Saad, Y. [Univ. of Minnesota, Minneapolis, MN (United States)
1996-12-31
Solving sparse irregularly structured linear systems on parallel platforms poses several challenges. First, sparsity makes it difficult to exploit data locality, whether in a distributed or shared memory environment. A second, perhaps more serious challenge, is to find efficient ways to precondition the system. Preconditioning techniques which have a large degree of parallelism, such as multicolor SSOR, often have a slower rate of convergence than their sequential counterparts. Finally, a number of other computational kernels such as inner products could ruin any gains gained from parallel speed-ups, and this is especially true on workstation clusters where start-up times may be high. In this paper we discuss these issues and report on our experience with PSPARSLIB, an on-going project for building a library of parallel iterative sparse matrix solvers.
Xu, Jincheng; Liu, Wei; Wang, Jin; Liu, Linong; Zhang, Jianfeng
2018-02-01
De-absorption pre-stack time migration (QPSTM) compensates for the absorption and dispersion of seismic waves by introducing an effective Q parameter, thereby making it an effective tool for 3D, high-resolution imaging of seismic data. Although the optimal aperture obtained via stationary-phase migration reduces the computational cost of 3D QPSTM and yields 3D stationary-phase QPSTM, the associated computational efficiency is still the main problem in the processing of 3D, high-resolution images for real large-scale seismic data. In the current paper, we proposed a division method for large-scale, 3D seismic data to optimize the performance of stationary-phase QPSTM on clusters of graphics processing units (GPU). Then, we designed an imaging point parallel strategy to achieve an optimal parallel computing performance. Afterward, we adopted an asynchronous double buffering scheme for multi-stream to perform the GPU/CPU parallel computing. Moreover, several key optimization strategies of computation and storage based on the compute unified device architecture (CUDA) were adopted to accelerate the 3D stationary-phase QPSTM algorithm. Compared with the initial GPU code, the implementation of the key optimization steps, including thread optimization, shared memory optimization, register optimization and special function units (SFU), greatly improved the efficiency. A numerical example employing real large-scale, 3D seismic data showed that our scheme is nearly 80 times faster than the CPU-QPSTM algorithm. Our GPU/CPU heterogeneous parallel computing framework significant reduces the computational cost and facilitates 3D high-resolution imaging for large-scale seismic data.
Linear Bregman algorithm implemented in parallel GPU
Li, Pengyan; Ke, Jue; Sui, Dong; Wei, Ping
2015-08-01
At present, most compressed sensing (CS) algorithms have poor converging speed, thus are difficult to run on PC. To deal with this issue, we use a parallel GPU, to implement a broadly used compressed sensing algorithm, the Linear Bregman algorithm. Linear iterative Bregman algorithm is a reconstruction algorithm proposed by Osher and Cai. Compared with other CS reconstruction algorithms, the linear Bregman algorithm only involves the vector and matrix multiplication and thresholding operation, and is simpler and more efficient for programming. We use C as a development language and adopt CUDA (Compute Unified Device Architecture) as parallel computing architectures. In this paper, we compared the parallel Bregman algorithm with traditional CPU realized Bregaman algorithm. In addition, we also compared the parallel Bregman algorithm with other CS reconstruction algorithms, such as OMP and TwIST algorithms. Compared with these two algorithms, the result of this paper shows that, the parallel Bregman algorithm needs shorter time, and thus is more convenient for real-time object reconstruction, which is important to people's fast growing demand to information technology.
IHadoop: Asynchronous iterations for MapReduce
Elnikety, Eslam Mohamed Ibrahim
2011-11-01
MapReduce is a distributed programming frame-work designed to ease the development of scalable data-intensive applications for large clusters of commodity machines. Most machine learning and data mining applications involve iterative computations over large datasets, such as the Web hyperlink structures and social network graphs. Yet, the MapReduce model does not efficiently support this important class of applications. The architecture of MapReduce, most critically its dataflow techniques and task scheduling, is completely unaware of the nature of iterative applications; tasks are scheduled according to a policy that optimizes the execution for a single iteration which wastes bandwidth, I/O, and CPU cycles when compared with an optimal execution for a consecutive set of iterations. This work presents iHadoop, a modified MapReduce model, and an associated implementation, optimized for iterative computations. The iHadoop model schedules iterations asynchronously. It connects the output of one iteration to the next, allowing both to process their data concurrently. iHadoop\\'s task scheduler exploits inter-iteration data locality by scheduling tasks that exhibit a producer/consumer relation on the same physical machine allowing a fast local data transfer. For those iterative applications that require satisfying certain criteria before termination, iHadoop runs the check concurrently during the execution of the subsequent iteration to further reduce the application\\'s latency. This paper also describes our implementation of the iHadoop model, and evaluates its performance against Hadoop, the widely used open source implementation of MapReduce. Experiments using different data analysis applications over real-world and synthetic datasets show that iHadoop performs better than Hadoop for iterative algorithms, reducing execution time of iterative applications by 25% on average. Furthermore, integrating iHadoop with HaLoop, a variant Hadoop implementation that caches
Energy Technology Data Exchange (ETDEWEB)
Lober, R.R.; Tautges, T.J.; Vaughan, C.T.
1997-03-01
Paving is an automated mesh generation algorithm which produces all-quadrilateral elements. It can additionally generate these elements in varying sizes such that the resulting mesh adapts to a function distribution, such as an error function. While powerful, conventional paving is a very serial algorithm in its operation. Parallel paving is the extension of serial paving into parallel environments to perform the same meshing functions as conventional paving only on distributed, discretized models. This extension allows large, adaptive, parallel finite element simulations to take advantage of paving`s meshing capabilities for h-remap remeshing. A significantly modified version of the CUBIT mesh generation code has been developed to host the parallel paving algorithm and demonstrate its capabilities on both two dimensional and three dimensional surface geometries and compare the resulting parallel produced meshes to conventionally paved meshes for mesh quality and algorithm performance. Sandia`s {open_quotes}tiling{close_quotes} dynamic load balancing code has also been extended to work with the paving algorithm to retain parallel efficiency as subdomains undergo iterative mesh refinement.
Grigoriev, Yu A.; Proletarskaya, V. A.; Ermakov, E. Yu; Ermakov, O. Yu
2017-10-01
A new method was developed with a cascading Bloom filter (CBF) for executing SQL queries in the Apache Spark parallel computing environment. It includes the representation of the original query in the form of several subqueries, the development of a connection graph and the transformation of subqueries, the definition of connections where it is necessary to use Bloom filters, the representation of the graph in terms of Spark. On the example of the query Q3 of the TPC-H test, full-scale experiments were carried out, which confirmed the effectiveness of the developed method.
Directory of Open Access Journals (Sweden)
Isaac Kinde
Full Text Available Massively parallel sequencing of cell-free, maternal plasma DNA was recently demonstrated to be a safe and effective screening method for fetal chromosomal aneuploidies. Here, we report an improved sequencing method achieving significantly increased throughput and decreased cost by replacing laborious sequencing library preparation steps with PCR employing a single primer pair designed to amplify a discrete subset of repeated regions. Using this approach, samples containing as little as 4% trisomy 21 DNA could be readily distinguished from euploid samples.
Directory of Open Access Journals (Sweden)
Fatiha Abdelmalek
2015-10-01
Full Text Available We report herein a two or three step synthesis of fluorinated π-conjugated oligomers through iterative C–H bond arylations. Palladium-catalyzed desulfitative arylation of heteroarenes allowed in a first step the synthesis of fluoroaryl-heteroarene units in high yields. Then, the next steps involve direct arylation with aryl bromides catalyzed by PdCl(C3H5(dppb to afford triad or tetrad heteroaromatic compounds via regioselective activation of C(sp2–H bonds.
Iterative Adaptive Sampling For Accurate Direct Illumination
National Research Council Canada - National Science Library
Donikian, Michael
2004-01-01
This thesis introduces a new multipass algorithm, Iterative Adaptive Sampling, for efficiently computing the direct illumination in scenes with many lights, including area lights that cause realistic soft shadows...
International Nuclear Information System (INIS)
Dubois, J.
2011-01-01
In science, simulation is a key process for research or validation. Modern computer technology allows faster numerical experiments, which are cheaper than real models. In the field of neutron simulation, the calculation of eigenvalues is one of the key challenges. The complexity of these problems is such that a lot of computing power may be necessary. The work of this thesis is first the evaluation of new computing hardware such as graphics card or massively multi-core chips, and their application to eigenvalue problems for neutron simulation. Then, in order to address the massive parallelism of supercomputers national, we also study the use of asynchronous hybrid methods for solving eigenvalue problems with this very high level of parallelism. Then we experiment the work of this research on several national supercomputers such as the Titane hybrid machine of the Computing Center, Research and Technology (CCRT), the Curie machine of the Very Large Computing Centre (TGCC), currently being installed, and the Hopper machine at the Lawrence Berkeley National Laboratory (LBNL). We also do our experiments on local workstations to illustrate the interest of this research in an everyday use with local computing resources. (author) [fr
Duan, Jizhong; Liu, Yu; Jing, Peiguang
2018-02-01
Self-consistent parallel imaging (SPIRiT) is an auto-calibrating model for the reconstruction of parallel magnetic resonance imaging, which can be formulated as a regularized SPIRiT problem. The Projection Over Convex Sets (POCS) method was used to solve the formulated regularized SPIRiT problem. However, the quality of the reconstructed image still needs to be improved. Though methods such as NonLinear Conjugate Gradients (NLCG) can achieve higher spatial resolution, these methods always demand very complex computation and converge slowly. In this paper, we propose a new algorithm to solve the formulated Cartesian SPIRiT problem with the JTV and JL1 regularization terms. The proposed algorithm uses the operator splitting (OS) technique to decompose the problem into a gradient problem and a denoising problem with two regularization terms, which is solved by our proposed split Bregman based denoising algorithm, and adopts the Barzilai and Borwein method to update step size. Simulation experiments on two in vivo data sets demonstrate that the proposed algorithm is 1.3 times faster than ADMM for datasets with 8 channels. Especially, our proposal is 2 times faster than ADMM for the dataset with 32 channels. Copyright © 2017 Elsevier Inc. All rights reserved.
ITER council proceedings: 1995
International Nuclear Information System (INIS)
1996-01-01
Records of the 8. ITER Council Meeting (IC-8), held on 26-27 July 1995, in San Diego, USA, and the 9. ITER Council Meeting (IC-9) held on 12-13 December 1995, in Garching, Germany, are presented, giving essential information on the evolution of the ITER Engineering Design Activities (EDA) and the ITER Interim Design Report Package and Relevant Documents. Figs, tabs
International Nuclear Information System (INIS)
Aymar, R.
1998-01-01
Six years of technical work under the ITER EDA Agreement have resulted in a design which constitutes a complete description of the ITER device and of its auxiliary systems and facilities. The ITER Council commented that the Final Design Report provides the first comprehensive design of a fusion reactor based on well established physics and technology
International Nuclear Information System (INIS)
Bosia, G.
1998-01-01
Neutral Beam Injection and RF heating are two of the methods for heating and current drive in ITER. The three ITER RF systems, which have been developed during the EDA, offer several complementary services and are able to fulfil ITER operational requirements
ITER council proceedings: 1999
International Nuclear Information System (INIS)
1999-01-01
In 1999 the ITER meeting in Cadarache (10-11 March 1999) and the Programme Directors Meeting in Grenoble (28-29 July 1999) took place. Both meetings were exclusively devoted to ITER engineering design activities and their agendas covered all issues important for the development of ITER. This volume presents the documents of these two important meetings
ITER council proceedings: 1996
International Nuclear Information System (INIS)
1997-01-01
Records of the 10. ITER Council Meeting (IC-10), held on 26-27 July 1996, in St. Petersburg, Russia, and the 11. ITER Council Meeting (IC-11) held on 17-18 December 1996, in Tokyo, Japan, are presented, giving essential information on the evolution of the ITER Engineering Design Activities (EDA) and the cost review and safety analysis. Figs, tabs
Development and test of the ITER conductor joints
Energy Technology Data Exchange (ETDEWEB)
Martovetsky, N., LLNL
1998-05-14
Joints for the ITER superconducting Central Solenoid should perform in rapidly varying magnetic field with low losses and low DC resistance. This paper describes the design of the ITER joint and presents its assembly process. Two joints were built and tested at the PTF facility at MIT. Test results are presented, losses in transverse and parallel field and the DC performance are discussed. The developed joint demonstrates sufficient margin for baseline ITER operating scenarios.
International Nuclear Information System (INIS)
Gordon, C.W.; Bartels, H.-W.; Honda, T.; Raeder, J.; Topilski, L.; Iseli, M.; Moshonas, K.; Taylor, N.; Gulden, W.; Kolbasov, B.; Inabe, T.; Tada, E.
2001-01-01
Safety has been an integral part of the design process for ITER since the Conceptual Design Activities of the project. The safety approach adopted in the ITER-FEAT design and the complementary assessments underway, to be documented in the Generic Site Safety Report (GSSR), are expected to help demonstrate the attractiveness of fusion and thereby set a good precedent for future fusion power reactors. The assessments address ITER's radiological hazards taking into account fusion's favourable safety characteristics. The expectation that ITER will need regulatory approval has influenced the entire safety design and assessment approach. This paper summarises the ITER-FEAT safety approach and assessments underway. (author)
A parallel algorithm for solving the integral form of the discrete ordinates equations
International Nuclear Information System (INIS)
Zerr, R. J.; Azmy, Y. Y.
2009-01-01
The integral form of the discrete ordinates equations involves a system of equations that has a large, dense coefficient matrix. The serial construction methodology is presented and properties that affect the execution times to construct and solve the system are evaluated. Two approaches for massively parallel implementation of the solution algorithm are proposed and the current results of one of these are presented. The system of equations May be solved using two parallel solvers-block Jacobi and conjugate gradient. Results indicate that both methods can reduce overall wall-clock time for execution. The conjugate gradient solver exhibits better performance to compete with the traditional source iteration technique in terms of execution time and scalability. The parallel conjugate gradient method is synchronous, hence it does not increase the number of iterations for convergence compared to serial execution, and the efficiency of the algorithm demonstrates an apparent asymptotic decline. (authors)
Brühlmann, David; Sokolov, Michael; Butté, Alessandro; Sauer, Markus; Hemberger, Jürgen; Souquet, Jonathan; Broly, Hervé; Jordan, Martin
2017-07-01
Rational and high-throughput optimization of mammalian cell culture media has a great potential to modulate recombinant protein product quality. We present a process design method based on parallel design-of-experiment (DoE) of CHO fed-batch cultures in 96-deepwell plates to modulate monoclonal antibody (mAb) glycosylation using medium supplements. To reduce the risk of losing valuable information in an intricate joint screening, 17 compounds were separated into five different groups, considering their mode of biological action. The concentration ranges of the medium supplements were defined according to information encountered in the literature and in-house experience. The screening experiments produced wide glycosylation pattern ranges. Multivariate analysis including principal component analysis and decision trees was used to select the best performing glycosylation modulators. Subsequent D-optimal quadratic design with four factors (three promising compounds and temperature shift) in shake tubes confirmed the outcome of the selection process and provided a solid basis for sequential process development at a larger scale. The glycosylation profile with respect to the specifications for biosimilarity was greatly improved in shake tube experiments: 75% of the conditions were equally close or closer to the specifications for biosimilarity than the best 25% in 96-deepwell plates. Biotechnol. Bioeng. 2017;114: 1448-1458. © 2017 Wiley Periodicals, Inc. © 2017 Wiley Periodicals, Inc.
DOVIS 2.0: an efficient and easy to use parallel virtual screening tool based on AutoDock 4.0
Directory of Open Access Journals (Sweden)
Wallqvist Anders
2008-09-01
Full Text Available Abstract Background Small-molecule docking is an important tool in studying receptor-ligand interactions and in identifying potential drug candidates. Previously, we developed a software tool (DOVIS to perform large-scale virtual screening of small molecules in parallel on Linux clusters, using AutoDock 3.05 as the docking engine. DOVIS enables the seamless screening of millions of compounds on high-performance computing platforms. In this paper, we report significant advances in the software implementation of DOVIS 2.0, including enhanced screening capability, improved file system efficiency, and extended usability. Implementation To keep DOVIS up-to-date, we upgraded the software's docking engine to the more accurate AutoDock 4.0 code. We developed a new parallelization scheme to improve runtime efficiency and modified the AutoDock code to reduce excessive file operations during large-scale virtual screening jobs. We also implemented an algorithm to output docked ligands in an industry standard format, sd-file format, which can be easily interfaced with other modeling programs. Finally, we constructed a wrapper-script interface to enable automatic rescoring of docked ligands by arbitrarily selected third-party scoring programs. Conclusion The significance of the new DOVIS 2.0 software compared with the previous version lies in its improved performance and usability. The new version makes the computation highly efficient by automating load balancing, significantly reducing excessive file operations by more than 95%, providing outputs that conform to industry standard sd-file format, and providing a general wrapper-script interface for rescoring of docked ligands. The new DOVIS 2.0 package is freely available to the public under the GNU General Public License.
DOVIS 2.0: an efficient and easy to use parallel virtual screening tool based on AutoDock 4.0.
Jiang, Xiaohui; Kumar, Kamal; Hu, Xin; Wallqvist, Anders; Reifman, Jaques
2008-09-08
Small-molecule docking is an important tool in studying receptor-ligand interactions and in identifying potential drug candidates. Previously, we developed a software tool (DOVIS) to perform large-scale virtual screening of small molecules in parallel on Linux clusters, using AutoDock 3.05 as the docking engine. DOVIS enables the seamless screening of millions of compounds on high-performance computing platforms. In this paper, we report significant advances in the software implementation of DOVIS 2.0, including enhanced screening capability, improved file system efficiency, and extended usability. To keep DOVIS up-to-date, we upgraded the software's docking engine to the more accurate AutoDock 4.0 code. We developed a new parallelization scheme to improve runtime efficiency and modified the AutoDock code to reduce excessive file operations during large-scale virtual screening jobs. We also implemented an algorithm to output docked ligands in an industry standard format, sd-file format, which can be easily interfaced with other modeling programs. Finally, we constructed a wrapper-script interface to enable automatic rescoring of docked ligands by arbitrarily selected third-party scoring programs. The significance of the new DOVIS 2.0 software compared with the previous version lies in its improved performance and usability. The new version makes the computation highly efficient by automating load balancing, significantly reducing excessive file operations by more than 95%, providing outputs that conform to industry standard sd-file format, and providing a general wrapper-script interface for rescoring of docked ligands. The new DOVIS 2.0 package is freely available to the public under the GNU General Public License.
Parallel Atomistic Simulations
Energy Technology Data Exchange (ETDEWEB)
HEFFELFINGER,GRANT S.
2000-01-18
Algorithms developed to enable the use of atomistic molecular simulation methods with parallel computers are reviewed. Methods appropriate for bonded as well as non-bonded (and charged) interactions are included. While strategies for obtaining parallel molecular simulations have been developed for the full variety of atomistic simulation methods, molecular dynamics and Monte Carlo have received the most attention. Three main types of parallel molecular dynamics simulations have been developed, the replicated data decomposition, the spatial decomposition, and the force decomposition. For Monte Carlo simulations, parallel algorithms have been developed which can be divided into two categories, those which require a modified Markov chain and those which do not. Parallel algorithms developed for other simulation methods such as Gibbs ensemble Monte Carlo, grand canonical molecular dynamics, and Monte Carlo methods for protein structure determination are also reviewed and issues such as how to measure parallel efficiency, especially in the case of parallel Monte Carlo algorithms with modified Markov chains are discussed.
Parallel processing of two-dimensional Sn transport calculations
International Nuclear Information System (INIS)
Uematsu, M.
1997-01-01
A parallel processing method for the two-dimensional S n transport code DOT3.5 has been developed to achieve a drastic reduction in computation time. In the proposed method, parallelization is achieved with angular domain decomposition and/or space domain decomposition. The calculational speed of parallel processing by angular domain decomposition is largely influenced by frequent communications between processing elements. To assess parallelization efficiency, sample problems with up to 32 x 32 spatial meshes were solved with a Sun workstation using the PVM message-passing library. As a result, parallel calculation using 16 processing elements, for example, was found to be nine times as fast as that with one processing element. As for parallel processing by geometry segmentation, the influence of processing element communications on computation time is small; however, discontinuity at the segment boundary degrades convergence speed. To accelerate the convergence, an alternate sweep of angular flux in conjunction with space domain decomposition and a two-step rescaling method consisting of segmentwise rescaling and ordinary pointwise rescaling have been developed. By applying the developed method, the number of iterations needed to obtain a converged flux solution was reduced by a factor of 2. As a result, parallel calculation using 16 processing elements was found to be 5.98 times as fast as the original DOT3.5 calculation
ITER Neutral Beam Injection System
International Nuclear Information System (INIS)
Ohara, Yoshihiro; Tanaka, Shigeru; Akiba, Masato
1991-03-01
A Japanese design proposal of the ITER Neutral Beam Injection System (NBS) which is consistent with the ITER common design requirements is described. The injection system is required to deliver a neutral deuterium beam of 75MW at 1.3MeV to the reactor plasma and utilized not only for plasma heating but also for current drive and current profile control. The injection system is composed of 9 modules, each of which is designed so as to inject a 1.3MeV, 10MW neutral beam. The most important point in the design is that the injection system is based on the utilization of a cesium-seeded volume negative ion source which can produce an intense negative ion beam with high current density at a low source operating pressure. The design value of the source is based on the experimental values achieved at JAERI. The utilization of the cesium-seeded volume source is essential to the design of an efficient and compact neutral beam injection system which satisfies the ITER common design requirements. The critical components to realize this design are the 1.3MeV, 17A electrostatic accelerator and the high voltage DC acceleration power supply, whose performances must be demonstrated prior to the construction of ITER NBI system. (author)
Li, Yiming; Ishitsuka, Yuji; Hedde, Per Niklas; Nienhaus, G Ulrich
2013-06-25
In localization-based super-resolution microscopy, individual fluorescent markers are stochastically photoactivated and subsequently localized within a series of camera frames, yielding a final image with a resolution far beyond the diffraction limit. Yet, before localization can be performed, the subregions within the frames where the individual molecules are present have to be identified-oftentimes in the presence of high background. In this work, we address the importance of reliable molecule identification for the quality of the final reconstructed super-resolution image. We present a fast and robust algorithm (a-livePALM) that vastly improves the molecule detection efficiency while minimizing false assignments that can lead to image artifacts.
International Nuclear Information System (INIS)
Abdou, M.; Baker, C.; Casini, G.
1991-01-01
ITER has been designed to operate in two phases. The first phase which lasts for 6 years, is devoted to machine checkout and physics testing. The second phase lasts for 8 years and is devoted primarily to technology testing. This report describes the technology test program development for ITER, the ancillary equipment outside the torus necessary to support the test modules, the international collaboration aspects of conducting the test program on ITER, the requirements on the machine major parameters and the R and D program required to develop the test modules for testing in ITER. 15 refs, figs and tabs
Parallel External Memory Graph Algorithms
DEFF Research Database (Denmark)
Arge, Lars Allan; Goodrich, Michael T.; Sitchinava, Nodari
2010-01-01
In this paper, we study parallel I/O efficient graph algorithms in the Parallel External Memory (PEM) model, one o f the private-cache chip multiprocessor (CMP) models. We study the fundamental problem of list ranking which leads to efficient solutions to problems on trees, such as computing lowest...... an optimal speedup of Â¿(P) in parallel I/O complexity and parallel computation time, compared to the single-processor external memory counterparts....
Agile parallel bioinformatics workflow management using Pwrake.
Mishima, Hiroyuki; Sasaki, Kensaku; Tanaka, Masahiro; Tatebe, Osamu; Yoshiura, Koh-Ichiro
2011-09-08
In bioinformatics projects, scientific workflow systems are widely used to manage computational procedures. Full-featured workflow systems have been proposed to fulfil the demand for workflow management. However, such systems tend to be over-weighted for actual bioinformatics practices. We realize that quick deployment of cutting-edge software implementing advanced algorithms and data formats, and continuous adaptation to changes in computational resources and the environment are often prioritized in scientific workflow management. These features have a greater affinity with the agile software development method through iterative development phases after trial and error.Here, we show the application of a scientific workflow system Pwrake to bioinformatics workflows. Pwrake is a parallel workflow extension of Ruby's standard build tool Rake, the flexibility of which has been demonstrated in the astronomy domain. Therefore, we hypothesize that Pwrake also has advantages in actual bioinformatics workflows. We implemented the Pwrake workflows to process next generation sequencing data using the Genomic Analysis Toolkit (GATK) and Dindel. GATK and Dindel workflows are typical examples of sequential and parallel workflows, respectively. We found that in practice, actual scientific workflow development iterates over two phases, the workflow definition phase and the parameter adjustment phase. We introduced separate workflow definitions to help focus on each of the two developmental phases, as well as helper methods to simplify the descriptions. This approach increased iterative development efficiency. Moreover, we implemented combined workflows to demonstrate modularity of the GATK and Dindel workflows. Pwrake enables agile management of scientific workflows in the bioinformatics domain. The internal domain specific language design built on Ruby gives the flexibility of rakefiles for writing scientific workflows. Furthermore, readability and maintainability of rakefiles
Agile parallel bioinformatics workflow management using Pwrake
2011-01-01
Background In bioinformatics projects, scientific workflow systems are widely used to manage computational procedures. Full-featured workflow systems have been proposed to fulfil the demand for workflow management. However, such systems tend to be over-weighted for actual bioinformatics practices. We realize that quick deployment of cutting-edge software implementing advanced algorithms and data formats, and continuous adaptation to changes in computational resources and the environment are often prioritized in scientific workflow management. These features have a greater affinity with the agile software development method through iterative development phases after trial and error. Here, we show the application of a scientific workflow system Pwrake to bioinformatics workflows. Pwrake is a parallel workflow extension of Ruby's standard build tool Rake, the flexibility of which has been demonstrated in the astronomy domain. Therefore, we hypothesize that Pwrake also has advantages in actual bioinformatics workflows. Findings We implemented the Pwrake workflows to process next generation sequencing data using the Genomic Analysis Toolkit (GATK) and Dindel. GATK and Dindel workflows are typical examples of sequential and parallel workflows, respectively. We found that in practice, actual scientific workflow development iterates over two phases, the workflow definition phase and the parameter adjustment phase. We introduced separate workflow definitions to help focus on each of the two developmental phases, as well as helper methods to simplify the descriptions. This approach increased iterative development efficiency. Moreover, we implemented combined workflows to demonstrate modularity of the GATK and Dindel workflows. Conclusions Pwrake enables agile management of scientific workflows in the bioinformatics domain. The internal domain specific language design built on Ruby gives the flexibility of rakefiles for writing scientific workflows. Furthermore, readability
Park, Seong-Wook; Park, Junyoung; Bong, Kyeongryeol; Shin, Dongjoo; Lee, Jinmook; Choi, Sungpill; Yoo, Hoi-Jun
2015-12-01
Deep Learning algorithm is widely used for various pattern recognition applications such as text recognition, object recognition and action recognition because of its best-in-class recognition accuracy compared to hand-crafted algorithm and shallow learning based algorithms. Long learning time caused by its complex structure, however, limits its usage only in high-cost servers or many-core GPU platforms so far. On the other hand, the demand on customized pattern recognition within personal devices will grow gradually as more deep learning applications will be developed. This paper presents a SoC implementation to enable deep learning applications to run with low cost platforms such as mobile or portable devices. Different from conventional works which have adopted massively-parallel architecture, this work adopts task-flexible architecture and exploits multiple parallelism to cover complex functions of convolutional deep belief network which is one of popular deep learning/inference algorithms. In this paper, we implement the most energy-efficient deep learning and inference processor for wearable system. The implemented 2.5 mm × 4.0 mm deep learning/inference processor is fabricated using 65 nm 8-metal CMOS technology for a battery-powered platform with real-time deep inference and deep learning operation. It consumes 185 mW average power, and 213.1 mW peak power at 200 MHz operating frequency and 1.2 V supply voltage. It achieves 411.3 GOPS peak performance and 1.93 TOPS/W energy efficiency, which is 2.07× higher than the state-of-the-art.
International Nuclear Information System (INIS)
Rosa, M.; Warsa, J. S.; Chang, J. H.
2007-01-01
A Fourier analysis is conducted in two-dimensional (2D) Cartesian geometry for the discrete-ordinates (SN) approximation of the neutron transport problem solved with Richardson iteration (Source Iteration) and Richardson iteration preconditioned with Transport Synthetic Acceleration (TSA), using the Parallel Block-Jacobi (PBJ) algorithm. The results for the un-accelerated algorithm show that convergence of PBJ can degrade, leading in particular to stagnation of GMRES(m) in problems containing optically thin sub-domains. The results for the accelerated algorithm indicate that TSA can be used to efficiently precondition an iterative method in the optically thin case when implemented in the 'modified' version MTSA, in which only the scattering in the low order equations is reduced by some non-negative factor β<1. (authors)
Distributed Parallel Endmember Extraction of Hyperspectral Data Based on Spark
Directory of Open Access Journals (Sweden)
Zebin Wu
2016-01-01
Full Text Available Due to the increasing dimensionality and volume of remotely sensed hyperspectral data, the development of acceleration techniques for massive hyperspectral image analysis approaches is a very important challenge. Cloud computing offers many possibilities of distributed processing of hyperspectral datasets. This paper proposes a novel distributed parallel endmember extraction method based on iterative error analysis that utilizes cloud computing principles to efficiently process massive hyperspectral data. The proposed method takes advantage of technologies including MapReduce programming model, Hadoop Distributed File System (HDFS, and Apache Spark to realize distributed parallel implementation for hyperspectral endmember extraction, which significantly accelerates the computation of hyperspectral processing and provides high throughput access to large hyperspectral data. The experimental results, which are obtained by extracting endmembers of hyperspectral datasets on a cloud computing platform built on a cluster, demonstrate the effectiveness and computational efficiency of the proposed method.
Fuel cycle design for ITER and its extrapolation to DEMO
International Nuclear Information System (INIS)
Konishi, Satoshi; Glugla, Manfred; Hayashi, Takumi
2008-01-01
ITER is the first fusion device that continuously processes DT plasma exhaust and supplies recycled fuel in a closed loop. All the tritium and deuterium in the exhaust are recovered, purified and returned to the tokamak with minimal delay, so that extended burn can be sustained with limited inventory. To maintain the safety of the entire facility, plant scale detritiation systems will also continuously run to remove tritium from the effluents at the maximum efficiency. In this entire tritium plant system, extremely high decontamination factor, that is the ratio of the tritium loss to the processing flow rate, is required for fuel economy and minimized tritium emissions, and the system design based on the state-of-the-art technology is expected to satisfy all the requirements without significant technical challenges. Considerable part of the fusion tritium system will be verified with ITER and its decades of operation experiences. Toward the DEMO plant that will actually generate energy and operate its closed fuel cycle, breeding blanket and power train that caries high temperature and pressure media from the fusion device to the generation system will be the major addition. For the tritium confinement, safety and environmental emission, particularly blanket, its coolant, and generation systems such as heat exchanger, steam generator and turbine will be the critical systems, because the tritium permeation from the breeder and handling large amount of high temperature, high pressure coolant will be further more difficult than that required for ITER. Detritiation of solid waste such as used blanket and divertor will be another issue for both tritium economy and safety. Unlike in the case of ITER that is regarded as experimental facility, DEMO will be expected to demonstrate the safety, reliability and social acceptance issue, even if economical feature is excluded. Fuel and environmental issue to be tested in the DEMO will determine the viability of the fusion as a
Wu, Tao; Wu, Zhensen; Linghu, Longxiang
2017-10-01
Study of characteristics of sea clutter is very important for signal processing of radar, detection of targets on sea surface and remote sensing. The sea state is complex at Low grazing angle (LGA), and it is difficult with its large irradiation area and a great deal simulation facets. A practical and efficient model to obtain radar clutter of dynamic sea in different sea condition is proposed, basing on the physical mechanism of interaction between electromagnetic wave and sea wave. The classical analysis method for sea clutter is basing on amplitude and spectrum distribution, taking the clutter as random processing model, which is equivocal in its physical mechanism. To achieve electromagnetic field from sea surface, a modified phase from facets is considered, and the backscattering coefficient is calculated by Wu's improved two-scale model, which can solve the statistical sea backscattering problem less than 5 degree, considering the effects of the surface slopes joint probability density, the shadowing function, the skewness of sea waves and the curvature of the surface on the backscattering from the ocean surface. We make the assumption that the scattering contribution of each facet is independent, the total field is the superposition of each facet in the receiving direction. Such data characters are very suitable to compute on GPU threads. So we can make the best of GPU resource. We have achieved a speedup of 155-fold for S band and 162-fold for Ku/Χ band on the Tesla K80 GPU as compared with Intel® Core™ CPU. In this paper, we mainly study the high resolution data, and the time resolution is millisecond, so we may have 10,00 time points, and we analyze amplitude probability density distribution of radar clutter.
International Nuclear Information System (INIS)
Roberts, M.
2003-01-01
Upon pressure from the United States Congress, the US Department of Energy had to withdraw from further American participation in the ITER Engineering Design Activities after the end of its commitment to the EDA in July 1998. In the years since that time, changes have taken place in both the ITER activity and the US fusion community's position on burning plasma physics. Reflecting the interest in the United States in pursuing burning plasma physics, the DOE's Office of Science commissioned three studies as part of its examination of the option of entering the Negotiations on the Agreement on the Establishment of the International Fusion Energy Organization for the Joint Implementation of the ITER Project. These were a National Academy Review Panel Report supporting the burning plasma mission; a Fusion Energy Sciences Advisory Committee (FESAC) report confirming the role of ITER in achieving fusion power production, and The Lehman Review of the ITER project costing and project management processes (for the latter one, see ITER CTA Newsletter, no. 15, December 2002). All three studies have endorsed the US return to the ITER activities. This historical decision was announced by DOE Secretary Abraham during his remarks to employees of the Department's Princeton Plasma Physics Laboratory. The United States will be working with the other Participants in the ITER Negotiations on the Agreement and is preparing to participate in the ITA
ITER at Cadarache; ITER a Cadarache
Energy Technology Data Exchange (ETDEWEB)
NONE
2005-06-15
This public information document presents the ITER project (International Thermonuclear Experimental Reactor), the definition of the fusion, the international cooperation and the advantages of the project. It presents also the site of Cadarache, an appropriate scientifical and economical environment. The last part of the documentation recalls the historical aspect of the project and the today mobilization of all partners. (A.L.B.)
ITER council proceedings: 1992
International Nuclear Information System (INIS)
1994-01-01
At the signing of the ITER EDA Agreement on July, 1992, each of the Parties presented to the Director General the names of their designated members of the ITER Council. Upon receiving those names, the Director General stated that the ITER Engineering Design Activities were ''ready to begin''. The next step in this process was the convening of the first meeting of the ITER Council. The first meeting of the Council, held in Vienna, was opened by Director General Hans Blix. The second meeting was held in Moscow, the formal seat of the Council. This volume presents records of these first two Council meetings and, together with the previous volumes on the text of the Agreement and Protocol 1 and the preparations for their signing respectively, represents essential information on the evolution of the ITER EDA
International Nuclear Information System (INIS)
2001-11-01
This ITER CTA newsletter comprises reports of Dr. P. Barnard, Iter Canada Chairman and CEO, about the progress of the first formal ITER negotiations and about the demonstration of details of Canada's bid on ITER workshops, and Dr. V. Vlasenkov, Project Board Secretary, about the meeting of the ITER CTA project board
LHCD and coupling experiments with an ITER-like PAM launcher on the FTU tokamak
International Nuclear Information System (INIS)
Pericoli Ridolfini, V.; Apicella, M.L.; Barbato, E.; Buratti, P.; Calabro, G.; Cardinali, A.; Mirizzi, F.; Panaccione, L.; Podda, S.; Tuccillo, A.A.; Bibet, Ph.; Granucci, G.; Sozzi, C.
2005-01-01
Successful experimental tests on a PAM (passive active multijunction) prototype antenna for the Lower Hybrid (LH) waves similar to that foreseen for ITER have been carried out on FTU. The power level routinely achieved without any fault in the transmission lines for the maximum time allowed by the LH power plant, i.e. 0.9 s, is 250 kW versus a design value of 270. It corresponds to 50 MW/m 2 through the ITER antenna active area if it is scaled for the different LH frequencies (5 GHz in ITER, 8 GHz in FTU) and it is more than 1.4 times the goal of the ITER design (33 MW/m 2 ). The test results validate the main features indicated by the simulation codes, concerning the power handling, the coupling and the launched N parallel spectrum. The power reflection coefficient R c is always ≤ 2.5%, once the PAM launcher has been properly conditioned, even with the grill mouth retracted 2 mm inside the port shadow, with density in front of the launcher very close or even lower than the cut-off value. The current drive efficiency is comparable to a conventional grill in similar conditions, once the lower directivity is taken into account. The flexibility in the N parallel spectrum is confirmed by the HXR and ECE spectra. Conditioning the PAM to operate at the ITER equivalent power level has required only one day of RF operation, without a previous baking of the waveguides. (author)
ITER Fast Ion Collective Thomson Scattering
DEFF Research Database (Denmark)
Bindslev, Henrik; Larsen, Axel Wright; Meo, Fernando
2005-01-01
The EFDA Contract 04-1213 with Risø National Laboratory concerning a detailed integrated design of a Fast Ion Collective Thomson Scattering (CTS) diagnostic for ITER was signed on 31 December 2004. In 2003 the Risø CTS group finished a feasibility study and a conceptual design of an ITER Fast Ion...... Collective Thomson Scattering System (Contract 01.654) [1, 2]. The purpose of the CTS diagnostic is to measure the distribution function of fast ions in the plasma. The feasibility study demonstrated that the only system that can fully meet the ITER measurement requirements for confined fusion alphas is a 60...... GHz system. The study showed that with two powerful microwave sources of this frequency (gyrotron) and two antenna systems, one on the low field side (LFS) and one on the high field side (HFS), it should be possible to resolve the distribution function of fast ions both for perpendicular and parallel...
Design iteration in construction projects – Review and directions
Directory of Open Access Journals (Sweden)
Purva Mujumdar
2018-03-01
Full Text Available Design phase of any construction project involves several designers who exchange information with each other most often in an unstructured manner throughout the design phase. When these information exchanges happen to occur in cycles/loops, it is termed as design iteration. Iteration is an inherent and unavoidable aspect of any design phase which requires proper planning. Till date, very few researchers have explored the design iteration (“complexity” in construction sector. Hence, the objective of this paper was to document and review the complexities of iteration during design phase of construction projects for efficient design planning. To achieve this objective, exhaustive literature review on design iteration was done for four sectors – construction, manufacturing, aerospace, and software development. In addition, semi-structured interviews and discussions were done with a few design experts to verify the different dimensions of iteration. Finally, a design iteration framework was presented in this study that facilitates successful planning. Keywords: Design iteration, Types of iteration, Causes and impact of iteration, Models of iteration, Execution strategies of iteration
Lovelace, Kay A; Shah, Gulzar H
2016-01-01
The objective of this case study was to describe the process and outcomes of a small local health department's (LHD's) strategy to build and use information systems. The case study is based on a review of documents and semi-structured interviews with key informants in the Pomperaug District Health Department. Interviews were recorded, transcribed, coded, and analyzed. The case study here suggests that small LHDs can use a low-resource, incremental strategy to build information systems for improving departmental effectiveness and efficiency. Specifically, we suggest that the elements for this department's success were simple information systems, clear vision, consistent leadership, and the involvement, training, and support of staff.
Non-iterative Voltage Stability
Energy Technology Data Exchange (ETDEWEB)
Makarov, Yuri V. [Pacific Northwest National Lab. (PNNL), Richland, WA (United States); Vyakaranam, Bharat [Pacific Northwest National Lab. (PNNL), Richland, WA (United States); Hou, Zhangshuan [Pacific Northwest National Lab. (PNNL), Richland, WA (United States); Wu, Di [Pacific Northwest National Lab. (PNNL), Richland, WA (United States); Meng, Da [Pacific Northwest National Lab. (PNNL), Richland, WA (United States); Wang, Shaobu [Pacific Northwest National Lab. (PNNL), Richland, WA (United States); Elbert, Stephen T. [Pacific Northwest National Lab. (PNNL), Richland, WA (United States); Miller, Laurie E. [Pacific Northwest National Lab. (PNNL), Richland, WA (United States); Huang, Zhenyu [Pacific Northwest National Lab. (PNNL), Richland, WA (United States)
2014-09-30
This report demonstrates promising capabilities and performance characteristics of the proposed method using several power systems models. The new method will help to develop a new generation of highly efficient tools suitable for real-time parallel implementation. The ultimate benefit obtained will be early detection of system instability and prevention of system blackouts in real time.
Fast iterative censoring CFAR algorithm for ship detection from SAR images
Gu, Dandan; Yue, Hui; Zhang, Yuan; Gao, Pengcheng
2017-11-01
Ship detection is one of the essential techniques for ship recognition from synthetic aperture radar (SAR) images. This paper presents a fast iterative detection procedure to eliminate the influence of target returns on the estimation of local sea clutter distributions for constant false alarm rate (CFAR) detectors. A fast block detector is first employed to extract potential target sub-images; and then, an iterative censoring CFAR algorithm is used to detect ship candidates from each target blocks adaptively and efficiently, where parallel detection is available, and statistical parameters of G0 distribution fitting local sea clutter well can be quickly estimated based on an integral image operator. Experimental results of TerraSAR-X images demonstrate the effectiveness of the proposed technique.
Parallelization of pressure equation solver for incompressible N-S equations
International Nuclear Information System (INIS)
Ichihara, Kiyoshi; Yokokawa, Mitsuo; Kaburaki, Hideo.
1996-03-01
A pressure equation solver in a code for 3-dimensional incompressible flow analysis has been parallelized by using red-black SOR method and PCG method on Fujitsu VPP500, a vector parallel computer with distributed memory. For the comparison of scalability, the solver using the red-black SOR method has been also parallelized on the Intel Paragon, a scalar parallel computer with a distributed memory. The scalability of the red-black SOR method on both VPP500 and Paragon was lost, when number of processor elements was increased. The reason of non-scalability on both systems is increasing communication time between processor elements. In addition, the parallelization by DO-loop division makes the vectorizing efficiency lower on VPP500. For an effective implementation on VPP500, a large scale problem which holds very long vectorized DO-loops in the parallel program should be solved. PCG method with red-black SOR method applied to incomplete LU factorization (red-black PCG) has more iteration steps than normal PCG method with forward and backward substitution, in spite of same number of the floating point operations in a DO-loop of incomplete LU factorization. The parallelized red-black PCG method has less merits than the parallelized red-black SOR method when the computational region has fewer grids, because the low vectorization efficiency is obtained in red-black PCG method. (author)
Iterative solution of high order compact systems
Energy Technology Data Exchange (ETDEWEB)
Spotz, W.F.; Carey, G.F. [Univ. of Texas, Austin, TX (United States)
1996-12-31
We have recently developed a class of finite difference methods which provide higher accuracy and greater stability than standard central or upwind difference methods, but still reside on a compact patch of grid cells. In the present study we investigate the performance of several gradient-type iterative methods for solving the associated sparse systems. Both serial and parallel performance studies have been made. Representative examples are taken from elliptic PDE`s for diffusion, convection-diffusion, and viscous flow applications.
Block quasi-minimal residual iterations for non-Hermitian linear systems
Energy Technology Data Exchange (ETDEWEB)
Freund, R.W. [AT& T Bell Labs., Murray Hill, NJ (United States)
1994-12-31
Many applications require the solution of multiple linear systems that have the same coefficient matrix, but differ only in their right-hand sides. Instead of applying an iterative method to each of these systems individually, it is usually more efficient to employ a block version of the method that generates blocks of iterates for all the systems simultaneously. An example of such an iteration is the block conjugate gradient algorithm, which was first studied by Underwood and O`Leary. On parallel architectures, block versions of conjugate gradient-type methods are attractive even for the solution of single linear systems, since they have fewer synchronization points than the standard versions of these algorithms. In this talk, the author presents a block version of Freund and Nachtigal`s quasi-minimal residual (QMR) method for the iterative solution of non-Hermitian linear systems. He describes two different implementations of the block-QMR method, one based on a block version of the three-term Lanczos algorithm and one based on coupled two-term block recurrences. In both cases, the underlying block-Lanczos process still allows arbitrary normalizations of the vectors within each block, and the author discusses different normalization strategies. To maintain linear independence within each block, it is usually necessary to reduce the block size in the course of the iteration, and the author describes a deflation technique for performing this reduction. He also present some convergence results, and reports results of numerical experiments with the block-QMR method. Finally, the author discusses possible block versions of transpose-free Lanczos-based iterations such as the TFQMR method.
International Nuclear Information System (INIS)
Golubchikov, L.
2001-01-01
In connection with the successful completion of the Engineering Design of the International Thermonuclear Reactor (ITER) and the 50th anniversary of fusion research in the USSR, the Ministry of the Russian Federation for Atomic Energy (Minatom) with the participation of the Russian Academy of Sciences, organized the International Symposium 'ITER days in Moscow' on 7-8 June 2001. About 250 people from more than 20 states took part in the Meeting. The participants welcomed the R and D results of the ITER project and considered it as a necessary step to establish a basis for a fusion energy source. There were also some scientific presentations on the following topics: ITER physics basis; Effect of fusion research on general physics; Fusion power reactors; US interests in burning plasma
International Nuclear Information System (INIS)
1989-01-01
The International Thermonuclear Experimental Reactor (ITER) is envisioned as a fusion device which would demonstrate the scientific and technological feasibility of fusion power. As a first step towards achieving this goal, the European Community, Japan, the Soviet Union, and the United States of America have entered into joint conceptual design activities under the auspices of the International Atomic Energy Agency. A brief summary of the Definition Phase of ITER activities is contained in this report. Included in this report are the background, objectives, organization, definition phase activities, and research and development plan of this endeavor in international scientific collaboration. A more extended technical summary is contained in the two-volume report, ''ITER Concept Definition,'' IAEA/ITER/DS/3. 2 figs, 2 tabs
Benfatto, I
2006-01-01
The International Thermonuclear Experimental Reactor (ITER) is a thermonuclear fusion experiment designed to provide long deuterium– tritium burning plasma operation. After a short description of ITER objectives, the main design parameters and the construction schedule, the paper describes the electrical characteristics of the French 400 kV grid at Cadarache: the European site proposed for ITER. Moreover, the paper describes the main requirements and features of the power converters designed for the ITER coil and additional heating power supplies, characterized by a total installed power of about 1.8 GVA, modular design with basic units up to 90 MVA continuous duty, dc currents up to 68 kA, and voltages from 1 kV to 1 MV dc.
Approximate iterative algorithms
Almudevar, Anthony Louis
2014-01-01
Iterative algorithms often rely on approximate evaluation techniques, which may include statistical estimation, computer simulation or functional approximation. This volume presents methods for the study of approximate iterative algorithms, providing tools for the derivation of error bounds and convergence rates, and for the optimal design of such algorithms. Techniques of functional analysis are used to derive analytical relationships between approximation methods and convergence properties for general classes of algorithms. This work provides the necessary background in functional analysis a
International Nuclear Information System (INIS)
Baker, C.C.
2001-01-01
The year 1998 was the culmination of the six-year Engineering Design Activities (EDA) of the International Thermonuclear Experimental Reactor (ITER) Project. The EDA results in design and validating technology R and D, plus the associated effort in voluntary physics research, is a significant achievement and major milestone in the history of magnetic fusion energy development. Consequently, the ITER EDA was a major theme at this Conference, contributing almost 40 papers
International Nuclear Information System (INIS)
Golubchikov, L.
2000-01-01
Opening this first Explorers' Meeting, Minister Adamov welcomed the participants, thanked the ITER parties for their positive response to his invitation and expressed the desire of the Russian Federation to see ITER realized, stressing the importance of continued progress with the project as an outstanding example of international scientific co-operation. During the meeting, the exploration tasks were discussed and agreed upon, as well as the work plan and schedule
Distributed Video Coding: Iterative Improvements
DEFF Research Database (Denmark)
Luong, Huynh Van
at the decoder side offering such benefits for these applications. Although there have been some advanced improvement techniques, improving the DVC coding efficiency is still challenging. The thesis addresses this challenge by proposing several iterative algorithms at different working levels, e.g. bitplane...... and noise modeling and also learn from the previous decoded Wyner-Ziv (WZ) frames, side information and noise learning (SING) is proposed. The SING scheme introduces an optical flow technique to compensate the weaknesses of the block based SI generation and also utilizes clustering of DCT blocks to capture...
Crockett, Thomas W.
1995-01-01
This article provides a broad introduction to the subject of parallel rendering, encompassing both hardware and software systems. The focus is on the underlying concepts and the issues which arise in the design of parallel rendering algorithms and systems. We examine the different types of parallelism and how they can be applied in rendering applications. Concepts from parallel computing, such as data decomposition, task granularity, scalability, and load balancing, are considered in relation to the rendering problem. We also explore concepts from computer graphics, such as coherence and projection, which have a significant impact on the structure of parallel rendering algorithms. Our survey covers a number of practical considerations as well, including the choice of architectural platform, communication and memory requirements, and the problem of image assembly and display. We illustrate the discussion with numerous examples from the parallel rendering literature, representing most of the principal rendering methods currently used in computer graphics.
Greenfield, Charles M.
2017-10-01
The US Burning Plasma Organization is pleased to welcome Dr. Bernard Bigot, who will give an update on progress in the ITER Project. Dr. Bigot took over as Director General of the ITER Organization in early 2015 following a distinguished career that included serving as Chairman and CEO of the French Alternative Energies and Atomic Energy Commission and as High Commissioner for ITER in France. During his tenure at ITER the project has moved into high gear, with rapid progress evident on the construction site and preparation of a staged schedule and a research plan leading from where we are today through all the way to full DT operation. In an unprecedented international effort, seven partners ``China, the European Union, India, Japan, Korea, Russia and the United States'' have pooled their financial and scientific resources to build the biggest fusion reactor in history. ITER will open the way to the next step: a demonstration fusion power plant. All DPP attendees are welcome to attend this ITER town meeting.
DEFF Research Database (Denmark)
Bendtsen, Claus; Nielsen, Ole Holm; Hansen, Lars Bruno
2001-01-01
The quantum mechanical ground state of electrons is described by Density Functional Theory, which leads to large minimization problems. An efficient minimization method uses a self-consistent field (SCF) solution of large eigenvalue problems. The iterative Davidson algorithm is often used, and we...... works well on both serial and parallel computers, and good scalability of the algorithm is obtained. (C) 2001 IMACS. Published by Elsevier Science B.V. All rights reserved.......The quantum mechanical ground state of electrons is described by Density Functional Theory, which leads to large minimization problems. An efficient minimization method uses a self-consistent field (SCF) solution of large eigenvalue problems. The iterative Davidson algorithm is often used, and we...... propose a new algorithm of this kind which is well suited for the SCF method, since the accuracy of the eigensolution is gradually improved along with the outer SCF-iterations. Best efficiency is obtained for small-block-size iterations, and the algorithm is highly memory efficient. The implementation...
1982-01-01
Parallel Computations focuses on parallel computation, with emphasis on algorithms used in a variety of numerical and physical applications and for many different types of parallel computers. Topics covered range from vectorization of fast Fourier transforms (FFTs) and of the incomplete Cholesky conjugate gradient (ICCG) algorithm on the Cray-1 to calculation of table lookups and piecewise functions. Single tridiagonal linear systems and vectorized computation of reactive flow are also discussed.Comprised of 13 chapters, this volume begins by classifying parallel computers and describing techn
International Nuclear Information System (INIS)
Neese, Frank; Wennmohs, Frank; Hansen, Andreas; Becker, Ute
2009-01-01
In this paper, the possibility is explored to speed up Hartree-Fock and hybrid density functional calculations by forming the Coulomb and exchange parts of the Fock matrix by different approximations. For the Coulomb part the previously introduced Split-RI-J variant (F. Neese, J. Comput. Chem. 24 (2003) 1740) of the well-known 'density fitting' approximation is used. The exchange part is formed by semi-numerical integration techniques that are closely related to Friesner's pioneering pseudo-spectral approach. Our potentially linear scaling realization of this algorithm is called the 'chain-of-spheres exchange' (COSX). A combination of semi-numerical integration and density fitting is also proposed. Both Split-RI-J and COSX scale very well with the highest angular momentum in the basis sets. It is shown that for extended basis sets speed-ups of up to two orders of magnitude compared to traditional implementations can be obtained in this way. Total energies are reproduced with an average error of <0.3 kcal/mol as determined from extended test calculations with various basis sets on a set of 26 molecules with 20-200 atoms and up to 2000 basis functions. Reaction energies agree to within 0.2 kcal/mol (Hartree-Fock) or 0.05 kcal/mol (hybrid DFT) with the canonical values. The COSX algorithm parallelizes with a speedup of 8.6 observed for 10 processes. Minimum energy geometries differ by less than 0.3 pm in the bond distances and 0.5 deg. in the bond angels from their canonical values. These developments enable highly efficient and accurate self-consistent field calculations including nonlocal Hartree-Fock exchange for large molecules. In combination with the RI-MP2 method and large basis sets, second-order many body perturbation energies can be obtained for medium sized molecules with unprecedented efficiency. The algorithms are implemented into the ORCA electronic structure system
Improving computational efficiency of Monte Carlo simulations with variance reduction
International Nuclear Information System (INIS)
Turner, A.; Davis, A.
2013-01-01
CCFE perform Monte-Carlo transport simulations on large and complex tokamak models such as ITER. Such simulations are challenging since streaming and deep penetration effects are equally important. In order to make such simulations tractable, both variance reduction (VR) techniques and parallel computing are used. It has been found that the application of VR techniques in such models significantly reduces the efficiency of parallel computation due to 'long histories'. VR in MCNP can be accomplished using energy-dependent weight windows. The weight window represents an 'average behaviour' of particles, and large deviations in the arriving weight of a particle give rise to extreme amounts of splitting being performed and a long history. When running on parallel clusters, a long history can have a detrimental effect on the parallel efficiency - if one process is computing the long history, the other CPUs complete their batch of histories and wait idle. Furthermore some long histories have been found to be effectively intractable. To combat this effect, CCFE has developed an adaptation of MCNP which dynamically adjusts the WW where a large weight deviation is encountered. The method effectively 'de-optimises' the WW, reducing the VR performance but this is offset by a significant increase in parallel efficiency. Testing with a simple geometry has shown the method does not bias the result. This 'long history method' has enabled CCFE to significantly improve the performance of MCNP calculations for ITER on parallel clusters, and will be beneficial for any geometry combining streaming and deep penetration effects. (authors)
ITER Cryoplant Status and Economics of the LHe plants
Monneret, E.; Chalifour, M.; Bonneton, M.; Fauve, E.; Voigt, T.; Badgujar, S.; Chang, H.-S.; Vincent, G.
The ITER cryoplant is composed of helium and nitrogen refrigerators and generator combined with 80 K helium loop plants and external purification systems. Storage and recovery of the helium inventory is provided in warm and cold (80 K and 4.5 K) helium tanks.The conceptual design of the ITER cryoplant has been completed, the technical requirements defined for industrial procurement and contracts signed with industry. Each contract covers the design, manufacturing, installation and commissioning. Design is under finalization and manufacturing has started. First deliveries are scheduled by end of 2015.The various cryoplant systems are designed based on recognized codes and international standards to meet the availability, the reliability and the time between maintenance imposed by the long-term uninterrupted operation of the ITER Tokamak. In addition, ITER has to consider the constraint of a nuclear installation.ITER Organization (IO) is responsible for the liquid helium (LHe) Plants contract signed end of 2012 with industry. It is composed of three LHe Plants, working in parallel and able to provide a total average cooling capacity of 75 kW at 4.5 K. Based on concept designed developed with industries and the procurement phase, ITER has accumulated data to broaden the scaling laws for costing such systems.After describing the status of ITER cryoplant part of the cryogenic system, we shall present the economics of the ITER LHe Plants based on key design requirements, choice and challenges of this ITER Organization procurement.
Advances in iterative methods for nonlinear equations
Busquier, Sonia
2016-01-01
This book focuses on the approximation of nonlinear equations using iterative methods. Nine contributions are presented on the construction and analysis of these methods, the coverage encompassing convergence, efficiency, robustness, dynamics, and applications. Many problems are stated in the form of nonlinear equations, using mathematical modeling. In particular, a wide range of problems in Applied Mathematics and in Engineering can be solved by finding the solutions to these equations. The book reveals the importance of studying convergence aspects in iterative methods and shows that selection of the most efficient and robust iterative method for a given problem is crucial to guaranteeing a good approximation. A number of sample criteria for selecting the optimal method are presented, including those regarding the order of convergence, the computational cost, and the stability, including the dynamics. This book will appeal to researchers whose field of interest is related to nonlinear problems and equations...
SPARSE ELECTROMAGNETIC IMAGING USING NONLINEAR LANDWEBER ITERATIONS
Desmal, Abdulla
2015-07-29
A scheme for efficiently solving the nonlinear electromagnetic inverse scattering problem on sparse investigation domains is described. The proposed scheme reconstructs the (complex) dielectric permittivity of an investigation domain from fields measured away from the domain itself. Least-squares data misfit between the computed scattered fields, which are expressed as a nonlinear function of the permittivity, and the measured fields is constrained by the L0/L1-norm of the solution. The resulting minimization problem is solved using nonlinear Landweber iterations, where at each iteration a thresholding function is applied to enforce the sparseness-promoting L0/L1-norm constraint. The thresholded nonlinear Landweber iterations are applied to several two-dimensional problems, where the ``measured\\'\\' fields are synthetically generated or obtained from actual experiments. These numerical experiments demonstrate the accuracy, efficiency, and applicability of the proposed scheme in reconstructing sparse profiles with high permittivity values.
International Nuclear Information System (INIS)
2001-10-01
This ITER CTA newsletter contains results of the ITER toroidal field model coil project presented by ITER EU Home Team (Garching) and an article in commemoration of the late Dr. Charles Maisonnier, one of the former leaders of ITER who made significant contributions to its development
Development of a parallelization strategy for the VARIANT code
International Nuclear Information System (INIS)
Hanebutte, U.R.; Khalil, H.S.; Palmiotti, G.; Tatsumi, M.
1996-01-01
The VARIANT code solves the multigroup steady-state neutron diffusion and transport equation in three-dimensional Cartesian and hexagonal geometries using the variational nodal method. VARIANT consists of four major parts that must be executed sequentially: input handling, calculation of response matrices, solution algorithm (i.e. inner-outer iteration), and output of results. The objective of the parallelization effort was to reduce the overall computing time by distributing the work of the two computationally intensive (sequential) tasks, the coupling coefficient calculation and the iterative solver, equally among a group of processors. This report describes the code's calculations and gives performance results on one of the benchmark problems used to test the code. The performance analysis in the IBM SPx system shows good efficiency for well-load-balanced programs. Even for relatively small problem sizes, respectable efficiencies are seen for the SPx. An extension to achieve a higher degree of parallelism will be addressed in future work. 7 refs., 1 tab
Massively parallel multicanonical simulations
Gross, Jonathan; Zierenberg, Johannes; Weigel, Martin; Janke, Wolfhard
2018-03-01
Generalized-ensemble Monte Carlo simulations such as the multicanonical method and similar techniques are among the most efficient approaches for simulations of systems undergoing discontinuous phase transitions or with rugged free-energy landscapes. As Markov chain methods, they are inherently serial computationally. It was demonstrated recently, however, that a combination of independent simulations that communicate weight updates at variable intervals allows for the efficient utilization of parallel computational resources for multicanonical simulations. Implementing this approach for the many-thread architecture provided by current generations of graphics processing units (GPUs), we show how it can be efficiently employed with of the order of 104 parallel walkers and beyond, thus constituting a versatile tool for Monte Carlo simulations in the era of massively parallel computing. We provide the fully documented source code for the approach applied to the paradigmatic example of the two-dimensional Ising model as starting point and reference for practitioners in the field.
International Nuclear Information System (INIS)
Doggett, J.; Salpietro, E.; Shatalov, G.
1991-01-01
The results of the Conceptual Design Activities for the International Thermonuclear Experimental Reactor (ITER) are summarized. These activities, carried out between April 1988 and December 1990, produced a consistent set of technical characteristics and preliminary plans for co-ordinated research and development support of ITER; and a conceptual design, a description of design requirements and a preliminary construction schedule and cost estimate. After a description of the design basis, an overview is given of the tokamak device, its auxiliary systems, facility and maintenance. The interrelation and integration of the various subsystems that form the ITER tokamak concept are discussed. The 16 ITER equatorial port allocations, used for nuclear testing, diagnostics, fuelling, maintenance, and heating and current drive, are given, as well as a layout of the reactor building. Finally, brief descriptions are given of the major ITER sub-systems, i.e., (i) magnet systems (toroidal and poloidal field coils and cryogenic systems), (ii) containment structures (vacuum and cryostat vessels, machine gravity supports, attaching locks, passive loops and active coils), (iii) first wall, (iv) divertor plate (design and materials, performance and lifetime, a.o.), (v) blanket/shield system, (vi) maintenance equipment, (vii) current drive and heating, (viii) fuel cycle system, and (ix) diagnostics. 11 refs, figs and tabs
Twelfth ITER negotiation meeting
International Nuclear Information System (INIS)
2006-01-01
Delegations from China, European Union, Japan, the Republic of Korea, the Russian Federation and the United States of America gathered on Jeju Island, Korea, on 6 December 2005, to complete their negotiations on an Agreement on the joint implementation of the ITER international fusion energy project. At the start of the Meeting, the Delegations unanimously and enthusiastically welcomed India as a full Party to the ITER venture. A Delegation from India then joined the Meeting and participated fully in the discussions that followed. The seven ITER Delegations also welcomed to the Meeting the newly designated Nominee Director-General for the prospective ITER Organization, Ambassador Kaname Ikeda, who is to take up his duties as leader of the project. Based on the results of intensive working level meetings held throughout the previous week, the Delegations have succeeded in clearing the remaining key issues such as decision-making, intellectual property and management within the prospective ITER Organization and adjustments to the sharing of resources as a result of India's participation, including in particular cost sharing and in-kind contributions, leaving only a few legal points requiring resolution during the final lawyers' meeting to review the text for coherence and internal consistency
F.N. Kepper (Nick); R. Ettig (Ramona); F. Dickmann (Frank); R. Stehr (Rene); F.G. Grosveld (Frank); G. Wedemann (Gero); T.A. Knoch (Tobias)
2010-01-01
textabstractThe hardware and software requirements for parallel applications depend on the problem size, type and the number particles / parameters, the degree of parallelization possible, the load balancing over different processors / memory, the calculation type and the input / output and
Study of wall conditioning in tokamaks with application to ITER
International Nuclear Information System (INIS)
Kogut, Dmitri
2014-01-01
Thesis is devoted to studies of performance and efficiency of wall conditioning techniques in fusion reactors, such as ITER. Conditioning is necessary to control the state of the surface of plasma facing components to ensure plasma initiation and performance. Conditioning and operation of the JET tokamak with ITER-relevant material mix is extensively studied. A 2D model of glow conditioning discharges is developed and validated; it predicts reasonably uniform discharges in ITER. In the nuclear phase of ITER operation conditioning will be needed to control tritium inventory. It is shown here that isotopic exchange is an efficient mean to eliminate tritium from the walls by replacing it with deuterium. Extrapolations for tritium removal are comparable with expected retention per a nominal plasma pulse in ITER. A 1D model of hydrogen isotopic exchange in beryllium is developed and validated. It shows that fluence and temperature of the surface influence efficiency of the isotopic exchange. (author) [fr
STICS: surface-tethered iterative carbohydrate synthesis.
Pornsuriyasak, Papapida; Ranade, Sneha C; Li, Aixiao; Parlato, M Cristina; Sims, Charles R; Shulga, Olga V; Stine, Keith J; Demchenko, Alexei V
2009-04-14
A new surface-tethered iterative carbohydrate synthesis (STICS) technology is presented in which a surface functionalized 'stick' made of chemically stable high surface area porous gold allows one to perform cost efficient and simple synthesis of oligosaccharide chains; at the end of the synthesis, the oligosaccharide can be cleaved off and the stick reused for subsequent syntheses.
Iterative linear focal-plane wavefront correction
Smith, C.S.; Marinica, R.M.; Den Dekker, A.J.; Verhaegen, M.H.G.; Korkiakoski, V.; Keller, C.U.; Doelman, N.
2013-01-01
We propose an efficient approximation to the nonlinear phase diversity (PD) method for wavefront reconstruction and correction from intensity measurements with potential of being used in real-time applications. The new iterative linear phase diversity (ILPD) method assumes that the residual phase
Casanova, Henri; Robert, Yves
2008-01-01
""…The authors of the present book, who have extensive credentials in both research and instruction in the area of parallelism, present a sound, principled treatment of parallel algorithms. … This book is very well written and extremely well designed from an instructional point of view. … The authors have created an instructive and fascinating text. The book will serve researchers as well as instructors who need a solid, readable text for a course on parallelism in computing. Indeed, for anyone who wants an understandable text from which to acquire a current, rigorous, and broad vi
Parallelization method for three dimensional MOC calculation
International Nuclear Information System (INIS)
Zhang Zhizhu; Li Qing; Wang Kan
2013-01-01
A parallelization method based on angular decomposition for the three dimensional MOC was designed. To improve the parallel efficiency, the directions were pre-grouped and the groups were assembled to minimize the communication. The improved parallelization method was applied to the three dimensional MOC code TCM. The numerical results show that the calculation results of parallelization method are agreed with serial calculation results. The parallel efficiency gets obvious increase after the communication optimized and load balance. (authors)
New algorithms for parallel MRI
International Nuclear Information System (INIS)
Anzengruber, S; Ramlau, R; Bauer, F; Leitao, A
2008-01-01
Magnetic Resonance Imaging with parallel data acquisition requires algorithms for reconstructing the patient's image from a small number of measured lines of the Fourier domain (k-space). In contrast to well-known algorithms like SENSE and GRAPPA and its flavors we consider the problem as a non-linear inverse problem. However, in order to avoid cost intensive derivatives we will use Landweber-Kaczmarz iteration and in order to improve the overall results some additional sparsity constraints.
International Nuclear Information System (INIS)
Pozdeyev, Mikhail
2002-01-01
Full text: Participating in the film are Academicians Velikhov and Glukhikh, Mr. Filatof, ITER Director from Russia, Mr. Sannikov from Kurchatov Institute. The film tells about the starting point of the project (Mr. Lavrentyev), the pioneers of the project (Academicians Tamme, Sakharov, Artsimovich) and about the situation the project is standing now. Participating in [ITER now are the US, Russia, Japan and the European Union. There are two associated members as well - Kazakhstan and Canada. By now the engineering design phase has been finished. Computer animation used in the video gives us the idea how the first thermonuclear reactor based on famous Russian TOKOMAK works. (author)
Iterated multidimensional wave conversion
International Nuclear Information System (INIS)
Brizard, A. J.; Tracy, E. R.; Johnston, D.; Kaufman, A. N.; Richardson, A. S.; Zobin, N.
2011-01-01
Mode conversion can occur repeatedly in a two-dimensional cavity (e.g., the poloidal cross section of an axisymmetric tokamak). We report on two novel concepts that allow for a complete and global visualization of the ray evolution under iterated conversions. First, iterated conversion is discussed in terms of ray-induced maps from the two-dimensional conversion surface to itself (which can be visualized in terms of three-dimensional rooms). Second, the two-dimensional conversion surface is shown to possess a symplectic structure derived from Dirac constraints associated with the two dispersion surfaces of the interacting waves.
International Nuclear Information System (INIS)
Rosenbluth, M.N.
1999-01-01
The design of an experimental thermonuclear reactor requires both cutting-edge technology and physics predictions precise enough to carry forward the design. The past few years of worldwide physics studies have seen great progress in understanding, innovation and integration. We will discuss this progress and the remaining issues in several key physics areas. (1) Transport and plasma confinement. A worldwide database has led to an 'empirical scaling law' for tokamaks which predicts adequate confinement for the ITER fusion mission, albeit with considerable but acceptable uncertainty. The ongoing revolution in computer capabilities has given rise to new gyrofluid and gyrokinetic simulations of microphysics which may be expected in the near future to attain predictive accuracy. Important databases on H-mode characteristics and helium retention have also been assembled. (2) Divertors, heat removal and fuelling. A novel concept for heat removal - the radiative, baffled, partially detached divertor - has been designed for ITER. Extensive two-dimensional (2D) calculations have been performed and agree qualitatively with recent experiments. Preliminary studies of the interaction of this configuration with core confinement are encouraging and the success of inside pellet launch provides an attractive alternative fuelling method. (3) Macrostability. The ITER mission can be accomplished well within ideal magnetohydrodynamic (MHD) stability limits, except for internal kink modes. Comparisons with JET, as well as a theoretical model including kinetic effects, predict such sawteeth will be benign in ITER. Alternative scenarios involving delayed current penetration or off-axis current drive may be employed if required. The recent discovery of neoclassical beta limits well below ideal MHD limits poses a threat to performance. Extrapolation to reactor scale is as yet unclear. In theory such modes are controllable by current drive profile control or feedback and experiments should
Various Newton-type iterative methods for solving nonlinear equations
Directory of Open Access Journals (Sweden)
Manoj Kumar
2013-10-01
Full Text Available The aim of the present paper is to introduce and investigate new ninth and seventh order convergent Newton-type iterative methods for solving nonlinear equations. The ninth order convergent Newton-type iterative method is made derivative free to obtain seventh-order convergent Newton-type iterative method. These new with and without derivative methods have efficiency indices 1.5518 and 1.6266, respectively. The error equations are used to establish the order of convergence of these proposed iterative methods. Finally, various numerical comparisons are implemented by MATLAB to demonstrate the performance of the developed methods.
A novel iterative scheme and its application to differential equations.
Khan, Yasir; Naeem, F; Šmarda, Zdeněk
2014-01-01
The purpose of this paper is to employ an alternative approach to reconstruct the standard variational iteration algorithm II proposed by He, including Lagrange multiplier, and to give a simpler formulation of Adomian decomposition and modified Adomian decomposition method in terms of newly proposed variational iteration method-II (VIM). Through careful investigation of the earlier variational iteration algorithm and Adomian decomposition method, we find unnecessary calculations for Lagrange multiplier and also repeated calculations involved in each iteration, respectively. Several examples are given to verify the reliability and efficiency of the method.
Iterative algorithms to approximate canonical Gabor windows: Computational aspects
DEFF Research Database (Denmark)
Janssen, A.J.E.M; Søndergaard, Peter Lempel
In this paper we investigate the computational aspects of some recently proposed iterative methods for approximating the canonical tight and canonical dual window of a Gabor frame (g,a,b). The iterations start with the window g while the iteration steps comprise the window g, the k^th iterand...... convergence constants. The iterations, initially formulated for time-continuous Gabor systems, are considered and tested in a discrete setting in which one passes to the appropriately sampled-and-periodized windows and frame operators. Furthermore, they are compared with respect to accuracy and efficiency...
Parallel Pascal - An extended Pascal for parallel computers
Reeves, A. P.
1984-01-01
Parallel Pascal is an extended version of the conventional serial Pascal programming language which includes a convenient syntax for specifying array operations. It is upward compatible with standard Pascal and involves only a small number of carefully chosen new features. Parallel Pascal was developed to reduce the semantic gap between standard Pascal and a large range of highly parallel computers. Two important design goals of Parallel Pascal were efficiency and portability. Portability is particularly difficult to achieve since different parallel computers frequently have very different capabilities.
ITER and the fusion reactor: status and challenge to technology
International Nuclear Information System (INIS)
Lackner, K.
2001-01-01
Fusion has a high potential, but requires an integrated physics and technology effort without precedence in non-military R and D, the basic physics feasibility demonstration will be concluded with ITER, although R and D for efficiency improvement will continue. The essential technological issues remaining at the start of ITER operation concern materials questions: first wall components and radiation tolerant (low activation materials). This paper comprised just the copy of the slides presentation with the following subjects: magnetic confinement fusion, the Tokamak, progress in Tokamak performance, ITER: its geneology, physics basis-critical issues, cutaway of ITER-FEAT, R and D - divertor cassette (L-5), differences power plant-ITER, challenges for ITER and fusion plants, main technological problems (plasma facing materials), structural and functional materials for fusion power plants, ferritic steels, EUROFER development, improvements beyond ferritic steels, costing among others. (nevyjel)
Parallel particle swarm optimization algorithm in nuclear problems
Energy Technology Data Exchange (ETDEWEB)
Waintraub, Marcel; Pereira, Claudio M.N.A. [Instituto de Engenharia Nuclear (IEN/CNEN-RJ), Rio de Janeiro, RJ (Brazil)], e-mail: marcel@ien.gov.br, e-mail: cmnap@ien.gov.br; Schirru, Roberto [Coordenacao dos Programas de Pos-graduacao de Engenharia (COPPE/UFRJ), Rio de Janeiro, RJ (Brazil). Lab. de Monitoracao de Processos], e-mail: schirru@lmp.ufrj.br
2009-07-01
Particle Swarm Optimization (PSO) is a population-based metaheuristic (PBM), in which solution candidates evolve through simulation of a simplified social adaptation model. Putting together robustness, efficiency and simplicity, PSO has gained great popularity. Many successful applications of PSO are reported, in which PSO demonstrated to have advantages over other well-established PBM. However, computational costs are still a great constraint for PSO, as well as for all other PBMs, especially in optimization problems with time consuming objective functions. To overcome such difficulty, parallel computation has been used. The default advantage of parallel PSO (PPSO) is the reduction of computational time. Master-slave approaches, exploring this characteristic are the most investigated. However, much more should be expected. It is known that PSO may be improved by more elaborated neighborhood topologies. Hence, in this work, we develop several different PPSO algorithms exploring the advantages of enhanced neighborhood topologies implemented by communication strategies in multiprocessor architectures. The proposed PPSOs have been applied to two complex and time consuming nuclear engineering problems: reactor core design and fuel reload optimization. After exhaustive experiments, it has been concluded that: PPSO still improves solutions after many thousands of iterations, making prohibitive the efficient use of serial (non-parallel) PSO in such kind of realworld problems; and PPSO with more elaborated communication strategies demonstrated to be more efficient and robust than the master-slave model. Advantages and peculiarities of each model are carefully discussed in this work. (author)
Parallel particle swarm optimization algorithm in nuclear problems
International Nuclear Information System (INIS)
Waintraub, Marcel; Pereira, Claudio M.N.A.; Schirru, Roberto
2009-01-01
Particle Swarm Optimization (PSO) is a population-based metaheuristic (PBM), in which solution candidates evolve through simulation of a simplified social adaptation model. Putting together robustness, efficiency and simplicity, PSO has gained great popularity. Many successful applications of PSO are reported, in which PSO demonstrated to have advantages over other well-established PBM. However, computational costs are still a great constraint for PSO, as well as for all other PBMs, especially in optimization problems with time consuming objective functions. To overcome such difficulty, parallel computation has been used. The default advantage of parallel PSO (PPSO) is the reduction of computational time. Master-slave approaches, exploring this characteristic are the most investigated. However, much more should be expected. It is known that PSO may be improved by more elaborated neighborhood topologies. Hence, in this work, we develop several different PPSO algorithms exploring the advantages of enhanced neighborhood topologies implemented by communication strategies in multiprocessor architectures. The proposed PPSOs have been applied to two complex and time consuming nuclear engineering problems: reactor core design and fuel reload optimization. After exhaustive experiments, it has been concluded that: PPSO still improves solutions after many thousands of iterations, making prohibitive the efficient use of serial (non-parallel) PSO in such kind of realworld problems; and PPSO with more elaborated communication strategies demonstrated to be more efficient and robust than the master-slave model. Advantages and peculiarities of each model are carefully discussed in this work. (author)
International Nuclear Information System (INIS)
Walters, W.J.; Haghighat, A.
2013-01-01
A new collision source method has been developed to solve the Linear Boltzmann Equation (LBE) more efficiently by adaptation of the angular quadrature order. The angular adaptation method is unique in that the flux from each scattering source iteration is obtained separately, with potentially a different quadrature order. This allows for an optimal use of processing power, by using a high order quadrature for the first few iterations that need it, before shifting to lower order quadratures for the remaining iterations. This is essentially an extension of the first collision source method, and we call it the adaptive collision source method (ACS). The ACS methodology has been implemented in the TITAN discrete ordinates code, and has shown a speedup of 2-3 on a test problem, with very little loss of accuracy (within a provided adaptive tolerance). Further, the code has been extended to work in parallel environments by angular decomposition. Although the method requires increased parallel communication, tests have shown excellent scale adaptation, with parallel fractions of up to 99%. (authors)
International Nuclear Information System (INIS)
Tomabechi, K.; Gilleland, J.R.; Sokolov, Yu.A.; Toschi, R.
1991-01-01
The Conceptual Design Activities of the International Thermonuclear Experimental Reactor (ITER) were carried out jointly by the European Community, Japan, the Soviet Union and the United States of America, under the auspices of the International Atomic Energy Agency. The European Community provided the site for joint work sessions at the Max-Planck-Institut fuer Plasmaphysik in Garching, Germany. The Conceptual Design Activities began in the spring of 1988 and ended in December 1990. The objectives of the activities were to develop the design of ITER, to perform a safety and environmental analysis, to define the site requirements as well as the future research and development needs, to estimate the cost and manpower, and to prepare a schedule for detailed engineering design, construction and operation. On the basis of the investigation and analysis performed, a concept of ITER was developed which incorporated maximum flexibility of the performance of the device and allowed a variety of operating scenarios to be adopted. The heart of the machine is a tokamak having a plasma major radius of 6 m, a plasma minor radius of 2.15 m, a nominal plasma current of 22 MA and a nominal fusion power of 1 GW. The conceptual design can meet the technical objectives of the ITER programme. Because of the success of the Conceptual Design Activities, the Parties are now considering the implementation of the next phase, called the Engineering Design Activities. (author). Refs, figs and tabs
ITER power electrical networks
International Nuclear Information System (INIS)
Sejas Portela, S.
2011-01-01
The ITER project (International Thermonuclear Experimental Reactor) is an international effort to research and development to design, build and operate an experimental facility to demonstrate the scientific and technological possibility of obtaining useful energy from the physical phenomenon known as nuclear fusion.
International Nuclear Information System (INIS)
Shimomura, Y.; Huget, M.; Mizoguchi, T.; Murakami, Y.; Polevoi, A.; Shimada, M.; Aymar, R.; Chuyanov, V.; Matsumoto, H.
2001-01-01
ITER is planned to be the first fusion experimental reactor in the world operating for research in physics and engineering. The first 10 years' operation will be devoted primarily to physics issues at low neutron fluence and the following 10 years' operation to engineering testing at higher fluence. ITER can accommodate various plasma configurations and plasma operation modes such as inductive high Q modes, long pulse hybrid modes, non-inductive steady-state modes, with large ranges of plasma current, density, beta and fusion power, and with various heating and current drive methods. This flexibility will provide an advantage for coping with uncertainties in the physics database, in studying burning plasmas, in introducing advanced features and in optimizing the plasma performance for the different programme objectives. Remote sites will be able to participate in the ITER experiment. This concept will provide an advantage not only in operating ITER for 24 hours per day but also in involving the world-wide fusion communities and in promoting scientific competition among the Parties. (author)
International Nuclear Information System (INIS)
1991-01-01
Results of the International Thermonuclear Experimental Reactor (ITER) Conceptual Design Activity (CDA) are reported. This report covers the Terms of Reference for the project: defining the technical specifications, defining future research needs, define site requirements, and carrying out a coordinated research effort coincident with the CDA. Refs, figs and tabs
International Nuclear Information System (INIS)
1991-12-01
This US ITER Management Plan is the plan for conducting the Engineering Design Activities within the US. The plan applies to all design, analyses, and associated physics and technology research and development (R ampersand D) required to support the program. The plan defines the management considerations associated with these activities. The plan also defines the management controls that the project participants will follow to establish, implement, monitor, and report these activities. The activities are to be conducted by the project in accordance with this plan. The plan will be updated to reflect the then-current management approach required to meet the project objectives. The plan will be reviewed at least annually for possible revision. Section 2 presents the ITER objectives, a brief description of the ITER concept as developed during the Conceptual Design Activities, and comments on the Engineering Design Activities. Section 3 discusses the planned international organization for the Engineering Design Activities, from which the tasks will flow to the US Home Team. Section 4 describes the US ITER management organization and responsibilities during the Engineering Design Activities. Section 5 describes the project management and control to be used to perform the assigned tasks during the Engineering Design Activities. Section 6 presents the references. Several appendices are provided that contain detailed information related to the front material
DEFF Research Database (Denmark)
Justesen, Jørn; Høholdt, Tom; Hjaltason, Johan
2005-01-01
We analyze the relation between iterative decoding and the extended parity check matrix. By considering a modified version of bit flipping, which produces a list of decoded words, we derive several relations between decodable error patterns and the parameters of the code. By developing a tree...
International Nuclear Information System (INIS)
2005-06-01
This public information document presents the ITER project (International Thermonuclear Experimental Reactor), the definition of the fusion, the international cooperation and the advantages of the project. It presents also the site of Cadarache, an appropriate scientifical and economical environment. The last part of the documentation recalls the historical aspect of the project and the today mobilization of all partners. (A.L.B.)
Energy Technology Data Exchange (ETDEWEB)
Duff, I.
1994-12-31
This workshop focuses on kernels for iterative software packages. Specifically, the three speakers discuss various aspects of sparse BLAS kernels. Their topics are: `Current status of user lever sparse BLAS`; Current status of the sparse BLAS toolkit`; and `Adding matrix-matrix and matrix-matrix-matrix multiply to the sparse BLAS toolkit`.
International Nuclear Information System (INIS)
Aymar, R.
2002-01-01
At the end of engineering design activities (EDA) in July 2001, all the essential elements became available to make a decision on construction of ITER. A sufficiently detailed and integrated engineering design now exists for a generic site, has been assessed for feasibility, and costed, and essential physics and technology R and D has been carried out to underpin the design choices. Formal negotiations have now begun between the current participants--Canada, Euratom, Japan, and the Russian Federation--on a Joint Implementation Agreement for ITER which also establishes the legal entity to run ITER. These negotiations are supported on technical aspects by Coordinated Technical Activities (CTA), which maintain the integrity of the project, for the good of all participants, and concentrate on preparing for procurement by industry of the longest lead items, and for formal application for a construction license with the host country. This paper highlights the main features of the ITER design. With cryogenically-cooled magnets close to neutron-generating plasma, the design of shielding with adequate access via port plugs for auxiliaries such as heating and diagnostics, and of remote replacement and refurbishing systems for in-vessel components, are particularly interesting nuclear technology challenges. Making a safety case for ITER to satisfy potential regulators and to demonstrate, as far as possible at this stage, the environmental attractiveness of fusion as an energy source, is also important. The paper gives illustrative details on this work, and an update on the progress of technical preparations for construction, as well as the status of the above negotiations
Migration of vectorized iterative solvers to distributed memory architectures
Energy Technology Data Exchange (ETDEWEB)
Pommerell, C. [AT& T Bell Labs., Murray Hill, NJ (United States); Ruehl, R. [CSCS-ETH, Manno (Switzerland)
1994-12-31
Both necessity and opportunity motivate the use of high-performance computers for iterative linear solvers. Necessity results from the size of the problems being solved-smaller problems are often better handled by direct methods. Opportunity arises from the formulation of the iterative methods in terms of simple linear algebra operations, even if this {open_quote}natural{close_quotes} parallelism is not easy to exploit in irregularly structured sparse matrices and with good preconditioners. As a result, high-performance implementations of iterative solvers have attracted a lot of interest in recent years. Most efforts are geared to vectorize or parallelize the dominating operation-structured or unstructured sparse matrix-vector multiplication, or to increase locality and parallelism by reformulating the algorithm-reducing global synchronization in inner products or local data exchange in preconditioners. Target architectures for iterative solvers currently include mostly vector supercomputers and architectures with one or few optimized (e.g., super-scalar and/or super-pipelined RISC) processors and hierarchical memory systems. More recently, parallel computers with physically distributed memory and a better price/performance ratio have been offered by vendors as a very interesting alternative to vector supercomputers. However, programming comfort on such distributed memory parallel processors (DMPPs) still lags behind. Here the authors are concerned with iterative solvers and their changing computing environment. In particular, they are considering migration from traditional vector supercomputers to DMPPs. Application requirements force one to use flexible and portable libraries. They want to extend the portability of iterative solvers rather than reimplementing everything for each new machine, or even for each new architecture.
iHadoop: Asynchronous Iterations Support for MapReduce
Elnikety, Eslam
2011-08-01
MapReduce is a distributed programming framework designed to ease the development of scalable data-intensive applications for large clusters of commodity machines. Most machine learning and data mining applications involve iterative computations over large datasets, such as the Web hyperlink structures and social network graphs. Yet, the MapReduce model does not efficiently support this important class of applications. The architecture of MapReduce, most critically its dataflow techniques and task scheduling, is completely unaware of the nature of iterative applications; tasks are scheduled according to a policy that optimizes the execution for a single iteration which wastes bandwidth, I/O, and CPU cycles when compared with an optimal execution for a consecutive set of iterations. This work presents iHadoop, a modified MapReduce model, and an associated implementation, optimized for iterative computations. The iHadoop model schedules iterations asynchronously. It connects the output of one iteration to the next, allowing both to process their data concurrently. iHadoop\\'s task scheduler exploits inter- iteration data locality by scheduling tasks that exhibit a producer/consumer relation on the same physical machine allowing a fast local data transfer. For those iterative applications that require satisfying certain criteria before termination, iHadoop runs the check concurrently during the execution of the subsequent iteration to further reduce the application\\'s latency. This thesis also describes our implementation of the iHadoop model, and evaluates its performance against Hadoop, the widely used open source implementation of MapReduce. Experiments using different data analysis applications over real-world and synthetic datasets show that iHadoop performs better than Hadoop for iterative algorithms, reducing execution time of iterative applications by 25% on average. Furthermore, integrating iHadoop with HaLoop, a variant Hadoop implementation that caches
Re-starting an Arnoldi iteration
Energy Technology Data Exchange (ETDEWEB)
Lehoucq, R.B. [Argonne National Lab., IL (United States)
1996-12-31
The Arnoldi iteration is an efficient procedure for approximating a subset of the eigensystem of a large sparse n x n matrix A. The iteration produces a partial orthogonal reduction of A into an upper Hessenberg matrix H{sub m} of order m. The eigenvalues of this small matrix H{sub m} are used to approximate a subset of the eigenvalues of the large matrix A. The eigenvalues of H{sub m} improve as estimates to those of A as m increases. Unfortunately, so does the cost and storage of the reduction. The idea of re-starting the Arnoldi iteration is motivated by the prohibitive cost associated with building a large factorization.
ITER ITA newsletter. No. 24, July 2005
International Nuclear Information System (INIS)
2005-08-01
stimulant for international co-operation on science and technology in the twenty first century, and taking a broader view of the situation, Japan has decided that they will let the EU host the ITER site. Dr. J. Potocnik, European Commissioner for Science and Research, thanked Minister Nakayama for the highly constructive spirit with which he and his colleagues had conducted the bilateral discussions. He expressed his respect for the honourable manner in which the most sensitive stages were handled. He pointed out that the EU was well aware of the important task it had in front of it as the Host of ITER. The action taken had implications beyond that of establishing fusion energy. It was also an expression of mutual confidence to face the scientific, technical and political challenges that will occur in the course of this first-of-a-kind true international science cooperation among the leading nations of the world. ITER was establishing a model of global co-operation to address the increasingly global nature of the challenges confronting today's society. The Chinese Minister of Science and Technology, Mr. Xu Guanhua, expressed his pleasure that agreement on the site had been found within the six-Party framework. China considered that a sustainable solution to the world's energy source problem required multilateral international collaboration on fusion, so that participants could complement each other's skills and pool resources in the shared challenge. Mr. S. Choi, Vice-Minister of Science and Technology, Republic of Korea, reminded the delegates that the eyes of the world were on ITER as one of the most significant projects of the century, with a view to it being a peaceful and affluent one. Having just crossed the barrier of the site decision, there was still more to be done ahead, particularly by concluding the ITER Joint Implementation Agreement as soon as possible. He quoted a Korean proverb, literally translated as 'After rain ground hardens', which parallels with the
Practical parallel programming
Bauer, Barr E
2014-01-01
This is the book that will teach programmers to write faster, more efficient code for parallel processors. The reader is introduced to a vast array of procedures and paradigms on which actual coding may be based. Examples and real-life simulations using these devices are presented in C and FORTRAN.
Iterative Algorithms for Nonexpansive Mappings
Directory of Open Access Journals (Sweden)
Yao Yonghong
2008-01-01
Full Text Available Abstract We suggest and analyze two new iterative algorithms for a nonexpansive mapping in Banach spaces. We prove that the proposed iterative algorithms converge strongly to some fixed point of .
Resource Usage Protocols for Iterators
Huisman, Marieke; Haack, C.; Müller, P.; Hurlin, C.
We discuss usage protocols for iterator objects that prevent concurrent modifications of the underlying collection while iterators are in progress. We formalize these protocols in Java-like object interfaces, enriched with separation logic contracts. We present examples of iterator clients and
International Nuclear Information System (INIS)
2001-12-01
This ITER CTA Newsletter contains information about the organization of the ITER Co-ordinated Technical Activities (CTA) International Team as the follow-up of the ITER CTA project board meeting in Toronto on 7 November 2001. It also includes a summary on the start of the international tokamak physics activity by Dr. D. Campbell, Chair of the ITPA Co-ordinating Committee
International Nuclear Information System (INIS)
Aymar, R.
2000-01-01
This article summarizes progress made in the ITER Engineering Design Activities in the period between the ITER Meeting in Tokyo (January 2000) and June 2000. Topics: Termination of EDA, Joint Central Team and Support, Task Assignments, ITER Physics, Urgent and High Priority Physics Research Areas
Iterative supervirtual refraction interferometry
Al-Hagan, Ola
2014-05-02
In refraction tomography, the low signal-to-noise ratio (S/N) can be a major obstacle in picking the first-break arrivals at the far-offset receivers. To increase the S/N, we evaluated iterative supervirtual refraction interferometry (ISVI), which is an extension of the supervirtual refraction interferometry method. In this method, supervirtual traces are computed and then iteratively reused to generate supervirtual traces with a higher S/N. Our empirical results with both synthetic and field data revealed that ISVI can significantly boost up the S/N of far-offset traces. The drawback is that using refraction events from more than one refractor can introduce unacceptable artifacts into the final traveltime versus offset curve. This problem can be avoided by careful windowing of refraction events.
International Nuclear Information System (INIS)
2002-01-01
Following on from the Final Report of the EDA(DS/21), and the summary of the ITER Final Design report(DS/22), the technical basis gives further details of the design of ITER. It is in two parts. The first, the Plant Design specification, summarises the main constraints on the plant design and operation from the viewpoint of engineering and physics assumptions, compliance with safety regulations, and siting requirements and assumptions. The second, the Plant Description Document, describes the physics performance and engineering characteristics of the plant design, illustrates the potential operational consequences foe the locality of a generic site, gives the construction, commissioning, exploitation and decommissioning schedule, and reports the estimated lifetime costing based on data from the industry of the EDA parties
Iterative participatory design
DEFF Research Database (Denmark)
Simonsen, Jesper; Hertzum, Morten
2010-01-01
The theoretical background in this chapter is information systems development in an organizational context. This includes theories from participatory design, human-computer interaction, and ethnographically inspired studies of work practices. The concept of design is defined as an experimental...... iterative process of mutual learning by designers and domain experts (users), who aim to change the users’ work practices through the introduction of information systems. We provide an illustrative case example with an ethnographic study of clinicians experimenting with a new electronic patient record...... system, focussing on emergent and opportunity-based change enabled by appropriating the system into real work. The contribution to a general core of design research is a reconstruction of the iterative prototyping approach into a general model for sustained participatory design....
Cell verification of parallel burnup calculation program MCBMPI based on MPI
International Nuclear Information System (INIS)
Yang Wankui; Liu Yaoguang; Ma Jimin; Wang Guanbo; Yang Xin; She Ding
2014-01-01
The parallel burnup calculation program MCBMPI was developed. The program was modularized. The parallel MCNP5 program MCNP5MPI was employed as neutron transport calculation module. And a composite of three solution methods was used to solve burnup equation, i.e. matrix exponential technique, TTA analytical solution, and Gauss Seidel iteration. MPI parallel zone decomposition strategy was concluded in the program. The program system only consists of MCNP5MPI and burnup subroutine. The latter achieves three main functions, i.e. zone decomposition, nuclide transferring and decaying, and data exchanging with MCNP5MPI. Also, the program was verified with the pressurized water reactor (PWR) cell burnup benchmark. The results show that it,s capable to apply the program to burnup calculation of multiple zones, and the computation efficiency could be significantly improved with the development of computer hardware. (authors)
Parallelization of the preconditioned IDR solver for modern multicore computer systems
Bessonov, O. A.; Fedoseyev, A. I.
2012-10-01
This paper present the analysis, parallelization and optimization approach for the large sparse matrix solver CNSPACK for modern multicore microprocessors. CNSPACK is an advanced solver successfully used for coupled solution of stiff problems arising in multiphysics applications such as CFD, semiconductor transport, kinetic and quantum problems. It employs iterative IDR algorithm with ILU preconditioning (user chosen ILU preconditioning order). CNSPACK has been successfully used during last decade for solving problems in several application areas, including fluid dynamics and semiconductor device simulation. However, there was a dramatic change in processor architectures and computer system organization in recent years. Due to this, performance criteria and methods have been revisited, together with involving the parallelization of the solver and preconditioner using Open MP environment. Results of the successful implementation for efficient parallelization are presented for the most advances computer system (Intel Core i7-9xx or two-processor Xeon 55xx/56xx).
Accelerating Lattice QCD Multigrid on GPUs Using Fine-Grained Parallelization
Energy Technology Data Exchange (ETDEWEB)
Clark, M. A. [NVIDIA Corp., Santa Clara; Joó, Bálint [Jefferson Lab; Strelchenko, Alexei [Fermilab; Cheng, Michael [Boston U., Ctr. Comp. Sci.; Gambhir, Arjun [William-Mary Coll.; Brower, Richard [Boston U.
2016-12-22
The past decade has witnessed a dramatic acceleration of lattice quantum chromodynamics calculations in nuclear and particle physics. This has been due to both significant progress in accelerating the iterative linear solvers using multi-grid algorithms, and due to the throughput improvements brought by GPUs. Deploying hierarchical algorithms optimally on GPUs is non-trivial owing to the lack of parallelism on the coarse grids, and as such, these advances have not proved multiplicative. Using the QUDA library, we demonstrate that by exposing all sources of parallelism that the underlying stencil problem possesses, and through appropriate mapping of this parallelism to the GPU architecture, we can achieve high efficiency even for the coarsest of grids. Results are presented for the Wilson-Clover discretization, where we demonstrate up to 10x speedup over present state-of-the-art GPU-accelerated methods on Titan. Finally, we look to the future, and consider the software implications of our findings.
Parallel Newton-Krylov-Schwarz algorithms for the transonic full potential equation
Cai, Xiao-Chuan; Gropp, William D.; Keyes, David E.; Melvin, Robin G.; Young, David P.
1996-01-01
We study parallel two-level overlapping Schwarz algorithms for solving nonlinear finite element problems, in particular, for the full potential equation of aerodynamics discretized in two dimensions with bilinear elements. The overall algorithm, Newton-Krylov-Schwarz (NKS), employs an inexact finite-difference Newton method and a Krylov space iterative method, with a two-level overlapping Schwarz method as a preconditioner. We demonstrate that NKS, combined with a density upwinding continuation strategy for problems with weak shocks, is robust and, economical for this class of mixed elliptic-hyperbolic nonlinear partial differential equations, with proper specification of several parameters. We study upwinding parameters, inner convergence tolerance, coarse grid density, subdomain overlap, and the level of fill-in in the incomplete factorization, and report their effect on numerical convergence rate, overall execution time, and parallel efficiency on a distributed-memory parallel computer.
International Nuclear Information System (INIS)
Johnson, L.C.; Barnes, C.W.; Batistoni, P.
1998-01-01
Neutron cameras with horizontal and vertical views have been designed for ITER, based on systems used on JET and TFTR. The cameras consist of fan-shaped arrays of collimated flight tubes, with suitably chosen detectors situated outside the biological shield. The sight lines view the ITER plasma through slots in the shield blanket and penetrate the vacuum vessel, cryostat, and biological shield through stainless steel windows. This paper analyzes the expected performance of several neutron camera arrangements for ITER. In addition to the reference designs, the authors examine proposed compact cameras, in which neutron fluxes are inferred from 16 N decay gammas in dedicated flowing water loops, and conventional cameras with fewer sight lines and more limited fields of view than in the reference designs. It is shown that the spatial sampling provided by the reference designs is sufficient to satisfy target measurement requirements and that some reduction in field of view may be permissible. The accuracy of measurements with 16 N-based compact cameras is not yet established, and they fail to satisfy requirements for parameter range and time resolution by large margins
A kind of iteration algorithm for fast wave heating
International Nuclear Information System (INIS)
Zhu Xueguang; Kuang Guangli; Zhao Yanping; Li Youyi; Xie Jikang
1998-03-01
The standard normal distribution for particles in Tokamak geometry is usually assumed in fast wave heating. In fact, due to the quasi-linear diffusion effect, the parallel and vertical temperature of resonant particles is not equal, so, this will bring some error. For this case, the Fokker-Planck equation is introduced, and iteration algorithm is adopted to solve the problem well
Iterative solution of the Helmholtz equation
Energy Technology Data Exchange (ETDEWEB)
Larsson, E.; Otto, K. [Uppsala Univ. (Sweden)
1996-12-31
We have shown that the numerical solution of the two-dimensional Helmholtz equation can be obtained in a very efficient way by using a preconditioned iterative method. We discretize the equation with second-order accurate finite difference operators and take special care to obtain non-reflecting boundary conditions. We solve the large, sparse system of equations that arises with the preconditioned restarted GMRES iteration. The preconditioner is of {open_quotes}fast Poisson type{close_quotes}, and is derived as a direct solver for a modified PDE problem.The arithmetic complexity for the preconditioner is O(n log{sub 2} n), where n is the number of grid points. As a test problem we use the propagation of sound waves in water in a duct with curved bottom. Numerical experiments show that the preconditioned iterative method is very efficient for this type of problem. The convergence rate does not decrease dramatically when the frequency increases. Compared to banded Gaussian elimination, which is a standard solution method for this type of problems, the iterative method shows significant gain in both storage requirement and arithmetic complexity. Furthermore, the relative gain increases when the frequency increases.
Furuichi, Mikito; Nishiura, Daisuke
2017-10-01
We developed dynamic load-balancing algorithms for Particle Simulation Methods (PSM) involving short-range interactions, such as Smoothed Particle Hydrodynamics (SPH), Moving Particle Semi-implicit method (MPS), and Discrete Element method (DEM). These are needed to handle billions of particles modeled in large distributed-memory computer systems. Our method utilizes flexible orthogonal domain decomposition, allowing the sub-domain boundaries in the column to be different for each row. The imbalances in the execution time between parallel logical processes are treated as a nonlinear residual. Load-balancing is achieved by minimizing the residual within the framework of an iterative nonlinear solver, combined with a multigrid technique in the local smoother. Our iterative method is suitable for adjusting the sub-domain frequently by monitoring the performance of each computational process because it is computationally cheaper in terms of communication and memory costs than non-iterative methods. Numerical tests demonstrated the ability of our approach to handle workload imbalances arising from a non-uniform particle distribution, differences in particle types, or heterogeneous computer architecture which was difficult with previously proposed methods. We analyzed the parallel efficiency and scalability of our method using Earth simulator and K-computer supercomputer systems.
FAST ITERATIVE KILOVOLTAGE CONE BEAM TOMOGRAPHY
Directory of Open Access Journals (Sweden)
S. A. Zolotarev
2015-01-01
Full Text Available Creating a fast parallel iterative tomographic algorithms based on the use of graphics accelerators, which simultaneously provide the minimization of residual and total variation of the reconstructed image is an important and urgent task, which is of great scientific and practical importance. Such algorithms can be used, for example, in the implementation of radiation therapy patients, because it is always done pre-computed tomography of patients in order to better identify areas which can then be subjected to radiation exposure.
Cubic B-spline solution for two-point boundary value problem with AOR iterative method
Suardi, M. N.; Radzuan, N. Z. F. M.; Sulaiman, J.
2017-09-01
In this study, the cubic B-spline approximation equation has been derived by using the cubic B-spline discretization scheme to solve two-point boundary value problems. In addition to that, system of cubic B-spline approximation equations is generated from this spline approximation equation in order to get the numerical solutions. To do this, the Accelerated Over Relaxation (AOR) iterative method has been used to solve the generated linear system. For the purpose of comparison, the GS iterative method is designated as a control method to compare between SOR and AOR iterative methods. There are two examples of proposed problems that have been considered to examine the efficiency of these proposed iterative methods via three parameters such as their number of iterations, computational time and maximum absolute error. The numerical results are obtained from these iterative methods, it can be concluded that the AOR iterative method is slightly efficient as compared with SOR iterative method.
Full wave simulation of lower hybrid waves in ITER plasmas based on the finite element method
International Nuclear Information System (INIS)
Meneghini, Orso; Shiraiwa, Syun'ichi
2010-01-01
The first lower hybrid (LH) full wave simulation of an ITER-scale plasma is presented. LHEAF, an efficient LH full wave solver based on Finite Element Method (FEM) was used. In this study the scalability of the LHEAF approach was investigated, and the possibility of using massive parallel computer for solving extremely large problems was shown. In reactor scale plasmas, LH waves having a typical n ‖ ≈ 2 are expected to be absorbed in the periphery of the plasma. In order to exploit the spatial localization of the LH waves, LHEAF is modified to consider only the region of plasma where the wave fields are non-zero. By this approach, the size of the computational domain was reduced by more than a factor of 10. In this simulation, the magnetic equilibrium and the density and temperature profiles proposed for AT operation scenario on ITER are used. In addition, the wide SOL is supposed to play an important role in the propagation of the LH waves on ITER, and its presence was included in the simulation. For a Maxwellian plasma the power deposition profile is narrow and peaks at r/a ≈ 0.7. (author)
On the Convergence of Iterative Receiver Algorithms Utilizing Hard Decisions
Directory of Open Access Journals (Sweden)
Jürgen F. Rößler
2009-01-01
Full Text Available The convergence of receivers performing iterative hard decision interference cancellation (IHDIC is analyzed in a general framework for ASK, PSK, and QAM constellations. We first give an overview of IHDIC algorithms known from the literature applied to linear modulation and DS-CDMA-based transmission systems and show the relation to Hopfield neural network theory. It is proven analytically that IHDIC with serial update scheme always converges to a stable state in the estimated values in course of iterations and that IHDIC with parallel update scheme converges to cycles of length 2. Additionally, we visualize the convergence behavior with the aid of convergence charts. Doing so, we give insight into possible errors occurring in IHDIC which turn out to be caused by locked error situations. The derived results can directly be applied to those iterative soft decision interference cancellation (ISDIC receivers whose soft decision functions approach hard decision functions in course of the iterations.
International Nuclear Information System (INIS)
1989-01-01
Volume II of the two volumes describing the concept definition of the International Thermonuclear Experimental Reactor deals with the ITER concept in technical depth, and covers all areas of design of the ITER tokamak. Included are an assessment of the current database for design, scoping studies, rationale for concepts selection, performance flexibility, the ITER concept, the operations and experimental/testing program, ITER parameters and design phase schedule, and research and development specific to ITER. This latter includes a definition of specific research and development tasks, a division of tasks among members, specific milestones, required results, and schedules. Figs and tabs
International Nuclear Information System (INIS)
2002-07-01
This ITER CTA newsletter issue comprises the ITER backgrounder, which was approved as an official document by the participants in the Negotiations on the ITER Implementation agreement at their fourth meeting, held in Cadarache from 4-6 June 2002, and information about two ITER meetings: one is the third meeting of the ITER parties' designated Safety Representatives, which took place in Cadarache, France from 6-7 June 2002, and the other is the second meeting of the International Tokamak Physics Activity (ITPA) topical group on diagnostics, which was held at General Atomics, San Diego, USA, from 4-8 March 2002
Locality-Driven Parallel Static Analysis for Power Delivery Networks
Zeng, Zhiyu
2011-06-01
Large VLSI on-chip Power Delivery Networks (PDNs) are challenging to analyze due to the sheer network complexity. In this article, a novel parallel partitioning-based PDN analysis approach is presented. We use the boundary circuit responses of each partition to divide the full grid simulation problem into a set of independent subgrid simulation problems. Instead of solving exact boundary circuit responses, a more efficient scheme is proposed to provide near-exact approximation to the boundary circuit responses by exploiting the spatial locality of the flip-chip-type power grids. This scheme is also used in a block-based iterative error reduction process to achieve fast convergence. Detailed computational cost analysis and performance modeling is carried out to determine the optimal (or near-optimal) number of partitions for parallel implementation. Through the analysis of several large power grids, the proposed approach is shown to have excellent parallel efficiency, fast convergence, and favorable scalability. Our approach can solve a 16-million-node power grid in 18 seconds on an IBM p5-575 processing node with 16 Power5+ processors, which is 18.8X faster than a state-of-the-art direct solver. © 2011 ACM.
Parallelized direct execution simulation of message-passing parallel programs
Dickens, Phillip M.; Heidelberger, Philip; Nicol, David M.
1994-01-01
As massively parallel computers proliferate, there is growing interest in findings ways by which performance of massively parallel codes can be efficiently predicted. This problem arises in diverse contexts such as parallelizing computers, parallel performance monitoring, and parallel algorithm development. In this paper we describe one solution where one directly executes the application code, but uses a discrete-event simulator to model details of the presumed parallel machine such as operating system and communication network behavior. Because this approach is computationally expensive, we are interested in its own parallelization specifically the parallelization of the discrete-event simulator. We describe methods suitable for parallelized direct execution simulation of message-passing parallel programs, and report on the performance of such a system, Large Application Parallel Simulation Environment (LAPSE), we have built on the Intel Paragon. On all codes measured to date, LAPSE predicts performance well typically within 10 percent relative error. Depending on the nature of the application code, we have observed low slowdowns (relative to natively executing code) and high relative speedups using up to 64 processors.
ITER EDA newsletter. V. 7, no. 7
International Nuclear Information System (INIS)
1998-07-01
This newsletter contains the articles: 'Extraordinary ITER council meeting', 'ITER EDA final safety meeting' and 'Summary report of the 3rd combined workshop of the ITER confinement and transport and ITER confinement database and modeling expert groups'
PREFACE: Progress in the ITER Physics Basis
Ikeda, K.
2007-06-01
fundamental to its completion. I am pleased to witness the extensive collaborations, the excellent working relationships and the free exchange of views that have been developed among scientists working on magnetic fusion, and I would particularly like to acknowledge the importance which they assign to ITER in their research. This close collaboration and the spirit of free discussion will be essential to the success of ITER. Finally, the PIPB identifies issues which remain in the projection of burning plasma performance to the ITER scale and in the control of burning plasmas. Continued R&D is therefore called for to reduce the uncertainties associated with these issues and to ensure the efficient operation and exploitation of ITER. It is important that the international fusion community maintains a high level of collaboration in the future to address these issues and to prepare the physics basis for ITER operation. ITPA Coordination Committee R. Stambaugh (Chair of ITPA CC, General Atomics, USA) D.J. Campbell (Previous Chair of ITPA CC, European Fusion Development Agreement—Close Support Unit, ITER Organization) M. Shimada (Co-Chair of ITPA CC, ITER Organization) R. Aymar (ITER International Team, CERN) V. Chuyanov (ITER Organization) J.H. Han (Korea Basic Science Institute, Korea) Y. Huo (Zengzhou University, China) Y.S. Hwang (Seoul National University, Korea) N. Ivanov (Kurchatov Institute, Russia) Y. Kamada (Japan Atomic Energy Agency, Naka, Japan) P.K. Kaw (Institute for Plasma Research, India) S. Konovalov (Kurchatov Institute, Russia) M. Kwon (National Fusion Research Center, Korea) J. Li (Academy of Science, Institute of Plasma Physics, China) S. Mirnov (TRINITI, Russia) Y. Nakamura (National Institute for Fusion Studies, Japan) H. Ninomiya (Japan Atomic Energy Agency, Naka, Japan) E. Oktay (Department of Energy, USA) J. Pamela (European Fusion Development Agreement—Close Support Unit) C. Pan (Southwestern Institute of Physics, China) F. Romanelli (Ente per le
Dynamic traffic assignment on parallel computers
Energy Technology Data Exchange (ETDEWEB)
Nagel, K.; Frye, R.; Jakob, R.; Rickert, M.; Stretz, P.
1998-12-01
The authors describe part of the current framework of the TRANSIMS traffic research project at the Los Alamos National Laboratory. It includes parallel implementations of a route planner and a microscopic traffic simulation model. They present performance figures and results of an offline load-balancing scheme used in one of the iterative re-planning runs required for dynamic route assignment.
Vector and parallel processors in computational science
International Nuclear Information System (INIS)
Duff, I.S.; Reid, J.K.
1985-01-01
This book presents the papers given at a conference which reviewed the new developments in parallel and vector processing. Topics considered at the conference included hardware (array processors, supercomputers), programming languages, software aids, numerical methods (e.g., Monte Carlo algorithms, iterative methods, finite elements, optimization), and applications (e.g., neutron transport theory, meteorology, image processing)
Parallelizing More Loops with Compiler Guided Refactoring
DEFF Research Database (Denmark)
Larsen, Per; Ladelsky, Razya; Lidman, Jacob
2012-01-01
an interactive compilation feedback system that guides programmers in iteratively modifying their application source code. This helps leverage the compiler’s ability to generate loop-parallel code. We employ our system to modify two sequential benchmarks dealing with image processing and edge detection...
Automatic Loop Parallelization via Compiler Guided Refactoring
DEFF Research Database (Denmark)
Larsen, Per; Ladelsky, Razya; Lidman, Jacob
for these codes in a static, off-line compiler, we developed an interactive compilation feedback system that guides the programmer in iteratively modifying application source, thereby improving the compiler’s ability to generate loop-parallel code. We use this compilation system to modify two sequential...
ITER EDA newsletter. V. 10, special issue
International Nuclear Information System (INIS)
2001-07-01
This ITER EDA Newsletter includes summaries of the reports of ITER EDA JCT Physics unit about ITER physics R and D during the Engineering Design Activities (EDA), ITER EDA JCT Naka JWC ITER technology R and D during the EDA, and Safety, Environment and Health group of ITER EDA JCT, Garching JWS on EDA activities related to safety
Directory of Open Access Journals (Sweden)
Yixue Chen
2017-01-01
Full Text Available ARES is a multidimensional parallel discrete ordinates particle transport code with arbitrary order anisotropic scattering. It can be applied to a wide variety of radiation shielding calculations and reactor physics analysis. ARES uses state-of-the-art solution methods to obtain accurate solutions to the linear Boltzmann transport equation. A multigroup discretization is applied in energy. The code allows multiple spatial discretization schemes and solution methodologies. ARES currently provides diamond difference with or without linear-zero flux fixup, theta weighted, directional theta weighted, exponential directional weighted, and linear discontinuous finite element spatial differencing schemes. Discrete ordinates differencing in angle and spherical harmonics expansion of the scattering source are adopted. First collision source method is used to eliminate or mitigate the ray effects. Traditional source iteration and Krylov iterative method preconditioned with diffusion synthetic acceleration are applied to solve the linear system of equations. ARES uses the Koch-Baker-Alcouffe parallel sweep algorithm to obtain high parallel efficiency. Verification and validation for the ARES transport code system have been done by lots of benchmarks. In this paper, ARES solutions to the HBR-2 benchmark and C5G7 benchmarks are in excellent agreement with published results. Numerical results are presented which demonstrate the accuracy and efficiency of these methods.
Tilton, James C.; Plaza, Antonio J. (Editor); Chang, Chein-I. (Editor)
2008-01-01
The hierarchical image segmentation algorithm (referred to as HSEG) is a hybrid of hierarchical step-wise optimization (HSWO) and constrained spectral clustering that produces a hierarchical set of image segmentations. HSWO is an iterative approach to region grooving segmentation in which the optimal image segmentation is found at N(sub R) regions, given a segmentation at N(sub R+1) regions. HSEG's addition of constrained spectral clustering makes it a computationally intensive algorithm, for all but, the smallest of images. To counteract this, a computationally efficient recursive approximation of HSEG (called RHSEG) has been devised. Further improvements in processing speed are obtained through a parallel implementation of RHSEG. This chapter describes this parallel implementation and demonstrates its computational efficiency on a Landsat Thematic Mapper test scene.
Elser, V; Rankenburg, I; Thibault, P
2007-01-09
In many problems that require extensive searching, the solution can be described as satisfying two competing constraints, where satisfying each independently does not pose a challenge. As an alternative to tree-based and stochastic searching, for these problems we propose using an iterated map built from the projections to the two constraint sets. Algorithms of this kind have been the method of choice in a large variety of signal-processing applications; we show here that the scope of these algorithms is surprisingly broad, with applications as diverse as protein folding and Sudoku.
Parallelization of the FLAPW method
International Nuclear Information System (INIS)
Canning, A.; Mannstadt, W.; Freeman, A.J.
1999-01-01
The FLAPW (full-potential linearized-augmented plane-wave) method is one of the most accurate first-principles methods for determining electronic and magnetic properties of crystals and surfaces. Until the present work, the FLAPW method has been limited to systems of less than about one hundred atoms due to a lack of an efficient parallel implementation to exploit the power and memory of parallel computers. In this work we present an efficient parallelization of the method by division among the processors of the plane-wave components for each state. The code is also optimized for RISC (reduced instruction set computer) architectures, such as those found on most parallel computers, making full use of BLAS (basic linear algebra subprograms) wherever possible. Scaling results are presented for systems of up to 686 silicon atoms and 343 palladium atoms per unit cell, running on up to 512 processors on a CRAY T3E parallel computer
Oliker, Leonid; Heber, Gerd; Biswas, Rupak
2000-01-01
The Conjugate Gradient (CG) algorithm is perhaps the best-known iterative technique to solve sparse linear systems that are symmetric and positive definite. A sparse matrix-vector multiply (SPMV) usually accounts for most of the floating-point operations within a CG iteration. In this paper, we investigate the effects of various ordering and partitioning strategies on the performance of parallel CG and SPMV using different programming paradigms and architectures. Results show that for this class of applications, ordering significantly improves overall performance, that cache reuse may be more important than reducing communication, and that it is possible to achieve message passing performance using shared memory constructs through careful data ordering and distribution. However, a multi-threaded implementation of CG on the Tera MTA does not require special ordering or partitioning to obtain high efficiency and scalability.
HPC-NMF: A High-Performance Parallel Algorithm for Nonnegative Matrix Factorization
Energy Technology Data Exchange (ETDEWEB)
2016-08-22
NMF is a useful tool for many applications in different domains such as topic modeling in text mining, background separation in video analysis, and community detection in social networks. Despite its popularity in the data mining community, there is a lack of efficient distributed algorithms to solve the problem for big data sets. We propose a high-performance distributed-memory parallel algorithm that computes the factorization by iteratively solving alternating non-negative least squares (NLS) subproblems for $\\WW$ and $\\HH$. It maintains the data and factor matrices in memory (distributed across processors), uses MPI for interprocessor communication, and, in the dense case, provably minimizes communication costs (under mild assumptions). As opposed to previous implementation, our algorithm is also flexible: It performs well for both dense and sparse matrices, and allows the user to choose any one of the multiple algorithms for solving the updates to low rank factors $\\WW$ and $\\HH$ within the alternating iterations.
McCallum, Ethan
2011-01-01
It's tough to argue with R as a high-quality, cross-platform, open source statistical software product-unless you're in the business of crunching Big Data. This concise book introduces you to several strategies for using R to analyze large datasets. You'll learn the basics of Snow, Multicore, Parallel, and some Hadoop-related tools, including how to find them, how to use them, when they work well, and when they don't. With these packages, you can overcome R's single-threaded nature by spreading work across multiple CPUs, or offloading work to multiple machines to address R's memory barrier.
Concurrent computation of attribute filters on shared memory parallel machines
Wilkinson, Michael H.F.; Gao, Hui; Hesselink, Wim H.; Jonker, Jan-Eppo; Meijster, Arnold
2008-01-01
Morphological attribute filters have not previously been parallelized mainly because they are both global and nonseparable. We propose a parallel algorithm that achieves efficient parallelism for a large class of attribute filters, including attribute openings, closings, thinnings, and thickenings,
Asynchronous Parallelization of a CFD Solver
Directory of Open Access Journals (Sweden)
Daniel S. Abdi
2015-01-01
Full Text Available A Navier-Stokes equations solver is parallelized to run on a cluster of computers using the domain decomposition method. Two approaches of communication and computation are investigated, namely, synchronous and asynchronous methods. Asynchronous communication between subdomains is not commonly used in CFD codes; however, it has a potential to alleviate scaling bottlenecks incurred due to processors having to wait for each other at designated synchronization points. A common way to avoid this idle time is to overlap asynchronous communication with computation. For this to work, however, there must be something useful and independent a processor can do while waiting for messages to arrive. We investigate an alternative approach of computation, namely, conducting asynchronous iterations to improve local subdomain solution while communication is in progress. An in-house CFD code is parallelized using message passing interface (MPI, and scalability tests are conducted that suggest asynchronous iterations are a viable way of parallelizing CFD code.
Measuring performance of parallel computers. Final report
Energy Technology Data Exchange (ETDEWEB)
Sullivan, F.
1994-07-01
Performance Measurement - the authors have developed a taxonomy of parallel algorithms based on data motion and example applications have been coded for each class of the taxonomy. Computational benchmark kernels have been extracted for several applications, and detailed measurements have been performed. Algorithms for Massively Parallel SIMD machines - measurement results and computational experiences indicate that top performance will be achieved by `iteration` type algorithms running on massively parallel SIMD machines. Reformulation as iteration may entail unorthodox approaches based on probabilistic methods. The authors have developed such methods for some applications. Here they discuss their approach to performance measurement, describe the taxonomy and measurements which have been made, and report on some general conclusions which can be drawn from the results of the measurements.
Analysis of the ITER cryoplant operational modes
International Nuclear Information System (INIS)
Henry, D.; Journeaux, J.Y.; Roussel, P.; Michel, F.; Poncet, J.M.; Girard, A.; Kalinin, V.; Chesny, P.
2007-01-01
In the framework of an EFDA task, CEA is carrying out an analysis of the various ITER cryoplant operational modes. According to the project integration document, ITER is designed to be operated 365 days per year in order to optimize the available time of the Tokamak. It is anticipated that operation will be performed in long periods separated by maintenance periods (e.g. 10 days continuous operation and 1 week break) with annual or bi-annual major shutdown periods of a few months for maintenance, further installation and commissioning. For this operation schedule, auxiliary subsystems like the cryoplant and the cryodistribution have to cope with different heat loads which depend on the different ITER operating states. The cryoplant consists of four identical 4.5 K refrigerators and two 80 K helium loops coupled with two LN2 modules. All of these cryogenic subsystems have to operate in parallel to remove the heat loads from the magnet, 80 K shields, cryopumps and other small users. After a brief recall of the main particularities of a cryogenic system operating in a Tokamak environment, the first part of this study is dedicated to the assessment of the main ITER operation states. A new design of refrigeration loop for the HTS current leads, the updated layout of the cryodistribution system and revised strategy for operations of the cryopumps have been taken into consideration. The relevant normal operating scenarios of the cryoplant are checked for the typical ITER operating states like plasma operation state, short term stand by, short term maintenance, or test and conditioning state. The second part of the paper is dedicated to the abnormal operating modes coming from the magnets and from those generated by the cryoplant itself. The occurrence of a fast discharge or a quench of the magnets generates large heat loads disturbances and produces exceptional high mass flow rates which have to be managed by the cryoplant, while a failure of a cryogenic component induces
Rokkasho: Japanese site for ITER
International Nuclear Information System (INIS)
Ohtake, S.; Yamaguchi, V.; Matsuda, S.; Kishimoto, H.
2003-01-01
The Atomic Energy Commission of Japan authorized ITER as the core machine of the Third Phase Basic Program of Fusion Energy Development. After a series of discussions in the Atomic Energy Commission and the Council of Science and Technology Policy, Japanese Government concluded formally with the Cabinet Agreement on 31 May 2002 that Japan should participate in the ITER Project and offer the Rokkasho-Mura site for construction of ITER to the Negotiations among Canada (CA), the European Union (EU), Japan (JA), and the Russian Federation (RF). The JA site proposal is now under the international assessment in the framework of the ITER Negotiations. (author)
International Nuclear Information System (INIS)
Bergamaschi, Luca; Pini, Giorgio; Sartoretto, Flavio
2003-01-01
The Jacobi-Davidson (JD) algorithm was recently proposed for evaluating a number of the eigenvalues of a matrix. JD goes beyond pure Krylov-space techniques; it cleverly expands its search space, by solving the so-called correction equation, thus in principle providing a more powerful method. Preconditioning the Jacobi-Davidson correction equation is mandatory when large, sparse matrices are analyzed. We considered several preconditioners: Classical block-Jacobi, and IC(0), together with approximate inverse (AINV or FSAI) preconditioners. The rationale for using approximate inverse preconditioners is their high parallelization potential, combined with their efficiency in accelerating the iterative solution of the correction equation. Analysis was carried on the sequential performance of preconditioned JD for the spectral decomposition of large, sparse matrices, which originate in the numerical integration of partial differential equations arising in physical and engineering problems. It was found that JD is highly sensitive to preconditioning, and it can display an irregular convergence behavior. We parallelized JD by data-splitting techniques, combining them with techniques to reduce the amount of communication data. Our own parallel, preconditioned code was executed on a dedicated parallel machine, and we present the results of our experiments. Our JD code provides an appreciable parallel degree of computation. Its performance was also compared with those of PARPACK and parallel DACG
Gong, Chunye; Bao, Weimin; Tang, Guojian; Jiang, Yuewen; Liu, Jie
2014-01-01
It is very time consuming to solve fractional differential equations. The computational complexity of two-dimensional fractional differential equation (2D-TFDE) with iterative implicit finite difference method is O(M(x)M(y)N(2)). In this paper, we present a parallel algorithm for 2D-TFDE and give an in-depth discussion about this algorithm. A task distribution model and data layout with virtual boundary are designed for this parallel algorithm. The experimental results show that the parallel algorithm compares well with the exact solution. The parallel algorithm on single Intel Xeon X5540 CPU runs 3.16-4.17 times faster than the serial algorithm on single CPU core. The parallel efficiency of 81 processes is up to 88.24% compared with 9 processes on a distributed memory cluster system. We do think that the parallel computing technology will become a very basic method for the computational intensive fractional applications in the near future.
Iterative algorithms to approximate canonieal Gabor windows: Computational aspects
DEFF Research Database (Denmark)
Janssen, A. J. E. M.; Søndergaard, Peter Lempel
2007-01-01
In this article we investigate the computational aspects of some recently proposed iterative methods for approximating the canonical tight and canonical dual window of a Gabor frame (g, a, b). The iterations start with the window g while the iteration steps comprise the window g, the k(th) iterand...... convergence constants. The iteratious, initially formulated for time-continuous Gabor systems, are considered and tested in a discrete setting in which one passes to the appropriately sampled-and-periodized windows and frame operators. Furthermore, they are compared with respect to accuracy and efficiency...
3-Dimensional Iterative Forward Model for Microwave Imaging
DEFF Research Database (Denmark)
Kim, Oleksiy S.; Meincke, Peter
2006-01-01
The efficient solution of a forward scattering problem is the key point in nonlinear inversion schemes associated with microwave imaging. In this paper the solution is presented for the volume integral equation based on the method of moments (MoM) and accelerated with the adaptive integral method...... in each iteration of the forward solution. Thus, the presented technique allows us to avoid the time-consuming procedure of the MoM matrix filling in each inversion iteration. Furthermore, the forward solution from the previous inversion iteration can be utilized in the next one as an initial guess, thus...... reducing the solution time for the forward model....
Parallel Simulation of Chip-Multiprocessor Architectures
National Research Council Canada - National Science Library
Chidester, Matthew C; George, Alan D
2002-01-01
Chip-multiprocessor (CMP) architectures present a challenge for efficient simulation, combining the requirements of a detailed microprocessor simulator with that of a tightly-coupled parallel system...
The PARTY parallel runtime system
Saltz, J. H.; Mirchandaney, Ravi; Smith, R. M.; Crowley, Kay; Nicol, D. M.
1989-01-01
In the present automated system for the organization of the data and computational operations entailed by parallel problems, in ways that optimize multiprocessor performance, general heuristics for partitioning program data and control are implemented by capturing and manipulating representations of a computation at run time. These heuristics are directed toward the dynamic identification and allocation of concurrent work in computations with irregular computational patterns. An optimized static-workload partitioning is computed for such repetitive-computation pattern problems as the iterative ones employed in scientific computation.
Parallel computation of rotating flows
DEFF Research Database (Denmark)
Lundin, Lars Kristian; Barker, Vincent A.; Sørensen, Jens Nørkær
1999-01-01
This paper deals with the simulation of 3‐D rotating flows based on the velocity‐vorticity formulation of the Navier‐Stokes equations in cylindrical coordinates. The governing equations are discretized by a finite difference method. The solution is advanced to a new time level by a two‐step process...... is that of solving a singular, large, sparse, over‐determined linear system of equations, and the iterative method CGLS is applied for this purpose. We discuss some of the mathematical and numerical aspects of this procedure and report on the performance of our software on a wide range of parallel computers. Darbe...
Reactor structure and superconducting magnet system of ITER
International Nuclear Information System (INIS)
Tada, Eisuke; Yoshida, Kiyoshi; Shibanuma, Kiyoshi; Okuno, Kiyoshi; Tsuji, Hiroshi; Shimamoto, Susumu
1993-01-01
Fusion Experimental Reactors are one of the major steps toward realization of the fusion energy and the key objective are to demonstrate the scientific and technological feasibility prior to the Demo Fusion Reactor. ITER (International Thermonuclear Experimental Reactor) is one of experimental reactors and the conceptual design has been completed by the united efforts of USA, USSR, EC and Japan. In parallel with the conceptual design, key technology development in various areas has being conducted. This paper describes the overall design concepts and the latest technological achievements of the ITER reactor structure and superconducting magnet system. (author)
Copper Mountain conference on iterative methods: Proceedings: Volume 2
Energy Technology Data Exchange (ETDEWEB)
NONE
1996-10-01
This volume (the second of two) contains information presented during the last two days of the Copper Mountain Conference on Iterative Methods held April 9-13, 1996 at Copper Mountain, Colorado. Topics of the sessions held these two days include domain decomposition, Krylov methods, computational fluid dynamics, Markov chains, sparse and parallel basic linear algebra subprograms, multigrid methods, applications of iterative methods, equation systems with multiple right-hand sides, projection methods, and the Helmholtz equation. Selected papers indexed separately for the Energy Science and Technology Database.
ITER CTA newsletter. No. 13, October 2002
International Nuclear Information System (INIS)
2002-11-01
This ITER CTA newsletter issue comprises concise information about an ITER related meeting concerning the joint implementation of ITER - the fifth ITER Negotiations Meeting - which was held in Toronto, Canada, 19-20 September, 2002, and information about assessment of the possible ITER site in Clarington, Ontario, Canada, which was the subject of the first official stage of the Joint Assessment of Specific Sites (JASS) for the ITER Project. This assessment was completed just before the Fifth ITER Negotiations Meeting
International Nuclear Information System (INIS)
Natalizio, A.; Hollies, R.E.; Sochaski, R.O.; Stubley, P.H.
1992-06-01
The ITER reference system uses low-temperature water for heat removal and high-temperature helium for bake-out. As these systems share common equipment, bake-out cannot be performed until the cooling system is drained and dried, and the reactor cannot be started until the helium has been purged from the cooling system. This study examines the feasibility of using a single high-temperature fluid to perform both heat removal and bake-out. The high temperature required for bake-out would also be in the range for power production. The study examines cost, operational benefits, and impact on reactor safety of two options: a high-pressure water system, and a low-pressure organic system. It was concluded that the cost savings and operational benefits are significant; there are no significant adverse safety impacts from operating either the water system or the organic system; and the capital costs of both systems are comparable
Iterated crowdsourcing dilemma game
Oishi, Koji; Cebrian, Manuel; Abeliuk, Andres; Masuda, Naoki
2014-02-01
The Internet has enabled the emergence of collective problem solving, also known as crowdsourcing, as a viable option for solving complex tasks. However, the openness of crowdsourcing presents a challenge because solutions obtained by it can be sabotaged, stolen, and manipulated at a low cost for the attacker. We extend a previously proposed crowdsourcing dilemma game to an iterated game to address this question. We enumerate pure evolutionarily stable strategies within the class of so-called reactive strategies, i.e., those depending on the last action of the opponent. Among the 4096 possible reactive strategies, we find 16 strategies each of which is stable in some parameter regions. Repeated encounters of the players can improve social welfare when the damage inflicted by an attack and the cost of attack are both small. Under the current framework, repeated interactions do not really ameliorate the crowdsourcing dilemma in a majority of the parameter space.
Full-scale calculation of the coupling losses in ITER size cable-in-conduit conductors
van Lanen, E. P. A.; van Nugteren, J.; Nijhuis, A.
2012-02-01
With the numerical cable model JackPot it is possible to calculate the interstrand coupling losses, generated by a time-changing background and self-field, between all strands in a cable-in-conduit conductor (CICC). For this, the model uses a system of equations in which the mutual inductances between all strand segments are calculated in advance. The model works well for analysing sub-size CICC sections. However, the exponential relationship between the model size and the computation time make it unpractical to simulate full size ITER CICC sections. For this reason, the multi-level fast multipole method (MLFMM) is implemented to control the computation load. For additional efficiency, it is written in a code that runs on graphics processing units, thereby utilizing an efficient low-cost parallel computation technique. A good accuracy is obtained with a considerably fast computation of the mutually induced voltages between all strands. This allows parametric studies on the coupling loss of long lengths of ITER size CICCs with the purpose of optimizing the cable design and to accurately compute the coupling loss for any applied magnetic field scenario.
DGDFT: A massively parallel method for large scale density functional theory calculations
Energy Technology Data Exchange (ETDEWEB)
Hu, Wei, E-mail: whu@lbl.gov; Yang, Chao, E-mail: cyang@lbl.gov [Computational Research Division, Lawrence Berkeley National Laboratory, Berkeley, California 94720 (United States); Lin, Lin, E-mail: linlin@math.berkeley.edu [Computational Research Division, Lawrence Berkeley National Laboratory, Berkeley, California 94720 (United States); Department of Mathematics, University of California, Berkeley, California 94720 (United States)
2015-09-28
We describe a massively parallel implementation of the recently developed discontinuous Galerkin density functional theory (DGDFT) method, for efficient large-scale Kohn-Sham DFT based electronic structure calculations. The DGDFT method uses adaptive local basis (ALB) functions generated on-the-fly during the self-consistent field iteration to represent the solution to the Kohn-Sham equations. The use of the ALB set provides a systematic way to improve the accuracy of the approximation. By using the pole expansion and selected inversion technique to compute electron density, energy, and atomic forces, we can make the computational complexity of DGDFT scale at most quadratically with respect to the number of electrons for both insulating and metallic systems. We show that for the two-dimensional (2D) phosphorene systems studied here, using 37 basis functions per atom allows us to reach an accuracy level of 1.3 × 10{sup −4} Hartree/atom in terms of the error of energy and 6.2 × 10{sup −4} Hartree/bohr in terms of the error of atomic force, respectively. DGDFT can achieve 80% parallel efficiency on 128,000 high performance computing cores when it is used to study the electronic structure of 2D phosphorene systems with 3500-14 000 atoms. This high parallel efficiency results from a two-level parallelization scheme that we will describe in detail.
DGDFT: A massively parallel method for large scale density functional theory calculations
International Nuclear Information System (INIS)
Hu, Wei; Yang, Chao; Lin, Lin
2015-01-01
We describe a massively parallel implementation of the recently developed discontinuous Galerkin density functional theory (DGDFT) method, for efficient large-scale Kohn-Sham DFT based electronic structure calculations. The DGDFT method uses adaptive local basis (ALB) functions generated on-the-fly during the self-consistent field iteration to represent the solution to the Kohn-Sham equations. The use of the ALB set provides a systematic way to improve the accuracy of the approximation. By using the pole expansion and selected inversion technique to compute electron density, energy, and atomic forces, we can make the computational complexity of DGDFT scale at most quadratically with respect to the number of electrons for both insulating and metallic systems. We show that for the two-dimensional (2D) phosphorene systems studied here, using 37 basis functions per atom allows us to reach an accuracy level of 1.3 × 10 −4 Hartree/atom in terms of the error of energy and 6.2 × 10 −4 Hartree/bohr in terms of the error of atomic force, respectively. DGDFT can achieve 80% parallel efficiency on 128,000 high performance computing cores when it is used to study the electronic structure of 2D phosphorene systems with 3500-14 000 atoms. This high parallel efficiency results from a two-level parallelization scheme that we will describe in detail
Directory of Open Access Journals (Sweden)
James G. Worner
2017-05-01
Full Text Available James Worner is an Australian-based writer and scholar currently pursuing a PhD at the University of Technology Sydney. His research seeks to expose masculinities lost in the shadow of Australia’s Anzac hegemony while exploring new opportunities for contemporary historiography. He is the recipient of the Doctoral Scholarship in Historical Consciousness at the university’s Australian Centre of Public History and will be hosted by the University of Bologna during 2017 on a doctoral research writing scholarship. ‘Parallel Lines’ is one of a collection of stories, The Shapes of Us, exploring liminal spaces of modern life: class, gender, sexuality, race, religion and education. It looks at lives, like lines, that do not meet but which travel in proximity, simultaneously attracted and repelled. James’ short stories have been published in various journals and anthologies.
2014-09-24
which nature uses strong electron correlation for efficient energy transfer, particularly in photosynthesis and bioluminescence, (ii) providing an...strong electron correlation for efficient energy transfer, particularly in photosynthesis and bioluminescence, (ii) providing an innovative paradigm...published in peer-reviewed journals (N/A for none) Enter List of papers submitted or published that acknowledge ARO support from the start of the project
Xu, Qiaofeng; Yang, Deshan; Tan, Jun; Sawatzky, Alex; Anastasio, Mark A.
2016-01-01
Purpose: The development of iterative image reconstruction algorithms for cone-beam computed tomography (CBCT) remains an active and important research area. Even with hardware acceleration, the overwhelming majority of the available 3D iterative algorithms that implement nonsmooth regularizers remain computationally burdensome and have not been translated for routine use in time-sensitive applications such as image-guided radiation therapy (IGRT). In this work, two variants of the fast iterative shrinkage thresholding algorithm (FISTA) are proposed and investigated for accelerated iterative image reconstruction in CBCT. Methods: Algorithm acceleration was achieved by replacing the original gradient-descent step in the FISTAs by a subproblem that is solved by use of the ordered subset simultaneous algebraic reconstruction technique (OS-SART). Due to the preconditioning matrix adopted in the OS-SART method, two new weighted proximal problems were introduced and corresponding fast gradient projection-type algorithms were developed for solving them. We also provided efficient numerical implementations of the proposed algorithms that exploit the massive data parallelism of multiple graphics processing units. Results: The improved rates of convergence of the proposed algorithms were quantified in computer-simulation studies and by use of clinical projection data corresponding to an IGRT study. The accelerated FISTAs were shown to possess dramatically improved convergence properties as compared to the standard FISTAs. For example, the number of iterations to achieve a specified reconstruction error could be reduced by an order of magnitude. Volumetric images reconstructed from clinical data were produced in under 4 min. Conclusions: The FISTA achieves a quadratic convergence rate and can therefore potentially reduce the number of iterations required to produce an image of a specified image quality as compared to first-order methods. We have proposed and investigated
International Nuclear Information System (INIS)
Xu, Qiaofeng; Sawatzky, Alex; Anastasio, Mark A.; Yang, Deshan; Tan, Jun
2016-01-01
Purpose: The development of iterative image reconstruction algorithms for cone-beam computed tomography (CBCT) remains an active and important research area. Even with hardware acceleration, the overwhelming majority of the available 3D iterative algorithms that implement nonsmooth regularizers remain computationally burdensome and have not been translated for routine use in time-sensitive applications such as image-guided radiation therapy (IGRT). In this work, two variants of the fast iterative shrinkage thresholding algorithm (FISTA) are proposed and investigated for accelerated iterative image reconstruction in CBCT. Methods: Algorithm acceleration was achieved by replacing the original gradient-descent step in the FISTAs by a subproblem that is solved by use of the ordered subset simultaneous algebraic reconstruction technique (OS-SART). Due to the preconditioning matrix adopted in the OS-SART method, two new weighted proximal problems were introduced and corresponding fast gradient projection-type algorithms were developed for solving them. We also provided efficient numerical implementations of the proposed algorithms that exploit the massive data parallelism of multiple graphics processing units. Results: The improved rates of convergence of the proposed algorithms were quantified in computer-simulation studies and by use of clinical projection data corresponding to an IGRT study. The accelerated FISTAs were shown to possess dramatically improved convergence properties as compared to the standard FISTAs. For example, the number of iterations to achieve a specified reconstruction error could be reduced by an order of magnitude. Volumetric images reconstructed from clinical data were produced in under 4 min. Conclusions: The FISTA achieves a quadratic convergence rate and can therefore potentially reduce the number of iterations required to produce an image of a specified image quality as compared to first-order methods. We have proposed and investigated
ITER-FEAT - outline design report. Report by the ITER Director. ITER meeting, Tokyo, January 2000
International Nuclear Information System (INIS)
2001-01-01
It is now possible to define the key elements of ITER-FEAT. This report provides the results, to date, of the joint work of the Special Working Group in the form of an Outline Design Report on the ITER-FEAT design which, subject to the views of ITER Council and of the Parties, will be the focus of further detailed design work and analysis in order to provide to the Parties a complete and fully integrated engineering design within the framework of the ITER EDA extension
Towards the procurement of the ITER divertor
International Nuclear Information System (INIS)
Merola, M.; Tivey, R.; Martin, A.; Pick, M.
2006-01-01
The procurement of the ITER divertor is planned to start in 2009. On the basis of the present common understanding of the sharing of the ITER components, the Japanese Participating Team (JAPT) will supply the outer vertical target, the Russian Federation (RF) PT the dome liner and will perform the high heat flux testing, the EU PT will supply the inner vertical targets and the cassette bodies, including final assembly of the divertor plasma-facing components (PFCs). The manufacturing of the PFCs of the ITER divertor represents a challenging endeavor due to the high technologies which are involved, and due to the unprecedented series production. To mitigate the associated risks, special arrangements need to be put in place prior to and during procurement to ensure quality and to keep to the time schedule. Before procurement can start, an ITER review of the qualification and production capability of each candidate PT is planned. Well in advance of the assumed start of the procurement, each PT which would like to contribute to the divertor PFC procurement, should first demonstrate its technical qualification to carry out the procurement with the required quality, and in an efficient and timely manner. Appropriate precautions, like subdivision of the procurement into stages, are also to be adopted during the procurement phase to mitigate the consequences of possible unexpected manufacturing problems. In preparation for writing the procurement specification for the vertical targets, the topic of setting acceptance criteria is also being addressed. This activity has the objective of defining workable acceptance criteria for the PFC armour joints. A complete set of analyses is also in progress to assess the latest design modifications against the design requirements. This task includes neutronic, shielding, thermo-mechanical and electromagnetic analyses. More than half of the ITER plasma parameters that must be measured and the related diagnostics are located in the
Data-parallel DNS of turbulent flow
Verstappen, R.W.C.P.; Veldman, A.E.P.; Emerson, DR; Ecer, A; Periaux, J; Satofuka, N
1998-01-01
This contribution deals with direct numerical simulation (DNS) of incompressible turbulent flows on parallel computers. We make use of the data-parallel model on shared memory systems as well as on a distributed memory machine. The combination of fast parallel computers and efficient numerical
International Nuclear Information System (INIS)
2002-04-01
This issue of ITER CTA newsletter contains information about the meeting of the ITER CTA project board, which took place in Moscow, Russian Federation on 22 April 2002 on the occasion of the Third Negotiators Meeting (N3), and about the meeting 'EU divertor celebration day' organized on 16 January 2002 at Plansee AG, Reutte, Austria
Fast parallel algorithm for CT image reconstruction.
Flores, Liubov A; Vidal, Vicent; Mayo, Patricia; Rodenas, Francisco; Verdú, Gumersindo
2012-01-01
In X-ray computed tomography (CT) the X rays are used to obtain the projection data needed to generate an image of the inside of an object. The image can be generated with different techniques. Iterative methods are more suitable for the reconstruction of images with high contrast and precision in noisy conditions and from a small number of projections. Their use may be important in portable scanners for their functionality in emergency situations. However, in practice, these methods are not widely used due to the high computational cost of their implementation. In this work we analyze iterative parallel image reconstruction with the Portable Extensive Toolkit for Scientific computation (PETSc).
Expressing Parallelism with ROOT
Piparo, D.; Tejedor, E.; Guiraud, E.; Ganis, G.; Mato, P.; Moneta, L.; Valls Pla, X.; Canal, P.
2017-10-01
The need for processing the ever-increasing amount of data generated by the LHC experiments in a more efficient way has motivated ROOT to further develop its support for parallelism. Such support is being tackled both for shared-memory and distributed-memory environments. The incarnations of the aforementioned parallelism are multi-threading, multi-processing and cluster-wide executions. In the area of multi-threading, we discuss the new implicit parallelism and related interfaces, as well as the new building blocks to safely operate with ROOT objects in a multi-threaded environment. Regarding multi-processing, we review the new MultiProc framework, comparing it with similar tools (e.g. multiprocessing module in Python). Finally, as an alternative to PROOF for cluster-wide executions, we introduce the efforts on integrating ROOT with state-of-the-art distributed data processing technologies like Spark, both in terms of programming model and runtime design (with EOS as one of the main components). For all the levels of parallelism, we discuss, based on real-life examples and measurements, how our proposals can increase the productivity of scientists.
Parallel hierarchical radiosity rendering
Energy Technology Data Exchange (ETDEWEB)
Carter, Michael [Iowa State Univ., Ames, IA (United States)
1993-07-01
In this dissertation, the step-by-step development of a scalable parallel hierarchical radiosity renderer is documented. First, a new look is taken at the traditional radiosity equation, and a new form is presented in which the matrix of linear system coefficients is transformed into a symmetric matrix, thereby simplifying the problem and enabling a new solution technique to be applied. Next, the state-of-the-art hierarchical radiosity methods are examined for their suitability to parallel implementation, and scalability. Significant enhancements are also discovered which both improve their theoretical foundations and improve the images they generate. The resultant hierarchical radiosity algorithm is then examined for sources of parallelism, and for an architectural mapping. Several architectural mappings are discussed. A few key algorithmic changes are suggested during the process of making the algorithm parallel. Next, the performance, efficiency, and scalability of the algorithm are analyzed. The dissertation closes with a discussion of several ideas which have the potential to further enhance the hierarchical radiosity method, or provide an entirely new forum for the application of hierarchical methods.
Expressing Parallelism with ROOT
Energy Technology Data Exchange (ETDEWEB)
Piparo, D. [CERN; Tejedor, E. [CERN; Guiraud, E. [CERN; Ganis, G. [CERN; Mato, P. [CERN; Moneta, L. [CERN; Valls Pla, X. [CERN; Canal, P. [Fermilab
2017-11-22
The need for processing the ever-increasing amount of data generated by the LHC experiments in a more efficient way has motivated ROOT to further develop its support for parallelism. Such support is being tackled both for shared-memory and distributed-memory environments. The incarnations of the aforementioned parallelism are multi-threading, multi-processing and cluster-wide executions. In the area of multi-threading, we discuss the new implicit parallelism and related interfaces, as well as the new building blocks to safely operate with ROOT objects in a multi-threaded environment. Regarding multi-processing, we review the new MultiProc framework, comparing it with similar tools (e.g. multiprocessing module in Python). Finally, as an alternative to PROOF for cluster-wide executions, we introduce the efforts on integrating ROOT with state-of-the-art distributed data processing technologies like Spark, both in terms of programming model and runtime design (with EOS as one of the main components). For all the levels of parallelism, we discuss, based on real-life examples and measurements, how our proposals can increase the productivity of scientists.
Plasma control concepts for ITER
International Nuclear Information System (INIS)
Lister, J.B.; Nieswand, C.
1997-01-01
This overview paper skims over a wide range of issues related to the control of ITER plasmas. Although operation of the ITER project will require extensive developmental work to achieve the degree of control required, there is no indication that any of the identified problems will present overwhelming difficulties compared with the operation of present tokamaks. However, the precision of control required and the degree of automation of the final ITER plasma control system will present a challenge which is somewhat greater than for present tokamaks. In order to operate ITER optimally, integrated use of a large amount of diagnostic information will be necessary, evaluated and interpreted automatically. This will challenge both the diagnostics themselves and their supporting interpretation codes. The intervening years will provide us with the opportunity to implement and evaluate most of the new features required for ITER on existing tokamaks, with the exception of the control of an ignited plasma. (author) 7 figs., 7 refs
ITER EDA Newsletter. V. 3, no. 8
International Nuclear Information System (INIS)
1994-08-01
This ITER EDA (Engineering Design Activities) Newsletter issue reports on the sixth ITER council meeting; introduces the newly appointed ITER director and reports on his address to the ITER council. The vacuum tank for the ITER model coil testing, installed at JAERI, Naka, Japan is also briefly described
ITER interim design report package documents
International Nuclear Information System (INIS)
1996-01-01
This publication contains the Excerpt from the ITER Council (IC-8), the ITER Interim Design Report, Cost Review and Safety Analysis, ITER Site Requirements and ITER Site Design Assumptions and the Excerpt from the ITER Council (IC-9). 8 figs, 2 tabs
ITER ITA newsletter. No. 8, September 2003
International Nuclear Information System (INIS)
2003-10-01
This issue of ITER ITA (ITER transitional Arrangements) newsletter contains concise information about ITER related activities including Robert Aymar's leaving ITER for CERN, ITER related issues at the IAEA General Conference and status and prospects of thermonuclear power and activity during the ITA on materials foe vessel and in-vessel components
Robust Cell Detection for Large-Scale 3D Microscopy Using GPU-Accelerated Iterative Voting
Directory of Open Access Journals (Sweden)
Leila Saadatifard
2018-04-01
Full Text Available High-throughput imaging techniques, such as Knife-Edge Scanning Microscopy (KESM,are capable of acquiring three-dimensional whole-organ images at sub-micrometer resolution. These images are challenging to segment since they can exceed several terabytes (TB in size, requiring extremely fast and fully automated algorithms. Staining techniques are limited to contrast agents that can be applied to large samples and imaged in a single pass. This requires maximizing the number of structures labeled in a single channel, resulting in images that are densely packed with spatial features. In this paper, we propose a three-dimensional approach for locating cells based on iterative voting. Due to the computational complexity of this algorithm, a highly efficient GPU implementation is required to make it practical on large data sets. The proposed algorithm has a limited number of input parameters and is highly parallel.
International Nuclear Information System (INIS)
Bourque, R.F.; Wykes, M.E.P.
1995-01-01
The ITER cryostat is the vacuum chamber containing the tokamak reactor. Its functions are (1) to provide a high vacuum environment to limit thermal loads to the superconducting magnet system by gas conduction and convection; (2) to be part of the second radioactivity confinement boundary; and (3) provide passive removal of decay heat for beyond design basis accidents. A separate thermal shield along the inside wall limits thermal radiation to the coils. An external concrete shield provides radiological protection. The cryostat consists of a cylindrical section bolted to torispherical heads at top and bottom. The vessel is made up of two concentric walls connected by horizontal and vertical ribs. The space between the walls can be filled with helium gas at slightly above one atmosphere for thermal coupling of the two walls, to block inbound air microleaks, and for leak detection. The cryostat has many penetrations, some as large as four meters diameter, providing various types of access from the outside to the tokamak. These include heat transport system cooling pipes, cryogenic feeds, auxiliary heating, diagnostics, and blanket and divertor removal ports. Large bellows are used between the cryostat and the tokamak to accommodate differential thermal expansion
International Nuclear Information System (INIS)
Kuroda, T.; Vieider, G.; Akiba, M.
1991-01-01
This document summarizes results of the Conceptual Design Activities (1988-1990) for the International Thermonuclear Experimental Reactor (ITER) project, namely those that pertain to the plasma facing components of the reactor vessel, of which the main components are the first wall and the divertor plates. After an introduction and an executive summary, the principal functions of the plasma-facing components are delineated, i.e., (i) define the low-impurity region within which the plasma is produced, (ii) absorb the electromagnetic radiation and charged-particle flux from the plasma, and (iii) protect the blanket/shield components from the plasma. A list of critical design issues for the divertor plates and the first wall is given, followed by discussions of the divertor plate design (including the issues of material selection, erosion lifetime, design concepts, thermal and mechanical analysis, operating limits and overall lifetime, tritium inventory, baking and conditioning, safety analysis, manufacture and testing, and advanced divertor concepts) and the first wall design (armor material and design, erosion lifetime, overall design concepts, thermal and mechanical analysis, lifetime and operating limits, tritium inventory, baking and conditioning, safety analysis, manufacture and testing, an alternative first wall design, and the limiters used instead of the divertor plates during start-up). Refs, figs and tabs
International Nuclear Information System (INIS)
Kveton, O.K.
1990-11-01
The present specification of the ITER cooling system does not permit its operation with water above 150 C. However, the first wall needs to be heated to higher temperatures during conditioning at 250 C and bake-out at 350 C. In order to use the cooling water for these operations the cooling system would have to operate during conditioning at 37 Bar and during bake-out at 164 Bar. This is undesirable from the safety analysis point of view, and alternative heating methods are to be found. This review suggests that superheated steam or gas heating can be used for both baking and conditioning. The blanket design must consider the use of dual heat transfer media, allowing for change from one to another in both directions. Transfer from water to gas or steam is the most intricate and risky part of the entire heating process. Superheated steam conditioning appears unfavorable. The use of inert gas is recommended, although alternative heating fluids such as organic coolant should be investigated
ITER EDA newsletter. V. 9, no. 2
International Nuclear Information System (INIS)
2000-02-01
This ITER EDA Newsletter reports on the seventh ITER technical meeting on safety and environment and contains the executive summary of the eleventh ITER scrape-off layer and divertor physics expert group meeting. Individual abstracts have been prepared
ITER EDA newsletter. V. 7, no. 6
International Nuclear Information System (INIS)
1998-06-01
This newsletter contains the articles: 'ITER representation at the 11th Pacific Basin Nuclear Conference', 'Summary of discussion points and further deliberations in the special committee on the ITER project in the Atomic Energy Commission', and 'ITER radio frequency systems'
Enhancing data parallel aplications with task parallelism
Fernández, Jacqueline; Guerrero, Roberto A.; Piccoli, María Fabiana; Printista, Alicia Marcela; Villalobos, M.
2001-01-01
Most parallel applications contain data parallelism and almost all discussion of its solutions has limited to the simplest and least expressive form: flat data parallelism. Several generalization of the flat data parallel model have been proposed because a large number of those applications need a combination of task and data parallelism to represent their natural computation structure and to achieve good performance in their results. Their aim is to allow the capability of combining the easi...
ITER safety challenges and opportunities
International Nuclear Information System (INIS)
Piet, S.J.
1992-01-01
This paper reports on results of the Conceptual Design Activity (CDA) for the International Thermonuclear Experimental Reactor (ITER) suggest challenges and opportunities. ITER is capable of meeting anticipated regulatory dose limits, but proof is difficult because of large radioactive inventories needing stringent radioactivity confinement. Much research and development (R ampersand D) and design analysis is needed to establish that ITER meets regulatory requirements. There is a further oportunity to do more to prove more of fusion's potential safety and environmental advantages and maximize the amount of ITER technology on the path toward fusion power plants. To fulfill these tasks, three programmatic challenges and three technical challenges must be overcome. The first step is to fund a comprehensive safety and environmental ITER R ampersand D plan. Second is to strengthen safety and environment work and personnel in the international team. Third is to establish an external consultant group to advise the ITER Joint Team on designing ITER to meet safety requirements for siting by any of the Parties. The first of three key technical challenges is plasma engineering - burn control, plasma shutdown, disruptions, tritium burn fraction, and steady state operation. The second is the divertor, including tritium inventory, activation hazards, chemical reactions, and coolant disturbances. The third technical challenge is optimization of design requirements considering safety risk, technical risk, and cost
Is Monte Carlo embarrassingly parallel?
International Nuclear Information System (INIS)
Hoogenboom, J. E.
2012-01-01
Monte Carlo is often stated as being embarrassingly parallel. However, running a Monte Carlo calculation, especially a reactor criticality calculation, in parallel using tens of processors shows a serious limitation in speedup and the execution time may even increase beyond a certain number of processors. In this paper the main causes of the loss of efficiency when using many processors are analyzed using a simple Monte Carlo program for criticality. The basic mechanism for parallel execution is MPI. One of the bottlenecks turn out to be the rendez-vous points in the parallel calculation used for synchronization and exchange of data between processors. This happens at least at the end of each cycle for fission source generation in order to collect the full fission source distribution for the next cycle and to estimate the effective multiplication factor, which is not only part of the requested results, but also input to the next cycle for population control. Basic improvements to overcome this limitation are suggested and tested. Also other time losses in the parallel calculation are identified. Moreover, the threading mechanism, which allows the parallel execution of tasks based on shared memory using OpenMP, is analyzed in detail. Recommendations are given to get the maximum efficiency out of a parallel Monte Carlo calculation. (authors)
Fainberg, J.; Schaefer, W.
2015-06-01
A new algorithm for heat exchange between thermally coupled diffusely radiating interfaces is presented, which can be applied for closed and half open transparent radiating cavities. Interfaces between opaque and transparent materials are automatically detected and subdivided into elementary radiation surfaces named tiles. Contrary to the classical view factor method, the fixed unit sphere area subdivision oriented along the normal tile direction is projected onto the surrounding radiation mesh and not vice versa. Then, the total incident radiating flux of the receiver is approximated as a direct sum of radiation intensities of representative “senders” with the same weight factor. A hierarchical scheme for the space angle subdivision is selected in order to minimize the total memory and the computational demands during thermal calculations. Direct visibility is tested by means of a voxel-based ray tracing method accelerated by means of the anisotropic Chebyshev distance method, which reuses the computational grid as a Chebyshev one. The ray tracing algorithm is fully parallelized using MPI and takes advantage of the balanced distribution of all available tiles among all CPU's. This approach allows tracing of each particular ray without any communication. The algorithm has been implemented in a commercial casting process simulation software. The accuracy and computational performance of the new radiation model for heat treatment, investment and ingot casting applications is illustrated using industrial examples.
Implementation of a parallel algorithm for spherical SN calculations on the IBM 3090
International Nuclear Information System (INIS)
Haghighat, A.; Lawrence, R.D.
1989-01-01
Parallel S N algorithms based on domain decomposition in angle are straightforward to develop in Cartesian geometry because the computation of the angular fluxes for a specific discrete ordinate can be performed independently of all other angles. This is not the case for curvilinear geometries, where the angular redistribution component of the discretized streaming operator results in coupling between angular fluxes along adjacent discrete ordinates. Previously, the authors developed a parallel algorithm for S N calculations in spherical geometry and examined its iterative convergence for criticality and detector problems with differing scattering/absorption ratios. In this paper, the authors describe the implementation of the algorithm on an IBM 3090 Model 400 (four processors) and present computational results illustrating the efficiency of the algorithm relative to serial execution
ITER EDA Newsletter. V. 10, no. 7
International Nuclear Information System (INIS)
2001-07-01
This ITER EDA Newsletter presents an overview of meetings held at IAEA Headquarters in Vienna during the week 16-20 July 2001 related to the successful completion of the ITER Engineering Design Activities (EDA). Among them were the final meeting of the ITER Council, the closing ceremony to commemorate the EDA completion, the final meeting of the ITER Management Advisory Committee, a briefing of issues related to ITER developments, and discussions on the possible joint implementation of ITER
Iterative optimization in inverse problems
Byrne, Charles L
2014-01-01
Iterative Optimization in Inverse Problems brings together a number of important iterative algorithms for medical imaging, optimization, and statistical estimation. It incorporates recent work that has not appeared in other books and draws on the author's considerable research in the field, including his recently developed class of SUMMA algorithms. Related to sequential unconstrained minimization methods, the SUMMA class includes a wide range of iterative algorithms well known to researchers in various areas, such as statistics and image processing. Organizing the topics from general to more
Remote maintenance development for ITER
Energy Technology Data Exchange (ETDEWEB)
Tada, Eisuke [Japan Atomic Energy Research Inst., Tokai, Ibaraki (Japan). Tokai Research Establishment; Shibanuma, Kiyoshi
1998-04-01
This paper describes the overall ITER remote maintenance design concept developed mainly for in-vessel components such as diverters and blankets, and outlines the ITER R and D program to develop remote handling equipment and radiation hard components. Reactor structures inside the ITER cryostat must be maintained remotely due to DT operation, making remote handling technology basic to reactor design. The overall maintenance scenario and design concepts have been developed, and maintenance design feasibility, including fabrication and testing of full-scale in-vessel remote maintenance handling equipment and tool, is being verified. (author)
Iterative nonlinear unfolding code: TWOGO
International Nuclear Information System (INIS)
Hajnal, F.
1981-03-01
a new iterative unfolding code, TWOGO, was developed to analyze Bonner sphere neutron measurements. The code includes two different unfolding schemes which alternate on successive iterations. The iterative process can be terminated either when the ratio of the coefficient of variations in terms of the measured and calculated responses is unity, or when the percentage difference between the measured and evaluated sphere responses is less than the average measurement error. The code was extensively tested with various known spectra and real multisphere neutron measurements which were performed inside the containments of pressurized water reactors
Recent ASDEX Upgrade research in support of ITER and DEMO
DEFF Research Database (Denmark)
Zohm, H.; Ahn, J.; Aho-Mantila, L.
2015-01-01
of radiated power during MGI mitigation. Concerning power exhaust, the partially detached ITER divertor scenario has been demonstrated at Psep/R = 10 MW m−1 in ASDEX Upgrade, with a peak time averaged target load around 5 MW m−2, well consistent with the component limits for ITER. Developing this towards DEMO......Recent experiments on the ASDEX Upgrade tokamak aim at improving the physics base for ITER and DEMO to aid the machine design and prepare efficient operation. Type I edge localized mode (ELM) mitigation using resonant magnetic perturbations (RMPs) has been shown at low pedestal collisionality...... to be the decisive element for the L–H power threshold. A physics based scaling of the density at which the minimum PLH occurs indicates that ITER could take advantage of it to initiate H-mode at lower density than that of the final Q = 10 operational point. Core density fluctuation measurements resolved in radius...
Parallelizing Monte Carlo with PMC
International Nuclear Information System (INIS)
Rathkopf, J.A.; Jones, T.R.; Nessett, D.M.; Stanberry, L.C.
1994-11-01
PMC (Parallel Monte Carlo) is a system of generic interface routines that allows easy porting of Monte Carlo packages of large-scale physics simulation codes to Massively Parallel Processor (MPP) computers. By loading various versions of PMC, simulation code developers can configure their codes to run in several modes: serial, Monte Carlo runs on the same processor as the rest of the code; parallel, Monte Carlo runs in parallel across many processors of the MPP with the rest of the code running on other MPP processor(s); distributed, Monte Carlo runs in parallel across many processors of the MPP with the rest of the code running on a different machine. This multi-mode approach allows maintenance of a single simulation code source regardless of the target machine. PMC handles passing of messages between nodes on the MPP, passing of messages between a different machine and the MPP, distributing work between nodes, and providing independent, reproducible sequences of random numbers. Several production codes have been parallelized under the PMC system. Excellent parallel efficiency in both the distributed and parallel modes results if sufficient workload is available per processor. Experiences with a Monte Carlo photonics demonstration code and a Monte Carlo neutronics package are described
International Nuclear Information System (INIS)
Sadakov, S.; Fauser, F.; Nelson, B.
1991-01-01
This document describes the results and recommendations of the Containment Structures Design Unit (CSDU) on the containment structures for ITER, made in the context of the Conceptual Design Phase. The document describes the following subsystems: (1) the primary vacuum vessel (VV), (2) the attaching locks (AL) of the invessel components, (3) the plasma passive and active stabilizers, (4) the cryostat vessel, and (5) the machine gravity supports. Although for most components reference designs were selected, for some of these alternative design options were described, because unresolved problems necessitate further research and development. Conclusions and future needs are summarized for each of the above subsystems: (1) a reference VV design was selected, while most critical VV future needs are the feasibility studies of manufacturing, assembly, and the repair/disassembly/reassembly by remote handling. Alternative, thin-wall options appear attractive and should be studied further during the Engineering Design Activities; (2) no reference design solution was selected for the AL system, as AL design requirements are extremely difficult and internally contradictory, while there is no existing tokamak precedent, but instead, five different approaches will be further researched early in the Engineering Design Phase; (3) significant progress is reported on passive loops, for which the ''twin-loops'' concept is ready to be advanced into the Engineering Design Phase, and on active coils, where a new coil positioning prevents interference with the blanket removal paths, and the current joints are located in a secondary vacuum or in the atmosphere of the reactor hall, repairable by remote handling; (4) a full metallic welded cryostat design with increased toroidal resistance was chosen, but with a design based on concrete with a thin inner metallic liner as a back-up in case detailed nuclear shielding requirements would force the cryostat to act as biological shield; (5) out
Perturbation resilience and superiorization of iterative algorithms
International Nuclear Information System (INIS)
Censor, Y; Davidi, R; Herman, G T
2010-01-01
Iterative algorithms aimed at solving some problems are discussed. For certain problems, such as finding a common point in the intersection of a finite number of convex sets, there often exist iterative algorithms that impose very little demand on computer resources. For other problems, such as finding that point in the intersection at which the value of a given function is optimal, algorithms tend to need more computer memory and longer execution time. A methodology is presented whose aim is to produce automatically for an iterative algorithm of the first kind a 'superiorized version' of it that retains its computational efficiency but nevertheless goes a long way toward solving an optimization problem. This is possible to do if the original algorithm is 'perturbation resilient', which is shown to be the case for various projection algorithms for solving the consistent convex feasibility problem. The superiorized versions of such algorithms use perturbations that steer the process in the direction of a superior feasible point, which is not necessarily optimal, with respect to the given function. After presenting these intuitive ideas in a precise mathematical form, they are illustrated in image reconstruction from projections for two different projection algorithms superiorized for the function whose value is the total variation of the image
Toward Generalization of Iterative Small Molecule Synthesis.
Lehmann, Jonathan W; Blair, Daniel J; Burke, Martin D
2018-02-01
Small molecules have extensive untapped potential to benefit society, but access to this potential is too often restricted by limitations inherent to the customized approach currently used to synthesize this class of chemical matter. In contrast, the "building block approach", i.e., generalized iterative assembly of interchangeable parts, has now proven to be a highly efficient and flexible way to construct things ranging all the way from skyscrapers to macromolecules to artificial intelligence algorithms. The structural redundancy found in many small molecules suggests that they possess a similar capacity for generalized building block-based construction. It is also encouraging that many customized iterative synthesis methods have been developed that improve access to specific classes of small molecules. There has also been substantial recent progress toward the iterative assembly of many different types of small molecules, including complex natural products, pharmaceuticals, biological probes, and materials, using common building blocks and coupling chemistry. Collectively, these advances suggest that a generalized building block approach for small molecule synthesis may be within reach.
Design of the ITER Neutral Beam injectors
International Nuclear Information System (INIS)
Hemsworth, R.S.; Feist, J.; Hanada, M.; Heinemann, B.; Inoue, T.; Kuessel, E.; Kulygin, V.; Krylov, A.; Lotte, P.; Miyamoto, K.; Miyamoto, N.; Murdoch, D.; Nagase, A.; Ohara, Y.; Okumura, Y.; Pamela, J.; Panasenkov, A.; Shibata, K.; Tanii, M.
1996-01-01
This paper describes the Neutral Beam Injection system which is presently being designed in Europe, Japan and Russia, with co-ordination by the Joint Central Team of ITER at Naka, Japan. The proposed system consists of three negative ion based neutral injectors, delivering a total of 50 MW of 1 MeV D 0 to the ITER plasma for pulse length of ≥1000 s. The injectors each use a single caesiated volume arc discharge negative ion source, and a multi-grid, multi-aperture accelerator, to produce about 40 A of 1 MeV D - . This will be neutralized in a sub-divided gas neutralizer, which has a conversion efficiency of about 60%. The charged fraction of the beam emerging from the neutralizer is dumped in an electrostatic residual ion dump. A water cooled calorimeter can be moved into the beam path to intercept the neutral beam, allowing commissioning of the injector independent of ITER. copyright 1996 American Institute of Physics
Directory of Open Access Journals (Sweden)
Edelheit Oded
2009-06-01
Full Text Available Abstract Background In protein engineering, site-directed mutagenesis methods are used to generate DNA sequences with mutated codons, insertions or deletions. In a widely used method, mutations are generated by PCR using a pair of oligonucleotide primers designed with mismatching nucleotides at the center of the primers. In this method, primer-primer annealing may prevent cloning of mutant cDNAs. To circumvent this problem we developed an alternative procedure that does not use forward-reverse primer pair in the same reaction. Results In initial studies we used a double-primer PCR mutagenesis protocol, but sequencing of products showed tandem repeats of primer in cloned DNA. We developed an alternative method that starts with two Single-Primer Reactions IN Parallel using high-fidelity Pwo DNA polymerase. Thus, we call the method with the acronym SPRINP. The SPRINP reactions are then combined, denatured at 95°C, and slowly cooled, promoting random annealing of the parental DNA and the newly synthesized strands. The products are digested with DpnI that digests methylated parental strands, and then transformed into E. coli. Using this method we generated >40 mutants in cDNAs coding for human Epithelial Na+ Channel (ENaC subunits. The method has been tested for 1–3 bp codon mutation and insertion of a 27 bp epitope tag into cDNAs. Conclusion The SPRINP mutagenesis protocol yields mutants reliably and with high fidelity. The use of a single primer in each amplification reaction increases the probability of success of primers relative to previous methods employing a forward and reverse primer pair in the same reaction.
Edelheit, Oded; Hanukoglu, Aaron; Hanukoglu, Israel
2009-06-30
In protein engineering, site-directed mutagenesis methods are used to generate DNA sequences with mutated codons, insertions or deletions. In a widely used method, mutations are generated by PCR using a pair of oligonucleotide primers designed with mismatching nucleotides at the center of the primers. In this method, primer-primer annealing may prevent cloning of mutant cDNAs. To circumvent this problem we developed an alternative procedure that does not use forward-reverse primer pair in the same reaction. In initial studies we used a double-primer PCR mutagenesis protocol, but sequencing of products showed tandem repeats of primer in cloned DNA. We developed an alternative method that starts with two Single-Primer Reactions IN Parallel using high-fidelity Pwo DNA polymerase. Thus, we call the method with the acronym SPRINP. The SPRINP reactions are then combined, denatured at 95 degrees C, and slowly cooled, promoting random annealing of the parental DNA and the newly synthesized strands. The products are digested with DpnI that digests methylated parental strands, and then transformed into E. coli. Using this method we generated >40 mutants in cDNAs coding for human Epithelial Na+ Channel (ENaC) subunits. The method has been tested for 1-3 bp codon mutation and insertion of a 27 bp epitope tag into cDNAs. The SPRINP mutagenesis protocol yields mutants reliably and with high fidelity. The use of a single primer in each amplification reaction increases the probability of success of primers relative to previous methods employing a forward and reverse primer pair in the same reaction.
International Nuclear Information System (INIS)
Gianluca, Longoni; Alireza, Haghighat
2003-01-01
In recent years, the SP L (simplified spherical harmonics) equations have received renewed interest for the simulation of nuclear systems. We have derived the SP L equations starting from the even-parity form of the S N equations. The SP L equations form a system of (L+1)/2 second order partial differential equations that can be solved with standard iterative techniques such as the Conjugate Gradient (CG). We discretized the SP L equations with the finite-volume approach in a 3-D Cartesian space. We developed a new 3-D general code, Pensp L (Parallel Environment Neutral-particle SP L ). Pensp L solves both fixed source and criticality eigenvalue problems. In order to optimize the memory management, we implemented a Compressed Diagonal Storage (CDS) to store the SP L matrices. Pensp L includes parallel algorithms for space and moment domain decomposition. The computational load is distributed on different processors, using a mapping function, which maps the 3-D Cartesian space and moments onto processors. The code is written in Fortran 90 using the Message Passing Interface (MPI) libraries for the parallel implementation of the algorithm. The code has been tested on the Pcpen cluster and the parallel performance has been assessed in terms of speed-up and parallel efficiency. (author)
Iterative methods for 3D implicit finite-difference migration using the complex Padé approximation
International Nuclear Information System (INIS)
Costa, Carlos A N; Campos, Itamara S; Costa, Jessé C; Neto, Francisco A; Schleicher, Jörg; Novais, Amélia
2013-01-01
Conventional implementations of 3D finite-difference (FD) migration use splitting techniques to accelerate performance and save computational cost. However, such techniques are plagued with numerical anisotropy that jeopardises the correct positioning of dipping reflectors in the directions not used for the operator splitting. We implement 3D downward continuation FD migration without splitting using a complex Padé approximation. In this way, the numerical anisotropy is eliminated at the expense of a computationally more intensive solution of a large-band linear system. We compare the performance of the iterative stabilized biconjugate gradient (BICGSTAB) and that of the multifrontal massively parallel direct solver (MUMPS). It turns out that the use of the complex Padé approximation not only stabilizes the solution, but also acts as an effective preconditioner for the BICGSTAB algorithm, reducing the number of iterations as compared to the implementation using the real Padé expansion. As a consequence, the iterative BICGSTAB method is more efficient than the direct MUMPS method when solving a single term in the Padé expansion. The results of both algorithms, here evaluated by computing the migration impulse response in the SEG/EAGE salt model, are of comparable quality. (paper)
Development Of A Parallel Performance Model For The THOR Neutral Particle Transport Code
Energy Technology Data Exchange (ETDEWEB)
Yessayan, Raffi; Azmy, Yousry; Schunert, Sebastian
2017-02-01
The THOR neutral particle transport code enables simulation of complex geometries for various problems from reactor simulations to nuclear non-proliferation. It is undergoing a thorough V&V requiring computational efficiency. This has motivated various improvements including angular parallelization, outer iteration acceleration, and development of peripheral tools. For guiding future improvements to the code’s efficiency, better characterization of its parallel performance is useful. A parallel performance model (PPM) can be used to evaluate the benefits of modifications and to identify performance bottlenecks. Using INL’s Falcon HPC, the PPM development incorporates an evaluation of network communication behavior over heterogeneous links and a functional characterization of the per-cell/angle/group runtime of each major code component. After evaluating several possible sources of variability, this resulted in a communication model and a parallel portion model. The former’s accuracy is bounded by the variability of communication on Falcon while the latter has an error on the order of 1%.
3D dictionary learning based iterative cone beam CT reconstruction
Directory of Open Access Journals (Sweden)
Ti Bai
2014-03-01
Full Text Available Purpose: This work is to develop a 3D dictionary learning based cone beam CT (CBCT reconstruction algorithm on graphic processing units (GPU to improve the quality of sparse-view CBCT reconstruction with high efficiency. Methods: A 3D dictionary containing 256 small volumes (atoms of 3 × 3 × 3 was trained from a large number of blocks extracted from a high quality volume image. On the basis, we utilized cholesky decomposition based orthogonal matching pursuit algorithm to find the sparse representation of each block. To accelerate the time-consuming sparse coding in the 3D case, we implemented the sparse coding in a parallel fashion by taking advantage of the tremendous computational power of GPU. Conjugate gradient least square algorithm was adopted to minimize the data fidelity term. Evaluations are performed based on a head-neck patient case. FDK reconstruction with full dataset of 364 projections is used as the reference. We compared the proposed 3D dictionary learning based method with tight frame (TF by performing reconstructions on a subset data of 121 projections. Results: Compared to TF based CBCT reconstruction that shows good overall performance, our experiments indicated that 3D dictionary learning based CBCT reconstruction is able to recover finer structures, remove more streaking artifacts and also induce less blocky artifacts. Conclusion: 3D dictionary learning based CBCT reconstruction algorithm is able to sense the structural information while suppress the noise, and hence to achieve high quality reconstruction under the case of sparse view. The GPU realization of the whole algorithm offers a significant efficiency enhancement, making this algorithm more feasible for potential clinical application.-------------------------------Cite this article as: Bai T, Yan H, Shi F, Jia X, Lou Y, Xu Q, Jiang S, Mou X. 3D dictionary learning based iterative cone beam CT reconstruction. Int J Cancer Ther Oncol 2014; 2(2:020240. DOI: 10
Implementing hybrid MPI/OpenMP parallelism in Fluidity
Gorman, Gerard; Lange, Michael; Avdis, Alexandros; Guo, Xiaohu; Mitchell, Lawrence; Weiland, Michele
2014-05-01
Parallelising finite element codes using domain decomposition methods and MPI has nearly become routine at the application code level. This has been helped in no small part by the development of an eco-system of open source libraries to provide key functionality, for example SCOTCH for graph partitioning or PETSc for sparse iterative solvers. As we move to an era where pure MPI no longer suffices, application developers cannot only focus on the application code, but must consider the full software stack. In the case of Fluidity (an open source control volume/finite element general purpose fluid dynamics code) the decision to improve parallel efficiency by moving to a hybrid MPI/OpenMP programming model it became necessary to get involved in extending 3rd party open source libraries, specifically PETSc, in addition to the application code itself. The effort involved in re-engineering a large application code highlights the fact that as computing platforms continue their advance towards low power many core processors, the software stack must also develop at a similar pace or application codes will suffer. In this presentation we will illustrate the steps required to re-engineer Fluidity to achieve good parallel efficiency when using MPI/OpenMP. We identify performance pitfalls when using Fortran features such as automatic arrays in a multi-threaded context, as well as poor data locality on NUMA platforms. A significant proportion of the computational cost is in the sparse iterative solvers. For this we collaborated with the development team at Argonne National Laboratory to add OpenMP support to PETSc. We will present performance results for both the application as a whole, as well as for key individual components such as matrix assembly and the solvers. We also show that while we did not explicitly target I/O for optimisation here, its performance is nonetheless greatly improved because of fewer processes accessing the file system. One of the main remaining
Cooperation between CERN and ITER
2008-01-01
CERN and the International Fusion Organisation ITER have just signed a first cooperation agreeement. Kaname Ikeda, the Director-General of the International Fusion Energy Organisation (ITER) (on the right) and Robert Aymar, Director-General of CERN, signing the agreement.The Director-General of the International Fusion Energy Organization, Mr Kaname Ikeda, and CERN Director-General, Robert Aymar, signed a cooperation agreement at a meeting on the Meyrin site on Thursday 6 March. One of the main purposes of this agreement is for CERN to give ITER the benefit of its experience in the field of technology as well as in administrative domains such as finance, procurement, human resources and informatics through the provision of consultancy services. Currently in its start-up phase at its Cadarache site, 70 km from Marseilles (France), ITER will focus its research on the scientific and technical feasibility of using fusion energy as a fu...
Rollout sampling approximate policy iteration
Dimitrakakis, C.; Lagoudakis, M.G.
2008-01-01
Several researchers have recently investigated the connection between reinforcement learning and classification. We are motivated by proposals of approximate policy iteration schemes without value functions, which focus on policy representation using classifiers and address policy learning as a
ITER Conceptual design: Interim report
International Nuclear Information System (INIS)
1990-01-01
This interim report describes the results of the International Thermonuclear Experimental Reactor (ITER) Conceptual Design Activities after the first year of design following the selection of the ITER concept in the autumn of 1988. Using the concept definition as the basis for conceptual design, the Design Phase has been underway since October 1988, and will be completed at the end of 1990, at which time a final report will be issued. This interim report includes an executive summary of ITER activities, a description of the ITER device and facility, an operation and research program summary, and a description of the physics and engineering design bases. Included are preliminary cost estimates and schedule for completion of the project
Plan of ITER remote experimentation center
Energy Technology Data Exchange (ETDEWEB)
Ozeki, T., E-mail: ozeki.takahisa@jaea.go.jp [Japan Atomic Energy Agency, 2-166 Obuchi Rokkasho, Kitakami-gun, Aomori 039-3212 (Japan); Clement, S.L. [Fusion for Energy, Torres Diagonal Litoral, B3, 13/03, 08019 Barcelona (Spain); Nakajima, N. [National Institute for Fusion Science and Project Leader of IFERC, 2-166 Obuchi, Rokkasho, Kamikita-gun, Aomori 039-3212 (Japan)
2014-05-15
Plan of ITER remote experimentation center (REC) based on the broader approach (BA) activity of the joint program of Japan and Europe (EU) is described. Objectives of REC activity are (1) to identify the functions and solve the technical issues for the construction of the REC for ITER at Rokkasho, (2) to develop the remote experiment system and verify the functions required for the remote experiment by using the Satellite Tokamak (JT-60SA) facilities in order to make the future experiments of ITER and JT-60SA effectively and efficiently implemented, and (3) to test the functions of REC and demonstrate the total system by using JT-60SA and existing other facilities in EU. Preliminary identified items to be developed are (1) Functions of the remote experiment system, such as setting of experiment parameters, shot scheduling, real time data streaming, communication by video-conference between the remote-site and on-site, (2) Effective data transfer system that is capable of fast transfer of the huge amount of data between on-site and off-site and the network connecting the REC system, (3) Storage system that can store/access the huge amount of data, including database management, (4) Data analysis software for the data viewing of the diagnostic data on the storage system, (5) Numerical simulation for preparation and estimation of the shot performance and the analysis of the plasma shot. Detailed specifications of the above items will be discussed and the system will be made in these four years in collaboration with tokamak facilities of JT-60SA and EU tokamak, experts of informatics, activities of plasma simulation and ITER. Finally, the function of REC will be tested and the total system will be demonstrated by the middle of 2017.
Feder, Toni
2003-01-01
After successfully chairing an external review committee for CERN last year, Robert Aymar will leave ITER to become director general of the European particle physics laboratory rom 2004. Before ITER he also successfully managed the startup or Tore Supra. He will attempt to ensure that the LHC begins operating in 2007 - two years late - and is paid for by 2010 and will also start the planning for life after the LHC (1 page)
International Nuclear Information System (INIS)
Aymar, R.
2000-01-01
Six years of joint work under the international thermonuclear experimental reactor (ITER) EDA agreement yielded a mature design for ITER which met the objectives set for it (ITER final design report (FDR)), together with a corpus of scientific and technological data, large/full scale models or prototypes of key components/systems and progress in understanding which both validated the specific design and are generally applicable to a next step, reactor-oriented tokamak on the road to the development of fusion as an energy source. In response to requests from the parties to explore the scope for addressing ITER's programmatic objective at reduced cost, the study of options for cost reduction has been the main feature of ITER work since summer 1998, using the advances in physics and technology databases, understandings, and tools arising out of the ITER collaboration to date. A joint concept improvement task force drawn from the joint central team and home teams has overseen and co-ordinated studies of the key issues in physics and technology which control the possibility of reducing the overall investment and simultaneously achieving the required objectives. The aim of this task force is to achieve common understandings of these issues and their consequences so as to inform and to influence the best cost-benefit choice, which will attract consensus between the ITER partners. A report to be submitted to the parties by the end of 1999 will present key elements of a specific design of minimum capital investment, with a target cost saving of about 50% the cost of the ITER FDR design, and a restricted number of design variants. Outline conclusions from the work of the task force are presented in terms of physics, operations, and design of the main tokamak systems. Possible implications for the way forward are discussed
ITER diagnostic system: Vacuum interface
International Nuclear Information System (INIS)
Patel, K.M.; Udintsev, V.S.; Hughes, S.; Walker, C.I.; Andrew, P.; Barnsley, R.; Bertalot, L.; Drevon, J.M.; Encheva, A.; Kashchuk, Y.; Maquet, Ph.; Pearce, R.; Taylor, N.; Vayakis, G.; Walsh, M.J.
2013-01-01
Diagnostics play an essential role for the successful operation of the ITER tokamak. They provide the means to observe control and to measure plasma during the operation of ITER tokamak. The components of the diagnostic system in the ITER tokamak will be installed in the vacuum vessel, in the cryostat, in the upper, equatorial and divertor ports, in the divertor cassettes and racks, as well as in various buildings. Diagnostic components that are placed in a high radiation environment are expected to operate for the life of ITER. There are approx. 45 diagnostic systems located on ITER. Some diagnostics incorporate direct or independently pumped extensions to maintain their necessary vacuum conditions. They require a base pressure less than 10 −7 Pa, irrespective of plasma operation, and a leak rate of less than 10 −10 Pa m 3 s −1 . In all the cases it is essential to maintain the ITER closed fuel cycle. These directly coupled diagnostic systems are an integral part of the ITER vacuum containment and are therefore subject to the same design requirements for tritium and active gas confinement, for all normal and accidental conditions. All the diagnostics, whether or not pumped, incorporate penetration of the vacuum boundary (i.e. window assembly, vacuum feedthrough etc.) and demountable joints. Monitored guard volumes are provided for all elements of the vacuum boundary that are judged to be vulnerable by virtue of their construction, material, load specification etc. Standard arrangements are made for their construction and for the monitoring, evacuating and leak testing of these volumes. Diagnostic systems are incorporated at more than 20 ports on ITER. This paper will describe typical and particular arrangements of pumped diagnostic and monitored guard volume. The status of the diagnostic vacuum systems, which are at the start of their detailed design, will be outlined and the specific features of the vacuum systems in ports and extensions will be described
The ITER remote maintenance system
International Nuclear Information System (INIS)
Tesini, A.
2007-01-01
Full text of publication follows: ITER is a joint international research and development project that aims to demonstrate the scientific and technological feasibility of fusion power. As soon as the plasma operation begins using tritium, the replacement of the vacuum vessel internal components will need to be done with remote handling techniques. To accomplish these operations ITER has equipped itself with a Remote Maintenance System; this includes the Remote Handling equipment set and the Hot Cell facility. Both need to work in a cooperative way, with the aim of minimizing the machine shutdown periods and to maximize the machine availability. The ITER Remote Handling equipment set is required to be available, robust, reliable and retrievable. The machine components, to be remotely handle-able, are required to be designed simply so as to ease their maintenance. The baseline ITER Remote Handling equipment is described. The ITER Hot Cell Facility is required to provide a controlled and shielded area for the execution of repair operations (carried out using dedicated remote handling equipment) on those activated components which need to be returned to service, inside the vacuum vessel. The Hot Cell provides also the equipment and space for the processing and temporary storage of the operational and decommissioning rad-waste. A conceptual ITER Hot Cell Facility is described. (authors)
International Nuclear Information System (INIS)
1989-01-01
Under the auspices of the International Atomic Energy Agency (IAEA), an agreement among the four parties representing the world's major fusion programs resulted in a program for conceptual design of the next logical step in the fusion program, the International Thermonuclear Experimental Reactor (ITER). The definition phase, which ended in November, 1989, is summarized in two reports: a brief summary is contained in the ITER Definition Phase Report (IAEA/ITER/DS/2); the extended technical summary and technical details of ITER are contained in this two-volume report. The first volume of this report contains the Introduction and Summary, and the remainder will appear in Volume II. In the Conceptual Design Activities phase, ITER has been defined as being a tokamak device. The basic performance parameters of ITER are given in Volume I of this report. In addition, the rationale for selection of this concept, the performance flexibility, technical issues, operations, safety, reliability, cost, and research and development needed to proceed with the design are discussed. Figs and tabs
The ITER remote maintenance system
International Nuclear Information System (INIS)
Tesini, A.; Palmer, J.
2007-01-01
ITER is a joint international research and development project that aims to demonstrate the scientific and technological feasibility of fusion power. As soon as the plasma operation begins using tritium, the replacement of the vacuum vessel internal components will need to be done with remote handling techniques. To accomplish these operations ITER has equipped itself with a Remote Maintenance System; this includes the Remote Handling equipment set and the Hot Cell facility. Both need to work in a cooperative way, with the aim of minimizing the machine shutdown periods and to maximize the machine availability. The ITER Remote Handling equipment set is required to be available, robust, reliable and retrievable. The machine components, to be remotely handle-able, are required to be designed simply so as to ease their maintenance. The baseline ITER Remote Handling equipment is described. The ITER Hot Cell Facility is required to provide a controlled and shielded area for the execution of repair operations (carried out using dedicated remote handling equipment) on those activated components which need to be returned to service, inside the vacuum vessel. The Hot Cell provides also the equipment and space for the processing and temporary storage of the operational and decommissioning radwaste. A conceptual ITER Hot Cell Facility is described. (orig.)
Directory of Open Access Journals (Sweden)
Bin Yan
2015-01-01
Full Text Available Sparse-view imaging is a promising scanning method which can reduce the radiation dose in X-ray computed tomography (CT. Reconstruction algorithm for sparse-view imaging system is of significant importance. The adoption of the spatial iterative algorithm for CT image reconstruction has a low operation efficiency and high computation requirement. A novel Fourier-based iterative reconstruction technique that utilizes nonuniform fast Fourier transform is presented in this study along with the advanced total variation (TV regularization for sparse-view CT. Combined with the alternating direction method, the proposed approach shows excellent efficiency and rapid convergence property. Numerical simulations and real data experiments are performed on a parallel beam CT. Experimental results validate that the proposed method has higher computational efficiency and better reconstruction quality than the conventional algorithms, such as simultaneous algebraic reconstruction technique using TV method and the alternating direction total variation minimization approach, with the same time duration. The proposed method appears to have extensive applications in X-ray CT imaging.
Three-dimensional reconstruction from cone beam projection by a block iterative technique
Peyrin, Francoise; Goutte, Robert; Amiel, Michel
1991-07-01
This work is concerned with truly 3D X-ray tomography. The method consists in the acquisition of an object's radiographs for different positions of an X-ray cone beam source. The image is then obtained by solving a 3D reconstruction problem from cone beam projections. When considering a series expansion approach, the problem is equivalent to the resolution of a linear system, presenting very particular characteristics in size and sparseness. The authors investigate the use of block iterative techniques which allow an efficient implementation of the algorithm on a parallel computer. Three different block iterative reconstruction schemes are developed. They can be used with or without simple constraints on the solution (positivity, amplitude, support...). Results obtained on simulated images allow comparison to the convergence properties of the different methods. Contrary to the conventional case in truly 3D X-ray tomography, different trajectories of the cone beam source are considered and the first results obtained on simulated objects are discussed.
Template based parallel checkpointing in a massively parallel computer system
Energy Technology Data Exchange (ETDEWEB)
Archer, Charles Jens [Rochester, MN; Inglett, Todd Alan [Rochester, MN
2009-01-13
A method and apparatus for a template based parallel checkpoint save for a massively parallel super computer system using a parallel variation of the rsync protocol, and network broadcast. In preferred embodiments, the checkpoint data for each node is compared to a template checkpoint file that resides in the storage and that was previously produced. Embodiments herein greatly decrease the amount of data that must be transmitted and stored for faster checkpointing and increased efficiency of the computer system. Embodiments are directed to a parallel computer system with nodes arranged in a cluster with a high speed interconnect that can perform broadcast communication. The checkpoint contains a set of actual small data blocks with their corresponding checksums from all nodes in the system. The data blocks may be compressed using conventional non-lossy data compression algorithms to further reduce the overall checkpoint size.
Ergül, Özgür
2011-11-01
Fast and accurate solutions of large-scale electromagnetics problems involving homogeneous dielectric objects are considered. Problems are formulated with the electric and magnetic current combined-field integral equation and discretized with the Rao-Wilton-Glisson functions. Solutions are performed iteratively by using the multilevel fast multipole algorithm (MLFMA). For the solution of large-scale problems discretized with millions of unknowns, MLFMA is parallelized on distributed-memory architectures using a rigorous technique, namely, the hierarchical partitioning strategy. Efficiency and accuracy of the developed implementation are demonstrated on very large problems involving as many as 100 million unknowns.
Green, DG; Meacham, KE; vanHoesel, F; Hertzberger, B; Serazzi, G
1995-01-01
This paper describes the techniques and methodologies employed during parallelization of the Molecular Dynamics (MD) code GROMOS87, with the specific requirement that the program run efficiently on a range of distributed-memory parallel platforms. We discuss the preliminary results of our parallel
Parallel algorithms for continuum dynamics
International Nuclear Information System (INIS)
Hicks, D.L.; Liebrock, L.M.
1987-01-01
Simply porting existing parallel programs to a new parallel processor may not achieve the full speedup possible; to achieve the maximum efficiency may require redesigning the parallel algorithms for the specific architecture. The authors discuss here parallel algorithms that were developed first for the HEP processor and then ported to the CRAY X-MP/4, the ELXSI/10, and the Intel iPSC/32. Focus is mainly on the most recent parallel processing results produced, i.e., those on the Intel Hypercube. The applications are simulations of continuum dynamics in which the momentum and stress gradients are important. Examples of these are inertial confinement fusion experiments, severe breaks in the coolant system of a reactor, weapons physics, shock-wave physics. Speedup efficiencies on the Intel iPSC Hypercube are very sensitive to the ratio of communication to computation. Great care must be taken in designing algorithms for this machine to avoid global communication. This is much more critical on the iPSC than it was on the three previous parallel processors
Li, Husheng; Betz, Sharon M.; Poor, H. Vincent
2006-01-01
This paper examines the performance of decision feedback based iterative channel estimation and multiuser detection in channel coded aperiodic DS-CDMA systems operating over multipath fading channels. First, explicit expressions describing the performance of channel estimation and parallel interference cancellation based multiuser detection are developed. These results are then combined to characterize the evolution of the performance of a system that iterates among channel estimation, multiu...
ITER EDA newsletter. V. 5, no. 9
International Nuclear Information System (INIS)
1996-09-01
This issue of the Newsletter on the Engineering Design Activities (EDA) for the ITER project contains an overview of one of the seven large ITER Research and Development Projects identified by the ITER Director, namely the Vacuum Vessel Sector, as well as an account of computer animation created for ITER
ITER EDA newsletter. V. 4, no. 9
International Nuclear Information System (INIS)
1995-09-01
This issue of the ITER EDA (Engineering Design Activities) Newsletter contains reports on the first meeting of the ITER Test Blanket Working Group held 19-21 July 1995 at the ITER Garching Joint Work Site, and on the second workshop of the ITER Expert Group on Confinement and Transport
ITER ITA newsletter. No. 22, May 2005
International Nuclear Information System (INIS)
2005-06-01
This issue of ITER ITA (ITER transitional Arrangements) newsletter contains concise information about Japanese Participant Team's recent activities in the ITER Transitional Arrangements(ITA) phase and ITER related meeting the Fourth IAEA Technical Meeting (IAEA-TM) on Negative Ion Based Neutral Beam Injectors which was held in Padova, Italy from 9-11 May 2005
ITER EDA newsletter. V. 10, no. 6
International Nuclear Information System (INIS)
2001-06-01
This ITER EDA Newsletter issue includes information about the ITER Management Advisory Committee Meeting held in Vienna on 16 July 2001 and also a summary of the ninth ITER Technical Meeting on safety and environment held at the ITER Garching Joint Work site, 8 to 10 May, 2001
ITER EDA Newsletter. V. 4, no. 5
International Nuclear Information System (INIS)
1995-05-01
This issue of the ITER EDA (Engineering Design Activities) Newsletter contains comments on the ITER project by the Permanent Representative of the Russian Federation to the International Organizations in Vienna; a report on the ITER Magnet Technical Meeting held at the Joint Work Site at Naka, Japan, April 19-21, 1995; and a contribution entitled ''ITER spouses cross the cultures''
ITER EDA newsletter. V. 8, no. 9
International Nuclear Information System (INIS)
1999-09-01
This edition of the ITER EDA Newsletter contains a contribution by the ITER Director, R. Aymar, on the subject of developments in ITER Physics R and D report on the completion of the ITER central solenoid model coils installation by H. Tsuji, Head fo the Superconducting Magnet Laboratory at JAERI in Naka, Japan. Individual abstracts are prepared for each of the two articles
ITER EDA newsletter. V. 7, no. 1
International Nuclear Information System (INIS)
1998-01-01
This issue of the ITER Newsletter contains a summary report on the Thirteenth meeting of the ITER Management Advisory Committee (MAC), a report on ITER at the International Conference on Fusion Reactor Materials and a report of a Russian scientist working at ITER Garching JWS
ITER ITA newsletter. No. 5, June 2003
International Nuclear Information System (INIS)
2003-08-01
This issue of ITER ITA (ITER transitional Arrangements) newsletter contains concise information about ITER related activities, one of them retirement of Dr. Michel Huguet, deputy director of the ITER central team and the Head of Naka joint work site and another about 10.5 years of his activities at this site
ITER EDA Newsletter. V.3, no.3
International Nuclear Information System (INIS)
1994-03-01
This ITER EDA Newsletter issue contains reports on (i) the completion of the ITER EDA Protocol 1, (ii) the signing of ITER EDA Protocol 2, (iii) a technical meeting on pumping and fuelling and (iv) a technical meeting on the ITER Tritium Plant
ITER EDA newsletter. V. 8, no. 12
International Nuclear Information System (INIS)
1999-12-01
This ITER EDA Newsletter reports about the ITER Management Advisory Committee Meeting in Naka, the ITER Technical Advisory Committee Meeting in Naka and the meeting of the ITER SWG-P2 in Vienna. A separate abstract is prepared for each meeting
ITER safety challenges and opportunities
International Nuclear Information System (INIS)
Piet, S.J.
1991-01-01
Results of the Conceptual Design Activity (CDA) for the International Thermonuclear Experimental Reactor (ITER) suggest challenges and opportunities. ''ITER is capable of meeting anticipated regulatory dose limits,'' but proof is difficult because of large radioactive inventories needing stringent radioactivity confinement. We need much research and development (R ampersand D) and design analysis to establish that ITER meets regulatory requirements. We have a further opportunity to do more to prove more of fusion's potential safety and environmental advantages and maximize the amount of ITER technology on the path toward fusion power plants. To fulfill these tasks, we need to overcome three programmatic challenges and three technical challenges. The first programmatic challenge is to fund a comprehensive safety and environmental ITER R ampersand D plan. Second is to strengthen safety and environment work and personnel in the international team. Third is to establish an external consultant group to advise the ITER Joint Team on designing ITER to meet safety requirements for siting by any of the Parties. The first of the three key technical challenges is plasma engineering -- burn control, plasma shutdown, disruptions, tritium burn fraction, and steady state operation. The second is the divertor, including tritium inventory, activation hazards, chemical reactions, and coolant disturbances. The third technical challenge is optimization of design requirements considering safety risk, technical risk, and cost. Some design requirements are now too strict; some are too lax. Fuel cycle design requirements are presently too strict, mandating inappropriate T separation from H and D. Heat sink requirements are presently too lax; they should be strengthened to ensure that maximum loss of coolant accident temperatures drop
Parallel plasma fluid turbulence calculations
International Nuclear Information System (INIS)
Leboeuf, J.N.; Carreras, B.A.; Charlton, L.A.; Drake, J.B.; Lynch, V.E.; Newman, D.E.; Sidikman, K.L.; Spong, D.A.
1994-01-01
The study of plasma turbulence and transport is a complex problem of critical importance for fusion-relevant plasmas. To this day, the fluid treatment of plasma dynamics is the best approach to realistic physics at the high resolution required for certain experimentally relevant calculations. Core and edge turbulence in a magnetic fusion device have been modeled using state-of-the-art, nonlinear, three-dimensional, initial-value fluid and gyrofluid codes. Parallel implementation of these models on diverse platforms--vector parallel (National Energy Research Supercomputer Center's CRAY Y-MP C90), massively parallel (Intel Paragon XP/S 35), and serial parallel (clusters of high-performance workstations using the Parallel Virtual Machine protocol)--offers a variety of paths to high resolution and significant improvements in real-time efficiency, each with its own advantages. The largest and most efficient calculations have been performed at the 200 Mword memory limit on the C90 in dedicated mode, where an overlap of 12 to 13 out of a maximum of 16 processors has been achieved with a gyrofluid model of core fluctuations. The richness of the physics captured by these calculations is commensurate with the increased resolution and efficiency and is limited only by the ingenuity brought to the analysis of the massive amounts of data generated
Massively Parallel Finite Element Programming
Heister, Timo
2010-01-01
Today\\'s large finite element simulations require parallel algorithms to scale on clusters with thousands or tens of thousands of processor cores. We present data structures and algorithms to take advantage of the power of high performance computers in generic finite element codes. Existing generic finite element libraries often restrict the parallelization to parallel linear algebra routines. This is a limiting factor when solving on more than a few hundreds of cores. We describe routines for distributed storage of all major components coupled with efficient, scalable algorithms. We give an overview of our effort to enable the modern and generic finite element library deal.II to take advantage of the power of large clusters. In particular, we describe the construction of a distributed mesh and develop algorithms to fully parallelize the finite element calculation. Numerical results demonstrate good scalability. © 2010 Springer-Verlag.
A parallel algorithm for the non-symmetric eigenvalue problem
International Nuclear Information System (INIS)
Sidani, M.M.
1991-01-01
An algorithm is presented for the solution of the non-symmetric eigenvalue problem. The algorithm is based on a divide-and-conquer procedure that provides initial approximations to the eigenpairs, which are then refined using Newton iterations. Since the smaller subproblems can be solved independently, and since Newton iterations with different initial guesses can be started simultaneously, the algorithm - unlike the standard QR method - is ideal for parallel computers. The author also reports on his investigation of deflation methods designed to obtain further eigenpairs if needed. Numerical results from implementations on a host of parallel machines (distributed and shared-memory) are presented
A Framework on Causes and Effects of Design Iterations
Mujumdar, Purva; Matsagar, Vasant
2017-06-01
Design is an evolutionary process that involves several teams collaborating with each other to achieve the final design solution. Due to an intrinsic association of teams participating during design stages, frequent exchanges of information take place among them to execute various design activities. These information exchanges occur in cycles/loops where the design is improved/refined at each information exchange. This process is referred as design iteration. Iteration is one of the most complex and unavoidable phenomenon of a design process and they last until the specifications and design requirements are met. Hence, design teams need to acquire a thorough understanding of the factors that cause iterations. As the causes of iterations have not been identified in a classified manner, a causes and effects framework of design iteration is presented in this paper. The methodology adopted to develop this framework was detailed literature review and interactions done with industry experts. This framework guides project planners to identify the possible causes of iterations and enables them to plan their design projects efficiently.
Fusion Power measurement at ITER
International Nuclear Information System (INIS)
Bertalot, L.; Barnsley, R.; Krasilnikov, V.; Stott, P.; Suarez, A.; Vayakis, G.; Walsh, M.
2015-01-01
Nuclear fusion research aims to provide energy for the future in a sustainable way and the ITER project scope is to demonstrate the feasibility of nuclear fusion energy. ITER is a nuclear experimental reactor based on a large scale fusion plasma (tokamak type) device generating Deuterium - Tritium (DT) fusion reactions with emission of 14 MeV neutrons producing up to 700 MW fusion power. The measurement of fusion power, i.e. total neutron emissivity, will play an important role for achieving ITER goals, in particular the fusion gain factor Q related to the reactor performance. Particular attention is given also to the development of the neutron calibration strategy whose main scope is to achieve the required accuracy of 10% for the measurement of fusion power. Neutron Flux Monitors located in diagnostic ports and inside the vacuum vessel will measure ITER total neutron emissivity, expected to range from 1014 n/s in Deuterium - Deuterium (DD) plasmas up to almost 10 21 n/s in DT plasmas. The neutron detection systems as well all other ITER diagnostics have to withstand high nuclear radiation and electromagnetic fields as well ultrahigh vacuum and thermal loads. (authors)
ITER safety and licensing update
Energy Technology Data Exchange (ETDEWEB)
Taylor, Neill, E-mail: neill.taylor@iter.org [ITER Organization, Route de Vinon sur Verdon, 13115 Saint Paul Lez Durance (France); Ciattaglia, Sergio; Cortes, Pierre; Iseli, Markus; Rosanvallon, Sandrine; Topilski, Leonid [ITER Organization, Route de Vinon sur Verdon, 13115 Saint Paul Lez Durance (France)
2012-08-15
Highlights: Black-Right-Pointing-Pointer The ITER preliminary safety report has been submitted to the French nuclear regulator. Black-Right-Pointing-Pointer Safety analyses have shown how the key safety functions will be achieved. Black-Right-Pointing-Pointer The design contains multiple provisions for the confinement of radioactive material. Black-Right-Pointing-Pointer Analyses have addressed external hazards (e.g. earthquake) and loss of all power. - Abstract: Safety files were submitted by the ITER Organization to the French nuclear safety authorities in March 2010 as a part of the licensing process. These included the preliminary safety report (RPrS) which presents the extensive safety analyses performed for ITER. The report has been the subject of examination by the authorities and their advisors, and discussions with them have been held on many topics. In the light of this process, this paper discusses some of the topics that remain prominent in the safety analysis of ITER. In particular, the provision of the two safety functions, confinement of radioactive material and limitation of exposure to radiation, is explained and some of the potential challenges to them are identified. Amongst these are the risks of fire and explosion, and external events such as earthquake and loss of all electric power. Provisions in the ITER design, together with the characteristics of fusion, ensure that a very good safety performance will be achieved.
Diverse Power Iteration Embeddings and Its Applications
Energy Technology Data Exchange (ETDEWEB)
Huang H.; Yoo S.; Yu, D.; Qin, H.
2014-12-14
Abstract—Spectral Embedding is one of the most effective dimension reduction algorithms in data mining. However, its computation complexity has to be mitigated in order to apply it for real-world large scale data analysis. Many researches have been focusing on developing approximate spectral embeddings which are more efficient, but meanwhile far less effective. This paper proposes Diverse Power Iteration Embeddings (DPIE), which not only retains the similar efficiency of power iteration methods but also produces a series of diverse and more effective embedding vectors. We test this novel method by applying it to various data mining applications (e.g. clustering, anomaly detection and feature selection) and evaluating their performance improvements. The experimental results show our proposed DPIE is more effective than popular spectral approximation methods, and obtains the similar quality of classic spectral embedding derived from eigen-decompositions. Moreover it is extremely fast on big data applications. For example in terms of clustering result, DPIE achieves as good as 95% of classic spectral clustering on the complex datasets but 4000+ times faster in limited memory environment.
ITER ITA newsletter. No. 1, February 2003
International Nuclear Information System (INIS)
2003-04-01
This first issue of ITER ITA (ITER transitional Arrangements) newsletter contains concise information about ITER related meetings including eighth ITER Negotiations meeting, held on 18-19 February, 2003 in St. Petersburg, Russia, first meeting of the ITER preparatory committee, held on 17 February, 2003 in St. Petersburg, Russia and the third meeting of the ITPA (International Tokamak Physics Activity) coordinating committee, held on 24-25 October 2002 at the Max-Planck-Institut fuer Plasmaphysik, Garching
ITER ITA newsletter. Special issue - December 2006
International Nuclear Information System (INIS)
2006-12-01
This issue of ITER ITA (ITER transitional arrangements) newsletter contains information about signing ITER Agreement, which took place on 21 November 2006 in Paris, France. It was great day for fusion research as Ministers from the seven ITER Parties in the presence of President Jacques Chirac and President of European Commission Jose Barroso and some 400 invited guests signed the Agreement setting up the ITER International Fusion Energy Organization. This issues contains the speeches, statements and remarks of Presidents and Ministers
ITER ITA newsletter. No. 11, December 2003
International Nuclear Information System (INIS)
2003-12-01
This issue of the ITER ITA (ITER transitional Arrangements) newsletter contains concise information about ITER including information from the editor about ITER update, about progress in ITER magnet design and preparation of procurement packages and about 25th anniversary of the First Steering Committee Meeting of the International Tokamak Reactor (INTOR) Workshop, organized under the auspices of the IAEA, took place at the IAEA Headquarters in Vienna
ITER EDA newsletter. V. 5, no. 7
International Nuclear Information System (INIS)
1996-07-01
This issue of the Newsletter on the Engineering Design Activities (EDA) for the ITER Tokamak project contains a report on the Tenth ITER Council Meeting, held July 24-25, 1996, in St. Petersburg, Russia; a description of the Status of the ITER EDA by the ITER Director, Dr. R. Aymar; and a report on the so-called Task Number One by the ITER Special Working Group (Basis for the Start of Explorations, presenting possible scenarios toward siting, licensing and host support)
ITER ITA newsletter No. 33, August-September-October 2006
International Nuclear Information System (INIS)
2006-11-01
This issue of ITER ITA (ITER transitional arrangements) newsletter contains concise information about ITER related events such as public debate on ITER in Provence and fiftieth annual General Conference of the IAEA. Eight ITER related statements were made during Conference
Iterative estimation of the background in noisy spectroscopic data
Energy Technology Data Exchange (ETDEWEB)
Zhu, M.H. [Space Exploration Laboratory, Macao University of Science and Technology, Taipa (Macao)], E-mail: peter_zu@163.com; Liu, L.G.; Cheng, Y.S.; Dong, T.K.; You, Z.; Xu, A.A. [Space Exploration Laboratory, Macao University of Science and Technology, Taipa (Macao)
2009-04-21
In this paper, we present an iterative filtering method to estimate the background of noisy spectroscopic data. The proposed method avoids the calculation of the average full width at half maximum (FWHM) of the whole spectrum and the peak regions, and it can estimate the background efficiently, especially for spectroscopic data with the Compton continuum.
A novel iterative energy calibration method for composite germanium detectors
International Nuclear Information System (INIS)
Pattabiraman, N.S.; Chintalapudi, S.N.; Ghugre, S.S.
2004-01-01
An automatic method for energy calibration of the observed experimental spectrum has been developed. The method presented is based on an iterative algorithm and presents an efficient way to perform energy calibrations after establishing the weights of the calibration data. An application of this novel technique for data acquired using composite detectors in an in-beam γ-ray spectroscopy experiment is presented
A novel iterative energy calibration method for composite germanium detectors
Energy Technology Data Exchange (ETDEWEB)
Pattabiraman, N.S.; Chintalapudi, S.N.; Ghugre, S.S. E-mail: ssg@alpha.iuc.res.in
2004-07-01
An automatic method for energy calibration of the observed experimental spectrum has been developed. The method presented is based on an iterative algorithm and presents an efficient way to perform energy calibrations after establishing the weights of the calibration data. An application of this novel technique for data acquired using composite detectors in an in-beam {gamma}-ray spectroscopy experiment is presented.
Iterated unscented Kalman filter for phase unwrapping of interferometric fringes.
Xie, Xianming
2016-08-22
A fresh phase unwrapping algorithm based on iterated unscented Kalman filter is proposed to estimate unambiguous unwrapped phase of interferometric fringes. This method is the result of combining an iterated unscented Kalman filter with a robust phase gradient estimator based on amended matrix pencil model, and an efficient quality-guided strategy based on heap sort. The iterated unscented Kalman filter that is one of the most robust methods under the Bayesian theorem frame in non-linear signal processing so far, is applied to perform simultaneously noise suppression and phase unwrapping of interferometric fringes for the first time, which can simplify the complexity and the difficulty of pre-filtering procedure followed by phase unwrapping procedure, and even can remove the pre-filtering procedure. The robust phase gradient estimator is used to efficiently and accurately obtain phase gradient information from interferometric fringes, which is needed for the iterated unscented Kalman filtering phase unwrapping model. The efficient quality-guided strategy is able to ensure that the proposed method fast unwraps wrapped pixels along the path from the high-quality area to the low-quality area of wrapped phase images, which can greatly improve the efficiency of phase unwrapping. Results obtained from synthetic data and real data show that the proposed method can obtain better solutions with an acceptable time consumption, with respect to some of the most used algorithms.
A Fast Iterative Bayesian Inference Algorithm for Sparse Channel Estimation
DEFF Research Database (Denmark)
Pedersen, Niels Lovmand; Manchón, Carles Navarro; Fleury, Bernard Henri
2013-01-01
representation of the Bessel K probability density function; a highly efficient, fast iterative Bayesian inference method is then applied to the proposed model. The resulting estimator outperforms other state-of-the-art Bayesian and non-Bayesian estimators, either by yielding lower mean squared estimation error...
Unified Lambert Tool for Massively Parallel Applications in Space Situational Awareness
Woollands, Robyn M.; Read, Julie; Hernandez, Kevin; Probe, Austin; Junkins, John L.
2017-11-01
This paper introduces a parallel-compiled tool that combines several of our recently developed methods for solving the perturbed Lambert problem using modified Chebyshev-Picard iteration. This tool (unified Lambert tool) consists of four individual algorithms, each of which is unique and better suited for solving a particular type of orbit transfer. The first is a Keplerian Lambert solver, which is used to provide a good initial guess (warm start) for solving the perturbed problem. It is also used to determine the appropriate algorithm to call for solving the perturbed problem. The arc length or true anomaly angle spanned by the transfer trajectory is the parameter that governs the automated selection of the appropriate perturbed algorithm, and is based on the respective algorithm convergence characteristics. The second algorithm solves the perturbed Lambert problem using the modified Chebyshev-Picard iteration two-point boundary value solver. This algorithm does not require a Newton-like shooting method and is the most efficient of the perturbed solvers presented herein, however the domain of convergence is limited to about a third of an orbit and is dependent on eccentricity. The third algorithm extends the domain of convergence of the modified Chebyshev-Picard iteration two-point boundary value solver to about 90% of an orbit, through regularization with the Kustaanheimo-Stiefel transformation. This is the second most efficient of the perturbed set of algorithms. The fourth algorithm uses the method of particular solutions and the modified Chebyshev-Picard iteration initial value solver for solving multiple revolution perturbed transfers. This method does require "shooting" but differs from Newton-like shooting methods in that it does not require propagation of a state transition matrix. The unified Lambert tool makes use of the General Mission Analysis Tool and we use it to compute thousands of perturbed Lambert trajectories in parallel on the Space Situational
Unified Lambert Tool for Massively Parallel Applications in Space Situational Awareness
Woollands, Robyn M.; Read, Julie; Hernandez, Kevin; Probe, Austin; Junkins, John L.
2018-03-01
This paper introduces a parallel-compiled tool that combines several of our recently developed methods for solving the perturbed Lambert problem using modified Chebyshev-Picard iteration. This tool (unified Lambert tool) consists of four individual algorithms, each of which is unique and better suited for solving a particular type of orbit transfer. The first is a Keplerian Lambert solver, which is used to provide a good initial guess (warm start) for solving the perturbed problem. It is also used to determine the appropriate algorithm to call for solving the perturbed problem. The arc length or true anomaly angle spanned by the transfer trajectory is the parameter that governs the automated selection of the appropriate perturbed algorithm, and is based on the respective algorithm convergence characteristics. The second algorithm solves the perturbed Lambert problem using the modified Chebyshev-Picard iteration two-point boundary value solver. This algorithm does not require a Newton-like shooting method and is the most efficient of the perturbed solvers presented herein, however the domain of convergence is limited to about a third of an orbit and is dependent on eccentricity. The third algorithm extends the domain of convergence of the modified Chebyshev-Picard iteration two-point boundary value solver to about 90% of an orbit, through regularization with the Kustaanheimo-Stiefel transformation. This is the second most efficient of the perturbed set of algorithms. The fourth algorithm uses the method of particular solutions and the modified Chebyshev-Picard iteration initial value solver for solving multiple revolution perturbed transfers. This method does require "shooting" but differs from Newton-like shooting methods in that it does not require propagation of a state transition matrix. The unified Lambert tool makes use of the General Mission Analysis Tool and we use it to compute thousands of perturbed Lambert trajectories in parallel on the Space Situational
Establishment of ITER: Relevant documents
International Nuclear Information System (INIS)
1988-01-01
At the Geneva Summit Meeting in November, 1985, a proposal was made by the Soviet Union to build a next-generation tokamak experiment on a collaborative basis involving the world's four major fusion blocks. In October, 1986, after consulting with Japan and the European Community, the United States responded with a proposal on how to implement such an activity. Ensuing diplomatic and technical discussions resulted in the establishment, under the auspices of the IAEA, of the International Thermonuclear Experimental Reactor Conceptual Design Activities. This tome represents a collection of all documents relating to the establishment of ITER, beginning with the initial meeting of the ITER Quadripartite Initiative Committee in Vienna on 15-16 March, 1987, through the meeting of the Provisional ITER Council, also in Vienna, on 8-9 February, 1988
Remote maintenance development for ITER
International Nuclear Information System (INIS)
Tada, Eisuke; Shibanuma, Kiyoshi
1997-01-01
This paper both describes the overall design concept of the ITER remote maintenance system, which has been developed mainly for use with in-vessel components such as divertor and blanket, and outlines of the ITER R and D program, which has been established to develop remote handling equipment/tools and radiation hard components. In ITER, the reactor structures inside cryostat have to be maintained remotely because of activation due to DT operation. Therefore, remote-handling technology is fundamental, and the reactor-structure design must be made consistent with remote maintainability. The overall maintenance scenario and design concepts of the required remote handling equipment/tools have been developed according to their maintenance classification. Technologies are also being developed to verify the feasibility of the maintenance design and include fabrication and testing of a fullscale remote-handling equipment/tools for in-vessel maintenance. (author)
The ITER project technological challenges
CERN. Geneva; Lister, Joseph; Marquina, Miguel A; Todesco, Ezio
2005-01-01
The first lecture reminds us of the ITER challenges, presents hard engineering problems, typically due to mechanical forces and thermal loads and identifies where the physics uncertainties play a significant role in the engineering requirements. The second lecture presents soft engineering problems of measuring the plasma parameters, feedback control of the plasma and handling the physics data flow and slow controls data flow from a large experiment like ITER. The last three lectures focus on superconductors for fusion. The third lecture reviews the design criteria and manufacturing methods for 6 milestone-conductors of large fusion devices (T-7, T-15, Tore Supra, LHD, W-7X, ITER). The evolution of the designer approach and the available technologies are critically discussed. The fourth lecture is devoted to the issue of performance prediction, from a superconducting wire to a large size conductor. The role of scaling laws, self-field, current distribution, voltage-current characteristic and transposition are...
The danger of iteration methods
International Nuclear Information System (INIS)
Villain, J.; Semeria, B.
1983-01-01
When a Hamiltonian H depends on variables phisub(i), the values of these variables which minimize H satisfy the equations deltaH/deltaphisub(i) = O. If this set of equations is solved by iteration, there is no guarantee that the solution is the one which minimizes H. In the case of a harmonic system with a random potential periodic with respect to the phisub(i)'s, the fluctuations have been calculated by Efetov and Larkin by means of the iteration method. The result is wrong in the case of a strong disorder. Even in the weak disorder case, it is wrong for a one-dimensional system and for a finite system of 2 particles. It is argued that the results obtained by iteration are always wrong, and that between 2 and 4 dimensions, spin-pair correlation functions decay like powers of the distance, as found by Aharony and Pytte for another model
Spectrally Compatible Iterative Water Filling
Verlinden, Jan; Bogaert, Etienne Vanden; Bostoen, Tom; Zanier, Francesca; Luise, Marco; Cendrillon, Raphael; Moonen, Marc
2006-12-01
Until now static spectrum management has ensured that DSL lines in the same cable are spectrally compatible under worst-case crosstalk conditions. Recently dynamic spectrum management (DSM) has been proposed aiming at an increased capacity utilization by adaptation of the transmit spectra of DSL lines to the actual crosstalk interference. In this paper, a new DSM method for downstream ADSL is derived from the well-known iterative water-filling (IWF) algorithm. The amount of boosting of this new DSM method is limited, such that it is spectrally compatible with ADSL. Hence it is referred to as spectrally compatible iterative water filling (SC-IWF). This paper focuses on the performance gains of SC-IWF. This method is an autonomous DSM method (DSM level 1) and it will be investigated together with two other DSM level-1 algorithms, under various noise conditions, namely, iterative water-filling algorithm, and flat power back-off (flat PBO).
Spectrally Compatible Iterative Water Filling
Directory of Open Access Journals (Sweden)
Cendrillon Raphael
2006-01-01
Full Text Available Until now static spectrum management has ensured that DSL lines in the same cable are spectrally compatible under worst-case crosstalk conditions. Recently dynamic spectrum management (DSM has been proposed aiming at an increased capacity utilization by adaptation of the transmit spectra of DSL lines to the actual crosstalk interference. In this paper, a new DSM method for downstream ADSL is derived from the well-known iterative water-filling (IWF algorithm. The amount of boosting of this new DSM method is limited, such that it is spectrally compatible with ADSL. Hence it is referred to as spectrally compatible iterative water filling (SC-IWF. This paper focuses on the performance gains of SC-IWF. This method is an autonomous DSM method (DSM level 1 and it will be investigated together with two other DSM level-1 algorithms, under various noise conditions, namely, iterative water-filling algorithm, and flat power back-off (flat PBO.
ITER ITA newsletter No. 31, June 2006
International Nuclear Information System (INIS)
2006-07-01
This issue of ITER ITA (ITER transitional Arrangements) newsletter contains concise information about initialling the ITER Agreement and its related instruments by seven ITER parties, which too place in Brussels on 24 May 2006. The initialling constituted the final act of the ITER negotiations. It confirmed the Parties' common acceptance of the negotiated texts, ad referendum, and signalled their intentions to move forward towards the entry into force of the ITER Agreement as soon as possible. 'ITER - Uniting science today, global energy tomorrow' was the theme of a number of media events timed to accompany a remarkable day in the history of the ITER international venture, May 24th 2006, initialling of the ITER international agreement
A survey of parallel multigrid algorithms
Chan, Tony F.; Tuminaro, Ray S.
1987-01-01
A typical multigrid algorithm applied to well-behaved linear-elliptic partial-differential equations (PDEs) is described. Criteria for designing and evaluating parallel algorithms are presented. Before evaluating the performance of some parallel multigrid algorithms, consideration is given to some theoretical complexity results for solving PDEs in parallel and for executing the multigrid algorithm. The effect of mapping and load imbalance on the partial efficiency of the algorithm is studied.
Experiences in Data-Parallel Programming
Directory of Open Access Journals (Sweden)
Terry W. Clark
1997-01-01
Full Text Available To efficiently parallelize a scientific application with a data-parallel compiler requires certain structural properties in the source program, and conversely, the absence of others. A recent parallelization effort of ours reinforced this observation and motivated this correspondence. Specifically, we have transformed a Fortran 77 version of GROMOS, a popular dusty-deck program for molecular dynamics, into Fortran D, a data-parallel dialect of Fortran. During this transformation we have encountered a number of difficulties that probably are neither limited to this particular application nor do they seem likely to be addressed by improved compiler technology in the near future. Our experience with GROMOS suggests a number of points to keep in mind when developing software that may at some time in its life cycle be parallelized with a data-parallel compiler. This note presents some guidelines for engineering data-parallel applications that are compatible with Fortran D or High Performance Fortran compilers.
GPU-Accelerated Asynchronous Error Correction for Mixed Precision Iterative Refinement
Energy Technology Data Exchange (ETDEWEB)
Antz, Hartwig [Karlsruhe Inst. of Technology (KIT) (Germany); Luszczek, Piotr [Univ. of Tennessee, Knoxville, TN (United States); Dongarra, Jack [Univ. of Tennessee, Knoxville, TN (United States); Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States); Univ. of Manchester (United Kingdom); Heuveline, Vinent [Karlsruhe Inst. of Technology (KIT) (Germany)
2011-12-14
In hardware-aware high performance computing, block- asynchronous iteration and mixed precision iterative refinement are two techniques that are applied to leverage the computing power of SIMD accelerators like GPUs. Although they use a very different approach for this purpose, they share the basic idea of compensating the convergence behaviour of an inferior numerical algorithm by a more efficient usage of the provided computing power. In this paper, we want to analyze the potential of combining both techniques. Therefore, we implement a mixed precision iterative refinement algorithm using a block-asynchronous iteration as an error correction solver, and compare its performance with a pure implementation of a block-asynchronous iteration and an iterative refinement method using double precision for the error correction solver. For matrices from theUniversity of FloridaMatrix collection,we report the convergence behaviour and provide the total solver runtime using different GPU architectures.
Sparse electromagnetic imaging using nonlinear iterative shrinkage thresholding
Desmal, Abdulla
2015-04-13
A sparse nonlinear electromagnetic imaging scheme is proposed for reconstructing dielectric contrast of investigation domains from measured fields. The proposed approach constructs the optimization problem by introducing the sparsity constraint to the data misfit between the scattered fields expressed as a nonlinear function of the contrast and the measured fields and solves it using the nonlinear iterative shrinkage thresholding algorithm. The thresholding is applied to the result of every nonlinear Landweber iteration to enforce the sparsity constraint. Numerical results demonstrate the accuracy and efficiency of the proposed method in reconstructing sparse dielectric profiles.
A Nickel Coating Removal Process for ITER Superconducting Cables
Zhou, Nengtao; Wu, Jiefeng; Zhang, Ping; Wang, Xueqin; Lu, Yu
2014-05-01
A new method is developed for removing the nickel coating on ITER superconducting cables by mechanical polishing. The obvious advantage of the mechanical method, which uses a nylon brush, is that there is no chemical residual left in the cable, which would otherwise result in passive effects on the joint resistance. The coating resistance test results of this newly developed method are compared with those of the two other methods that can meet the requirements of ITER. An automatic polishing machine is designed and manufactured for the procedure to provide quality under precise control. This new technique can replace the conventional manual method due to its improved efficiency.
Experimental test campaign on an ITER divertor mock-up
International Nuclear Information System (INIS)
Dell'Orco, G.; Malavasi, A.; Merola, M.; Polazzi, G.; Simoncini, M.; Zito, D.
2002-01-01
In 1998, in the frame of the European R and D on ITER high heat flux components, the fabrication of a full scale ITER Divertor Outboard mock-up was launched. It comprised a Cassette Body (CB), designed with some mechanical and hydraulic simplifications with respect to the reference body and its actively cooled Dummy Armour Prototype (DAP). This DAP consists of a Vertical Target (VT), a Wing (WI) and a Dump Target (DT), manufactured by European industries, which are integrated to the Gas Box Liner (GBL) supplied by the Russian Federation ITER Home Team. In 1999, in parallel with the manufacturing activity, the ITER European Home Team decided to assign to ENEA a Task for checking the component integration and performing the thermal-hydraulic and thermal mechanical testing of the DAP and CB. In 1999-2000, ENEA performed the experimental campaign at Brasimone Labs. The present work presents the experimental results of the component integration and the thermal-hydraulic and thermo-mechanical fatigue tests
Parallel Programming with Intel Parallel Studio XE
Blair-Chappell , Stephen
2012-01-01
Optimize code for multi-core processors with Intel's Parallel Studio Parallel programming is rapidly becoming a "must-know" skill for developers. Yet, where to start? This teach-yourself tutorial is an ideal starting point for developers who already know Windows C and C++ and are eager to add parallelism to their code. With a focus on applying tools, techniques, and language extensions to implement parallelism, this essential resource teaches you how to write programs for multicore and leverage the power of multicore in your programs. Sharing hands-on case studies and real-world examples, the
International Nuclear Information System (INIS)
Aymar, R.
1999-01-01
This article summarises progress made in the ITER Design Activities between October 1998 and February 1999. The three main focusses of the activity were on design work, on R and D work and on the physics basis. The consequences of diminishing financial funds and personnel are discussed and the state of the individual R and D projects is given briefly
Informal meeting on ITER developments
International Nuclear Information System (INIS)
Canobbio, E.
2000-01-01
The International Fusion Research Council (IFRC), advisory body of the IAEA, organized an informal meeting on the general status and outlook for ITER, held October 9 at Sorrento, Italy, in conjunction with the 18th IAEA Fusion Energy Conference. This article describes the main events at the meeting
Iterative Specialisation of Horn Clauses
DEFF Research Database (Denmark)
Nielsen, Christoffer Rosenkilde; Nielson, Flemming; Nielson, Hanne Riis
2008-01-01
We present a generic algorithm for solving Horn clauses through iterative specialisation. The algorithm is generic in the sense that it can be instantiated with any decidable fragment of Horn clauses, resulting in a solution scheme for general Horn clauses that guarantees soundness and termination...