Parallel time domain solvers for electrically large transient scattering problems
Liu, Yang
2014-09-26
Marching on in time (MOT)-based integral equation solvers represent an increasingly appealing avenue for analyzing transient electromagnetic interactions with large and complex structures. MOT integral equation solvers for analyzing electromagnetic scattering from perfect electrically conducting objects are obtained by enforcing electric field boundary conditions and implicitly time advance electric surface current densities by iteratively solving sparse systems of equations at all time steps. Contrary to finite difference and element competitors, these solvers apply to nonlinear and multi-scale structures comprising geometrically intricate and deep sub-wavelength features residing atop electrically large platforms. Moreover, they are high-order accurate, stable in the low- and high-frequency limits, and applicable to conducting and penetrable structures represented by highly irregular meshes. This presentation reviews some recent advances in the parallel implementations of time domain integral equation solvers, specifically those that leverage multilevel plane-wave time-domain algorithm (PWTD) on modern manycore computer architectures including graphics processing units (GPUs) and distributed memory supercomputers. The GPU-based implementation achieves at least one order of magnitude speedups compared to serial implementations while the distributed parallel implementation are highly scalable to thousands of compute-nodes. A distributed parallel PWTD kernel has been adopted to solve time domain surface/volume integral equations (TDSIE/TDVIE) for analyzing transient scattering from large and complex-shaped perfectly electrically conducting (PEC)/dielectric objects involving ten million/tens of millions of spatial unknowns.
Solving Large Quadratic|Assignment Problems in Parallel
DEFF Research Database (Denmark)
Clausen, Jens; Perregaard, Michael
1997-01-01
and recalculation of bounds between branchings when used in a parallel Branch-and-Bound algorithm. The algorithm has been implemented on a 16-processor MEIKO Computing Surface with Intel i860 processors. Computational results from the solution of a number of large QAPs, including the classical Nugent 20...... processors, and have hence not been ideally suited for computations essentially involving non-vectorizable computations on integers.In this paper we investigate the combination of one of the best bound functions for a Branch-and-Bound algorithm (the Gilmore-Lawler bound) and various testing, variable binding...
Large-Scale Parallel Finite Element Analysis of the Stress Singular Problems
International Nuclear Information System (INIS)
Noriyuki Kushida; Hiroshi Okuda; Genki Yagawa
2002-01-01
In this paper, the convergence behavior of large-scale parallel finite element method for the stress singular problems was investigated. The convergence behavior of iterative solvers depends on the efficiency of the pre-conditioners. However, efficiency of pre-conditioners may be influenced by the domain decomposition that is necessary for parallel FEM. In this study the following results were obtained: Conjugate gradient method without preconditioning and the diagonal scaling preconditioned conjugate gradient method were not influenced by the domain decomposition as expected. symmetric successive over relaxation method preconditioned conjugate gradient method converged 6% faster as maximum if the stress singular area was contained in one sub-domain. (authors)
Energy Technology Data Exchange (ETDEWEB)
Carey, G.F.; Young, D.M.
1993-12-31
The program outlined here is directed to research on methods, algorithms, and software for distributed parallel supercomputers. Of particular interest are finite element methods and finite difference methods together with sparse iterative solution schemes for scientific and engineering computations of very large-scale systems. Both linear and nonlinear problems will be investigated. In the nonlinear case, applications with bifurcation to multiple solutions will be considered using continuation strategies. The parallelizable numerical methods of particular interest are a family of partitioning schemes embracing domain decomposition, element-by-element strategies, and multi-level techniques. The methods will be further developed incorporating parallel iterative solution algorithms with associated preconditioners in parallel computer software. The schemes will be implemented on distributed memory parallel architectures such as the CRAY MPP, Intel Paragon, the NCUBE3, and the Connection Machine. We will also consider other new architectures such as the Kendall-Square (KSQ) and proposed machines such as the TERA. The applications will focus on large-scale three-dimensional nonlinear flow and reservoir problems with strong convective transport contributions. These are legitimate grand challenge class computational fluid dynamics (CFD) problems of significant practical interest to DOE. The methods developed and algorithms will, however, be of wider interest.
Parallel Optimization of Polynomials for Large-scale Problems in Stability and Control
Kamyar, Reza
In this thesis, we focus on some of the NP-hard problems in control theory. Thanks to the converse Lyapunov theory, these problems can often be modeled as optimization over polynomials. To avoid the problem of intractability, we establish a trade off between accuracy and complexity. In particular, we develop a sequence of tractable optimization problems --- in the form of Linear Programs (LPs) and/or Semi-Definite Programs (SDPs) --- whose solutions converge to the exact solution of the NP-hard problem. However, the computational and memory complexity of these LPs and SDPs grow exponentially with the progress of the sequence - meaning that improving the accuracy of the solutions requires solving SDPs with tens of thousands of decision variables and constraints. Setting up and solving such problems is a significant challenge. The existing optimization algorithms and software are only designed to use desktop computers or small cluster computers --- machines which do not have sufficient memory for solving such large SDPs. Moreover, the speed-up of these algorithms does not scale beyond dozens of processors. This in fact is the reason we seek parallel algorithms for setting-up and solving large SDPs on large cluster- and/or super-computers. We propose parallel algorithms for stability analysis of two classes of systems: 1) Linear systems with a large number of uncertain parameters; 2) Nonlinear systems defined by polynomial vector fields. First, we develop a distributed parallel algorithm which applies Polya's and/or Handelman's theorems to some variants of parameter-dependent Lyapunov inequalities with parameters defined over the standard simplex. The result is a sequence of SDPs which possess a block-diagonal structure. We then develop a parallel SDP solver which exploits this structure in order to map the computation, memory and communication to a distributed parallel environment. Numerical tests on a supercomputer demonstrate the ability of the algorithm to
Decomposition and parallelization strategies for solving large-scale MDO problems
Energy Technology Data Exchange (ETDEWEB)
Grauer, M.; Eschenauer, H.A. [Research Center for Multidisciplinary Analyses and Applied Structural Optimization, FOMAAS, Univ. of Siegen (Germany)
2007-07-01
During previous years, structural optimization has been recognized as a useful tool within the discriptiones of engineering and economics. However, the optimization of large-scale systems or structures is impeded by an immense solution effort. This was the reason to start a joint research and development (R and D) project between the Institute of Mechanics and Control Engineering and the Information and Decision Sciences Institute within the Research Center for Multidisciplinary Analyses and Applied Structural Optimization (FOMAAS) on cluster computing for parallel and distributed solution of multidisciplinary optimization (MDO) problems based on the OpTiX-Workbench. Here the focus of attention will be put on coarsegrained parallelization and its implementation on clusters of workstations. A further point of emphasis was laid on the development of a parallel decomposition strategy called PARDEC, for the solution of very complex optimization problems which cannot be solved efficiently by sequential integrated optimization. The use of the OptiX-Workbench together with the FEM ground water simulation system FEFLOW is shown for a special water management problem. (orig.)
Efficient numerical methods for the large-scale, parallel solution of elastoplastic contact problems
Frohne, Jörg
2015-08-06
© 2016 John Wiley & Sons, Ltd. Quasi-static elastoplastic contact problems are ubiquitous in many industrial processes and other contexts, and their numerical simulation is consequently of great interest in accurately describing and optimizing production processes. The key component in these simulations is the solution of a single load step of a time iteration. From a mathematical perspective, the problems to be solved in each time step are characterized by the difficulties of variational inequalities for both the plastic behavior and the contact problem. Computationally, they also often lead to very large problems. In this paper, we present and evaluate a complete set of methods that are (1) designed to work well together and (2) allow for the efficient solution of such problems. In particular, we use adaptive finite element meshes with linear and quadratic elements, a Newton linearization of the plasticity, active set methods for the contact problem, and multigrid-preconditioned linear solvers. Through a sequence of numerical experiments, we show the performance of these methods. This includes highly accurate solutions of a three-dimensional benchmark problem and scaling our methods in parallel to 1024 cores and more than a billion unknowns.
Efficient numerical methods for the large-scale, parallel solution of elastoplastic contact problems
Frohne, Jö rg; Heister, Timo; Bangerth, Wolfgang
2015-01-01
© 2016 John Wiley & Sons, Ltd. Quasi-static elastoplastic contact problems are ubiquitous in many industrial processes and other contexts, and their numerical simulation is consequently of great interest in accurately describing and optimizing production processes. The key component in these simulations is the solution of a single load step of a time iteration. From a mathematical perspective, the problems to be solved in each time step are characterized by the difficulties of variational inequalities for both the plastic behavior and the contact problem. Computationally, they also often lead to very large problems. In this paper, we present and evaluate a complete set of methods that are (1) designed to work well together and (2) allow for the efficient solution of such problems. In particular, we use adaptive finite element meshes with linear and quadratic elements, a Newton linearization of the plasticity, active set methods for the contact problem, and multigrid-preconditioned linear solvers. Through a sequence of numerical experiments, we show the performance of these methods. This includes highly accurate solutions of a three-dimensional benchmark problem and scaling our methods in parallel to 1024 cores and more than a billion unknowns.
Liu, Yang
2013-07-01
The computational complexity and memory requirements of multilevel plane wave time domain (PWTD)-accelerated marching-on-in-time (MOT)-based surface integral equation (SIE) solvers scale as O(NtNs(log 2)Ns) and O(Ns 1.5); here N t and Ns denote numbers of temporal and spatial basis functions discretizing the current [Shanker et al., IEEE Trans. Antennas Propag., 51, 628-641, 2003]. In the past, serial versions of these solvers have been successfully applied to the analysis of scattering from perfect electrically conducting as well as homogeneous penetrable targets involving up to Ns ≈ 0.5 × 106 and Nt ≈ 10 3. To solve larger problems, parallel PWTD-enhanced MOT solvers are called for. Even though a simple parallelization strategy was demonstrated in the context of electromagnetic compatibility analysis [M. Lu et al., in Proc. IEEE Int. Symp. AP-S, 4, 4212-4215, 2004], by and large, progress in this area has been slow. The lack of progress can be attributed wholesale to difficulties associated with the construction of a scalable PWTD kernel. © 2013 IEEE.
DEFF Research Database (Denmark)
Bendtsen, Claus; Nielsen, Ole Holm; Hansen, Lars Bruno
2001-01-01
The quantum mechanical ground state of electrons is described by Density Functional Theory, which leads to large minimization problems. An efficient minimization method uses a self-consistent field (SCF) solution of large eigenvalue problems. The iterative Davidson algorithm is often used, and we...
Energy Technology Data Exchange (ETDEWEB)
Moryakov, A. V., E-mail: sailor@orc.ru [National Research Centre Kurchatov Institute (Russian Federation)
2016-12-15
An algorithm for solving the linear Cauchy problem for large systems of ordinary differential equations is presented. The algorithm for systems of first-order differential equations is implemented in the EDELWEISS code with the possibility of parallel computations on supercomputers employing the MPI (Message Passing Interface) standard for the data exchange between parallel processes. The solution is represented by a series of orthogonal polynomials on the interval [0, 1]. The algorithm is characterized by simplicity and the possibility to solve nonlinear problems with a correction of the operator in accordance with the solution obtained in the previous iterative process.
Speedup predictions on large scientific parallel programs
International Nuclear Information System (INIS)
Williams, E.; Bobrowicz, F.
1985-01-01
How much speedup can we expect for large scientific parallel programs running on supercomputers. For insight into this problem we extend the parallel processing environment currently existing on the Cray X-MP (a shared memory multiprocessor with at most four processors) to a simulated N-processor environment, where N greater than or equal to 1. Several large scientific parallel programs from Los Alamos National Laboratory were run in this simulated environment, and speedups were predicted. A speedup of 14.4 on 16 processors was measured for one of the three most used codes at the Laboratory
Vacuum Large Current Parallel Transfer Numerical Analysis
Directory of Open Access Journals (Sweden)
Enyuan Dong
2014-01-01
Full Text Available The stable operation and reliable breaking of large generator current are a difficult problem in power system. It can be solved successfully by the parallel interrupters and proper timing sequence with phase-control technology, in which the strategy of breaker’s control is decided by the time of both the first-opening phase and second-opening phase. The precise transfer current’s model can provide the proper timing sequence to break the generator circuit breaker. By analysis of the transfer current’s experiments and data, the real vacuum arc resistance and precise correctional model in the large transfer current’s process are obtained in this paper. The transfer time calculated by the correctional model of transfer current is very close to the actual transfer time. It can provide guidance for planning proper timing sequence and breaking the vacuum generator circuit breaker with the parallel interrupters.
PALNS - A software framework for parallel large neighborhood search
DEFF Research Database (Denmark)
Røpke, Stefan
2009-01-01
This paper propose a simple, parallel, portable software framework for the metaheuristic named large neighborhood search (LNS). The aim is to provide a framework where the user has to set up a few data structures and implement a few functions and then the framework provides a metaheuristic where ...... parallelization "comes for free". We apply the parallel LNS heuristic to two different problems: the traveling salesman problem with pickup and delivery (TSPPD) and the capacitated vehicle routing problem (CVRP)....
Parallel algorithms for network routing problems and recurrences
International Nuclear Information System (INIS)
Wisniewski, J.A.; Sameh, A.H.
1982-01-01
In this paper, we consider the parallel solution of recurrences, and linear systems in the regular algebra of Carre. These problems are equivalent to solving the shortest path problem in graph theory, and they also arise in the analysis of Fortran programs. Our methods for solving linear systems in the regular algebra are analogues of well-known methods for solving systems of linear algebraic equations. A parallel version of Dijkstra's method, which has no linear algebraic analogue, is presented. Considerations for choosing an algorithm when the problem is large and sparse are also discussed
Parallel algorithms for boundary value problems
Lin, Avi
1991-01-01
A general approach to solve boundary value problems numerically in a parallel environment is discussed. The basic algorithm consists of two steps: the local step where all the P available processors work in parallel, and the global step where one processor solves a tridiagonal linear system of the order P. The main advantages of this approach are twofold. First, this suggested approach is very flexible, especially in the local step and thus the algorithm can be used with any number of processors and with any of the SIMD or MIMD machines. Secondly, the communication complexity is very small and thus can be used as easily with shared memory machines. Several examples for using this strategy are discussed.
Experiments with parallel algorithms for combinatorial problems
G.A.P. Kindervater (Gerard); H.W.J.M. Trienekens
1985-01-01
textabstractIn the last decade many models for parallel computation have been proposed and many parallel algorithms have been developed. However, few of these models have been realized and most of these algorithms are supposed to run on idealized, unrealistic parallel machines. The parallel machines
A Parallel Computational Model for Multichannel Phase Unwrapping Problem
Imperatore, Pasquale; Pepe, Antonio; Lanari, Riccardo
2015-05-01
In this paper, a parallel model for the solution of the computationally intensive multichannel phase unwrapping (MCh-PhU) problem is proposed. Firstly, the Extended Minimum Cost Flow (EMCF) algorithm for solving MCh-PhU problem is revised within the rigorous mathematical framework of the discrete calculus ; thus permitting to capture its topological structure in terms of meaningful discrete differential operators. Secondly, emphasis is placed on those methodological and practical aspects, which lead to a parallel reformulation of the EMCF algorithm. Thus, a novel dual-level parallel computational model, in which the parallelism is hierarchically implemented at two different (i.e., process and thread) levels, is presented. The validity of our approach has been demonstrated through a series of experiments that have revealed a significant speedup. Therefore, the attained high-performance prototype is suitable for the solution of large-scale phase unwrapping problems in reasonable time frames, with a significant impact on the systematic exploitation of the existing, and rapidly growing, large archives of SAR data.
The parallel volume at large distances
DEFF Research Database (Denmark)
Kampf, Jürgen
In this paper we examine the asymptotic behavior of the parallel volume of planar non-convex bodies as the distance tends to infinity. We show that the difference between the parallel volume of the convex hull of a body and the parallel volume of the body itself tends to . This yields a new proof...... for the fact that a planar body can only have polynomial parallel volume, if it is convex. Extensions to Minkowski spaces and random sets are also discussed....
The parallel volume at large distances
DEFF Research Database (Denmark)
Kampf, Jürgen
In this paper we examine the asymptotic behavior of the parallel volume of planar non-convex bodies as the distance tends to infinity. We show that the difference between the parallel volume of the convex hull of a body and the parallel volume of the body itself tends to 0. This yields a new proof...... for the fact that a planar body can only have polynomial parallel volume, if it is convex. Extensions to Minkowski spaces and random sets are also discussed....
Parallel Auxiliary Space AMG Solver for $H(div)$ Problems
Energy Technology Data Exchange (ETDEWEB)
Kolev, Tzanio V. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Vassilevski, Panayot S. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)
2012-12-18
We present a family of scalable preconditioners for matrices arising in the discretization of $H(div)$ problems using the lowest order Raviart--Thomas finite elements. Our approach belongs to the class of “auxiliary space''--based methods and requires only the finite element stiffness matrix plus some minimal additional discretization information about the topology and orientation of mesh entities. Also, we provide a detailed algebraic description of the theory, parallel implementation, and different variants of this parallel auxiliary space divergence solver (ADS) and discuss its relations to the Hiptmair--Xu (HX) auxiliary space decomposition of $H(div)$ [SIAM J. Numer. Anal., 45 (2007), pp. 2483--2509] and to the auxiliary space Maxwell solver AMS [J. Comput. Math., 27 (2009), pp. 604--623]. Finally, an extensive set of numerical experiments demonstrates the robustness and scalability of our implementation on large-scale $H(div)$ problems with large jumps in the material coefficients.
Parallel clustering algorithm for large-scale biological data sets.
Wang, Minchao; Zhang, Wu; Ding, Wang; Dai, Dongbo; Zhang, Huiran; Xie, Hao; Chen, Luonan; Guo, Yike; Xie, Jiang
2014-01-01
Recent explosion of biological data brings a great challenge for the traditional clustering algorithms. With increasing scale of data sets, much larger memory and longer runtime are required for the cluster identification problems. The affinity propagation algorithm outperforms many other classical clustering algorithms and is widely applied into the biological researches. However, the time and space complexity become a great bottleneck when handling the large-scale data sets. Moreover, the similarity matrix, whose constructing procedure takes long runtime, is required before running the affinity propagation algorithm, since the algorithm clusters data sets based on the similarities between data pairs. Two types of parallel architectures are proposed in this paper to accelerate the similarity matrix constructing procedure and the affinity propagation algorithm. The memory-shared architecture is used to construct the similarity matrix, and the distributed system is taken for the affinity propagation algorithm, because of its large memory size and great computing capacity. An appropriate way of data partition and reduction is designed in our method, in order to minimize the global communication cost among processes. A speedup of 100 is gained with 128 cores. The runtime is reduced from serval hours to a few seconds, which indicates that parallel algorithm is capable of handling large-scale data sets effectively. The parallel affinity propagation also achieves a good performance when clustering large-scale gene data (microarray) and detecting families in large protein superfamilies.
A parallel approach to the stable marriage problem
DEFF Research Database (Denmark)
Larsen, Jesper
1997-01-01
This paper describes two parallel algorithms for the stable marriage problem implemented on a MIMD parallel computer. The algorithms are tested against sequential algorithms on randomly generated and worst-case instances. The results clearly show that the combination fo a very simple problem...... and a commercial MIMD system results in parallel algorithms which are not competitive with sequential algorithms wrt. practical performance. 1 Introduction In 1962 the Stable Marriage Problem was....
A novel two-level dynamic parallel data scheme for large 3-D SN calculations
International Nuclear Information System (INIS)
Sjoden, G.E.; Shedlock, D.; Haghighat, A.; Yi, C.
2005-01-01
We introduce a new dynamic parallel memory optimization scheme for executing large scale 3-D discrete ordinates (Sn) simulations on distributed memory parallel computers. In order for parallel transport codes to be truly scalable, they must use parallel data storage, where only the variables that are locally computed are locally stored. Even with parallel data storage for the angular variables, cumulative storage requirements for large discrete ordinates calculations can be prohibitive. To address this problem, Memory Tuning has been implemented into the PENTRAN 3-D parallel discrete ordinates code as an optimized, two-level ('large' array, 'small' array) parallel data storage scheme. Memory Tuning can be described as the process of parallel data memory optimization. Memory Tuning dynamically minimizes the amount of required parallel data in allocated memory on each processor using a statistical sampling algorithm. This algorithm is based on the integral average and standard deviation of the number of fine meshes contained in each coarse mesh in the global problem. Because PENTRAN only stores the locally computed problem phase space, optimal two-level memory assignments can be unique on each node, depending upon the parallel decomposition used (hybrid combinations of angular, energy, or spatial). As demonstrated in the two large discrete ordinates models presented (a storage cask and an OECD MOX Benchmark), Memory Tuning can save a substantial amount of memory per parallel processor, allowing one to accomplish very large scale Sn computations. (authors)
A parallel 2-opt algorithm for the traveling salesman problem
Verhoeven, M.G.A.; Aarts, E.H.L.; Swinkels, P.C.J.
1995-01-01
We present a scalable parallel local search algorithm based on data parallelism. The concept of distributed neighborhood structures is introduced, and applied to the Traveling Salesman Problem (TSP). Our parallel local search algorithm finds the same quality solutions as the classical 2-opt
Performance modeling of parallel algorithms for solving neutron diffusion problems
International Nuclear Information System (INIS)
Azmy, Y.Y.; Kirk, B.L.
1995-01-01
Neutron diffusion calculations are the most common computational methods used in the design, analysis, and operation of nuclear reactors and related activities. Here, mathematical performance models are developed for the parallel algorithm used to solve the neutron diffusion equation on message passing and shared memory multiprocessors represented by the Intel iPSC/860 and the Sequent Balance 8000, respectively. The performance models are validated through several test problems, and these models are used to estimate the performance of each of the two considered architectures in situations typical of practical applications, such as fine meshes and a large number of participating processors. While message passing computers are capable of producing speedup, the parallel efficiency deteriorates rapidly as the number of processors increases. Furthermore, the speedup fails to improve appreciably for massively parallel computers so that only small- to medium-sized message passing multiprocessors offer a reasonable platform for this algorithm. In contrast, the performance model for the shared memory architecture predicts very high efficiency over a wide range of number of processors reasonable for this architecture. Furthermore, the model efficiency of the Sequent remains superior to that of the hypercube if its model parameters are adjusted to make its processors as fast as those of the iPSC/860. It is concluded that shared memory computers are better suited for this parallel algorithm than message passing computers
Large amplitude parallel propagating electromagnetic oscillitons
International Nuclear Information System (INIS)
Cattaert, Tom; Verheest, Frank
2005-01-01
Earlier systematic nonlinear treatments of parallel propagating electromagnetic waves have been given within a fluid dynamic approach, in a frame where the nonlinear structures are stationary and various constraining first integrals can be obtained. This has lead to the concept of oscillitons that has found application in various space plasmas. The present paper differs in three main aspects from the previous studies: first, the invariants are derived in the plasma frame, as customary in the Sagdeev method, thus retaining in Maxwell's equations all possible effects. Second, a single differential equation is obtained for the parallel fluid velocity, in a form reminiscent of the Sagdeev integrals, hence allowing a fully nonlinear discussion of the oscilliton properties, at such amplitudes as the underlying Mach number restrictions allow. Third, the transition to weakly nonlinear whistler oscillitons is done in an analytical rather than a numerical fashion
Approximation algorithms for the parallel flow shop problem
X. Zhang (Xiandong); S.L. van de Velde (Steef)
2012-01-01
textabstractWe consider the NP-hard problem of scheduling n jobs in m two-stage parallel flow shops so as to minimize the makespan. This problem decomposes into two subproblems: assigning the jobs to parallel flow shops; and scheduling the jobs assigned to the same flow shop by use of Johnson's
Reliability allocation problem in a series-parallel system
International Nuclear Information System (INIS)
Yalaoui, Alice; Chu, Chengbin; Chatelet, Eric
2005-01-01
In order to improve system reliability, designers may introduce in a system different technologies in parallel. When each technology is composed of components in series, the configuration belongs to the series-parallel systems. This type of system has not been studied as much as the parallel-series architecture. There exist no methods dedicated to the reliability allocation in series-parallel systems with different technologies. We propose in this paper theoretical and practical results for the allocation problem in a series-parallel system. Two resolution approaches are developed. Firstly, a one stage problem is studied and the results are exploited for the multi-stages problem. A theoretical condition for obtaining the optimal allocation is developed. Since this condition is too restrictive, we secondly propose an alternative approach based on an approximated function and the results of the one-stage study. This second approach is applied to numerical examples
Linux software for large topology optimization problems
DEFF Research Database (Denmark)
evolving product, which allows a parallel solution of the PDE, it lacks the important feature that the matrix-generation part of the computations is localized to each processor. This is well-known to be critical for obtaining a useful speedup on a Linux cluster and it motivates the search for a COMSOL......-like package for large topology optimization problems. One candidate for such software is developed for Linux by Sandia Nat’l Lab in the USA being the Sundance system. Sundance also uses a symbolic representation of the PDE and a scalable numerical solution is achieved by employing the underlying Trilinos...
Topology Optimization of Large Scale Stokes Flow Problems
DEFF Research Database (Denmark)
Aage, Niels; Poulsen, Thomas Harpsøe; Gersborg-Hansen, Allan
2008-01-01
This note considers topology optimization of large scale 2D and 3D Stokes flow problems using parallel computations. We solve problems with up to 1.125.000 elements in 2D and 128.000 elements in 3D on a shared memory computer consisting of Sun UltraSparc IV CPUs.......This note considers topology optimization of large scale 2D and 3D Stokes flow problems using parallel computations. We solve problems with up to 1.125.000 elements in 2D and 128.000 elements in 3D on a shared memory computer consisting of Sun UltraSparc IV CPUs....
Solving the Stokes problem on a massively parallel computer
DEFF Research Database (Denmark)
Axelsson, Owe; Barker, Vincent A.; Neytcheva, Maya
2001-01-01
boundary value problem for each velocity component, are solved by the conjugate gradient method with a preconditioning based on the algebraic multi‐level iteration (AMLI) technique. The velocity is found from the computed pressure. The method is optimal in the sense that the computational work...... is proportional to the number of unknowns. Further, it is designed to exploit a massively parallel computer with distributed memory architecture. Numerical experiments on a Cray T3E computer illustrate the parallel performance of the method....
Parallel preconditioned conjugate gradient algorithm applied to neutron diffusion problem
International Nuclear Information System (INIS)
Majumdar, A.; Martin, W.R.
1992-01-01
Numerical solution of the neutron diffusion problem requires solving a linear system of equations such as Ax = b, where A is an n x n symmetric positive definite (SPD) matrix; x and b are vectors with n components. The preconditioned conjugate gradient (PCG) algorithm is an efficient iterative method for solving such a linear system of equations. In this paper, the authors describe the implementation of a parallel PCG algorithm on a shared memory machine (BBN TC2000) and on a distributed workstation (IBM RS6000) environment created by the parallel virtual machine parallelization software
Parallel multiple instance learning for extremely large histopathology image analysis.
Xu, Yan; Li, Yeshu; Shen, Zhengyang; Wu, Ziwei; Gao, Teng; Fan, Yubo; Lai, Maode; Chang, Eric I-Chao
2017-08-03
Histopathology images are critical for medical diagnosis, e.g., cancer and its treatment. A standard histopathology slice can be easily scanned at a high resolution of, say, 200,000×200,000 pixels. These high resolution images can make most existing imaging processing tools infeasible or less effective when operated on a single machine with limited memory, disk space and computing power. In this paper, we propose an algorithm tackling this new emerging "big data" problem utilizing parallel computing on High-Performance-Computing (HPC) clusters. Experimental results on a large-scale data set (1318 images at a scale of 10 billion pixels each) demonstrate the efficiency and effectiveness of the proposed algorithm for low-latency real-time applications. The framework proposed an effective and efficient system for extremely large histopathology image analysis. It is based on the multiple instance learning formulation for weakly-supervised learning for image classification, segmentation and clustering. When a max-margin concept is adopted for different clusters, we obtain further improvement in clustering performance.
Shutdown problems in large tokamaks
International Nuclear Information System (INIS)
Weldon, D.M.
1978-01-01
Some of the problems connected with a normal shutdown at the end of the burn phase (soft shutdown) and with a shutdown caused by disruptive instability (hard shutdown) have been considered. For a soft shutdown a cursory literature search was undertaken and methods for controlling the thermal wall loading were listed. Because shutdown computer codes are not widespread, some of the differences between start-up codes and shutdown codes were discussed along with program changes needed to change a start-up code to a shutdown code. For a hard shutdown, the major problems are large induced voltages in the ohmic-heating and equilibrium-field coils and high first wall erosion. A literature search of plasma-wall interactions was carried out. Phenomena that occur at the plasma-wall interface can be quite complicated. For example, material evaporated from the wall can form a virtual limiter or shield protecting the wall from major damage. Thermal gradients that occur during the interaction can produce currents whose associated magnetic field also helps shield the wall
A PARALLEL NONOVERLAPPING DOMAIN DECOMPOSITION METHOD FOR STOKES PROBLEMS
Institute of Scientific and Technical Information of China (English)
Mei-qun Jiang; Pei-liang Dai
2006-01-01
A nonoverlapping domain decomposition iterative procedure is developed and analyzed for generalized Stokes problems and their finite element approximate problems in RN(N=2,3). The method is based on a mixed-type consistency condition with two parameters as a transmission condition together with a derivative-free transmission data updating technique on the artificial interfaces. The method can be applied to a general multi-subdomain decomposition and implemented on parallel machines with local simple communications naturally.
A parallel algorithm for the non-symmetric eigenvalue problem
International Nuclear Information System (INIS)
Sidani, M.M.
1991-01-01
An algorithm is presented for the solution of the non-symmetric eigenvalue problem. The algorithm is based on a divide-and-conquer procedure that provides initial approximations to the eigenpairs, which are then refined using Newton iterations. Since the smaller subproblems can be solved independently, and since Newton iterations with different initial guesses can be started simultaneously, the algorithm - unlike the standard QR method - is ideal for parallel computers. The author also reports on his investigation of deflation methods designed to obtain further eigenpairs if needed. Numerical results from implementations on a host of parallel machines (distributed and shared-memory) are presented
On the adequacy of message-passing parallel supercomputers for solving neutron transport problems
International Nuclear Information System (INIS)
Azmy, Y.Y.
1990-01-01
A coarse-grained, static-scheduling parallelization of the standard iterative scheme used for solving the discrete-ordinates approximation of the neutron transport equation is described. The parallel algorithm is based on a decomposition of the angular domain along the discrete ordinates, thus naturally producing a set of completely uncoupled systems of equations in each iteration. Implementation of the parallel code on Intcl's iPSC/2 hypercube, and solutions to test problems are presented as evidence of the high speedup and efficiency of the parallel code. The performance of the parallel code on the iPSC/2 is analyzed, and a model for the CPU time as a function of the problem size (order of angular quadrature) and the number of participating processors is developed and validated against measured CPU times. The performance model is used to speculate on the potential of massively parallel computers for significantly speeding up real-life transport calculations at acceptable efficiencies. We conclude that parallel computers with a few hundred processors are capable of producing large speedups at very high efficiencies in very large three-dimensional problems. 10 refs., 8 figs
Vectorization and parallelization of the finite strip method for dynamic Mindlin plate problems
Chen, Hsin-Chu; He, Ai-Fang
1993-01-01
The finite strip method is a semi-analytical finite element process which allows for a discrete analysis of certain types of physical problems by discretizing the domain of the problem into finite strips. This method decomposes a single large problem into m smaller independent subproblems when m harmonic functions are employed, thus yielding natural parallelism at a very high level. In this paper we address vectorization and parallelization strategies for the dynamic analysis of simply-supported Mindlin plate bending problems and show how to prevent potential conflicts in memory access during the assemblage process. The vector and parallel implementations of this method and the performance results of a test problem under scalar, vector, and vector-concurrent execution modes on the Alliant FX/80 are also presented.
Parallel particle swarm optimization algorithm in nuclear problems
International Nuclear Information System (INIS)
Waintraub, Marcel; Pereira, Claudio M.N.A.; Schirru, Roberto
2009-01-01
Particle Swarm Optimization (PSO) is a population-based metaheuristic (PBM), in which solution candidates evolve through simulation of a simplified social adaptation model. Putting together robustness, efficiency and simplicity, PSO has gained great popularity. Many successful applications of PSO are reported, in which PSO demonstrated to have advantages over other well-established PBM. However, computational costs are still a great constraint for PSO, as well as for all other PBMs, especially in optimization problems with time consuming objective functions. To overcome such difficulty, parallel computation has been used. The default advantage of parallel PSO (PPSO) is the reduction of computational time. Master-slave approaches, exploring this characteristic are the most investigated. However, much more should be expected. It is known that PSO may be improved by more elaborated neighborhood topologies. Hence, in this work, we develop several different PPSO algorithms exploring the advantages of enhanced neighborhood topologies implemented by communication strategies in multiprocessor architectures. The proposed PPSOs have been applied to two complex and time consuming nuclear engineering problems: reactor core design and fuel reload optimization. After exhaustive experiments, it has been concluded that: PPSO still improves solutions after many thousands of iterations, making prohibitive the efficient use of serial (non-parallel) PSO in such kind of realworld problems; and PPSO with more elaborated communication strategies demonstrated to be more efficient and robust than the master-slave model. Advantages and peculiarities of each model are carefully discussed in this work. (author)
Solving the Selective Multi-Category Parallel-Servicing Problem
DEFF Research Database (Denmark)
Range, Troels Martin; Lusby, Richard Martin; Larsen, Jesper
In this paper we present a new scheduling problem and describe a shortest path based heuristic as well as a dynamic programming based exact optimization algorithm to solve it. The Selective Multi-Category Parallel-Servicing Problem (SMCPSP) arises when a set of jobs has to be scheduled on a server...... (machine) with limited capacity. Each job requests service in a prespecified time window and belongs to a certain category. Jobs may be serviced partially, incurring a penalty; however, only jobs of the same category can be processed simultaneously. One must identify the best subset of jobs to process...
Solving the selective multi-category parallel-servicing problem
DEFF Research Database (Denmark)
Range, Troels Martin; Lusby, Richard Martin; Larsen, Jesper
2015-01-01
In this paper, we present a new scheduling problem and describe a shortest path-based heuristic as well as a dynamic programming-based exact optimization algorithm to solve it. The selective multi-category parallel-servicing problem arises when a set of jobs has to be scheduled on a server (machine......) with limited capacity. Each job requests service in a prespecified time window and belongs to a certain category. Jobs may be serviced partially, incurring a penalty; however, only jobs of the same category can be processed simultaneously. One must identify the best subset of jobs to process in each time...
A parallel graded-mesh FDTD algorithm for human-antenna interaction problems.
Catarinucci, Luca; Tarricone, Luciano
2009-01-01
The finite difference time domain method (FDTD) is frequently used for the numerical solution of a wide variety of electromagnetic (EM) problems and, among them, those concerning human exposure to EM fields. In many practical cases related to the assessment of occupational EM exposure, large simulation domains are modeled and high space resolution adopted, so that strong memory and central processing unit power requirements have to be satisfied. To better afford the computational effort, the use of parallel computing is a winning approach; alternatively, subgridding techniques are often implemented. However, the simultaneous use of subgridding schemes and parallel algorithms is very new. In this paper, an easy-to-implement and highly-efficient parallel graded-mesh (GM) FDTD scheme is proposed and applied to human-antenna interaction problems, demonstrating its appropriateness in dealing with complex occupational tasks and showing its capability to guarantee the advantages of a traditional subgridding technique without affecting the parallel FDTD performance.
Brueggemann, T.; Hurink, Johann L.; Vredeveld, T.; Woeginger, Gerhard
2006-01-01
We study the problem of minimizing the makespan on m parallel machines. We introduce a very large-scale neighborhood of exponential size (in the number of machines) that is based on a matching in a complete graph. The idea is to partition the jobs assigned to the same machine into two sets. This
Simulation of neutron transport equation using parallel Monte Carlo for deep penetration problems
International Nuclear Information System (INIS)
Bekar, K. K.; Tombakoglu, M.; Soekmen, C. N.
2001-01-01
Neutron transport equation is simulated using parallel Monte Carlo method for deep penetration neutron transport problem. Monte Carlo simulation is parallelized by using three different techniques; direct parallelization, domain decomposition and domain decomposition with load balancing, which are used with PVM (Parallel Virtual Machine) software on LAN (Local Area Network). The results of parallel simulation are given for various model problems. The performances of the parallelization techniques are compared with each other. Moreover, the effects of variance reduction techniques on parallelization are discussed
Parallel and Cooperative Particle Swarm Optimizer for Multimodal Problems
Directory of Open Access Journals (Sweden)
Geng Zhang
2015-01-01
Full Text Available Although the original particle swarm optimizer (PSO method and its related variant methods show some effectiveness for solving optimization problems, it may easily get trapped into local optimum especially when solving complex multimodal problems. Aiming to solve this issue, this paper puts forward a novel method called parallel and cooperative particle swarm optimizer (PCPSO. In case that the interacting of the elements in D-dimensional function vector X=[x1,x2,…,xd,…,xD] is independent, cooperative particle swarm optimizer (CPSO is used. Based on this, the PCPSO is presented to solve real problems. Since the dimension cannot be split into several lower dimensional search spaces in real problems because of the interacting of the elements, PCPSO exploits the cooperation of two parallel CPSO algorithms by orthogonal experimental design (OED learning. Firstly, the CPSO algorithm is used to generate two locally optimal vectors separately; then the OED is used to learn the merits of these two vectors and creates a better combination of them to generate further search. Experimental studies on a set of test functions show that PCPSO exhibits better robustness and converges much closer to the global optimum than several other peer algorithms.
[Parallel virtual reality visualization of extreme large medical datasets].
Tang, Min
2010-04-01
On the basis of a brief description of grid computing, the essence and critical techniques of parallel visualization of extreme large medical datasets are discussed in connection with Intranet and common-configuration computers of hospitals. In this paper are introduced several kernel techniques, including the hardware structure, software framework, load balance and virtual reality visualization. The Maximum Intensity Projection algorithm is realized in parallel using common PC cluster. In virtual reality world, three-dimensional models can be rotated, zoomed, translated and cut interactively and conveniently through the control panel built on virtual reality modeling language (VRML). Experimental results demonstrate that this method provides promising and real-time results for playing the role in of a good assistant in making clinical diagnosis.
Efficient parallel algorithms for string editing and related problems
Apostolico, Alberto; Atallah, Mikhail J.; Larmore, Lawrence; Mcfaddin, H. S.
1988-01-01
The string editing problem for input strings x and y consists of transforming x into y by performing a series of weighted edit operations on x of overall minimum cost. An edit operation on x can be the deletion of a symbol from x, the insertion of a symbol in x or the substitution of a symbol x with another symbol. This problem has a well known O((absolute value of x)(absolute value of y)) time sequential solution (25). The efficient Program Requirements Analysis Methods (PRAM) parallel algorithms for the string editing problem are given. If m = ((absolute value of x),(absolute value of y)) and n = max((absolute value of x),(absolute value of y)), then the CREW bound is O (log m log n) time with O (mn/log m) processors. In all algorithms, space is O (mn).
Parallel Simulation of Three-Dimensional Free Surface Fluid Flow Problems
International Nuclear Information System (INIS)
BAER, THOMAS A.; SACKINGER, PHILIP A.; SUBIA, SAMUEL R.
1999-01-01
Simulation of viscous three-dimensional fluid flow typically involves a large number of unknowns. When free surfaces are included, the number of unknowns increases dramatically. Consequently, this class of problem is an obvious application of parallel high performance computing. We describe parallel computation of viscous, incompressible, free surface, Newtonian fluid flow problems that include dynamic contact fines. The Galerkin finite element method was used to discretize the fully-coupled governing conservation equations and a ''pseudo-solid'' mesh mapping approach was used to determine the shape of the free surface. In this approach, the finite element mesh is allowed to deform to satisfy quasi-static solid mechanics equations subject to geometric or kinematic constraints on the boundaries. As a result, nodal displacements must be included in the set of unknowns. Other issues discussed are the proper constraints appearing along the dynamic contact line in three dimensions. Issues affecting efficient parallel simulations include problem decomposition to equally distribute computational work among a SPMD computer and determination of robust, scalable preconditioners for the distributed matrix systems that must be solved. Solution continuation strategies important for serial simulations have an enhanced relevance in a parallel coquting environment due to the difficulty of solving large scale systems. Parallel computations will be demonstrated on an example taken from the coating flow industry: flow in the vicinity of a slot coater edge. This is a three dimensional free surface problem possessing a contact line that advances at the web speed in one region but transitions to static behavior in another region. As such, a significant fraction of the computational time is devoted to processing boundary data. Discussion focuses on parallel speed ups for fixed problem size, a class of problems of immediate practical importance
Highly uniform parallel microfabrication using a large numerical aperture system
Energy Technology Data Exchange (ETDEWEB)
Zhang, Zi-Yu; Su, Ya-Hui, E-mail: ustcsyh@ahu.edu.cn, E-mail: dongwu@ustc.edu.cn [School of Electrical Engineering and Automation, Anhui University, Hefei 230601 (China); Zhang, Chen-Chu; Hu, Yan-Lei; Wang, Chao-Wei; Li, Jia-Wen; Chu, Jia-Ru; Wu, Dong, E-mail: ustcsyh@ahu.edu.cn, E-mail: dongwu@ustc.edu.cn [CAS Key Laboratory of Mechanical Behavior and Design of Materials, Department of Precision Machinery and Precision Instrumentation, University of Science and Technology of China, Hefei 230026 (China)
2016-07-11
In this letter, we report an improved algorithm to produce accurate phase patterns for generating highly uniform diffraction-limited multifocal arrays in a large numerical aperture objective system. It is shown that based on the original diffraction integral, the uniformity of the diffraction-limited focal arrays can be improved from ∼75% to >97%, owing to the critical consideration of the aperture function and apodization effect associated with a large numerical aperture objective. The experimental results, e.g., 3 × 3 arrays of square and triangle, seven microlens arrays with high uniformity, further verify the advantage of the improved algorithm. This algorithm enables the laser parallel processing technology to realize uniform microstructures and functional devices in the microfabrication system with a large numerical aperture objective.
Parallel Algorithm for Incremental Betweenness Centrality on Large Graphs
Jamour, Fuad Tarek
2017-10-17
Betweenness centrality quantifies the importance of nodes in a graph in many applications, including network analysis, community detection and identification of influential users. Typically, graphs in such applications evolve over time. Thus, the computation of betweenness centrality should be performed incrementally. This is challenging because updating even a single edge may trigger the computation of all-pairs shortest paths in the entire graph. Existing approaches cannot scale to large graphs: they either require excessive memory (i.e., quadratic to the size of the input graph) or perform unnecessary computations rendering them prohibitively slow. We propose iCentral; a novel incremental algorithm for computing betweenness centrality in evolving graphs. We decompose the graph into biconnected components and prove that processing can be localized within the affected components. iCentral is the first algorithm to support incremental betweeness centrality computation within a graph component. This is done efficiently, in linear space; consequently, iCentral scales to large graphs. We demonstrate with real datasets that the serial implementation of iCentral is up to 3.7 times faster than existing serial methods. Our parallel implementation that scales to large graphs, is an order of magnitude faster than the state-of-the-art parallel algorithm, while using an order of magnitude less computational resources.
Final Report: Migration Mechanisms for Large-scale Parallel Applications
Energy Technology Data Exchange (ETDEWEB)
Jason Nieh
2009-10-30
Process migration is the ability to transfer a process from one machine to another. It is a useful facility in distributed computing environments, especially as computing devices become more pervasive and Internet access becomes more ubiquitous. The potential benefits of process migration, among others, are fault resilience by migrating processes off of faulty hosts, data access locality by migrating processes closer to the data, better system response time by migrating processes closer to users, dynamic load balancing by migrating processes to less loaded hosts, and improved service availability and administration by migrating processes before host maintenance so that applications can continue to run with minimal downtime. Although process migration provides substantial potential benefits and many approaches have been considered, achieving transparent process migration functionality has been difficult in practice. To address this problem, our work has designed, implemented, and evaluated new and powerful transparent process checkpoint-restart and migration mechanisms for desktop, server, and parallel applications that operate across heterogeneous cluster and mobile computing environments. A key aspect of this work has been to introduce lightweight operating system virtualization to provide processes with private, virtual namespaces that decouple and isolate processes from dependencies on the host operating system instance. This decoupling enables processes to be transparently checkpointed and migrated without modifying, recompiling, or relinking applications or the operating system. Building on this lightweight operating system virtualization approach, we have developed novel technologies that enable (1) coordinated, consistent checkpoint-restart and migration of multiple processes, (2) fast checkpointing of process and file system state to enable restart of multiple parallel execution environments and time travel, (3) process migration across heterogeneous
Parallel Object-Oriented Computation Applied to a Finite Element Problem
Directory of Open Access Journals (Sweden)
Jon B. Weissman
1993-01-01
Full Text Available The conventional wisdom in the scientific computing community is that the best way to solve large-scale numerically intensive scientific problems on today's parallel MIMD computers is to use Fortran or C programmed in a data-parallel style using low-level message-passing primitives. This approach inevitably leads to nonportable codes and extensive development time, and restricts parallel programming to the domain of the expert programmer. We believe that these problems are not inherent to parallel computing but are the result of the programming tools used. We will show that comparable performance can be achieved with little effort if better tools that present higher level abstractions are used. The vehicle for our demonstration is a 2D electromagnetic finite element scattering code we have implemented in Mentat, an object-oriented parallel processing system. We briefly describe the application. Mentat, the implementation, and present performance results for both a Mentat and a hand-coded parallel Fortran version.
Parallel Quasi Newton Algorithms for Large Scale Non Linear Unconstrained Optimization
International Nuclear Information System (INIS)
Rahman, M. A.; Basarudin, T.
1997-01-01
This paper discusses about Quasi Newton (QN) method to solve non-linear unconstrained minimization problems. One of many important of QN method is choice of matrix Hk. to be positive definite and satisfies to QN method. Our interest here is the parallel QN methods which will suite for the solution of large-scale optimization problems. The QN methods became less attractive in large-scale problems because of the storage and computational requirements. How ever, it is often the case that the Hessian is space matrix. In this paper we include the mechanism of how to reduce the Hessian update and hold the Hessian properties.One major reason of our research is that the QN method may be good in solving certain type of minimization problems, but it is efficiency degenerate when is it applied to solve other category of problems. For this reason, we use an algorithm containing several direction strategies which are processed in parallel. We shall attempt to parallelized algorithm by exploring different search directions which are generated by various QN update during the minimization process. The different line search strategies will be employed simultaneously in the process of locating the minimum along each direction.The code of algorithm will be written in Occam language 2 which is run on the transputer machine
Parallel Framework for Dimensionality Reduction of Large-Scale Datasets
Directory of Open Access Journals (Sweden)
Sai Kiranmayee Samudrala
2015-01-01
Full Text Available Dimensionality reduction refers to a set of mathematical techniques used to reduce complexity of the original high-dimensional data, while preserving its selected properties. Improvements in simulation strategies and experimental data collection methods are resulting in a deluge of heterogeneous and high-dimensional data, which often makes dimensionality reduction the only viable way to gain qualitative and quantitative understanding of the data. However, existing dimensionality reduction software often does not scale to datasets arising in real-life applications, which may consist of thousands of points with millions of dimensions. In this paper, we propose a parallel framework for dimensionality reduction of large-scale data. We identify key components underlying the spectral dimensionality reduction techniques, and propose their efficient parallel implementation. We show that the resulting framework can be used to process datasets consisting of millions of points when executed on a 16,000-core cluster, which is beyond the reach of currently available methods. To further demonstrate applicability of our framework we perform dimensionality reduction of 75,000 images representing morphology evolution during manufacturing of organic solar cells in order to identify how processing parameters affect morphology evolution.
Optical technologies for data communication in large parallel systems
International Nuclear Information System (INIS)
Ritter, M B; Vlasov, Y; Kash, J A; Benner, A
2011-01-01
Large, parallel systems have greatly aided scientific computation and data collection, but performance scaling now relies on chip and system-level parallelism. This has happened because power density limits have caused processor frequency growth to stagnate, driving the new multi-core architecture paradigm, which would seem to provide generations of performance increases as transistors scale. However, this paradigm will be constrained by electrical I/O bandwidth limits; first off the processor card, then off the processor module itself. We will present best-estimates of these limits, then show how optical technologies can help provide more bandwidth to allow continued system scaling. We will describe the current status of optical transceiver technology which is already being used to exceed off-board electrical bandwidth limits, then present work on silicon nanophotonic transceivers and 3D integration technologies which, taken together, promise to allow further increases in off-module and off-card bandwidth. Finally, we will show estimated limits of nanophotonic links and discuss breakthroughs that are needed for further progress, and will speculate on whether we will reach Exascale-class machine performance at affordable powers.
Large-scale parallel genome assembler over cloud computing environment.
Das, Arghya Kusum; Koppa, Praveen Kumar; Goswami, Sayan; Platania, Richard; Park, Seung-Jong
2017-06-01
The size of high throughput DNA sequencing data has already reached the terabyte scale. To manage this huge volume of data, many downstream sequencing applications started using locality-based computing over different cloud infrastructures to take advantage of elastic (pay as you go) resources at a lower cost. However, the locality-based programming model (e.g. MapReduce) is relatively new. Consequently, developing scalable data-intensive bioinformatics applications using this model and understanding the hardware environment that these applications require for good performance, both require further research. In this paper, we present a de Bruijn graph oriented Parallel Giraph-based Genome Assembler (GiGA), as well as the hardware platform required for its optimal performance. GiGA uses the power of Hadoop (MapReduce) and Giraph (large-scale graph analysis) to achieve high scalability over hundreds of compute nodes by collocating the computation and data. GiGA achieves significantly higher scalability with competitive assembly quality compared to contemporary parallel assemblers (e.g. ABySS and Contrail) over traditional HPC cluster. Moreover, we show that the performance of GiGA is significantly improved by using an SSD-based private cloud infrastructure over traditional HPC cluster. We observe that the performance of GiGA on 256 cores of this SSD-based cloud infrastructure closely matches that of 512 cores of traditional HPC cluster.
Optical technologies for data communication in large parallel systems
Energy Technology Data Exchange (ETDEWEB)
Ritter, M B; Vlasov, Y; Kash, J A [IBM T.J. Watson Research Center, Yorktown Heights, NY (United States); Benner, A, E-mail: mritter@us.ibm.com [IBM Poughkeepsie, Poughkeepsie, NY (United States)
2011-01-15
Large, parallel systems have greatly aided scientific computation and data collection, but performance scaling now relies on chip and system-level parallelism. This has happened because power density limits have caused processor frequency growth to stagnate, driving the new multi-core architecture paradigm, which would seem to provide generations of performance increases as transistors scale. However, this paradigm will be constrained by electrical I/O bandwidth limits; first off the processor card, then off the processor module itself. We will present best-estimates of these limits, then show how optical technologies can help provide more bandwidth to allow continued system scaling. We will describe the current status of optical transceiver technology which is already being used to exceed off-board electrical bandwidth limits, then present work on silicon nanophotonic transceivers and 3D integration technologies which, taken together, promise to allow further increases in off-module and off-card bandwidth. Finally, we will show estimated limits of nanophotonic links and discuss breakthroughs that are needed for further progress, and will speculate on whether we will reach Exascale-class machine performance at affordable powers.
A new asynchronous parallel algorithm for inferring large-scale gene regulatory networks.
Directory of Open Access Journals (Sweden)
Xiangyun Xiao
Full Text Available The reconstruction of gene regulatory networks (GRNs from high-throughput experimental data has been considered one of the most important issues in systems biology research. With the development of high-throughput technology and the complexity of biological problems, we need to reconstruct GRNs that contain thousands of genes. However, when many existing algorithms are used to handle these large-scale problems, they will encounter two important issues: low accuracy and high computational cost. To overcome these difficulties, the main goal of this study is to design an effective parallel algorithm to infer large-scale GRNs based on high-performance parallel computing environments. In this study, we proposed a novel asynchronous parallel framework to improve the accuracy and lower the time complexity of large-scale GRN inference by combining splitting technology and ordinary differential equation (ODE-based optimization. The presented algorithm uses the sparsity and modularity of GRNs to split whole large-scale GRNs into many small-scale modular subnetworks. Through the ODE-based optimization of all subnetworks in parallel and their asynchronous communications, we can easily obtain the parameters of the whole network. To test the performance of the proposed approach, we used well-known benchmark datasets from Dialogue for Reverse Engineering Assessments and Methods challenge (DREAM, experimentally determined GRN of Escherichia coli and one published dataset that contains more than 10 thousand genes to compare the proposed approach with several popular algorithms on the same high-performance computing environments in terms of both accuracy and time complexity. The numerical results demonstrate that our parallel algorithm exhibits obvious superiority in inferring large-scale GRNs.
A new asynchronous parallel algorithm for inferring large-scale gene regulatory networks.
Xiao, Xiangyun; Zhang, Wei; Zou, Xiufen
2015-01-01
The reconstruction of gene regulatory networks (GRNs) from high-throughput experimental data has been considered one of the most important issues in systems biology research. With the development of high-throughput technology and the complexity of biological problems, we need to reconstruct GRNs that contain thousands of genes. However, when many existing algorithms are used to handle these large-scale problems, they will encounter two important issues: low accuracy and high computational cost. To overcome these difficulties, the main goal of this study is to design an effective parallel algorithm to infer large-scale GRNs based on high-performance parallel computing environments. In this study, we proposed a novel asynchronous parallel framework to improve the accuracy and lower the time complexity of large-scale GRN inference by combining splitting technology and ordinary differential equation (ODE)-based optimization. The presented algorithm uses the sparsity and modularity of GRNs to split whole large-scale GRNs into many small-scale modular subnetworks. Through the ODE-based optimization of all subnetworks in parallel and their asynchronous communications, we can easily obtain the parameters of the whole network. To test the performance of the proposed approach, we used well-known benchmark datasets from Dialogue for Reverse Engineering Assessments and Methods challenge (DREAM), experimentally determined GRN of Escherichia coli and one published dataset that contains more than 10 thousand genes to compare the proposed approach with several popular algorithms on the same high-performance computing environments in terms of both accuracy and time complexity. The numerical results demonstrate that our parallel algorithm exhibits obvious superiority in inferring large-scale GRNs.
Iterative algorithms for large sparse linear systems on parallel computers
Adams, L. M.
1982-01-01
Algorithms for assembling in parallel the sparse system of linear equations that result from finite difference or finite element discretizations of elliptic partial differential equations, such as those that arise in structural engineering are developed. Parallel linear stationary iterative algorithms and parallel preconditioned conjugate gradient algorithms are developed for solving these systems. In addition, a model for comparing parallel algorithms on array architectures is developed and results of this model for the algorithms are given.
Directory of Open Access Journals (Sweden)
José Miguel Vargas-Félix
2012-11-01
Full Text Available The Finite Element Method (FEM is used to solve problems like solid deformation and heat diffusion in domains with complex geometries. This kind of geometries requires discretization with millions of elements; this is equivalent to solve systems of equations with sparse matrices and tens or hundreds of millions of variables. The aim is to use computer clusters to solve these systems. The solution method used is Schur substructuration. Using it is possible to divide a large system of equations into many small ones to solve them more efficiently. This method allows parallelization. MPI (Message Passing Interface is used to distribute the systems of equations to solve each one in a computer of a cluster. Each system of equations is solved using a solver implemented to use OpenMP as a local parallelization method.The Finite Element Method (FEM is used to solve problems like solid deformation and heat diffusion in domains with complex geometries. This kind of geometries requires discretization with millions of elements; this is equivalent to solve systems of equations with sparse matrices and tens or hundreds of millions of variables. The aim is to use computer clusters to solve these systems. The solution method used is Schur substructuration. Using it is possible to divide a large system of equations into many small ones to solve them more efficiently. This method allows parallelization. MPI (Message Passing Interface is used to distribute the systems of equations to solve each one in a computer of a cluster. Each system of equations is solved using a solver implemented to use OpenMP as a local parallelization method.
Likelihood Approximation With Parallel Hierarchical Matrices For Large Spatial Datasets
Litvinenko, Alexander
2017-11-01
The main goal of this article is to introduce the parallel hierarchical matrix library HLIBpro to the statistical community. We describe the HLIBCov package, which is an extension of the HLIBpro library for approximating large covariance matrices and maximizing likelihood functions. We show that an approximate Cholesky factorization of a dense matrix of size $2M\\\\times 2M$ can be computed on a modern multi-core desktop in few minutes. Further, HLIBCov is used for estimating the unknown parameters such as the covariance length, variance and smoothness parameter of a Matérn covariance function by maximizing the joint Gaussian log-likelihood function. The computational bottleneck here is expensive linear algebra arithmetics due to large and dense covariance matrices. Therefore covariance matrices are approximated in the hierarchical ($\\\\H$-) matrix format with computational cost $\\\\mathcal{O}(k^2n \\\\log^2 n/p)$ and storage $\\\\mathcal{O}(kn \\\\log n)$, where the rank $k$ is a small integer (typically $k<25$), $p$ the number of cores and $n$ the number of locations on a fairly general mesh. We demonstrate a synthetic example, where the true values of known parameters are known. For reproducibility we provide the C++ code, the documentation, and the synthetic data.
Likelihood Approximation With Parallel Hierarchical Matrices For Large Spatial Datasets
Litvinenko, Alexander; Sun, Ying; Genton, Marc G.; Keyes, David E.
2017-01-01
The main goal of this article is to introduce the parallel hierarchical matrix library HLIBpro to the statistical community. We describe the HLIBCov package, which is an extension of the HLIBpro library for approximating large covariance matrices and maximizing likelihood functions. We show that an approximate Cholesky factorization of a dense matrix of size $2M\\times 2M$ can be computed on a modern multi-core desktop in few minutes. Further, HLIBCov is used for estimating the unknown parameters such as the covariance length, variance and smoothness parameter of a Matérn covariance function by maximizing the joint Gaussian log-likelihood function. The computational bottleneck here is expensive linear algebra arithmetics due to large and dense covariance matrices. Therefore covariance matrices are approximated in the hierarchical ($\\H$-) matrix format with computational cost $\\mathcal{O}(k^2n \\log^2 n/p)$ and storage $\\mathcal{O}(kn \\log n)$, where the rank $k$ is a small integer (typically $k<25$), $p$ the number of cores and $n$ the number of locations on a fairly general mesh. We demonstrate a synthetic example, where the true values of known parameters are known. For reproducibility we provide the C++ code, the documentation, and the synthetic data.
Quan, Zhe; Wu, Lei
2017-09-01
This article investigates the use of parallel computing for solving the disjunctively constrained knapsack problem. The proposed parallel computing model can be viewed as a cooperative algorithm based on a multi-neighbourhood search. The cooperation system is composed of a team manager and a crowd of team members. The team members aim at applying their own search strategies to explore the solution space. The team manager collects the solutions from the members and shares the best one with them. The performance of the proposed method is evaluated on a group of benchmark data sets. The results obtained are compared to those reached by the best methods from the literature. The results show that the proposed method is able to provide the best solutions in most cases. In order to highlight the robustness of the proposed parallel computing model, a new set of large-scale instances is introduced. Encouraging results have been obtained.
Fast parallel DNA-based algorithms for molecular computation: the set-partition problem.
Chang, Weng-Long
2007-12-01
This paper demonstrates that basic biological operations can be used to solve the set-partition problem. In order to achieve this, we propose three DNA-based algorithms, a signed parallel adder, a signed parallel subtractor and a signed parallel comparator, that formally verify our designed molecular solutions for solving the set-partition problem.
Sensitivity analysis for large-scale problems
Noor, Ahmed K.; Whitworth, Sandra L.
1987-01-01
The development of efficient techniques for calculating sensitivity derivatives is studied. The objective is to present a computational procedure for calculating sensitivity derivatives as part of performing structural reanalysis for large-scale problems. The scope is limited to framed type structures. Both linear static analysis and free-vibration eigenvalue problems are considered.
2009-01-01
At the 19th Annual Conference on Parallel Computational Fluid Dynamics held in Antalya, Turkey, in May 2007, the most recent developments and implementations of large-scale and grid computing were presented. This book, comprised of the invited and selected papers of this conference, details those advances, which are of particular interest to CFD and CFD-related communities. It also offers the results related to applications of various scientific and engineering problems involving flows and flow-related topics. Intended for CFD researchers and graduate students, this book is a state-of-the-art presentation of the relevant methodology and implementation techniques of large-scale computing.
Application of Raptor-M3G to reactor dosimetry problems on massively parallel architectures - 026
International Nuclear Information System (INIS)
Longoni, G.
2010-01-01
The solution of complex 3-D radiation transport problems requires significant resources both in terms of computation time and memory availability. Therefore, parallel algorithms and multi-processor architectures are required to solve efficiently large 3-D radiation transport problems. This paper presents the application of RAPTOR-M3G (Rapid Parallel Transport Of Radiation - Multiple 3D Geometries) to reactor dosimetry problems. RAPTOR-M3G is a newly developed parallel computer code designed to solve the discrete ordinates (SN) equations on multi-processor computer architectures. This paper presents the results for a reactor dosimetry problem using a 3-D model of a commercial 2-loop pressurized water reactor (PWR). The accuracy and performance of RAPTOR-M3G will be analyzed and the numerical results obtained from the calculation will be compared directly to measurements of the neutron field in the reactor cavity air gap. The parallel performance of RAPTOR-M3G on massively parallel architectures, where the number of computing nodes is in the order of hundreds, will be analyzed up to four hundred processors. The performance results will be presented based on two supercomputing architectures: the POPLE supercomputer operated by the Pittsburgh Supercomputing Center and the Westinghouse computer cluster. The Westinghouse computer cluster is equipped with a standard Ethernet network connection and an InfiniBand R interconnects capable of a bandwidth in excess of 20 GBit/sec. Therefore, the impact of the network architecture on RAPTOR-M3G performance will be analyzed as well. (authors)
Robust large-scale parallel nonlinear solvers for simulations.
Energy Technology Data Exchange (ETDEWEB)
Bader, Brett William; Pawlowski, Roger Patrick; Kolda, Tamara Gibson (Sandia National Laboratories, Livermore, CA)
2005-11-01
This report documents research to develop robust and efficient solution techniques for solving large-scale systems of nonlinear equations. The most widely used method for solving systems of nonlinear equations is Newton's method. While much research has been devoted to augmenting Newton-based solvers (usually with globalization techniques), little has been devoted to exploring the application of different models. Our research has been directed at evaluating techniques using different models than Newton's method: a lower order model, Broyden's method, and a higher order model, the tensor method. We have developed large-scale versions of each of these models and have demonstrated their use in important applications at Sandia. Broyden's method replaces the Jacobian with an approximation, allowing codes that cannot evaluate a Jacobian or have an inaccurate Jacobian to converge to a solution. Limited-memory methods, which have been successful in optimization, allow us to extend this approach to large-scale problems. We compare the robustness and efficiency of Newton's method, modified Newton's method, Jacobian-free Newton-Krylov method, and our limited-memory Broyden method. Comparisons are carried out for large-scale applications of fluid flow simulations and electronic circuit simulations. Results show that, in cases where the Jacobian was inaccurate or could not be computed, Broyden's method converged in some cases where Newton's method failed to converge. We identify conditions where Broyden's method can be more efficient than Newton's method. We also present modifications to a large-scale tensor method, originally proposed by Bouaricha, for greater efficiency, better robustness, and wider applicability. Tensor methods are an alternative to Newton-based methods and are based on computing a step based on a local quadratic model rather than a linear model. The advantage of Bouaricha's method is that it can use any
Parallel processing of Monte Carlo code MCNP for particle transport problem
Energy Technology Data Exchange (ETDEWEB)
Higuchi, Kenji; Kawasaki, Takuji
1996-06-01
It is possible to vectorize or parallelize Monte Carlo codes (MC code) for photon and neutron transport problem, making use of independency of the calculation for each particle. Applicability of existing MC code to parallel processing is mentioned. As for parallel computer, we have used both vector-parallel processor and scalar-parallel processor in performance evaluation. We have made (i) vector-parallel processing of MCNP code on Monte Carlo machine Monte-4 with four vector processors, (ii) parallel processing on Paragon XP/S with 256 processors. In this report we describe the methodology and results for parallel processing on two types of parallel or distributed memory computers. In addition, we mention the evaluation of parallel programming environments for parallel computers used in the present work as a part of the work developing STA (Seamless Thinking Aid) Basic Software. (author)
Efficient parallel and out of core algorithms for constructing large bi-directed de Bruijn graphs
Directory of Open Access Journals (Sweden)
Vaughn Matthew
2010-11-01
Full Text Available Abstract Background Assembling genomic sequences from a set of overlapping reads is one of the most fundamental problems in computational biology. Algorithms addressing the assembly problem fall into two broad categories - based on the data structures which they employ. The first class uses an overlap/string graph and the second type uses a de Bruijn graph. However with the recent advances in short read sequencing technology, de Bruijn graph based algorithms seem to play a vital role in practice. Efficient algorithms for building these massive de Bruijn graphs are very essential in large sequencing projects based on short reads. In an earlier work, an O(n/p time parallel algorithm has been given for this problem. Here n is the size of the input and p is the number of processors. This algorithm enumerates all possible bi-directed edges which can overlap with a node and ends up generating Θ(nΣ messages (Σ being the size of the alphabet. Results In this paper we present a Θ(n/p time parallel algorithm with a communication complexity that is equal to that of parallel sorting and is not sensitive to Σ. The generality of our algorithm makes it very easy to extend it even to the out-of-core model and in this case it has an optimal I/O complexity of Θ(nlog(n/BBlog(M/B (M being the main memory size and B being the size of the disk block. We demonstrate the scalability of our parallel algorithm on a SGI/Altix computer. A comparison of our algorithm with the previous approaches reveals that our algorithm is faster - both asymptotically and practically. We demonstrate the scalability of our sequential out-of-core algorithm by comparing it with the algorithm used by VELVET to build the bi-directed de Bruijn graph. Our experiments reveal that our algorithm can build the graph with a constant amount of memory, which clearly outperforms VELVET. We also provide efficient algorithms for the bi-directed chain compaction problem. Conclusions The bi
Efficient parallel and out of core algorithms for constructing large bi-directed de Bruijn graphs.
Kundeti, Vamsi K; Rajasekaran, Sanguthevar; Dinh, Hieu; Vaughn, Matthew; Thapar, Vishal
2010-11-15
Assembling genomic sequences from a set of overlapping reads is one of the most fundamental problems in computational biology. Algorithms addressing the assembly problem fall into two broad categories - based on the data structures which they employ. The first class uses an overlap/string graph and the second type uses a de Bruijn graph. However with the recent advances in short read sequencing technology, de Bruijn graph based algorithms seem to play a vital role in practice. Efficient algorithms for building these massive de Bruijn graphs are very essential in large sequencing projects based on short reads. In an earlier work, an O(n/p) time parallel algorithm has been given for this problem. Here n is the size of the input and p is the number of processors. This algorithm enumerates all possible bi-directed edges which can overlap with a node and ends up generating Θ(nΣ) messages (Σ being the size of the alphabet). In this paper we present a Θ(n/p) time parallel algorithm with a communication complexity that is equal to that of parallel sorting and is not sensitive to Σ. The generality of our algorithm makes it very easy to extend it even to the out-of-core model and in this case it has an optimal I/O complexity of Θ(nlog(n/B)Blog(M/B)) (M being the main memory size and B being the size of the disk block). We demonstrate the scalability of our parallel algorithm on a SGI/Altix computer. A comparison of our algorithm with the previous approaches reveals that our algorithm is faster--both asymptotically and practically. We demonstrate the scalability of our sequential out-of-core algorithm by comparing it with the algorithm used by VELVET to build the bi-directed de Bruijn graph. Our experiments reveal that our algorithm can build the graph with a constant amount of memory, which clearly outperforms VELVET. We also provide efficient algorithms for the bi-directed chain compaction problem. The bi-directed de Bruijn graph is a fundamental data structure for
A parallel solution to the cutting stock problem for a cluster of workstations
Energy Technology Data Exchange (ETDEWEB)
Nicklas, L.D.; Atkins, R.W.; Setia, S.V.; Wang, P.Y. [George Mason Univ., Fairfax, VA (United States)
1996-12-31
This paper describes the design and implementation of a solution to the constrained 2-D cutting stock problem on a cluster of workstations. The constrained 2-D cutting stock problem is an irregular problem with a dynamically modified global data set and irregular amounts and patterns of communication. A replicated data structure is used for the parallel solution since the ratio of reads to writes is known to be large. Mutual exclusion and consistency are maintained using a token-based lazy consistency mechanism, and a randomized protocol for dynamically balancing the distributed work queue is employed. Speedups are reported for three benchmark problems executed on a cluster of workstations interconnected by a 10 Mbps Ethernet.
Implementation of highly parallel and large scale GW calculations within the OpenAtom software
Ismail-Beigi, Sohrab
The need to describe electronic excitations with better accuracy than provided by band structures produced by Density Functional Theory (DFT) has been a long-term enterprise for the computational condensed matter and materials theory communities. In some cases, appropriate theoretical frameworks have existed for some time but have been difficult to apply widely due to computational cost. For example, the GW approximation incorporates a great deal of important non-local and dynamical electronic interaction effects but has been too computationally expensive for routine use in large materials simulations. OpenAtom is an open source massively parallel ab initiodensity functional software package based on plane waves and pseudopotentials (http://charm.cs.uiuc.edu/OpenAtom/) that takes advantage of the Charm + + parallel framework. At present, it is developed via a three-way collaboration, funded by an NSF SI2-SSI grant (ACI-1339804), between Yale (Ismail-Beigi), IBM T. J. Watson (Glenn Martyna) and the University of Illinois at Urbana Champaign (Laxmikant Kale). We will describe the project and our current approach towards implementing large scale GW calculations with OpenAtom. Potential applications of large scale parallel GW software for problems involving electronic excitations in semiconductor and/or metal oxide systems will be also be pointed out.
Parallel Tensor Compression for Large-Scale Scientific Data.
Energy Technology Data Exchange (ETDEWEB)
Kolda, Tamara G. [Sandia National Lab. (SNL-CA), Livermore, CA (United States); Ballard, Grey [Sandia National Lab. (SNL-CA), Livermore, CA (United States); Austin, Woody Nathan [Univ. of Texas, Austin, TX (United States)
2015-10-01
As parallel computing trends towards the exascale, scientific data produced by high-fidelity simulations are growing increasingly massive. For instance, a simulation on a three-dimensional spatial grid with 512 points per dimension that tracks 64 variables per grid point for 128 time steps yields 8 TB of data. By viewing the data as a dense five way tensor, we can compute a Tucker decomposition to find inherent low-dimensional multilinear structure, achieving compression ratios of up to 10000 on real-world data sets with negligible loss in accuracy. So that we can operate on such massive data, we present the first-ever distributed memory parallel implementation for the Tucker decomposition, whose key computations correspond to parallel linear algebra operations, albeit with nonstandard data layouts. Our approach specifies a data distribution for tensors that avoids any tensor data redistribution, either locally or in parallel. We provide accompanying analysis of the computation and communication costs of the algorithms. To demonstrate the compression and accuracy of the method, we apply our approach to real-world data sets from combustion science simulations. We also provide detailed performance results, including parallel performance in both weak and strong scaling experiments.
Application of parallel computing techniques to a large-scale reservoir simulation
International Nuclear Information System (INIS)
Zhang, Keni; Wu, Yu-Shu; Ding, Chris; Pruess, Karsten
2001-01-01
Even with the continual advances made in both computational algorithms and computer hardware used in reservoir modeling studies, large-scale simulation of fluid and heat flow in heterogeneous reservoirs remains a challenge. The problem commonly arises from intensive computational requirement for detailed modeling investigations of real-world reservoirs. This paper presents the application of a massive parallel-computing version of the TOUGH2 code developed for performing large-scale field simulations. As an application example, the parallelized TOUGH2 code is applied to develop a three-dimensional unsaturated-zone numerical model simulating flow of moisture, gas, and heat in the unsaturated zone of Yucca Mountain, Nevada, a potential repository for high-level radioactive waste. The modeling approach employs refined spatial discretization to represent the heterogeneous fractured tuffs of the system, using more than a million 3-D gridblocks. The problem of two-phase flow and heat transfer within the model domain leads to a total of 3,226,566 linear equations to be solved per Newton iteration. The simulation is conducted on a Cray T3E-900, a distributed-memory massively parallel computer. Simulation results indicate that the parallel computing technique, as implemented in the TOUGH2 code, is very efficient. The reliability and accuracy of the model results have been demonstrated by comparing them to those of small-scale (coarse-grid) models. These comparisons show that simulation results obtained with the refined grid provide more detailed predictions of the future flow conditions at the site, aiding in the assessment of proposed repository performance
Parallel Computation of RCS of Electrically Large Platform with Coatings Modeled with NURBS Surfaces
Directory of Open Access Journals (Sweden)
Ying Yan
2012-01-01
Full Text Available The significance of Radar Cross Section (RCS in the military applications makes its prediction an important problem. This paper uses large-scale parallel Physical Optics (PO to realize the fast computation of RCS to electrically large targets, which are modeled by Non-Uniform Rational B-Spline (NURBS surfaces and coated with dielectric materials. Some numerical examples are presented to validate this paper’s method. In addition, 1024 CPUs are used in Shanghai Supercomputer Center (SSC to perform the simulation of a model with the maximum electrical size 1966.7 λ for the first time in China. From which, it can be found that this paper’s method can greatly speed the calculation and is capable of solving the real-life problem of RCS prediction.
A Parallel Solver for Large-Scale Markov Chains
Czech Academy of Sciences Publication Activity Database
Benzi, M.; Tůma, Miroslav
2002-01-01
Roč. 41, - (2002), s. 135-153 ISSN 0168-9274 R&D Projects: GA AV ČR IAA2030801; GA ČR GA101/00/1035 Keywords : parallel preconditioning * iterative methods * discrete Markov chains * generalized inverses * singular matrices * graph partitioning * AINV * Bi-CGSTAB Subject RIV: BA - General Mathematics Impact factor: 0.504, year: 2002
Modern algorithms for large sparse eigenvalue problems
International Nuclear Information System (INIS)
Meyer, A.
1987-01-01
The volume is written for mathematicians interested in (numerical) linear algebra and in the solution of large sparse eigenvalue problems, as well as for specialists in engineering, who use the considered algorithms in the investigation of eigenoscillations of structures, in reactor physics, etc. Some variants of the algorithms based on the idea of a gradient-type direction of movement are presented and their convergence properties are discussed. From this, a general strategy for the direct use of preconditionings for the eigenvalue problem is derived. In this new approach the necessity of the solution of large linear systems is entirely avoided. Hence, these methods represent a new alternative to some other modern eigenvalue algorithms, as they show a slightly slower convergence on the one hand but essentially lower numerical and data processing problems on the other hand. A brief description and comparison of some well-known methods (i.e. simultaneous iteration, Lanczos algorithm) completes this volume. (author)
Variable Neighborhood Search for Parallel Machines Scheduling Problem with Step Deteriorating Jobs
Directory of Open Access Journals (Sweden)
Wenming Cheng
2012-01-01
Full Text Available In many real scheduling environments, a job processed later needs longer time than the same job when it starts earlier. This phenomenon is known as scheduling with deteriorating jobs to many industrial applications. In this paper, we study a scheduling problem of minimizing the total completion time on identical parallel machines where the processing time of a job is a step function of its starting time and a deteriorating date that is individual to all jobs. Firstly, a mixed integer programming model is presented for the problem. And then, a modified weight-combination search algorithm and a variable neighborhood search are employed to yield optimal or near-optimal schedule. To evaluate the performance of the proposed algorithms, computational experiments are performed on randomly generated test instances. Finally, computational results show that the proposed approaches obtain near-optimal solutions in a reasonable computational time even for large-sized problems.
Apparatus to examine pulsed parallel field losses in large conductors
International Nuclear Information System (INIS)
Miller, J.R.; Shen, S.S.
1977-01-01
Conductors in tokamak toroidal field coils will be exposed to pulsed fields both parallel and perpendicular to the current direction. These conductors will likely be quite high capacity (10 to 20 kA) and therefore probably will be built up out of smaller units. We have previously published measurements of losses in conductors exposed to a pulsed parallel field, but those experiments necessarily used monolithic conductors of relatively small cross section because the pulse coil, a torus that surrounded the test conductor, was itself small. Here we describe an apparatus that is conceptually similar but has been scaled up to accept conductors of much larger cross section and current capacity. The apparatus consists basically of a superconducting torus that contains a movable spool to allow test samples to be wound inside without unwinding the torus. Details of apparatus design and capabilities are described and preliminary results from tests of the apparatus and from loss measurements using it are reported
Parallelization of mathematical library for generalized eigenvalue problem for real band matrices
International Nuclear Information System (INIS)
Tanaka, Yasuhisa.
1997-05-01
This research has focused on a parallelization of the mathematical library for a generalized eigenvalue problem for real band matrices on IBM SP and Hitachi SR2201. The origin of the library is LASO (Lanczos Algorithm with Selective Orthogonalization), which was developed on the basis of Block Lanczos method for standard eigenvalue problem for real band matrices at Texas University. We adopted D.O.F. (Degree Of Freedom) decomposition method for a parallelization of this library, and evaluated its parallel performance. (author)
International Nuclear Information System (INIS)
Hwang, F-N; Wei, Z-H; Huang, T-M; Wang Weichung
2010-01-01
We develop a parallel Jacobi-Davidson approach for finding a partial set of eigenpairs of large sparse polynomial eigenvalue problems with application in quantum dot simulation. A Jacobi-Davidson eigenvalue solver is implemented based on the Portable, Extensible Toolkit for Scientific Computation (PETSc). The eigensolver thus inherits PETSc's efficient and various parallel operations, linear solvers, preconditioning schemes, and easy usages. The parallel eigenvalue solver is then used to solve higher degree polynomial eigenvalue problems arising in numerical simulations of three dimensional quantum dots governed by Schroedinger's equations. We find that the parallel restricted additive Schwarz preconditioner in conjunction with a parallel Krylov subspace method (e.g. GMRES) can solve the correction equations, the most costly step in the Jacobi-Davidson algorithm, very efficiently in parallel. Besides, the overall performance is quite satisfactory. We have observed near perfect superlinear speedup by using up to 320 processors. The parallel eigensolver can find all target interior eigenpairs of a quintic polynomial eigenvalue problem with more than 32 million variables within 12 minutes by using 272 Intel 3.0 GHz processors.
Energy Technology Data Exchange (ETDEWEB)
Sreepathi, Sarat [ORNL; Kumar, Jitendra [ORNL; Mills, Richard T. [Argonne National Laboratory; Hoffman, Forrest M. [ORNL; Sripathi, Vamsi [Intel Corporation; Hargrove, William Walter [United States Department of Agriculture (USDA), United States Forest Service (USFS)
2017-09-01
A proliferation of data from vast networks of remote sensing platforms (satellites, unmanned aircraft systems (UAS), airborne etc.), observational facilities (meteorological, eddy covariance etc.), state-of-the-art sensors, and simulation models offer unprecedented opportunities for scientific discovery. Unsupervised classification is a widely applied data mining approach to derive insights from such data. However, classification of very large data sets is a complex computational problem that requires efficient numerical algorithms and implementations on high performance computing (HPC) platforms. Additionally, increasing power, space, cooling and efficiency requirements has led to the deployment of hybrid supercomputing platforms with complex architectures and memory hierarchies like the Titan system at Oak Ridge National Laboratory. The advent of such accelerated computing architectures offers new challenges and opportunities for big data analytics in general and specifically, large scale cluster analysis in our case. Although there is an existing body of work on parallel cluster analysis, those approaches do not fully meet the needs imposed by the nature and size of our large data sets. Moreover, they had scaling limitations and were mostly limited to traditional distributed memory computing platforms. We present a parallel Multivariate Spatio-Temporal Clustering (MSTC) technique based on k-means cluster analysis that can target hybrid supercomputers like Titan. We developed a hybrid MPI, CUDA and OpenACC implementation that can utilize both CPU and GPU resources on computational nodes. We describe performance results on Titan that demonstrate the scalability and efficacy of our approach in processing large ecological data sets.
Stabilization Algorithms for Large-Scale Problems
DEFF Research Database (Denmark)
Jensen, Toke Koldborg
2006-01-01
The focus of the project is on stabilization of large-scale inverse problems where structured models and iterative algorithms are necessary for computing approximate solutions. For this purpose, we study various iterative Krylov methods and their abilities to produce regularized solutions. Some......-curve. This heuristic is implemented as a part of a larger algorithm which is developed in collaboration with G. Rodriguez and P. C. Hansen. Last, but not least, a large part of the project has, in different ways, revolved around the object-oriented Matlab toolbox MOORe Tools developed by PhD Michael Jacobsen. New...
Parallel time domain solvers for electrically large transient scattering problems
Liu, Yang; Yucel, Abdulkadir; Bagcý , Hakan; Michielssen, Eric
2014-01-01
scattering from perfect electrically conducting objects are obtained by enforcing electric field boundary conditions and implicitly time advance electric surface current densities by iteratively solving sparse systems of equations at all time steps. Contrary
Directory of Open Access Journals (Sweden)
Ricardo Soto
2016-01-01
Full Text Available The Machine-Part Cell Formation Problem (MPCFP is a NP-Hard optimization problem that consists in grouping machines and parts in a set of cells, so that each cell can operate independently and the intercell movements are minimized. This problem has largely been tackled in the literature by using different techniques ranging from classic methods such as linear programming to more modern nature-inspired metaheuristics. In this paper, we present an efficient parallel version of the Migrating Birds Optimization metaheuristic for solving the MPCFP. Migrating Birds Optimization is a population metaheuristic based on the V-Flight formation of the migrating birds, which is proven to be an effective formation in energy saving. This approach is enhanced by the smart incorporation of parallel procedures that notably improve performance of the several sorting processes performed by the metaheuristic. We perform computational experiments on 1080 benchmarks resulting from the combination of 90 well-known MPCFP instances with 12 sorting configurations with and without threads. We illustrate promising results where the proposal is able to reach the global optimum in all instances, while the solving time with respect to a nonparallel approach is notably reduced.
Parallel Solution of Robust Nonlinear Model Predictive Control Problems in Batch Crystallization
Directory of Open Access Journals (Sweden)
Yankai Cao
2016-06-01
Full Text Available Representing the uncertainties with a set of scenarios, the optimization problem resulting from a robust nonlinear model predictive control (NMPC strategy at each sampling instance can be viewed as a large-scale stochastic program. This paper solves these optimization problems using the parallel Schur complement method developed to solve stochastic programs on distributed and shared memory machines. The control strategy is illustrated with a case study of a multidimensional unseeded batch crystallization process. For this application, a robust NMPC based on min–max optimization guarantees satisfaction of all state and input constraints for a set of uncertainty realizations, and also provides better robust performance compared with open-loop optimal control, nominal NMPC, and robust NMPC minimizing the expected performance at each sampling instance. The performance of robust NMPC can be improved by generating optimization scenarios using Bayesian inference. With the efficient parallel solver, the solution time of one optimization problem is reduced from 6.7 min to 0.5 min, allowing for real-time application.
Enabling parallel simulation of large-scale HPC network systems
International Nuclear Information System (INIS)
Mubarak, Misbah; Carothers, Christopher D.; Ross, Robert B.; Carns, Philip
2016-01-01
Here, with the increasing complexity of today’s high-performance computing (HPC) architectures, simulation has become an indispensable tool for exploring the design space of HPC systems—in particular, networks. In order to make effective design decisions, simulations of these systems must possess the following properties: (1) have high accuracy and fidelity, (2) produce results in a timely manner, and (3) be able to analyze a broad range of network workloads. Most state-of-the-art HPC network simulation frameworks, however, are constrained in one or more of these areas. In this work, we present a simulation framework for modeling two important classes of networks used in today’s IBM and Cray supercomputers: torus and dragonfly networks. We use the Co-Design of Multi-layer Exascale Storage Architecture (CODES) simulation framework to simulate these network topologies at a flit-level detail using the Rensselaer Optimistic Simulation System (ROSS) for parallel discrete-event simulation. Our simulation framework meets all the requirements of a practical network simulation and can assist network designers in design space exploration. First, it uses validated and detailed flit-level network models to provide an accurate and high-fidelity network simulation. Second, instead of relying on serial time-stepped or traditional conservative discrete-event simulations that limit simulation scalability and efficiency, we use the optimistic event-scheduling capability of ROSS to achieve efficient and scalable HPC network simulations on today’s high-performance cluster systems. Third, our models give network designers a choice in simulating a broad range of network workloads, including HPC application workloads using detailed network traces, an ability that is rarely offered in parallel with high-fidelity network simulations
Bayesian nonlinear regression for large small problems
Chakraborty, Sounak; Ghosh, Malay; Mallick, Bani K.
2012-01-01
Statistical modeling and inference problems with sample sizes substantially smaller than the number of available covariates are challenging. This is known as large p small n problem. Furthermore, the problem is more complicated when we have multiple correlated responses. We develop multivariate nonlinear regression models in this setup for accurate prediction. In this paper, we introduce a full Bayesian support vector regression model with Vapnik's ε-insensitive loss function, based on reproducing kernel Hilbert spaces (RKHS) under the multivariate correlated response setup. This provides a full probabilistic description of support vector machine (SVM) rather than an algorithm for fitting purposes. We have also introduced a multivariate version of the relevance vector machine (RVM). Instead of the original treatment of the RVM relying on the use of type II maximum likelihood estimates of the hyper-parameters, we put a prior on the hyper-parameters and use Markov chain Monte Carlo technique for computation. We have also proposed an empirical Bayes method for our RVM and SVM. Our methods are illustrated with a prediction problem in the near-infrared (NIR) spectroscopy. A simulation study is also undertaken to check the prediction accuracy of our models. © 2012 Elsevier Inc.
Bayesian nonlinear regression for large small problems
Chakraborty, Sounak
2012-07-01
Statistical modeling and inference problems with sample sizes substantially smaller than the number of available covariates are challenging. This is known as large p small n problem. Furthermore, the problem is more complicated when we have multiple correlated responses. We develop multivariate nonlinear regression models in this setup for accurate prediction. In this paper, we introduce a full Bayesian support vector regression model with Vapnik\\'s ε-insensitive loss function, based on reproducing kernel Hilbert spaces (RKHS) under the multivariate correlated response setup. This provides a full probabilistic description of support vector machine (SVM) rather than an algorithm for fitting purposes. We have also introduced a multivariate version of the relevance vector machine (RVM). Instead of the original treatment of the RVM relying on the use of type II maximum likelihood estimates of the hyper-parameters, we put a prior on the hyper-parameters and use Markov chain Monte Carlo technique for computation. We have also proposed an empirical Bayes method for our RVM and SVM. Our methods are illustrated with a prediction problem in the near-infrared (NIR) spectroscopy. A simulation study is also undertaken to check the prediction accuracy of our models. © 2012 Elsevier Inc.
Parallel RFSAI-BFGS Preconditioners for Large Symmetric Eigenproblems
Directory of Open Access Journals (Sweden)
L. Bergamaschi
2013-01-01
the linearized Newton system to solve Au=q(uu, q(u being the Rayleigh quotient. In a previous work (Bergamaschi and Martínez, 2013 the sequence of preconditioned Jacobians is proven to remain close to the identity matrix if the initial preconditioned Jacobian is so. Numerical results onto matrices arising from various realistic problems with size up to 1.5 million unknowns account for the efficiency and the scalability of the proposed low rank update of the RFSAI preconditioner. The overall RFSAI-BFGS preconditioned Newton algorithm has shown comparable efficiencies with a well-established eigenvalue solver on all the test problems.
Computational approach to large quantum dynamical problems
International Nuclear Information System (INIS)
Friesner, R.A.; Brunet, J.P.; Wyatt, R.E.; Leforestier, C.; Binkley, S.
1987-01-01
The organizational structure is described for a new program that permits computations on a variety of quantum mechanical problems in chemical dynamics and spectroscopy. Particular attention is devoted to developing and using algorithms that exploit the capabilities of current vector supercomputers. A key component in this procedure is the recursive transformation of the large sparse Hamiltonian matrix into a much smaller tridiagonal matrix. An application to time-dependent laser molecule energy transfer is presented. Rate of energy deposition in the multimode molecule for systematic variations in the molecular intermode coupling parameters is emphasized
Large-Scale Parallel Viscous Flow Computations using an Unstructured Multigrid Algorithm
Mavriplis, Dimitri J.
1999-01-01
The development and testing of a parallel unstructured agglomeration multigrid algorithm for steady-state aerodynamic flows is discussed. The agglomeration multigrid strategy uses a graph algorithm to construct the coarse multigrid levels from the given fine grid, similar to an algebraic multigrid approach, but operates directly on the non-linear system using the FAS (Full Approximation Scheme) approach. The scalability and convergence rate of the multigrid algorithm are examined on the SGI Origin 2000 and the Cray T3E. An argument is given which indicates that the asymptotic scalability of the multigrid algorithm should be similar to that of its underlying single grid smoothing scheme. For medium size problems involving several million grid points, near perfect scalability is obtained for the single grid algorithm, while only a slight drop-off in parallel efficiency is observed for the multigrid V- and W-cycles, using up to 128 processors on the SGI Origin 2000, and up to 512 processors on the Cray T3E. For a large problem using 25 million grid points, good scalability is observed for the multigrid algorithm using up to 1450 processors on a Cray T3E, even when the coarsest grid level contains fewer points than the total number of processors.
Control problems in very large accelerators
International Nuclear Information System (INIS)
Crowley-Milling, M.C.
1985-06-01
There is no fundamental difference of kind in the control requirements between a small and a large accelerator since they are built of the same types of components, which individually have similar control inputs and outputs. The main difference is one of scale; the large machine has many more components of each type, and the distances involved are much greater. Both of these factors must be taken into account in determining the optimum way of carrying out the control functions. Small machines should use standard equipment and software for control as much as possible, as special developments for small quantities cannot normally be justified if all costs are taken into account. On the other hand, the very great number of devices needed for a large machine means that, if special developments can result in simplification, they may make possible an appreciable reduction in the control equipment costs. It is the purpose of this report to look at the special control problems of large accelerators, which the author shall arbitarily define as those with a length of circumference in excess of 10 km, and point out where special developments, or the adoption of developments from outside the accelerator control field, can be of assistance in minimizing the cost of the control system. Most of the first part of this report was presented as a paper to the 1985 Particle Accelerator Conference. It has now been extended to include a discussion on the special case of the controls for the SSC
Parallel algorithms for 2-D cylindrical transport equations of Eigenvalue problem
International Nuclear Information System (INIS)
Wei, J.; Yang, S.
2013-01-01
In this paper, aimed at the neutron transport equations of eigenvalue problem under 2-D cylindrical geometry on unstructured grid, the discrete scheme of Sn discrete ordinate and discontinuous finite is built, and the parallel computation for the scheme is realized on MPI systems. Numerical experiments indicate that the designed parallel algorithm can reach perfect speedup, it has good practicality and scalability. (authors)
Solving the Flood Propagation Problem with Newton Algorithm on Parallel Systems
Directory of Open Access Journals (Sweden)
Chefi Triki
2012-04-01
Full Text Available In this paper we propose a parallel implementation for the flood propagation method Flo2DH. The model is built on a finite element spatial approximation combined with a Newton algorithm that uses a direct LU linear solver. The parallel implementation has been developed by using the standard MPI protocol and has been tested on a set of real world problems.
Concurrent Programming Using Actors: Exploiting Large-Scale Parallelism,
1985-10-07
ORGANIZATION NAME AND ADDRESS 10. PROGRAM ELEMENT. PROJECT. TASK* Artificial Inteligence Laboratory AREA Is WORK UNIT NUMBERS 545 Technology Square...D-R162 422 CONCURRENT PROGRMMIZNG USING f"OS XL?ITP TEH l’ LARGE-SCALE PARALLELISH(U) NASI AC E Al CAMBRIDGE ARTIFICIAL INTELLIGENCE L. G AGHA ET AL...RESOLUTION TEST CHART N~ATIONAL BUREAU OF STANDA.RDS - -96 A -E. __ _ __ __’ .,*- - -- •. - MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL
Penas, David R; González, Patricia; Egea, Jose A; Doallo, Ramón; Banga, Julio R
2017-01-21
The development of large-scale kinetic models is one of the current key issues in computational systems biology and bioinformatics. Here we consider the problem of parameter estimation in nonlinear dynamic models. Global optimization methods can be used to solve this type of problems but the associated computational cost is very large. Moreover, many of these methods need the tuning of a number of adjustable search parameters, requiring a number of initial exploratory runs and therefore further increasing the computation times. Here we present a novel parallel method, self-adaptive cooperative enhanced scatter search (saCeSS), to accelerate the solution of this class of problems. The method is based on the scatter search optimization metaheuristic and incorporates several key new mechanisms: (i) asynchronous cooperation between parallel processes, (ii) coarse and fine-grained parallelism, and (iii) self-tuning strategies. The performance and robustness of saCeSS is illustrated by solving a set of challenging parameter estimation problems, including medium and large-scale kinetic models of the bacterium E. coli, bakerés yeast S. cerevisiae, the vinegar fly D. melanogaster, Chinese Hamster Ovary cells, and a generic signal transduction network. The results consistently show that saCeSS is a robust and efficient method, allowing very significant reduction of computation times with respect to several previous state of the art methods (from days to minutes, in several cases) even when only a small number of processors is used. The new parallel cooperative method presented here allows the solution of medium and large scale parameter estimation problems in reasonable computation times and with small hardware requirements. Further, the method includes self-tuning mechanisms which facilitate its use by non-experts. We believe that this new method can play a key role in the development of large-scale and even whole-cell dynamic models.
Ordering schemes for parallel processing of certain mesh problems
International Nuclear Information System (INIS)
O'Leary, D.
1984-01-01
In this work, some ordering schemes for mesh points are presented which enable algorithms such as the Gauss-Seidel or SOR iteration to be performed efficiently for the nine-point operator finite difference method on computers consisting of a two-dimensional grid of processors. Convergence results are presented for the discretization of u /SUB xx/ + u /SUB yy/ on a uniform mesh over a square, showing that the spectral radius of the iteration for these orderings is no worse than that for the standard row by row ordering of mesh points. Further applications of these mesh point orderings to network problems, more general finite difference operators, and picture processing problems are noted
Parallel Index and Query for Large Scale Data Analysis
Energy Technology Data Exchange (ETDEWEB)
Chou, Jerry; Wu, Kesheng; Ruebel, Oliver; Howison, Mark; Qiang, Ji; Prabhat,; Austin, Brian; Bethel, E. Wes; Ryne, Rob D.; Shoshani, Arie
2011-07-18
Modern scientific datasets present numerous data management and analysis challenges. State-of-the-art index and query technologies are critical for facilitating interactive exploration of large datasets, but numerous challenges remain in terms of designing a system for process- ing general scientific datasets. The system needs to be able to run on distributed multi-core platforms, efficiently utilize underlying I/O infrastructure, and scale to massive datasets. We present FastQuery, a novel software framework that address these challenges. FastQuery utilizes a state-of-the-art index and query technology (FastBit) and is designed to process mas- sive datasets on modern supercomputing platforms. We apply FastQuery to processing of a massive 50TB dataset generated by a large scale accelerator modeling code. We demonstrate the scalability of the tool to 11,520 cores. Motivated by the scientific need to search for inter- esting particles in this dataset, we use our framework to reduce search time from hours to tens of seconds.
Improved Parallel Three-List Algorithm for the Knapsack Problem without Memory Conflicts
Institute of Scientific and Technical Information of China (English)
Pan Jun; Li Kenli; Li Qinghua
2006-01-01
Based on the two-list algorithm and the parallel three-list algorithm, an improved parallel three-list algorithm for knapsack problem is proposed, in which the method of divide and conquer, and parallel merging without memory conflicts are adopted. To find a solution for the n-element knapsack problem, the proposed algorithm needs O(23n/8) time when O(23n/8) shared memory units and O(2n/4) processors are available. The comparisons between the proposed algorithm and 10 existing algorithms show that the improved parallel three-list algorithm is the first exclusive-read exclusive-write (EREW) parallel algorithm that can solve the knapsack instances in less than O(2n/2) time when the available hardware resource is smaller than O(2n/2), and hence is an improved result over the past researches.
Mizan: Optimizing Graph Mining in Large Parallel Systems
Kalnis, Panos
2012-03-01
Extracting information from graphs, from nding shortest paths to complex graph mining, is essential for many ap- plications. Due to the shear size of modern graphs (e.g., social networks), processing must be done on large paral- lel computing infrastructures (e.g., the cloud). Earlier ap- proaches relied on the MapReduce framework, which was proved inadequate for graph algorithms. More recently, the message passing model (e.g., Pregel) has emerged. Although the Pregel model has many advantages, it is agnostic to the graph properties and the architecture of the underlying com- puting infrastructure, leading to suboptimal performance. In this paper, we propose Mizan, a layer between the users\\' code and the computing infrastructure. Mizan considers the structure of the input graph and the architecture of the in- frastructure in order to: (i) decide whether it is bene cial to generate a near-optimal partitioning of the graph in a pre- processing step, and (ii) choose between typical point-to- point message passing and a novel approach that puts com- puting nodes in a virtual overlay ring. We deployed Mizan on a small local Linux cluster, on the cloud (256 virtual machines in Amazon EC2), and on an IBM Blue Gene/P supercomputer (1024 CPUs). We show that Mizan executes common algorithms on very large graphs 1-2 orders of mag- nitude faster than MapReduce-based implementations and up to one order of magnitude faster than implementations relying on Pregel-like hash-based graph partitioning.
Robust Parallel Machine Scheduling Problem with Uncertainties and Sequence-Dependent Setup Time
Directory of Open Access Journals (Sweden)
Hongtao Hu
2016-01-01
Full Text Available A parallel machine scheduling problem in plastic production is studied in this paper. In this problem, the processing time and arrival time are uncertain but lie in their respective intervals. In addition, each job must be processed together with a mold while jobs which belong to one family can share the same mold. Therefore, time changing mold is required for two consecutive jobs that belong to different families, which is known as sequence-dependent setup time. This paper aims to identify a robust schedule by min–max regret criterion. It is proved that the scenario incurring maximal regret for each feasible solution lies in finite extreme scenarios. A mixed integer linear programming formulation and an exact algorithm are proposed to solve the problem. Moreover, a modified artificial bee colony algorithm is developed to solve large-scale problems. The performance of the presented algorithm is evaluated through extensive computational experiments and the results show that the proposed algorithm surpasses the exact method in terms of objective value and computational time.
Hierarchical approach to optimization of parallel matrix multiplication on large-scale platforms
Hasanov, Khalid
2014-03-04
© 2014, Springer Science+Business Media New York. Many state-of-the-art parallel algorithms, which are widely used in scientific applications executed on high-end computing systems, were designed in the twentieth century with relatively small-scale parallelism in mind. Indeed, while in 1990s a system with few hundred cores was considered a powerful supercomputer, modern top supercomputers have millions of cores. In this paper, we present a hierarchical approach to optimization of message-passing parallel algorithms for execution on large-scale distributed-memory systems. The idea is to reduce the communication cost by introducing hierarchy and hence more parallelism in the communication scheme. We apply this approach to SUMMA, the state-of-the-art parallel algorithm for matrix–matrix multiplication, and demonstrate both theoretically and experimentally that the modified Hierarchical SUMMA significantly improves the communication cost and the overall performance on large-scale platforms.
Massively parallel Monte Carlo. Experiences running nuclear simulations on a large condor cluster
International Nuclear Information System (INIS)
Tickner, James; O'Dwyer, Joel; Roach, Greg; Uher, Josef; Hitchen, Greg
2010-01-01
The trivially-parallel nature of Monte Carlo (MC) simulations make them ideally suited for running on a distributed, heterogeneous computing environment. We report on the setup and operation of a large, cycle-harvesting Condor computer cluster, used to run MC simulations of nuclear instruments ('jobs') on approximately 4,500 desktop PCs. Successful operation must balance the competing goals of maximizing the availability of machines for running jobs whilst minimizing the impact on users' PC performance. This requires classification of jobs according to anticipated run-time and priority and careful optimization of the parameters used to control job allocation to host machines. To maximize use of a large Condor cluster, we have created a powerful suite of tools to handle job submission and analysis, as the manual creation, submission and evaluation of large numbers (hundred to thousands) of jobs would be too arduous. We describe some of the key aspects of this suite, which has been interfaced to the well-known MCNP and EGSnrc nuclear codes and our in-house PHOTON optical MC code. We report on our practical experiences of operating our Condor cluster and present examples of several large-scale instrument design problems that have been solved using this tool. (author)
An Optimal Parallel Algorithm for the Knapsack Problem Based on EREW
Institute of Scientific and Technical Information of China (English)
李肯立; 蒋盛益; 王卉; 李庆华
2003-01-01
A new parallel algorithm is proposed for the knapsack problem where the method of divide and conquer is adopted. Based on an EREW-SIMD machine with shared memory, the proposed algorithm utilizes O(2n/4)1-ε processors, 0≤ε≤1, and O(2n/2) memory to find a solution for the n-element knapsack problem in time O(2n/4(2n/4)ε). The cost of the proposed parallel algorithm is O(2n/2), which is an optimal method for solving the knapsack problem without memory conflicts and an improved result over the past researches.
Implicit solvers for large-scale nonlinear problems
International Nuclear Information System (INIS)
Keyes, David E; Reynolds, Daniel R; Woodward, Carol S
2006-01-01
Computational scientists are grappling with increasingly complex, multi-rate applications that couple such physical phenomena as fluid dynamics, electromagnetics, radiation transport, chemical and nuclear reactions, and wave and material propagation in inhomogeneous media. Parallel computers with large storage capacities are paving the way for high-resolution simulations of coupled problems; however, hardware improvements alone will not prove enough to enable simulations based on brute-force algorithmic approaches. To accurately capture nonlinear couplings between dynamically relevant phenomena, often while stepping over rapid adjustments to quasi-equilibria, simulation scientists are increasingly turning to implicit formulations that require a discrete nonlinear system to be solved for each time step or steady state solution. Recent advances in iterative methods have made fully implicit formulations a viable option for solution of these large-scale problems. In this paper, we overview one of the most effective iterative methods, Newton-Krylov, for nonlinear systems and point to software packages with its implementation. We illustrate the method with an example from magnetically confined plasma fusion and briefly survey other areas in which implicit methods have bestowed important advantages, such as allowing high-order temporal integration and providing a pathway to sensitivity analyses and optimization. Lastly, we overview algorithm extensions under development motivated by current SciDAC applications
Directory of Open Access Journals (Sweden)
Mehiddin Al-Baali
2015-12-01
Full Text Available We deal with the design of parallel algorithms by using variable partitioning techniques to solve nonlinear optimization problems. We propose an iterative solution method that is very efficient for separable functions, our scope being to discuss its performance for general functions. Experimental results on an illustrative example have suggested some useful modifications that, even though they improve the efficiency of our parallel method, leave some questions open for further investigation.
On а Recursive-Parallel Algorithm for Solving the Knapsack Problem
Directory of Open Access Journals (Sweden)
Vladimir V. Vasilchikov
2018-01-01
Full Text Available In this paper, we offer an efficient parallel algorithm for solving the NP-complete Knapsack Problem in its basic, so-called 0-1 variant. To find its exact solution, algorithms belonging to the category ”branch and bound methods” have long been used. To speed up the solving with varying degrees of efficiency, various options for parallelizing computations are also used. We propose here an algorithm for solving the problem, based on the paradigm of recursive-parallel computations. We consider it suited well for problems of this kind, when it is difficult to immediately break up the computations into a sufficient number of subtasks that are comparable in complexity, since they appear dynamically at run time. We used the RPM ParLib library, developed by the author, as the main tool to program the algorithm. This library allows us to develop effective applications for parallel computing on a local network in the .NET Framework. Such applications have the ability to generate parallel branches of computation directly during program execution and dynamically redistribute work between computing modules. Any language with support for the .NET Framework can be used as a programming language in conjunction with this library. For our experiments, we developed some C# applications using this library. The main purpose of these experiments was to study the acceleration achieved by recursive-parallel computing. A detailed description of the algorithm and its testing, as well as the results obtained, are also given in the paper.
DEFF Research Database (Denmark)
Cao, Bin; Zhao, Jianwei; Yang, Po
2018-01-01
-objective evolutionary algorithms the Cooperative Coevolutionary Generalized Differential Evolution 3, the Cooperative Multi-objective Differential Evolution and the Nondominated Sorting Genetic Algorithm III, the proposed algorithm addresses the deployment optimization problem efficiently and effectively.......Using immune algorithms is generally a time-intensive process especially for problems with a large number of variables. In this paper, we propose a distributed parallel cooperative coevolutionary multi-objective large-scale immune algorithm that is implemented using the message passing interface...... (MPI). The proposed algorithm is composed of three layers: objective, group and individual layers. First, for each objective in the multi-objective problem to be addressed, a subpopulation is used for optimization, and an archive population is used to optimize all the objectives. Second, the large...
Optimal parallel algorithms for problems modeled by a family of intervals
Olariu, Stephan; Schwing, James L.; Zhang, Jingyuan
1992-01-01
A family of intervals on the real line provides a natural model for a vast number of scheduling and VLSI problems. Recently, a number of parallel algorithms to solve a variety of practical problems on such a family of intervals have been proposed in the literature. Computational tools are developed, and it is shown how they can be used for the purpose of devising cost-optimal parallel algorithms for a number of interval-related problems including finding a largest subset of pairwise nonoverlapping intervals, a minimum dominating subset of intervals, along with algorithms to compute the shortest path between a pair of intervals and, based on the shortest path, a parallel algorithm to find the center of the family of intervals. More precisely, with an arbitrary family of n intervals as input, all algorithms run in O(log n) time using O(n) processors in the EREW-PRAM model of computation.
Directory of Open Access Journals (Sweden)
Lorenzo L. Pesce
2013-01-01
Full Text Available Our limited understanding of the relationship between the behavior of individual neurons and large neuronal networks is an important limitation in current epilepsy research and may be one of the main causes of our inadequate ability to treat it. Addressing this problem directly via experiments is impossibly complex; thus, we have been developing and studying medium-large-scale simulations of detailed neuronal networks to guide us. Flexibility in the connection schemas and a complete description of the cortical tissue seem necessary for this purpose. In this paper we examine some of the basic issues encountered in these multiscale simulations. We have determined the detailed behavior of two such simulators on parallel computer systems. The observed memory and computation-time scaling behavior for a distributed memory implementation were very good over the range studied, both in terms of network sizes (2,000 to 400,000 neurons and processor pool sizes (1 to 256 processors. Our simulations required between a few megabytes and about 150 gigabytes of RAM and lasted between a few minutes and about a week, well within the capability of most multinode clusters. Therefore, simulations of epileptic seizures on networks with millions of cells should be feasible on current supercomputers.
Pesce, Lorenzo L; Lee, Hyong C; Hereld, Mark; Visser, Sid; Stevens, Rick L; Wildeman, Albert; van Drongelen, Wim
2013-01-01
Our limited understanding of the relationship between the behavior of individual neurons and large neuronal networks is an important limitation in current epilepsy research and may be one of the main causes of our inadequate ability to treat it. Addressing this problem directly via experiments is impossibly complex; thus, we have been developing and studying medium-large-scale simulations of detailed neuronal networks to guide us. Flexibility in the connection schemas and a complete description of the cortical tissue seem necessary for this purpose. In this paper we examine some of the basic issues encountered in these multiscale simulations. We have determined the detailed behavior of two such simulators on parallel computer systems. The observed memory and computation-time scaling behavior for a distributed memory implementation were very good over the range studied, both in terms of network sizes (2,000 to 400,000 neurons) and processor pool sizes (1 to 256 processors). Our simulations required between a few megabytes and about 150 gigabytes of RAM and lasted between a few minutes and about a week, well within the capability of most multinode clusters. Therefore, simulations of epileptic seizures on networks with millions of cells should be feasible on current supercomputers.
Amallynda, I.; Santosa, B.
2017-11-01
This paper proposes a new generalization of the distributed parallel machine and assembly scheduling problem (DPMASP) with eligibility constraints referred to as the modified distributed parallel machine and assembly scheduling problem (MDPMASP) with eligibility constraints. Within this generalization, we assume that there are a set non-identical factories or production lines, each one with a set unrelated parallel machine with different speeds in processing them disposed to a single assembly machine in series. A set of different products that are manufactured through an assembly program of a set of components (jobs) according to the requested demand. Each product requires several kinds of jobs with different sizes. Beside that we also consider to the multi-objective problem (MOP) of minimizing mean flow time and the number of tardy products simultaneously. This is known to be NP-Hard problem, is important to practice, as the former criterions to reflect the customer's demand and manufacturer's perspective. This is a realistic and complex problem with wide range of possible solutions, we propose four simple heuristics and two metaheuristics to solve it. Various parameters of the proposed metaheuristic algorithms are discussed and calibrated by means of Taguchi technique. All proposed algorithms are tested by Matlab software. Our computational experiments indicate that the proposed problem and fourth proposed algorithms are able to be implemented and can be used to solve moderately-sized instances, and giving efficient solutions, which are close to optimum in most cases.
Literature Review on the Hybrid Flow Shop Scheduling Problem with Unrelated Parallel Machines
Directory of Open Access Journals (Sweden)
Eliana Marcela Peña Tibaduiza
2017-01-01
Full Text Available Context: The flow shop hybrid problem with unrelated parallel machines has been less studied in the academia compared to the flow shop hybrid with identical processors. For this reason, there are few reports about the kind of application of this problem in industries. Method: A literature review of the state of the art on flow-shop scheduling problem was conducted by collecting and analyzing academic papers on several scientific databases. For this aim, a search query was constructed using keywords defining the problem and checking the inclusion of unrelated parallel machines in such definition; as a result, 50 papers were finally selected for this study. Results: A classification of the problem according to the characteristics of the production system was performed, also solution methods, constraints and objective functions commonly used are presented. Conclusions: An increasing trend is observed in studies of flow shop with multiple stages, but few are based on industry case-studies.
Random number generators for large-scale parallel Monte Carlo simulations on FPGA
Lin, Y.; Wang, F.; Liu, B.
2018-05-01
Through parallelization, field programmable gate array (FPGA) can achieve unprecedented speeds in large-scale parallel Monte Carlo (LPMC) simulations. FPGA presents both new constraints and new opportunities for the implementations of random number generators (RNGs), which are key elements of any Monte Carlo (MC) simulation system. Using empirical and application based tests, this study evaluates all of the four RNGs used in previous FPGA based MC studies and newly proposed FPGA implementations for two well-known high-quality RNGs that are suitable for LPMC studies on FPGA. One of the newly proposed FPGA implementations: a parallel version of additive lagged Fibonacci generator (Parallel ALFG) is found to be the best among the evaluated RNGs in fulfilling the needs of LPMC simulations on FPGA.
Parallel Motion Simulation of Large-Scale Real-Time Crowd in a Hierarchical Environmental Model
Directory of Open Access Journals (Sweden)
Xin Wang
2012-01-01
Full Text Available This paper presents a parallel real-time crowd simulation method based on a hierarchical environmental model. A dynamical model of the complex environment should be constructed to simulate the state transition and propagation of individual motions. By modeling of a virtual environment where virtual crowds reside, we employ different parallel methods on a topological layer, a path layer and a perceptual layer. We propose a parallel motion path matching method based on the path layer and a parallel crowd simulation method based on the perceptual layer. The large-scale real-time crowd simulation becomes possible with these methods. Numerical experiments are carried out to demonstrate the methods and results.
Parallel Implementation of the Multi-Dimensional Spectral Code SPECT3D on large 3D grids.
Golovkin, Igor E.; Macfarlane, Joseph J.; Woodruff, Pamela R.; Pereyra, Nicolas A.
2006-10-01
The multi-dimensional collisional-radiative, spectral analysis code SPECT3D can be used to study radiation from complex plasmas. SPECT3D can generate instantaneous and time-gated images and spectra, space-resolved and streaked spectra, which makes it a valuable tool for post-processing hydrodynamics calculations and direct comparison between simulations and experimental data. On large three dimensional grids, transporting radiation along lines of sight (LOS) requires substantial memory and CPU resources. Currently, the parallel option in SPECT3D is based on parallelization over photon frequencies and allows for a nearly linear speed-up for a variety of problems. In addition, we are introducing a new parallel mechanism that will greatly reduce memory requirements. In the new implementation, spatial domain decomposition will be utilized allowing transport along a LOS to be performed only on the mesh cells the LOS crosses. The ability to operate on a fraction of the grid is crucial for post-processing the results of large-scale three-dimensional hydrodynamics simulations. We will present a parallel implementation of the code and provide a scalability study performed on a Linux cluster.
On the Optimization and Parallelizing Little Algorithm for Solving the Traveling Salesman Problem
Directory of Open Access Journals (Sweden)
V. V. Vasilchikov
2016-01-01
Full Text Available The paper describes some ways to accelerate solving the NP-complete Traveling Salesman Problem. The classic Little algorithm belonging to the category of ”branch and bound methods” can solve it both for directed and undirected graphs. However, for undirected graphs its operation can be accelerated by eliminating the consideration of branches examined earlier. The paper proposes changes to be made in the key operations of the algorithm to speed up its execution. It also describes the results of an experiment that demonstrated a significant acceleration of solving the problem by using an advanced algorithm. Another way to speed up the work is to parallelize the algorithm. For problems of this kind it is difficult to break the task into a sufficient number of subtasks having comparable complexity. Their parallelism arises dynamically during the execution. For such problems, it seems reasonable to use parallel-recursive algorithms. In our case the use of the library RPM ParLib developed by the author was a good choice. It allows us to develop effective applications for parallel computing on a local network using any .NET-compatible programming language. We used C# to develop the programs. Parallel applications were developed as for basic and modified algorithms, the comparing of their speed was made. Experiments were performed for the graphs with the number of vertexes up to 45 and with the number of network computers up to 16. We also investigated the acceleration that can be achieved by parallelizing the basic Little algorithm for directed graphs. The results of these experiments are also presented in the paper.
Czech Academy of Sciences Publication Activity Database
Cullum, J. K.; Johnson, K.; Tůma, Miroslav
2003-01-01
Roč. 10, - (2003), s. 445-465 ISSN 1070-5325 R&D Projects: GA ČR GA201/02/0595; GA AV ČR IAA1030103 Institutional research plan: CEZ:AV0Z1030915 Keywords : parallel algorithms * graph partitioning * problem decomposition * rate of convergence Subject RIV: BA - General Mathematics Impact factor: 1.042, year: 2003
Parallel patterns determination in solving cyclic flow shop problem with setups
Directory of Open Access Journals (Sweden)
Bożejko Wojciech
2017-06-01
Full Text Available The subject of this work is the new idea of blocks for the cyclic flow shop problem with setup times, using multiple patterns with different sizes determined for each machine constituting optimal schedule of cities for the traveling salesman problem (TSP. We propose to take advantage of the Intel Xeon Phi parallel computing environment during so-called ’blocks’ determination basing on patterns, in effect significantly improving the quality of obtained results.
Mathematical Problems in Creating Large Astronomical Catalogs
Directory of Open Access Journals (Sweden)
Prokhorov M. E.
2016-12-01
Full Text Available The next stage after performing observations and their primary reduction is to transform the set of observations into a catalog. To this end, objects that are irrelevant to the catalog should be excluded from observations and gross errors should be discarded. To transform such a prepared data set into a high-precision catalog, we need to identify and correct systematic errors. Therefore, each object of the survey should be observed several, preferably many, times. The problem formally reduces to solving an overdetermined set of equations. However, in the case of catalogs this system of equations has a very specific form: it is extremely sparse, and its sparseness increases rapidly with the number of objects in the catalog. Such equation systems require special methods for storing data on disks and in RAM, and for the choice of the techniques for their solving. Another specific feature of such systems is their high “stiffiness”, which also increases with the volume of a catalog. Special stable mathematical methods should be used in order not to lose precision when solving such systems of equations. We illustrate the problem by the example of photometric star catalogs, although similar problems arise in the case of positional, radial-velocity, and parallax catalogs.
Large-Scale, Parallel, Multi-Sensor Atmospheric Data Fusion Using Cloud Computing
Wilson, B. D.; Manipon, G.; Hua, H.; Fetzer, E. J.
2013-12-01
NASA's Earth Observing System (EOS) is an ambitious facility for studying global climate change. The mandate now is to combine measurements from the instruments on the 'A-Train' platforms (AIRS, AMSR-E, MODIS, MISR, MLS, and CloudSat) and other Earth probes to enable large-scale studies of climate change over decades. Moving to multi-sensor, long-duration analyses of important climate variables presents serious challenges for large-scale data mining and fusion. For example, one might want to compare temperature and water vapor retrievals from one instrument (AIRS) to another (MODIS), and to a model (MERRA), stratify the comparisons using a classification of the 'cloud scenes' from CloudSat, and repeat the entire analysis over 10 years of data. To efficiently assemble such datasets, we are utilizing Elastic Computing in the Cloud and parallel map/reduce-based algorithms. However, these problems are Data Intensive computing so the data transfer times and storage costs (for caching) are key issues. SciReduce is a Hadoop-like parallel analysis system, programmed in parallel python, that is designed from the ground up for Earth science. SciReduce executes inside VMWare images and scales to any number of nodes in the Cloud. Unlike Hadoop, SciReduce operates on bundles of named numeric arrays, which can be passed in memory or serialized to disk in netCDF4 or HDF5. Figure 1 shows the architecture of the full computational system, with SciReduce at the core. Multi-year datasets are automatically 'sharded' by time and space across a cluster of nodes so that years of data (millions of files) can be processed in a massively parallel way. Input variables (arrays) are pulled on-demand into the Cloud using OPeNDAP URLs or other subsetting services, thereby minimizing the size of the cached input and intermediate datasets. We are using SciReduce to automate the production of multiple versions of a ten-year A-Train water vapor climatology under a NASA MEASURES grant. We will
Parallel genetic algorithms with migration for the hybrid flow shop scheduling problem
Directory of Open Access Journals (Sweden)
K. Belkadi
2006-01-01
Full Text Available This paper addresses scheduling problems in hybrid flow shop-like systems with a migration parallel genetic algorithm (PGA_MIG. This parallel genetic algorithm model allows genetic diversity by the application of selection and reproduction mechanisms nearer to nature. The space structure of the population is modified by dividing it into disjoined subpopulations. From time to time, individuals are exchanged between the different subpopulations (migration. Influence of parameters and dedicated strategies are studied. These parameters are the number of independent subpopulations, the interconnection topology between subpopulations, the choice/replacement strategy of the migrant individuals, and the migration frequency. A comparison between the sequential and parallel version of genetic algorithm (GA is provided. This comparison relates to the quality of the solution and the execution time of the two versions. The efficiency of the parallel model highly depends on the parameters and especially on the migration frequency. In the same way this parallel model gives a significant improvement of computational time if it is implemented on a parallel architecture which offers an acceptable number of processors (as many processors as subpopulations.
Par@Graph - a parallel toolbox for the construction and analysis of large complex climate networks
Tantet, A.J.J.
2015-01-01
In this paper, we present Par@Graph, a software toolbox to reconstruct and analyze complex climate networks having a large number of nodes (up to at least 106) and edges (up to at least 1012). The key innovation is an efficient set of parallel software tools designed to leverage the inherited hybrid
Streaming Parallel GPU Acceleration of Large-Scale filter-based Spiking Neural Networks
L.P. Slazynski (Leszek); S.M. Bohte (Sander)
2012-01-01
htmlabstractThe arrival of graphics processing (GPU) cards suitable for massively parallel computing promises a↵ordable large-scale neural network simulation previously only available at supercomputing facil- ities. While the raw numbers suggest that GPUs may outperform CPUs by at least an order of
Ergul, Ozgur
2014-01-01
The Multilevel Fast Multipole Algorithm (MLFMA) for Solving Large-Scale Computational Electromagnetic Problems provides a detailed and instructional overview of implementing MLFMA. The book: Presents a comprehensive treatment of the MLFMA algorithm, including basic linear algebra concepts, recent developments on the parallel computation, and a number of application examplesCovers solutions of electromagnetic problems involving dielectric objects and perfectly-conducting objectsDiscusses applications including scattering from airborne targets, scattering from red
International Nuclear Information System (INIS)
Pereira, Claudio M.N.A.; Lapa, Celso M.F.
2003-01-01
This work extends the research related to generic algorithms (GA) in core design optimization problems, which basic investigations were presented in previous work. Here we explore the use of the Island Genetic Algorithm (IGA), a coarse-grained parallel GA model, comparing its performance to that obtained by the application of a traditional non-parallel GA. The optimization problem consists on adjusting several reactor cell parameters, such as dimensions, enrichment and materials, in order to minimize the average peak-factor in a 3-enrichment zone reactor, considering restrictions on the average thermal flux, criticality and sub-moderation. Our IGA implementation runs as a distributed application on a conventional local area network (LAN), avoiding the use of expensive parallel computers or architectures. After exhaustive experiments, taking more than 1500 h in 550 MHz personal computers, we have observed that the IGA provided gains not only in terms of computational time, but also in the optimization outcome. Besides, we have also realized that, for such kind of problem, which fitness evaluation is itself time consuming, the time overhead in the IGA, due to the communication in LANs, is practically imperceptible, leading to the conclusion that the use of expensive parallel computers or architecture can be avoided
International Nuclear Information System (INIS)
Kim, Heungseob; Kim, Pansoo
2017-01-01
To maximize the reliability of a system, the traditional reliability–redundancy allocation problem (RRAP) determines the component reliability and level of redundancy for each subsystem. This paper proposes an advanced RRAP that also considers the optimal redundancy strategy, either active or cold standby. In addition, new examples are presented for it. Furthermore, the exact reliability function for a cold standby redundant subsystem with an imperfect detector/switch is suggested, and is expected to replace the previous approximating model that has been used in most related studies. A parallel genetic algorithm for solving the RRAP as a mixed-integer nonlinear programming model is presented, and its performance is compared with those of previous studies by using numerical examples on three benchmark problems. - Highlights: • Optimal strategy is proposed to solve reliability redundancy allocation problem. • The redundancy strategy uses parallel genetic algorithm. • Improved reliability function for a cold standby subsystem is suggested. • Proposed redundancy strategy enhances the system reliability.
Wang, Zhaocai; Pu, Jun; Cao, Liling; Tan, Jian
2015-10-23
The unbalanced assignment problem (UAP) is to optimally resolve the problem of assigning n jobs to m individuals (m applied mathematics, having numerous real life applications. In this paper, we present a new parallel DNA algorithm for solving the unbalanced assignment problem using DNA molecular operations. We reasonably design flexible-length DNA strands representing different jobs and individuals, take appropriate steps, and get the solutions of the UAP in the proper length range and O(mn) time. We extend the application of DNA molecular operations and simultaneity to simplify the complexity of the computation.
International Nuclear Information System (INIS)
Mark Dennis Usang; Mohd Hairie Rabir; Mohd Amin Sharifuldin Salleh; Mohamad Puad Abu
2012-01-01
MPI parallelism are implemented on a SUN Workstation for running MCNPX and on the High Performance Computing Facility (HPC) for running MCNP5. 23 input less obtained from MCNP Criticality Validation Suite are utilized for the purpose of evaluating the amount of speed up achievable by using the parallel capabilities of MPI. More importantly, we will study the economics of using more processors and the type of problem where the performance gain are obvious. This is important to enable better practices of resource sharing especially for the HPC facilities processing time. Future endeavours in this direction might even reveal clues for best MCNP5/ MCNPX coding practices for optimum performance of MPI parallelisms. (author)
MapReduce Based Parallel Neural Networks in Enabling Large Scale Machine Learning
Directory of Open Access Journals (Sweden)
Yang Liu
2015-01-01
Full Text Available Artificial neural networks (ANNs have been widely used in pattern recognition and classification applications. However, ANNs are notably slow in computation especially when the size of data is large. Nowadays, big data has received a momentum from both industry and academia. To fulfill the potentials of ANNs for big data applications, the computation process must be speeded up. For this purpose, this paper parallelizes neural networks based on MapReduce, which has become a major computing model to facilitate data intensive applications. Three data intensive scenarios are considered in the parallelization process in terms of the volume of classification data, the size of the training data, and the number of neurons in the neural network. The performance of the parallelized neural networks is evaluated in an experimental MapReduce computer cluster from the aspects of accuracy in classification and efficiency in computation.
MapReduce Based Parallel Neural Networks in Enabling Large Scale Machine Learning.
Liu, Yang; Yang, Jie; Huang, Yuan; Xu, Lixiong; Li, Siguang; Qi, Man
2015-01-01
Artificial neural networks (ANNs) have been widely used in pattern recognition and classification applications. However, ANNs are notably slow in computation especially when the size of data is large. Nowadays, big data has received a momentum from both industry and academia. To fulfill the potentials of ANNs for big data applications, the computation process must be speeded up. For this purpose, this paper parallelizes neural networks based on MapReduce, which has become a major computing model to facilitate data intensive applications. Three data intensive scenarios are considered in the parallelization process in terms of the volume of classification data, the size of the training data, and the number of neurons in the neural network. The performance of the parallelized neural networks is evaluated in an experimental MapReduce computer cluster from the aspects of accuracy in classification and efficiency in computation.
International Nuclear Information System (INIS)
Cao, Dingzhou; Murat, Alper; Chinnam, Ratna Babu
2013-01-01
This paper proposes a decomposition-based approach to exactly solve the multi-objective Redundancy Allocation Problem for series-parallel systems. Redundancy allocation problem is a form of reliability optimization and has been the subject of many prior studies. The majority of these earlier studies treat redundancy allocation problem as a single objective problem maximizing the system reliability or minimizing the cost given certain constraints. The few studies that treated redundancy allocation problem as a multi-objective optimization problem relied on meta-heuristic solution approaches. However, meta-heuristic approaches have significant limitations: they do not guarantee that Pareto points are optimal and, more importantly, they may not identify all the Pareto-optimal points. In this paper, we treat redundancy allocation problem as a multi-objective problem, as is typical in practice. We decompose the original problem into several multi-objective sub-problems, efficiently and exactly solve sub-problems, and then systematically combine the solutions. The decomposition-based approach can efficiently generate all the Pareto-optimal solutions for redundancy allocation problems. Experimental results demonstrate the effectiveness and efficiency of the proposed method over meta-heuristic methods on a numerical example taken from the literature.
Directory of Open Access Journals (Sweden)
Jingli Du
2013-01-01
Full Text Available Cable-driven parallel manipulators are one of the best solutions to achieving large workspace since flexible cables can be easily stored on reels. However, due to the negligible flexural stiffness of cables, long cables will unavoidably vibrate during operation for large workspace applications. In this paper a finite element model for cable-driven parallel manipulators is proposed to mimic small amplitude vibration of cables around their desired position. Output feedback of the cable tension variation at the end of the end-effector is utilized to design the vibration attenuation controller which aims at attenuating the vibration of cables by slightly varying the cable length, thus decreasing its effect on the end-effector. When cable vibration is attenuated, motion controller could be designed for implementing precise large motion to track given trajectories. A numerical example is presented to demonstrate the dynamic model and the control algorithm.
Scalable Parallel Distributed Coprocessor System for Graph Searching Problems with Massive Data
Directory of Open Access Journals (Sweden)
Wanrong Huang
2017-01-01
Full Text Available The Internet applications, such as network searching, electronic commerce, and modern medical applications, produce and process massive data. Considerable data parallelism exists in computation processes of data-intensive applications. A traversal algorithm, breadth-first search (BFS, is fundamental in many graph processing applications and metrics when a graph grows in scale. A variety of scientific programming methods have been proposed for accelerating and parallelizing BFS because of the poor temporal and spatial locality caused by inherent irregular memory access patterns. However, new parallel hardware could provide better improvement for scientific methods. To address small-world graph problems, we propose a scalable and novel field-programmable gate array-based heterogeneous multicore system for scientific programming. The core is multithread for streaming processing. And the communication network InfiniBand is adopted for scalability. We design a binary search algorithm to address mapping to unify all processor addresses. Within the limits permitted by the Graph500 test bench after 1D parallel hybrid BFS algorithm testing, our 8-core and 8-thread-per-core system achieved superior performance and efficiency compared with the prior work under the same degree of parallelism. Our system is efficient not as a special acceleration unit but as a processor platform that deals with graph searching applications.
Parallel computing in cluster of GPU applied to a problem of nuclear engineering
International Nuclear Information System (INIS)
Moraes, Sergio Ricardo S.; Heimlich, Adino; Resende, Pedro
2013-01-01
Cluster computing has been widely used as a low cost alternative for parallel processing in scientific applications. With the use of Message-Passing Interface (MPI) protocol development became even more accessible and widespread in the scientific community. A more recent trend is the use of Graphic Processing Unit (GPU), which is a powerful co-processor able to perform hundreds of instructions in parallel, reaching a capacity of hundreds of times the processing of a CPU. However, a standard PC does not allow, in general, more than two GPUs. Hence, it is proposed in this work development and evaluation of a hybrid low cost parallel approach to the solution to a nuclear engineering typical problem. The idea is to use clusters parallelism technology (MPI) together with GPU programming techniques (CUDA - Compute Unified Device Architecture) to simulate neutron transport through a slab using Monte Carlo method. By using a cluster comprised by four quad-core computers with 2 GPU each, it has been developed programs using MPI and CUDA technologies. Experiments, applying different configurations, from 1 to 8 GPUs has been performed and results were compared with the sequential (non-parallel) version. A speed up of about 2.000 times has been observed when comparing the 8-GPU with the sequential version. Results here presented are discussed and analyzed with the objective of outlining gains and possible limitations of the proposed approach. (author)
Visual analysis of inter-process communication for large-scale parallel computing.
Muelder, Chris; Gygi, Francois; Ma, Kwan-Liu
2009-01-01
In serial computation, program profiling is often helpful for optimization of key sections of code. When moving to parallel computation, not only does the code execution need to be considered but also communication between the different processes which can induce delays that are detrimental to performance. As the number of processes increases, so does the impact of the communication delays on performance. For large-scale parallel applications, it is critical to understand how the communication impacts performance in order to make the code more efficient. There are several tools available for visualizing program execution and communications on parallel systems. These tools generally provide either views which statistically summarize the entire program execution or process-centric views. However, process-centric visualizations do not scale well as the number of processes gets very large. In particular, the most common representation of parallel processes is a Gantt char t with a row for each process. As the number of processes increases, these charts can become difficult to work with and can even exceed screen resolution. We propose a new visualization approach that affords more scalability and then demonstrate it on systems running with up to 16,384 processes.
Parallel analysis tools and new visualization techniques for ultra-large climate data set
Energy Technology Data Exchange (ETDEWEB)
Middleton, Don [National Center for Atmospheric Research, Boulder, CO (United States); Haley, Mary [National Center for Atmospheric Research, Boulder, CO (United States)
2014-12-10
ParVis was a project funded under LAB 10-05: “Earth System Modeling: Advanced Scientific Visualization of Ultra-Large Climate Data Sets”. Argonne was the lead lab with partners at PNNL, SNL, NCAR and UC-Davis. This report covers progress from January 1st, 2013 through Dec 1st, 2014. Two previous reports covered the period from Summer, 2010, through September 2011 and October 2011 through December 2012, respectively. While the project was originally planned to end on April 30, 2013, personnel and priority changes allowed many of the institutions to continue work through FY14 using existing funds. A primary focus of ParVis was introducing parallelism to climate model analysis to greatly reduce the time-to-visualization for ultra-large climate data sets. Work in the first two years was conducted on two tracks with different time horizons: one track to provide immediate help to climate scientists already struggling to apply their analysis to existing large data sets and another focused on building a new data-parallel library and tool for climate analysis and visualization that will give the field a platform for performing analysis and visualization on ultra-large datasets for the foreseeable future. In the final 2 years of the project, we focused mostly on the new data-parallel library and associated tools for climate analysis and visualization.
Parallel continuous simulated tempering and its applications in large-scale molecular simulations
Energy Technology Data Exchange (ETDEWEB)
Zang, Tianwu; Yu, Linglin; Zhang, Chong [Applied Physics Program and Department of Bioengineering, Rice University, Houston, Texas 77005 (United States); Ma, Jianpeng, E-mail: jpma@bcm.tmc.edu [Applied Physics Program and Department of Bioengineering, Rice University, Houston, Texas 77005 (United States); Verna and Marrs McLean Department of Biochemistry and Molecular Biology, Baylor College of Medicine, One Baylor Plaza, BCM-125, Houston, Texas 77030 (United States)
2014-07-28
In this paper, we introduce a parallel continuous simulated tempering (PCST) method for enhanced sampling in studying large complex systems. It mainly inherits the continuous simulated tempering (CST) method in our previous studies [C. Zhang and J. Ma, J. Chem. Phys. 130, 194112 (2009); C. Zhang and J. Ma, J. Chem. Phys. 132, 244101 (2010)], while adopts the spirit of parallel tempering (PT), or replica exchange method, by employing multiple copies with different temperature distributions. Differing from conventional PT methods, despite the large stride of total temperature range, the PCST method requires very few copies of simulations, typically 2–3 copies, yet it is still capable of maintaining a high rate of exchange between neighboring copies. Furthermore, in PCST method, the size of the system does not dramatically affect the number of copy needed because the exchange rate is independent of total potential energy, thus providing an enormous advantage over conventional PT methods in studying very large systems. The sampling efficiency of PCST was tested in two-dimensional Ising model, Lennard-Jones liquid and all-atom folding simulation of a small globular protein trp-cage in explicit solvent. The results demonstrate that the PCST method significantly improves sampling efficiency compared with other methods and it is particularly effective in simulating systems with long relaxation time or correlation time. We expect the PCST method to be a good alternative to parallel tempering methods in simulating large systems such as phase transition and dynamics of macromolecules in explicit solvent.
Fast Simulation of Large-Scale Floods Based on GPU Parallel Computing
Directory of Open Access Journals (Sweden)
Qiang Liu
2018-05-01
Full Text Available Computing speed is a significant issue of large-scale flood simulations for real-time response to disaster prevention and mitigation. Even today, most of the large-scale flood simulations are generally run on supercomputers due to the massive amounts of data and computations necessary. In this work, a two-dimensional shallow water model based on an unstructured Godunov-type finite volume scheme was proposed for flood simulation. To realize a fast simulation of large-scale floods on a personal computer, a Graphics Processing Unit (GPU-based, high-performance computing method using the OpenACC application was adopted to parallelize the shallow water model. An unstructured data management method was presented to control the data transportation between the GPU and CPU (Central Processing Unit with minimum overhead, and then both computation and data were offloaded from the CPU to the GPU, which exploited the computational capability of the GPU as much as possible. The parallel model was validated using various benchmarks and real-world case studies. The results demonstrate that speed-ups of up to one order of magnitude can be achieved in comparison with the serial model. The proposed parallel model provides a fast and reliable tool with which to quickly assess flood hazards in large-scale areas and, thus, has a bright application prospect for dynamic inundation risk identification and disaster assessment.
International Nuclear Information System (INIS)
Spencer, VN
2001-01-01
An investigation has been conducted regarding the ability of clustered personal computers to improve the performance of executing software simulations for solving engineering problems. The power and utility of personal computers continues to grow exponentially through advances in computing capabilities such as newer microprocessors, advances in microchip technologies, electronic packaging, and cost effective gigabyte-size hard drive capacity. Many engineering problems require significant computing power. Therefore, the computation has to be done by high-performance computer systems that cost millions of dollars and need gigabytes of memory to complete the task. Alternately, it is feasible to provide adequate computing in the form of clustered personal computers. This method cuts the cost and size by linking (clustering) personal computers together across a network. Clusters also have the advantage that they can be used as stand-alone computers when they are not operating as a parallel computer. Parallel computing software to exploit clusters is available for computer operating systems like Unix, Windows NT, or Linux. This project concentrates on the use of Windows NT, and the Parallel Virtual Machine (PVM) system to solve an engineering dynamics problem in Fortran
A parallel algorithm for solving linear equations arising from one-dimensional network problems
International Nuclear Information System (INIS)
Mesina, G.L.
1991-01-01
One-dimensional (1-D) network problems, such as those arising from 1- D fluid simulations and electrical circuitry, produce systems of sparse linear equations which are nearly tridiagonal and contain a few non-zero entries outside the tridiagonal. Most direct solution techniques for such problems either do not take advantage of the special structure of the matrix or do not fully utilize parallel computer architectures. We describe a new parallel direct linear equation solution algorithm, called TRBR, which is especially designed to take advantage of this structure on MIMD shared memory machines. The new method belongs to a family of methods which split the coefficient matrix into the sum of a tridiagonal matrix T and a matrix comprised of the remaining coefficients R. Efficient tridiagonal methods are used to algebraically simplify the linear system. A smaller auxiliary subsystem is created and solved and its solution is used to calculate the solution of the original system. The newly devised BR method solves the subsystem. The serial and parallel operation counts are given for the new method and related earlier methods. TRBR is shown to have the smallest operation count in this class of direct methods. Numerical results are given. Although the algorithm is designed for one-dimensional networks, it has been applied successfully to three-dimensional problems as well. 20 refs., 2 figs., 4 tabs
Parallel simulation of tsunami inundation on a large-scale supercomputer
Oishi, Y.; Imamura, F.; Sugawara, D.
2013-12-01
An accurate prediction of tsunami inundation is important for disaster mitigation purposes. One approach is to approximate the tsunami wave source through an instant inversion analysis using real-time observation data (e.g., Tsushima et al., 2009) and then use the resulting wave source data in an instant tsunami inundation simulation. However, a bottleneck of this approach is the large computational cost of the non-linear inundation simulation and the computational power of recent massively parallel supercomputers is helpful to enable faster than real-time execution of a tsunami inundation simulation. Parallel computers have become approximately 1000 times faster in 10 years (www.top500.org), and so it is expected that very fast parallel computers will be more and more prevalent in the near future. Therefore, it is important to investigate how to efficiently conduct a tsunami simulation on parallel computers. In this study, we are targeting very fast tsunami inundation simulations on the K computer, currently the fastest Japanese supercomputer, which has a theoretical peak performance of 11.2 PFLOPS. One computing node of the K computer consists of 1 CPU with 8 cores that share memory, and the nodes are connected through a high-performance torus-mesh network. The K computer is designed for distributed-memory parallel computation, so we have developed a parallel tsunami model. Our model is based on TUNAMI-N2 model of Tohoku University, which is based on a leap-frog finite difference method. A grid nesting scheme is employed to apply high-resolution grids only at the coastal regions. To balance the computation load of each CPU in the parallelization, CPUs are first allocated to each nested layer in proportion to the number of grid points of the nested layer. Using CPUs allocated to each layer, 1-D domain decomposition is performed on each layer. In the parallel computation, three types of communication are necessary: (1) communication to adjacent neighbours for the
Liu, Yang
2015-12-17
A scalable parallel plane-wave time-domain (PWTD) algorithm for efficient and accurate analysis of transient scattering from electrically large objects is presented. The algorithm produces scalable communication patterns on very large numbers of processors by leveraging two mechanisms: (i) a hierarchical parallelization strategy to evenly distribute the computation and memory loads at all levels of the PWTD tree among processors, and (ii) a novel asynchronous communication scheme to reduce the cost and memory requirement of the communications between the processors. The efficiency and accuracy of the algorithm are demonstrated through its applications to the analysis of transient scattering from a perfect electrically conducting (PEC) sphere with a diameter of 70 wavelengths and a PEC square plate with a dimension of 160 wavelengths. Furthermore, the proposed algorithm is used to analyze transient fields scattered from realistic airplane and helicopter models under high frequency excitation.
Parallelization of a beam dynamics code and first large scale radio frequency quadrupole simulations
Directory of Open Access Journals (Sweden)
J. Xu
2007-01-01
Full Text Available The design and operation support of hadron (proton and heavy-ion linear accelerators require substantial use of beam dynamics simulation tools. The beam dynamics code TRACK has been originally developed at Argonne National Laboratory (ANL to fulfill the special requirements of the rare isotope accelerator (RIA accelerator systems. From the beginning, the code has been developed to make it useful in the three stages of a linear accelerator project, namely, the design, commissioning, and operation of the machine. To realize this concept, the code has unique features such as end-to-end simulations from the ion source to the final beam destination and automatic procedures for tuning of a multiple charge state heavy-ion beam. The TRACK code has become a general beam dynamics code for hadron linacs and has found wide applications worldwide. Until recently, the code has remained serial except for a simple parallelization used for the simulation of multiple seeds to study the machine errors. To speed up computation, the TRACK Poisson solver has been parallelized. This paper discusses different parallel models for solving the Poisson equation with the primary goal to extend the scalability of the code onto 1024 and more processors of the new generation of supercomputers known as BlueGene (BG/L. Domain decomposition techniques have been adapted and incorporated into the parallel version of the TRACK code. To demonstrate the new capabilities of the parallelized TRACK code, the dynamics of a 45 mA proton beam represented by 10^{8} particles has been simulated through the 325 MHz radio frequency quadrupole and initial accelerator section of the proposed FNAL proton driver. The results show the benefits and advantages of large-scale parallel computing in beam dynamics simulations.
A review of parallel computing for large-scale remote sensing image mosaicking
Chen, Lajiao; Ma, Yan; Liu, Peng; Wei, Jingbo; Jie, Wei; He, Jijun
2015-01-01
Interest in image mosaicking has been spurred by a wide variety of research and management needs. However, for large-scale applications, remote sensing image mosaicking usually requires significant computational capabilities. Several studies have attempted to apply parallel computing to improve image mosaicking algorithms and to speed up calculation process. The state of the art of this field has not yet been summarized, which is, however, essential for a better understanding and for further ...
Directory of Open Access Journals (Sweden)
Zhaocai Wang
2015-10-01
Full Text Available The unbalanced assignment problem (UAP is to optimally resolve the problem of assigning n jobs to m individuals (m < n, such that minimum cost or maximum profit obtained. It is a vitally important Non-deterministic Polynomial (NP complete problem in operation management and applied mathematics, having numerous real life applications. In this paper, we present a new parallel DNA algorithm for solving the unbalanced assignment problem using DNA molecular operations. We reasonably design flexible-length DNA strands representing different jobs and individuals, take appropriate steps, and get the solutions of the UAP in the proper length range and O(mn time. We extend the application of DNA molecular operations and simultaneity to simplify the complexity of the computation.
DGDFT: A massively parallel method for large scale density functional theory calculations.
Hu, Wei; Lin, Lin; Yang, Chao
2015-09-28
We describe a massively parallel implementation of the recently developed discontinuous Galerkin density functional theory (DGDFT) method, for efficient large-scale Kohn-Sham DFT based electronic structure calculations. The DGDFT method uses adaptive local basis (ALB) functions generated on-the-fly during the self-consistent field iteration to represent the solution to the Kohn-Sham equations. The use of the ALB set provides a systematic way to improve the accuracy of the approximation. By using the pole expansion and selected inversion technique to compute electron density, energy, and atomic forces, we can make the computational complexity of DGDFT scale at most quadratically with respect to the number of electrons for both insulating and metallic systems. We show that for the two-dimensional (2D) phosphorene systems studied here, using 37 basis functions per atom allows us to reach an accuracy level of 1.3 × 10(-4) Hartree/atom in terms of the error of energy and 6.2 × 10(-4) Hartree/bohr in terms of the error of atomic force, respectively. DGDFT can achieve 80% parallel efficiency on 128,000 high performance computing cores when it is used to study the electronic structure of 2D phosphorene systems with 3500-14 000 atoms. This high parallel efficiency results from a two-level parallelization scheme that we will describe in detail.
DGDFT: A massively parallel method for large scale density functional theory calculations
Energy Technology Data Exchange (ETDEWEB)
Hu, Wei, E-mail: whu@lbl.gov; Yang, Chao, E-mail: cyang@lbl.gov [Computational Research Division, Lawrence Berkeley National Laboratory, Berkeley, California 94720 (United States); Lin, Lin, E-mail: linlin@math.berkeley.edu [Computational Research Division, Lawrence Berkeley National Laboratory, Berkeley, California 94720 (United States); Department of Mathematics, University of California, Berkeley, California 94720 (United States)
2015-09-28
We describe a massively parallel implementation of the recently developed discontinuous Galerkin density functional theory (DGDFT) method, for efficient large-scale Kohn-Sham DFT based electronic structure calculations. The DGDFT method uses adaptive local basis (ALB) functions generated on-the-fly during the self-consistent field iteration to represent the solution to the Kohn-Sham equations. The use of the ALB set provides a systematic way to improve the accuracy of the approximation. By using the pole expansion and selected inversion technique to compute electron density, energy, and atomic forces, we can make the computational complexity of DGDFT scale at most quadratically with respect to the number of electrons for both insulating and metallic systems. We show that for the two-dimensional (2D) phosphorene systems studied here, using 37 basis functions per atom allows us to reach an accuracy level of 1.3 × 10{sup −4} Hartree/atom in terms of the error of energy and 6.2 × 10{sup −4} Hartree/bohr in terms of the error of atomic force, respectively. DGDFT can achieve 80% parallel efficiency on 128,000 high performance computing cores when it is used to study the electronic structure of 2D phosphorene systems with 3500-14 000 atoms. This high parallel efficiency results from a two-level parallelization scheme that we will describe in detail.
DGDFT: A massively parallel method for large scale density functional theory calculations
International Nuclear Information System (INIS)
Hu, Wei; Yang, Chao; Lin, Lin
2015-01-01
We describe a massively parallel implementation of the recently developed discontinuous Galerkin density functional theory (DGDFT) method, for efficient large-scale Kohn-Sham DFT based electronic structure calculations. The DGDFT method uses adaptive local basis (ALB) functions generated on-the-fly during the self-consistent field iteration to represent the solution to the Kohn-Sham equations. The use of the ALB set provides a systematic way to improve the accuracy of the approximation. By using the pole expansion and selected inversion technique to compute electron density, energy, and atomic forces, we can make the computational complexity of DGDFT scale at most quadratically with respect to the number of electrons for both insulating and metallic systems. We show that for the two-dimensional (2D) phosphorene systems studied here, using 37 basis functions per atom allows us to reach an accuracy level of 1.3 × 10 −4 Hartree/atom in terms of the error of energy and 6.2 × 10 −4 Hartree/bohr in terms of the error of atomic force, respectively. DGDFT can achieve 80% parallel efficiency on 128,000 high performance computing cores when it is used to study the electronic structure of 2D phosphorene systems with 3500-14 000 atoms. This high parallel efficiency results from a two-level parallelization scheme that we will describe in detail
New computational methodology for large 3D neutron transport problems
International Nuclear Information System (INIS)
Dahmani, M.; Roy, R.; Koclas, J.
2004-01-01
We present a new computational methodology, based on 3D characteristics method, dedicated to solve very large 3D problems without spatial homogenization. In order to eliminate the input/output problems occurring when solving these large problems, we set up a new computing scheme that requires more CPU resources than the usual one, based on sweeps over large tracking files. The huge capacity of storage needed in some problems and the related I/O queries needed by the characteristics solver are replaced by on-the-fly recalculation of tracks at each iteration step. Using this technique, large 3D problems are no longer I/O-bound, and distributed CPU resources can be efficiently used. (authors)
A Parallel Approach for Frequent Subgraph Mining in a Single Large Graph Using Spark
Directory of Open Access Journals (Sweden)
Fengcai Qiao
2018-02-01
Full Text Available Frequent subgraph mining (FSM plays an important role in graph mining, attracting a great deal of attention in many areas, such as bioinformatics, web data mining and social networks. In this paper, we propose SSiGraM (Spark based Single Graph Mining, a Spark based parallel frequent subgraph mining algorithm in a single large graph. Aiming to approach the two computational challenges of FSM, we conduct the subgraph extension and support evaluation parallel across all the distributed cluster worker nodes. In addition, we also employ a heuristic search strategy and three novel optimizations: load balancing, pre-search pruning and top-down pruning in the support evaluation process, which significantly improve the performance. Extensive experiments with four different real-world datasets demonstrate that the proposed algorithm outperforms the existing GraMi (Graph Mining algorithm by an order of magnitude for all datasets and can work with a lower support threshold.
Enhanced 2D-DOA Estimation for Large Spacing Three-Parallel Uniform Linear Arrays
Directory of Open Access Journals (Sweden)
Dong Zhang
2018-01-01
Full Text Available An enhanced two-dimensional direction of arrival (2D-DOA estimation algorithm for large spacing three-parallel uniform linear arrays (ULAs is proposed in this paper. Firstly, we use the propagator method (PM to get the highly accurate but ambiguous estimation of directional cosine. Then, we use the relationship between the directional cosine to eliminate the ambiguity. This algorithm not only can make use of the elements of the three-parallel ULAs but also can utilize the connection between directional cosine to improve the estimation accuracy. Besides, it has satisfied estimation performance when the elevation angle is between 70° and 90° and it can automatically pair the estimated azimuth and elevation angles. Furthermore, it has low complexity without using any eigen value decomposition (EVD or singular value decompostion (SVD to the covariance matrix. Simulation results demonstrate the effectiveness of our proposed algorithm.
Stability of Large Parallel Tunnels Excavated in Weak Rocks: A Case Study
Ding, Xiuli; Weng, Yonghong; Zhang, Yuting; Xu, Tangjin; Wang, Tuanle; Rao, Zhiwen; Qi, Zufang
2017-09-01
Diversion tunnels are important structures for hydropower projects but are always placed in locations with less favorable geological conditions than those in which other structures are placed. Because diversion tunnels are usually large and closely spaced, the rock pillar between adjacent tunnels in weak rocks is affected on both sides, and conventional support measures may not be adequate to achieve the required stability. Thus, appropriate reinforcement support measures are needed, and the design philosophy regarding large parallel tunnels in weak rocks should be updated. This paper reports a recent case in which two large parallel diversion tunnels are excavated. The rock masses are thin- to ultra-thin-layered strata coated with phyllitic films, which significantly decrease the soundness and strength of the strata and weaken the rocks. The behaviors of the surrounding rock masses under original (and conventional) support measures are detailed in terms of rock mass deformation, anchor bolt stress, and the extent of the excavation disturbed zone (EDZ), as obtained from safety monitoring and field testing. In situ observed phenomena and their interpretation are also included. The sidewall deformations exhibit significant time-dependent characteristics, and large magnitudes are recorded. The stresses in the anchor bolts are small, but the extents of the EDZs are large. The stability condition under the original support measures is evaluated as poor. To enhance rock mass stability, attempts are made to reinforce support design and improve safety monitoring programs. The main feature of these attempts is the use of prestressed cables that run through the rock pillar between the parallel tunnels. The efficacy of reinforcement support measures is verified by further safety monitoring data and field test results. Numerical analysis is constantly performed during the construction process to provide a useful reference for decision making. The calculated deformations are in
Parallel Solver for H(div) Problems Using Hybridization and AMG
Energy Technology Data Exchange (ETDEWEB)
Lee, Chak S. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Vassilevski, Panayot S. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)
2016-01-15
In this paper, a scalable parallel solver is proposed for H(div) problems discretized by arbitrary order finite elements on general unstructured meshes. The solver is based on hybridization and algebraic multigrid (AMG). Unlike some previously studied H(div) solvers, the hybridization solver does not require discrete curl and gradient operators as additional input from the user. Instead, only some element information is needed in the construction of the solver. The hybridization results in a H1-equivalent symmetric positive definite system, which is then rescaled and solved by AMG solvers designed for H1 problems. Weak and strong scaling of the method are examined through several numerical tests. Our numerical results show that the proposed solver provides a promising alternative to ADS, a state-of-the-art solver [12], for H(div) problems. In fact, it outperforms ADS for higher order elements.
Directory of Open Access Journals (Sweden)
Salvator Abreu
2009-10-01
Full Text Available We explore the use of the Cell Broadband Engine (Cell/BE for short for combinatorial optimization applications: we present a parallel version of a constraint-based local search algorithm that has been implemented on a multiprocessor BladeCenter machine with twin Cell/BE processors (total of 16 SPUs per blade. This algorithm was chosen because it fits very well the Cell/BE architecture and requires neither shared memory nor communication between processors, while retaining a compact memory footprint. We study the performance on several large optimization benchmarks and show that this achieves mostly linear time speedups, even sometimes super-linear. This is possible because the parallel implementation might explore simultaneously different parts of the search space and therefore converge faster towards the best sub-space and thus towards a solution. Besides getting speedups, the resulting times exhibit a much smaller variance, which benefits applications where a timely reply is critical.
Directory of Open Access Journals (Sweden)
Naumenko Mikhail
2018-01-01
Full Text Available Modern parallel computing algorithm has been applied to the solution of the few-body problem. The approach is based on Feynman’s continual integrals method implemented in C++ programming language using NVIDIA CUDA technology. A wide range of 3-body and 4-body bound systems has been considered including nuclei described as consisting of protons and neutrons (e.g., 3,4He and nuclei described as consisting of clusters and nucleons (e.g., 6He. The correctness of the results was checked by the comparison with the exactly solvable 4-body oscillatory system and experimental data.
Naumenko, Mikhail; Samarin, Viacheslav
2018-02-01
Modern parallel computing algorithm has been applied to the solution of the few-body problem. The approach is based on Feynman's continual integrals method implemented in C++ programming language using NVIDIA CUDA technology. A wide range of 3-body and 4-body bound systems has been considered including nuclei described as consisting of protons and neutrons (e.g., 3,4He) and nuclei described as consisting of clusters and nucleons (e.g., 6He). The correctness of the results was checked by the comparison with the exactly solvable 4-body oscillatory system and experimental data.
A heuristic for solving the redundancy allocation problem for multi-state series-parallel systems
International Nuclear Information System (INIS)
Ramirez-Marquez, Jose E.; Coit, David W.
2004-01-01
The redundancy allocation problem is formulated with the objective of minimizing design cost, when the system exhibits a multi-state reliability behavior, given system-level performance constraints. When the multi-state nature of the system is considered, traditional solution methodologies are no longer valid. This study considers a multi-state series-parallel system (MSPS) with capacitated binary components that can provide different multi-state system performance levels. The different demand levels, which must be supplied during the system-operating period, result in the multi-state nature of the system. The new solution methodology offers several distinct benefits compared to traditional formulations of the MSPS redundancy allocation problem. For some systems, recognizing that different component versions yield different system performance is critical so that the overall system reliability estimation and associated design models the true system reliability behavior more realistically. The MSPS design problem, solved in this study, has been previously analyzed using genetic algorithms (GAs) and the universal generating function. The specific problem being addressed is one where there are multiple component choices, but once a component selection is made, only the same component type can be used to provide redundancy. This is the first time that the MSPS design problem has been addressed without using GAs. The heuristic offers more efficient and straightforward analyses. Solutions to three different problem types are obtained illustrating the simplicity and ease of application of the heuristic without compromising the intended optimization needs
Large-area parallel near-field optical nanopatterning of functional materials using microsphere mask
Energy Technology Data Exchange (ETDEWEB)
Chen, G.X. [NUS Nanoscience and Nanotechnology Initiative, National University of Singapore, 2 Engineering Drive 3, Singapore 117576 (Singapore); Department of Electrical and Computer Engineering, National University of Singapore, 4 Engineering Drive 3, Singapore 117576 (Singapore); Hong, M.H. [NUS Nanoscience and Nanotechnology Initiative, National University of Singapore, 2 Engineering Drive 3, Singapore 117576 (Singapore); Department of Electrical and Computer Engineering, National University of Singapore, 4 Engineering Drive 3, Singapore 117576 (Singapore); Data Storage Institute, ASTAR, DSI Building, 5 Engineering Drive 1, Singapore 117608 (Singapore)], E-mail: Hong_Minghui@dsi.a-star.edu.sg; Lin, Y. [NUS Nanoscience and Nanotechnology Initiative, National University of Singapore, 2 Engineering Drive 3, Singapore 117576 (Singapore); Department of Electrical and Computer Engineering, National University of Singapore, 4 Engineering Drive 3, Singapore 117576 (Singapore); Wang, Z.B. [Data Storage Institute, ASTAR, DSI Building, 5 Engineering Drive 1, Singapore 117608 (Singapore); Ng, D.K.T. [Department of Electrical and Computer Engineering, National University of Singapore, 4 Engineering Drive 3, Singapore 117576 (Singapore); Data Storage Institute, ASTAR, DSI Building, 5 Engineering Drive 1, Singapore 117608 (Singapore); Xie, Q. [Data Storage Institute, ASTAR, DSI Building, 5 Engineering Drive 1, Singapore 117608 (Singapore); Tan, L.S. [NUS Nanoscience and Nanotechnology Initiative, National University of Singapore, 2 Engineering Drive 3, Singapore 117576 (Singapore); Department of Electrical and Computer Engineering, National University of Singapore, 4 Engineering Drive 3, Singapore 117576 (Singapore); Chong, T.C. [Department of Electrical and Computer Engineering, National University of Singapore, 4 Engineering Drive 3, Singapore 117576 (Singapore); Data Storage Institute, ASTAR, DSI Building, 5 Engineering Drive 1, Singapore 117608 (Singapore)
2008-01-31
Large-area parallel near-field optical nanopatterning on functional material surfaces was investigated with KrF excimer laser irradiation. A monolayer of silicon dioxide microspheres was self-assembled on the sample surfaces as the processing mask. Nanoholes and nanospots were obtained on silicon surfaces and thin silver films, respectively. The nanopatterning results were affected by the refractive indices of the surrounding media. Near-field optical enhancement beneath the microspheres is the physical origin of nanostructure formation. Theoretical calculation was performed to study the intensity of optical field distributions under the microspheres according to the light scattering model of a sphere on the substrate.
Imaging of a large collection of human embryo using a super-parallel MR microscope
International Nuclear Information System (INIS)
Matsuda, Yoshimasa; Ono, Shinya; Otake, Yosuke; Handa, Shinya; Kose, Katsumi; Haishi, Tomoyuki; Yamada, Shigeto; Uwabe, Chikako; Shiota, Kohei
2007-01-01
Using 4 and 8-channel super-parallel magnetic resonance (MR) microscopes with a horizontal bore 2.34T superconducting magnet developed for 3-dimensional MR microscopy of the large Kyoto Collection of Human Embryos, we acquired T 1 -weighted 3D images of 1204 embryos at a spatial resolution of (40 μm) 3 to (150 μm) 3 in about 2 years. Similarity of image contrast between the T 1 -weighted images and stained anatomical sections indicated that T 1 -weighted 3D images could be used for an anatomical 3D image database for human embryology. (author)
Structuring and assessing large and complex decision problems using MCDA
DEFF Research Database (Denmark)
Barfod, Michael Bruhn
This paper presents an approach for the structuring and assessing of large and complex decision problems using multi-criteria decision analysis (MCDA). The MCDA problem is structured in a decision tree and assessed using the REMBRANDT technique featuring a procedure for limiting the number of pair...
The Cauchy problem for the Pavlov equation with large data
Wu, Derchyi
2017-08-01
We prove a local solvability of the Cauchy problem for the Pavlov equation with large initial data by the inverse scattering method. The Pavlov equation arises in studies Einstein-Weyl geometries and dispersionless integrable models. Our theory yields a local solvability of Cauchy problems for a quasi-linear wave equation with a characteristic initial hypersurface.
Development of parallel GPU based algorithms for problems in nuclear area
International Nuclear Information System (INIS)
Almeida, Adino Americo Heimlich
2009-01-01
Graphics Processing Units (GPU) are high performance co-processors intended, originally, to improve the use and quality of computer graphics applications. Since researchers and practitioners realized the potential of using GPU for general purpose, their application has been extended to other fields out of computer graphics scope. The main objective of this work is to evaluate the impact of using GPU in two typical problems of Nuclear area. The neutron transport simulation using Monte Carlo method and solve heat equation in a bi-dimensional domain by finite differences method. To achieve this, we develop parallel algorithms for GPU and CPU in the two problems described above. The comparison showed that the GPU-based approach is faster than the CPU in a computer with two quad core processors, without precision loss. (author)
Higgs, moduli problem, baryogenesis and large volume compactifications
International Nuclear Information System (INIS)
Higaki, Tetsutaro; Takahashi, Fuminobu
2012-07-01
We consider the cosmological moduli problem in the context of high-scale supersymmetry breaking suggested by the recent discovery of the standard-model like Higgs boson. In order to solve the notorious moduli-induced gravitino problem, we focus on the LARGE volume scenario, in which the modulus decay into gravitinos can be kinematically forbidden. We then consider the Affleck-Dine mechanism with or without an enhanced coupling with the inflaton, taking account of possible Q-ball formation. We show that the baryon asymmetry of the present Universe can be generated by the Affleck-Dine mechanism in LARGE volume scenario, solving the moduli and gravitino problems.
Higgs, moduli problem, baryogenesis and large volume compactifications
Energy Technology Data Exchange (ETDEWEB)
Higaki, Tetsutaro [RIKEN Nishina Center, Saitama (Japan). Mathematical Physics Lab.; Kamada, Kohei [Deutsches Elektronen-Synchrotron (DESY), Hamburg (Germany); Takahashi, Fuminobu [Tohoku Univ., Sendai (Japan). Dept. of Physics
2012-07-15
We consider the cosmological moduli problem in the context of high-scale supersymmetry breaking suggested by the recent discovery of the standard-model like Higgs boson. In order to solve the notorious moduli-induced gravitino problem, we focus on the LARGE volume scenario, in which the modulus decay into gravitinos can be kinematically forbidden. We then consider the Affleck-Dine mechanism with or without an enhanced coupling with the inflaton, taking account of possible Q-ball formation. We show that the baryon asymmetry of the present Universe can be generated by the Affleck-Dine mechanism in LARGE volume scenario, solving the moduli and gravitino problems.
Directory of Open Access Journals (Sweden)
A. L. Lapikov
2014-01-01
Full Text Available The article is aimed at creating techniques to study multi-sectional manipulators with parallel structure. To solve this task the analysis in the field concerned was carried out to reveal both advantages and drawbacks of such executive mechanisms and main problems to be encountered in the course of research. The work shows that it is inefficient to create complete mathematical models of multisectional manipulators, which in the context of solving a direct kinematic problem are to derive a functional dependence of location and orientation of the end effector on all the generalized coordinates of the mechanism. The structure of multisectional manipulators was considered, where the sections are platform manipulators of parallel kinematics with six degrees of freedom. The paper offers an algorithm to define location and orientation of the end effector of the manipulator by means of iterative solution of analytical equation of the moving platform plane for each section. The equation for the unknown plane is derived using three points, which are attachment points of the moving platform joints. To define the values of joint coordinates a system of nine non-linear equations is completed. It is necessary to mention that for completion of the equation system are used the equations with the same type of non-linearity. The physical sense of all nine equations of the system is Euclidean distance between the points of the manipulator. The result of algorithm execution is a matrix of homogenous transformation for each section. The correlations describing transformations between adjoining sections of the manipulator are given. An example of the mechanism consisting of three sections is examined. The comparison of theoretical calculations with results obtained on a 3D-prototype is made. The next step of the work is to conduct research activities both in the field of dynamics of platform parallel kinematics manipulators with six degrees of freedom and in the
Material-Point Analysis of Large-Strain Problems
DEFF Research Database (Denmark)
Andersen, Søren
The aim of this thesis is to apply and improve the material-point method for modelling of geotechnical problems. One of the geotechnical phenomena that is a subject of active research is the study of landslides. A large amount of research is focused on determining when slopes become unstable. Hence......, it is possible to predict if a certain slope is stable using commercial finite element or finite difference software such as PLAXIS, ABAQUS or FLAC. However, the dynamics during a landslide are less explored. The material-point method (MPM) is a novel numerical method aimed at analysing problems involving...... materials subjected to large strains in a dynamical time–space domain. This thesis explores the material-point method with the specific aim of improving the performance for geotechnical problems. Large-strain geotechnical problems such as landslides pose a major challenge to model numerically. Employing...
Leveraging human oversight and intervention in large-scale parallel processing of open-source data
Casini, Enrico; Suri, Niranjan; Bradshaw, Jeffrey M.
2015-05-01
The popularity of cloud computing along with the increased availability of cheap storage have led to the necessity of elaboration and transformation of large volumes of open-source data, all in parallel. One way to handle such extensive volumes of information properly is to take advantage of distributed computing frameworks like Map-Reduce. Unfortunately, an entirely automated approach that excludes human intervention is often unpredictable and error prone. Highly accurate data processing and decision-making can be achieved by supporting an automatic process through human collaboration, in a variety of environments such as warfare, cyber security and threat monitoring. Although this mutual participation seems easily exploitable, human-machine collaboration in the field of data analysis presents several challenges. First, due to the asynchronous nature of human intervention, it is necessary to verify that once a correction is made, all the necessary reprocessing is done in chain. Second, it is often needed to minimize the amount of reprocessing in order to optimize the usage of resources due to limited availability. In order to improve on these strict requirements, this paper introduces improvements to an innovative approach for human-machine collaboration in the processing of large amounts of open-source data in parallel.
Solving Large Clustering Problems with Meta-Heuristic Search
DEFF Research Database (Denmark)
Turkensteen, Marcel; Andersen, Kim Allan; Bang-Jensen, Jørgen
In Clustering Problems, groups of similar subjects are to be retrieved from data sets. In this paper, Clustering Problems with the frequently used Minimum Sum-of-Squares Criterion are solved using meta-heuristic search. Tabu search has proved to be a successful methodology for solving optimization...... problems, but applications to large clustering problems are rare. The simulated annealing heuristic has mainly been applied to relatively small instances. In this paper, we implement tabu search and simulated annealing approaches and compare them to the commonly used k-means approach. We find that the meta-heuristic...
Parallelizing Gene Expression Programming Algorithm in Enabling Large-Scale Classification
Directory of Open Access Journals (Sweden)
Lixiong Xu
2017-01-01
Full Text Available As one of the most effective function mining algorithms, Gene Expression Programming (GEP algorithm has been widely used in classification, pattern recognition, prediction, and other research fields. Based on the self-evolution, GEP is able to mine an optimal function for dealing with further complicated tasks. However, in big data researches, GEP encounters low efficiency issue due to its long time mining processes. To improve the efficiency of GEP in big data researches especially for processing large-scale classification tasks, this paper presents a parallelized GEP algorithm using MapReduce computing model. The experimental results show that the presented algorithm is scalable and efficient for processing large-scale classification tasks.
Large-Scale, Parallel, Multi-Sensor Data Fusion in the Cloud
Wilson, B. D.; Manipon, G.; Hua, H.
2012-12-01
NASA's Earth Observing System (EOS) is an ambitious facility for studying global climate change. The mandate now is to combine measurements from the instruments on the "A-Train" platforms (AIRS, AMSR-E, MODIS, MISR, MLS, and CloudSat) and other Earth probes to enable large-scale studies of climate change over periods of years to decades. However, moving from predominantly single-instrument studies to a multi-sensor, measurement-based model for long-duration analysis of important climate variables presents serious challenges for large-scale data mining and data fusion. For example, one might want to compare temperature and water vapor retrievals from one instrument (AIRS) to another instrument (MODIS), and to a model (ECMWF), stratify the comparisons using a classification of the "cloud scenes" from CloudSat, and repeat the entire analysis over years of AIRS data. To perform such an analysis, one must discover & access multiple datasets from remote sites, find the space/time "matchups" between instruments swaths and model grids, understand the quality flags and uncertainties for retrieved physical variables, assemble merged datasets, and compute fused products for further scientific and statistical analysis. To efficiently assemble such decade-scale datasets in a timely manner, we are utilizing Elastic Computing in the Cloud and parallel map/reduce-based algorithms. "SciReduce" is a Hadoop-like parallel analysis system, programmed in parallel python, that is designed from the ground up for Earth science. SciReduce executes inside VMWare images and scales to any number of nodes in the Cloud. Unlike Hadoop, in which simple tuples (keys & values) are passed between the map and reduce functions, SciReduce operates on bundles of named numeric arrays, which can be passed in memory or serialized to disk in netCDF4 or HDF5. Thus, SciReduce uses the native datatypes (geolocated grids, swaths, and points) that geo-scientists are familiar with. We are deploying within Sci
Tabu search for the redundancy allocation problem of homogenous series-parallel multi-state systems
International Nuclear Information System (INIS)
Ouzineb, Mohamed; Nourelfath, Mustapha; Gendreau, Michel
2008-01-01
This paper develops an efficient tabu search (TS) heuristic to solve the redundancy allocation problem for multi-state series-parallel systems. The system has a range of performance levels from perfect functioning to complete failure. Identical redundant elements are included in order to achieve a desirable level of availability. The elements of the system are characterized by their cost, performance and availability. These elements are chosen from a list of products available in the market. System availability is defined as the ability to satisfy consumer demand, which is represented as a piecewise cumulative load curve. A universal generating function technique is applied to evaluate system availability. The proposed TS heuristic determines the minimal cost system configuration under availability constraints. An originality of our approach is that it proceeds by dividing the search space into a set of disjoint subsets, and then by applying TS to each subset. The design problem, solved in this study, has been previously analyzed using genetic algorithms (GAs). Numerical results for the test problems from previous research are reported, and larger test problems are randomly generated. Comparisons show that the proposed TS out-performs GA solutions, in terms of both the solution quality and the execution time
Directory of Open Access Journals (Sweden)
Masoud Rabbani
2016-12-01
Full Text Available This paper deals with mixed model assembly line (MMAL balancing problem of type-I. In MMALs several products are made on an assembly line while the similarity of these products is so high. As a result, it is possible to assemble several types of products simultaneously without any additional setup times. The problem has some particular features such as parallel workstations and precedence constraints in dynamic periods in which each period also effects on its next period. The research intends to reduce the number of workstations and maximize the workload smoothness between workstations. Dynamic periods are used to determine all variables in different periods to achieve efficient solutions. A non-dominated sorting genetic algorithm (NSGA-II and multi-objective particle swarm optimization (MOPSO are used to solve the problem. The proposed model is validated with GAMS software for small size problem and the performance of the foregoing algorithms is compared with each other based on some comparison metrics. The NSGA-II outperforms MOPSO with respect to some comparison metrics used in this paper, but in other metrics MOPSO is better than NSGA-II. Finally, conclusion and future research is provided.
Directory of Open Access Journals (Sweden)
Yaser Afshar
Full Text Available Modern fluorescence microscopy modalities, such as light-sheet microscopy, are capable of acquiring large three-dimensional images at high data rate. This creates a bottleneck in computational processing and analysis of the acquired images, as the rate of acquisition outpaces the speed of processing. Moreover, images can be so large that they do not fit the main memory of a single computer. We address both issues by developing a distributed parallel algorithm for segmentation of large fluorescence microscopy images. The method is based on the versatile Discrete Region Competition algorithm, which has previously proven useful in microscopy image segmentation. The present distributed implementation decomposes the input image into smaller sub-images that are distributed across multiple computers. Using network communication, the computers orchestrate the collectively solving of the global segmentation problem. This not only enables segmentation of large images (we test images of up to 10(10 pixels, but also accelerates segmentation to match the time scale of image acquisition. Such acquisition-rate image segmentation is a prerequisite for the smart microscopes of the future and enables online data compression and interactive experiments.
Afshar, Yaser; Sbalzarini, Ivo F
2016-01-01
Modern fluorescence microscopy modalities, such as light-sheet microscopy, are capable of acquiring large three-dimensional images at high data rate. This creates a bottleneck in computational processing and analysis of the acquired images, as the rate of acquisition outpaces the speed of processing. Moreover, images can be so large that they do not fit the main memory of a single computer. We address both issues by developing a distributed parallel algorithm for segmentation of large fluorescence microscopy images. The method is based on the versatile Discrete Region Competition algorithm, which has previously proven useful in microscopy image segmentation. The present distributed implementation decomposes the input image into smaller sub-images that are distributed across multiple computers. Using network communication, the computers orchestrate the collectively solving of the global segmentation problem. This not only enables segmentation of large images (we test images of up to 10(10) pixels), but also accelerates segmentation to match the time scale of image acquisition. Such acquisition-rate image segmentation is a prerequisite for the smart microscopes of the future and enables online data compression and interactive experiments.
Afshar, Yaser; Sbalzarini, Ivo F.
2016-01-01
Modern fluorescence microscopy modalities, such as light-sheet microscopy, are capable of acquiring large three-dimensional images at high data rate. This creates a bottleneck in computational processing and analysis of the acquired images, as the rate of acquisition outpaces the speed of processing. Moreover, images can be so large that they do not fit the main memory of a single computer. We address both issues by developing a distributed parallel algorithm for segmentation of large fluorescence microscopy images. The method is based on the versatile Discrete Region Competition algorithm, which has previously proven useful in microscopy image segmentation. The present distributed implementation decomposes the input image into smaller sub-images that are distributed across multiple computers. Using network communication, the computers orchestrate the collectively solving of the global segmentation problem. This not only enables segmentation of large images (we test images of up to 1010 pixels), but also accelerates segmentation to match the time scale of image acquisition. Such acquisition-rate image segmentation is a prerequisite for the smart microscopes of the future and enables online data compression and interactive experiments. PMID:27046144
A Parallel, Finite-Volume Algorithm for Large-Eddy Simulation of Turbulent Flows
Bui, Trong T.
1999-01-01
A parallel, finite-volume algorithm has been developed for large-eddy simulation (LES) of compressible turbulent flows. This algorithm includes piecewise linear least-square reconstruction, trilinear finite-element interpolation, Roe flux-difference splitting, and second-order MacCormack time marching. Parallel implementation is done using the message-passing programming model. In this paper, the numerical algorithm is described. To validate the numerical method for turbulence simulation, LES of fully developed turbulent flow in a square duct is performed for a Reynolds number of 320 based on the average friction velocity and the hydraulic diameter of the duct. Direct numerical simulation (DNS) results are available for this test case, and the accuracy of this algorithm for turbulence simulations can be ascertained by comparing the LES solutions with the DNS results. The effects of grid resolution, upwind numerical dissipation, and subgrid-scale dissipation on the accuracy of the LES are examined. Comparison with DNS results shows that the standard Roe flux-difference splitting dissipation adversely affects the accuracy of the turbulence simulation. For accurate turbulence simulations, only 3-5 percent of the standard Roe flux-difference splitting dissipation is needed.
Bonus algorithm for large scale stochastic nonlinear programming problems
Diwekar, Urmila
2015-01-01
This book presents the details of the BONUS algorithm and its real world applications in areas like sensor placement in large scale drinking water networks, sensor placement in advanced power systems, water management in power systems, and capacity expansion of energy systems. A generalized method for stochastic nonlinear programming based on a sampling based approach for uncertainty analysis and statistical reweighting to obtain probability information is demonstrated in this book. Stochastic optimization problems are difficult to solve since they involve dealing with optimization and uncertainty loops. There are two fundamental approaches used to solve such problems. The first being the decomposition techniques and the second method identifies problem specific structures and transforms the problem into a deterministic nonlinear programming problem. These techniques have significant limitations on either the objective function type or the underlying distributions for the uncertain variables. Moreover, these ...
International Nuclear Information System (INIS)
LUCCIO, A.U.; DIMPERIO, N.L.; SAMULYAK, R.; BEEB-WANG, J.
2001-01-01
Simulation of high intensity accelerators leads to the solution of the Poisson Equation, to calculate space charge forces in the presence of acceleration chamber walls. We reduced the problem to ''two-and-a-half'' dimensions for long particle bunches, characteristic of large circular accelerators, and applied the results to the tracking code Orbit
International Nuclear Information System (INIS)
Fonseca, R A; Vieira, J; Silva, L O; Fiuza, F; Davidson, A; Tsung, F S; Mori, W B
2013-01-01
A new generation of laser wakefield accelerators (LWFA), supported by the extreme accelerating fields generated in the interaction of PW-Class lasers and underdense targets, promises the production of high quality electron beams in short distances for multiple applications. Achieving this goal will rely heavily on numerical modelling to further understand the underlying physics and identify optimal regimes, but large scale modelling of these scenarios is computationally heavy and requires the efficient use of state-of-the-art petascale supercomputing systems. We discuss the main difficulties involved in running these simulations and the new developments implemented in the OSIRIS framework to address these issues, ranging from multi-dimensional dynamic load balancing and hybrid distributed/shared memory parallelism to the vectorization of the PIC algorithm. We present the results of the OASCR Joule Metric program on the issue of large scale modelling of LWFA, demonstrating speedups of over 1 order of magnitude on the same hardware. Finally, scalability to over ∼10 6 cores and sustained performance over ∼2 P Flops is demonstrated, opening the way for large scale modelling of LWFA scenarios. (paper)
Solving Large Scale Crew Scheduling Problems in Practice
E.J.W. Abbink (Erwin); L. Albino; T.A.B. Dollevoet (Twan); D. Huisman (Dennis); J. Roussado; R.L. Saldanha
2010-01-01
textabstractThis paper deals with large-scale crew scheduling problems arising at the Dutch railway operator, Netherlands Railways (NS). NS operates about 30,000 trains a week. All these trains need a driver and a certain number of guards. Some labor rules restrict the duties of a certain crew base
International Nuclear Information System (INIS)
Zhang, B.; Li, G.; Wang, W.; Shangguan, D.; Deng, L.
2015-01-01
This paper introduces the Strategy of multilevel hybrid parallelism of JCOGIN Infrastructure on Monte Carlo Particle Transport for the large-scale full-core pin-by-pin simulations. The particle parallelism, domain decomposition parallelism and MPI/OpenMP parallelism are designed and implemented. By the testing, JMCT presents the parallel scalability of JCOGIN, which reaches the parallel efficiency 80% on 120,000 cores for the pin-by-pin computation of the BEAVRS benchmark. (author)
International Nuclear Information System (INIS)
Khisamutdinov, A I; Velker, N N
2014-01-01
The talk examines a system of pairwise interaction particles, which models a rarefied gas in accordance with the nonlinear Boltzmann equation, the master equations of Markov evolution of this system and corresponding numerical Monte Carlo methods. Selection of some optimal method for simulation of rarefied gas dynamics depends on the spatial size of the gas flow domain. For problems with the Knudsen number K n of order unity 'imitation', or 'continuous time', Monte Carlo methods ([2]) are quite adequate and competitive. However if K n ≤ 0.1 (the large sizes), excessive punctuality, namely, the need to see all the pairs of particles in the latter, leads to a significant increase in computational cost(complexity). We are interested in to construct the optimal methods for Boltzmann equation problems with large enough spatial sizes of the flow. Speaking of the optimal, we mean that we are talking about algorithms for parallel computation to be implemented on high-performance multi-processor computers. The characteristic property of large systems is the weak dependence of sub-parts of each other at a sufficiently small time intervals. This property is taken into account in the approximate methods using various splittings of operator of corresponding master equations. In the paper, we develop the approximate method based on the splitting of the operator of master equations system 'over groups of particles' ([7]). The essence of the method is that the system of particles is divided into spatial subparts which are modeled independently for small intervals of time, using the precise 'imitation' method. The type of splitting used is different from other well-known type 'over collisions and displacements', which is an attribute of the known Direct simulation Monte Carlo methods. The second attribute of the last ones is the grid of the 'interaction cells', which is completely absent in the imitation methods. The
Cai, Yong; Cui, Xiangyang; Li, Guangyao; Liu, Wenyang
2018-04-01
The edge-smooth finite element method (ES-FEM) can improve the computational accuracy of triangular shell elements and the mesh partition efficiency of complex models. In this paper, an approach is developed to perform explicit finite element simulations of contact-impact problems with a graphical processing unit (GPU) using a special edge-smooth triangular shell element based on ES-FEM. Of critical importance for this problem is achieving finer-grained parallelism to enable efficient data loading and to minimize communication between the device and host. Four kinds of parallel strategies are then developed to efficiently solve these ES-FEM based shell element formulas, and various optimization methods are adopted to ensure aligned memory access. Special focus is dedicated to developing an approach for the parallel construction of edge systems. A parallel hierarchy-territory contact-searching algorithm (HITA) and a parallel penalty function calculation method are embedded in this parallel explicit algorithm. Finally, the program flow is well designed, and a GPU-based simulation system is developed, using Nvidia's CUDA. Several numerical examples are presented to illustrate the high quality of the results obtained with the proposed methods. In addition, the GPU-based parallel computation is shown to significantly reduce the computing time.
Litvinenko, Alexander
2017-09-26
The main goal of this article is to introduce the parallel hierarchical matrix library HLIBpro to the statistical community. We describe the HLIBCov package, which is an extension of the HLIBpro library for approximating large covariance matrices and maximizing likelihood functions. We show that an approximate Cholesky factorization of a dense matrix of size $2M\\\\times 2M$ can be computed on a modern multi-core desktop in few minutes. Further, HLIBCov is used for estimating the unknown parameters such as the covariance length, variance and smoothness parameter of a Mat\\\\\\'ern covariance function by maximizing the joint Gaussian log-likelihood function. The computational bottleneck here is expensive linear algebra arithmetics due to large and dense covariance matrices. Therefore covariance matrices are approximated in the hierarchical ($\\\\H$-) matrix format with computational cost $\\\\mathcal{O}(k^2n \\\\log^2 n/p)$ and storage $\\\\mathcal{O}(kn \\\\log n)$, where the rank $k$ is a small integer (typically $k<25$), $p$ the number of cores and $n$ the number of locations on a fairly general mesh. We demonstrate a synthetic example, where the true values of known parameters are known. For reproducibility we provide the C++ code, the documentation, and the synthetic data.
Litvinenko, Alexander
2017-09-24
The main goal of this article is to introduce the parallel hierarchical matrix library HLIBpro to the statistical community. We describe the HLIBCov package, which is an extension of the HLIBpro library for approximating large covariance matrices and maximizing likelihood functions. We show that an approximate Cholesky factorization of a dense matrix of size $2M\\\\times 2M$ can be computed on a modern multi-core desktop in few minutes. Further, HLIBCov is used for estimating the unknown parameters such as the covariance length, variance and smoothness parameter of a Mat\\\\\\'ern covariance function by maximizing the joint Gaussian log-likelihood function. The computational bottleneck here is expensive linear algebra arithmetics due to large and dense covariance matrices. Therefore covariance matrices are approximated in the hierarchical ($\\\\mathcal{H}$-) matrix format with computational cost $\\\\mathcal{O}(k^2n \\\\log^2 n/p)$ and storage $\\\\mathcal{O}(kn \\\\log n)$, where the rank $k$ is a small integer (typically $k<25$), $p$ the number of cores and $n$ the number of locations on a fairly general mesh. We demonstrate a synthetic example, where the true values of known parameters are known. For reproducibility we provide the C++ code, the documentation, and the synthetic data.
Energy Technology Data Exchange (ETDEWEB)
Alleon, G. [EADS-CCR, 31 - Blagnac (France); Carpentieri, B.; Du, I.S.; Giraud, L.; Langou, J.; Martin, E. [Cerfacs, 31 - Toulouse (France)
2003-07-01
The boundary element method has become a popular tool for the solution of Maxwell's equations in electromagnetism. It discretizes only the surface of the radiating object and gives rise to linear systems that are smaller in size compared to those arising from finite element or finite difference discretizations. However, these systems are prohibitively demanding in terms of memory for direct methods and challenging to solve by iterative methods. In this paper we address the iterative solution via preconditioned Krylov methods of electromagnetic scattering problems expressed in an integral formulation, with main focus on the design of the pre-conditioner. We consider an approximate inverse method based on the Frobenius-norm minimization with a pattern prescribed in advance. The pre-conditioner is constructed from a sparse approximation of the dense coefficient matrix, and the patterns both for the pre-conditioner and for the coefficient matrix are computed a priori using geometric information from the mesh. We describe the implementation of the approximate inverse in an out-of-core parallel code that uses multipole techniques for the matrix-vector products, and show results on the numerical scalability of our method on systems of size up to one million unknowns. We propose an embedded iterative scheme based on the GMRES method and combined with multipole techniques, aimed at improving the robustness of the approximate inverse for large problems. We prove by numerical experiments that the proposed scheme enables the solution of very large and difficult problems efficiently at reduced computational and memory cost. Finally we perform a preliminary study on a spectral two-level pre-conditioner to enhance the robustness of our method. This numerical technique exploits spectral information of the preconditioned systems to build a low rank-update of the pre-conditioner. (authors)
International Nuclear Information System (INIS)
Alleon, G.; Carpentieri, B.; Du, I.S.; Giraud, L.; Langou, J.; Martin, E.
2003-01-01
The boundary element method has become a popular tool for the solution of Maxwell's equations in electromagnetism. It discretizes only the surface of the radiating object and gives rise to linear systems that are smaller in size compared to those arising from finite element or finite difference discretizations. However, these systems are prohibitively demanding in terms of memory for direct methods and challenging to solve by iterative methods. In this paper we address the iterative solution via preconditioned Krylov methods of electromagnetic scattering problems expressed in an integral formulation, with main focus on the design of the pre-conditioner. We consider an approximate inverse method based on the Frobenius-norm minimization with a pattern prescribed in advance. The pre-conditioner is constructed from a sparse approximation of the dense coefficient matrix, and the patterns both for the pre-conditioner and for the coefficient matrix are computed a priori using geometric information from the mesh. We describe the implementation of the approximate inverse in an out-of-core parallel code that uses multipole techniques for the matrix-vector products, and show results on the numerical scalability of our method on systems of size up to one million unknowns. We propose an embedded iterative scheme based on the GMRES method and combined with multipole techniques, aimed at improving the robustness of the approximate inverse for large problems. We prove by numerical experiments that the proposed scheme enables the solution of very large and difficult problems efficiently at reduced computational and memory cost. Finally we perform a preliminary study on a spectral two-level pre-conditioner to enhance the robustness of our method. This numerical technique exploits spectral information of the preconditioned systems to build a low rank-update of the pre-conditioner. (authors)
Background problem for a large solid angle, high sensitivity detector
International Nuclear Information System (INIS)
Chen, M.
1977-01-01
With extremely good vacuum (10 -11 to 10 -13 torr) and well controlled beams, the ISR has a good reputation for clean beam conditions and low background for most types of experiments. However, for a detector covering a large solid angle, measuring processes with small cross sections (approximately 10 -38 cm 2 ) there are serious background problems which took almost a year to solve. Since ISABELLE may have similar problems, a summary is given of experience at the ISR with the hope that some of the solutions can be installed in ISABELLE at an early stage
Visual Data-Analytics of Large-Scale Parallel Discrete-Event Simulations
Energy Technology Data Exchange (ETDEWEB)
Ross, Caitlin; Carothers, Christopher D.; Mubarak, Misbah; Carns, Philip; Ross, Robert; Li, Jianping Kelvin; Ma, Kwan-Liu
2016-11-13
Parallel discrete-event simulation (PDES) is an important tool in the codesign of extreme-scale systems because PDES provides a cost-effective way to evaluate designs of highperformance computing systems. Optimistic synchronization algorithms for PDES, such as Time Warp, allow events to be processed without global synchronization among the processing elements. A rollback mechanism is provided when events are processed out of timestamp order. Although optimistic synchronization protocols enable the scalability of large-scale PDES, the performance of the simulations must be tuned to reduce the number of rollbacks and provide an improved simulation runtime. To enable efficient large-scale optimistic simulations, one has to gain insight into the factors that affect the rollback behavior and simulation performance. We developed a tool for ROSS model developers that gives them detailed metrics on the performance of their large-scale optimistic simulations at varying levels of simulation granularity. Model developers can use this information for parameter tuning of optimistic simulations in order to achieve better runtime and fewer rollbacks. In this work, we instrument the ROSS optimistic PDES framework to gather detailed statistics about the simulation engine. We have also developed an interactive visualization interface that uses the data collected by the ROSS instrumentation to understand the underlying behavior of the simulation engine. The interface connects real time to virtual time in the simulation and provides the ability to view simulation data at different granularities. We demonstrate the usefulness of our framework by performing a visual analysis of the dragonfly network topology model provided by the CODES simulation framework built on top of ROSS. The instrumentation needs to minimize overhead in order to accurately collect data about the simulation performance. To ensure that the instrumentation does not introduce unnecessary overhead, we perform a
Ab initio nuclear structure - the large sparse matrix eigenvalue problem
Energy Technology Data Exchange (ETDEWEB)
Vary, James P; Maris, Pieter [Department of Physics, Iowa State University, Ames, IA, 50011 (United States); Ng, Esmond; Yang, Chao [Computational Research Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720 (United States); Sosonkina, Masha, E-mail: jvary@iastate.ed [Scalable Computing Laboratory, Ames Laboratory, Iowa State University, Ames, IA, 50011 (United States)
2009-07-01
The structure and reactions of light nuclei represent fundamental and formidable challenges for microscopic theory based on realistic strong interaction potentials. Several ab initio methods have now emerged that provide nearly exact solutions for some nuclear properties. The ab initio no core shell model (NCSM) and the no core full configuration (NCFC) method, frame this quantum many-particle problem as a large sparse matrix eigenvalue problem where one evaluates the Hamiltonian matrix in a basis space consisting of many-fermion Slater determinants and then solves for a set of the lowest eigenvalues and their associated eigenvectors. The resulting eigenvectors are employed to evaluate a set of experimental quantities to test the underlying potential. For fundamental problems of interest, the matrix dimension often exceeds 10{sup 10} and the number of nonzero matrix elements may saturate available storage on present-day leadership class facilities. We survey recent results and advances in solving this large sparse matrix eigenvalue problem. We also outline the challenges that lie ahead for achieving further breakthroughs in fundamental nuclear theory using these ab initio approaches.
Ab initio nuclear structure - the large sparse matrix eigenvalue problem
International Nuclear Information System (INIS)
Vary, James P; Maris, Pieter; Ng, Esmond; Yang, Chao; Sosonkina, Masha
2009-01-01
The structure and reactions of light nuclei represent fundamental and formidable challenges for microscopic theory based on realistic strong interaction potentials. Several ab initio methods have now emerged that provide nearly exact solutions for some nuclear properties. The ab initio no core shell model (NCSM) and the no core full configuration (NCFC) method, frame this quantum many-particle problem as a large sparse matrix eigenvalue problem where one evaluates the Hamiltonian matrix in a basis space consisting of many-fermion Slater determinants and then solves for a set of the lowest eigenvalues and their associated eigenvectors. The resulting eigenvectors are employed to evaluate a set of experimental quantities to test the underlying potential. For fundamental problems of interest, the matrix dimension often exceeds 10 10 and the number of nonzero matrix elements may saturate available storage on present-day leadership class facilities. We survey recent results and advances in solving this large sparse matrix eigenvalue problem. We also outline the challenges that lie ahead for achieving further breakthroughs in fundamental nuclear theory using these ab initio approaches.
Hierarchical approach to optimization of parallel matrix multiplication on large-scale platforms
Hasanov, Khalid; Quintin, Jean-Noë l; Lastovetsky, Alexey
2014-01-01
-scale parallelism in mind. Indeed, while in 1990s a system with few hundred cores was considered a powerful supercomputer, modern top supercomputers have millions of cores. In this paper, we present a hierarchical approach to optimization of message-passing parallel
International Nuclear Information System (INIS)
Liebrock, L.M.; McGrath, J.F.; Hicks, D.L.
1986-01-01
Parallel programming promises improved processing speeds for hydrocodes, magnetohydrocodes, multiphase flow codes, thermal-hydraulics codes, wavecodes and other continuum dynamics codes. This paper presents the results of some investigations of parallel algorithms on three parallel processors: the CRAY X-MP, ELXSI and the HEP computers. Introduction and Background: We report the results of investigations of parallel algorithms for computational continuum dynamics. These programs (hydrocodes, wavecodes, etc.) produce simulations of the solutions to problems arising in the motion of continua: solid dynamics, liquid dynamics, gas dynamics, plasma dynamics, multiphase flow dynamics, thermal-hydraulic dynamics and multimaterial flow dynamics. This report restricts its scope to one-dimensional algorithms such as the von Neumann-Richtmyer (1950) scheme
Directory of Open Access Journals (Sweden)
Xiaoqing Wang
2016-01-01
Full Text Available Parallel analyses about the dynamic responses of a large-scale water conveyance tunnel under seismic excitation are presented in this paper. A full three-dimensional numerical model considering the water-tunnel-soil coupling is established and adopted to investigate the tunnel’s dynamic responses. The movement and sloshing of the internal water are simulated using the multi-material Arbitrary Lagrangian Eulerian (ALE method. Nonlinear fluid–structure interaction (FSI between tunnel and inner water is treated by using the penalty method. Nonlinear soil-structure interaction (SSI between soil and tunnel is dealt with by using the surface to surface contact algorithm. To overcome computing power limitations and to deal with such a large-scale calculation, a parallel algorithm based on the modified recursive coordinate bisection (MRCB considering the balance of SSI and FSI loads is proposed and used. The whole simulation is accomplished on Dawning 5000 A using the proposed MRCB based parallel algorithm optimized to run on supercomputers. The simulation model and the proposed approaches are validated by comparison with the added mass method. Dynamic responses of the tunnel are analyzed and the parallelism is discussed. Besides, factors affecting the dynamic responses are investigated. Better speedup and parallel efficiency show the scalability of the parallel method and the analysis results can be used to aid in the design of water conveyance tunnels.
Parallel real-time visualization system for large-scale simulation. Application to WSPEEDI
International Nuclear Information System (INIS)
Muramatsu, Kazuhiro; Otani, Takayuki; Kitabata, Hideyuki; Matsumoto, Hideki; Takei, Toshifumi; Doi, Shun
2000-01-01
The real-time visualization system, PATRAS (PArallel TRAcking Steering system) has been developed on parallel computing servers. The system performs almost all of the visualization tasks on a parallel computing server, and uses image data compression technique for efficient communication between the server and the client terminal. Therefore, the system realizes high performance concurrent visualization in an internet computing environment. The experience in applying PATRAS to WSPEEDI (Worldwide version of System for Prediction Environmental Emergency Dose Information) is reported. The application of PATRAS to WSPEEDI enables users to understand behaviours of radioactive tracers from different release points easily and quickly. (author)
Liu, Yang; Yucel, Abdulkadir; Bagci, Hakan; Michielssen, Eric
2015-01-01
of processors by leveraging two mechanisms: (i) a hierarchical parallelization strategy to evenly distribute the computation and memory loads at all levels of the PWTD tree among processors, and (ii) a novel asynchronous communication scheme to reduce the cost
International Nuclear Information System (INIS)
Sauget, M.
2007-12-01
This research is about the application of neural networks used in the external radiotherapy domain. The goal is to elaborate a new evaluating system for the radiation dose distributions in heterogeneous environments. The al objective of this work is to build a complete tool kit to evaluate the optimal treatment planning. My st research point is about the conception of an incremental learning algorithm. The interest of my work is to combine different optimizations specialized in the function interpolation and to propose a new algorithm allowing to change the neural network architecture during the learning phase. This algorithm allows to minimise the al size of the neural network while keeping a good accuracy. The second part of my research is to parallelize the previous incremental learning algorithm. The goal of that work is to increase the speed of the learning step as well as the size of the learned dataset needed in a clinical case. For that, our incremental learning algorithm presents an original data decomposition with overlapping, together with a fault tolerance mechanism. My last research point is about a fast and accurate algorithm computing the radiation dose deposit in any heterogeneous environment. At the present time, the existing solutions used are not optimal. The fast solution are not accurate and do not give an optimal treatment planning. On the other hand, the accurate solutions are far too slow to be used in a clinical context. Our algorithm answers to this problem by bringing rapidity and accuracy. The concept is to use a neural network adequately learned together with a mechanism taking into account the environment changes. The advantages of this algorithm is to avoid the use of a complex physical code while keeping a good accuracy and reasonable computation times. (author)
Problems of cleaning the large diameter sections of deep wells
Energy Technology Data Exchange (ETDEWEB)
Patsch, F; Gilicz, B
1966-01-01
In drilling deep wells, great importance is being given to the problem of cutting removal from the hole bottom of sections drilled by large diameter bits. The length of borehole sections drilled by 12-1/4-in. and larger bits has been more than doubled in Hungary in the course of the past 4 years. When the drilling fluid jet is struck against the borehole bottom, pressure waves are brought about which take on a crossed flow pattern and result in a retardation of cleaning of the well bottom, particularly in the case of larger bottom surfaces. In large diameter boreholes, the cleaning efficiency is being achieved by full utilization of the pump power and increased pump delivery. Friction losses in drill pipes are being reduced by using 6-in. XH pipes.
Directory of Open Access Journals (Sweden)
S. Dousthaghi
2012-08-01
Full Text Available This paper considers an economic lot and delivery scheduling problem (ELDSP in a fuzzy environment with the fuzzy shelf life for each product. This problem is formulated in a flexible job shop with unrelated parallel machines, when the planning horizon is finite and it determines lot sizing, scheduling and sequencing, simultaneously. The proposed model of this paper is based on the basic period (BP approach. In this paper, a mixed-integer nonlinear programming (MINLP model is presented and then it is changed into two models in the fuzzy shelf life. The main model is dependent to the multiple basic periods and it is difficult to solve the resulted proposed model for large-scale problems in reasonable amount of time; thus, an efficient heuristic method is proposed to solve the problem. The performance of the proposed model is demonstrated using some numerical examples.
Efficient graph-based dynamic load-balancing for parallel large-scale agent-based traffic simulation
Xu, Y.; Cai, W.; Aydt, H.; Lees, M.; Tolk, A.; Diallo, S.Y.; Ryzhov, I.O.; Yilmaz, L.; Buckley, S.; Miller, J.A.
2014-01-01
One of the issues of parallelizing large-scale agent-based traffic simulations is partitioning and load-balancing. Traffic simulations are dynamic applications where the distribution of workload in the spatial domain constantly changes. Dynamic load-balancing at run-time has shown better efficiency
Parallel algorithms for large-scale biological sequence alignment on Xeon-Phi based clusters.
Lan, Haidong; Chan, Yuandong; Xu, Kai; Schmidt, Bertil; Peng, Shaoliang; Liu, Weiguo
2016-07-19
Computing alignments between two or more sequences are common operations frequently performed in computational molecular biology. The continuing growth of biological sequence databases establishes the need for their efficient parallel implementation on modern accelerators. This paper presents new approaches to high performance biological sequence database scanning with the Smith-Waterman algorithm and the first stage of progressive multiple sequence alignment based on the ClustalW heuristic on a Xeon Phi-based compute cluster. Our approach uses a three-level parallelization scheme to take full advantage of the compute power available on this type of architecture; i.e. cluster-level data parallelism, thread-level coarse-grained parallelism, and vector-level fine-grained parallelism. Furthermore, we re-organize the sequence datasets and use Xeon Phi shuffle operations to improve I/O efficiency. Evaluations show that our method achieves a peak overall performance up to 220 GCUPS for scanning real protein sequence databanks on a single node consisting of two Intel E5-2620 CPUs and two Intel Xeon Phi 7110P cards. It also exhibits good scalability in terms of sequence length and size, and number of compute nodes for both database scanning and multiple sequence alignment. Furthermore, the achieved performance is highly competitive in comparison to optimized Xeon Phi and GPU implementations. Our implementation is available at https://github.com/turbo0628/LSDBS-mpi .
Energy Technology Data Exchange (ETDEWEB)
Cwik, T. [California Institute of Technology, Pasadena, CA (United States); Katz, D.S. [Cray Research, El Segundo, CA (United States)
1996-12-31
Finite element modeling has proven useful for accurately simulating scattered or radiated electromagnetic fields from complex three-dimensional objects whose geometry varies on the scale of a fraction of an electrical wavelength. An unstructured finite element model of realistic objects leads to a large, sparse, system of equations that needs to be solved efficiently with regard to machine memory and execution time. Both factorization and iterative solvers can be used to produce solutions to these systems of equations. Factorization leads to high memory requirements that limit the electrical problem size of three-dimensional objects that can be modeled. An iterative solver can be used to efficiently solve the system without excessive memory use and in a minimal amount of time if the convergence rate is controlled.
International Nuclear Information System (INIS)
Sadjadi, Seyed Jafar; Soltani, R.
2009-01-01
We present a heuristic approach to solve a general framework of serial-parallel redundancy problem where the reliability of the system is maximized subject to some general linear constraints. The complexity of the redundancy problem is generally considered to be NP-Hard and the optimal solution is not normally available. Therefore, to evaluate the performance of the proposed method, a hybrid genetic algorithm is also implemented whose parameters are calibrated via Taguchi's robust design method. Then, various test problems are solved and the computational results indicate that the proposed heuristic approach could provide us some promising reliabilities, which are fairly close to optimal solutions in a reasonable amount of time.
Problems of large-scale vertically-integrated aquaculture
Energy Technology Data Exchange (ETDEWEB)
Webber, H H; Riordan, P F
1976-01-01
The problems of vertically-integrated aquaculture are outlined; they are concerned with: species limitations (in the market, biological and technological); site selection, feed, manpower needs, and legal, institutional and financial requirements. The gaps in understanding of, and the constraints limiting, large-scale aquaculture are listed. Future action is recommended with respect to: types and diversity of species to be cultivated, marketing, biotechnology (seed supply, disease control, water quality and concerted effort), siting, feed, manpower, legal and institutional aids (granting of water rights, grants, tax breaks, duty-free imports, etc.), and adequate financing. The last of hard data based on experience suggests that large-scale vertically-integrated aquaculture is a high risk enterprise, and with the high capital investment required, banks and funding institutions are wary of supporting it. Investment in pilot projects is suggested to demonstrate that large-scale aquaculture can be a fully functional and successful business. Construction and operation of such pilot farms is judged to be in the interests of both the public and private sector.
Xu, Jincheng; Liu, Wei; Wang, Jin; Liu, Linong; Zhang, Jianfeng
2018-02-01
De-absorption pre-stack time migration (QPSTM) compensates for the absorption and dispersion of seismic waves by introducing an effective Q parameter, thereby making it an effective tool for 3D, high-resolution imaging of seismic data. Although the optimal aperture obtained via stationary-phase migration reduces the computational cost of 3D QPSTM and yields 3D stationary-phase QPSTM, the associated computational efficiency is still the main problem in the processing of 3D, high-resolution images for real large-scale seismic data. In the current paper, we proposed a division method for large-scale, 3D seismic data to optimize the performance of stationary-phase QPSTM on clusters of graphics processing units (GPU). Then, we designed an imaging point parallel strategy to achieve an optimal parallel computing performance. Afterward, we adopted an asynchronous double buffering scheme for multi-stream to perform the GPU/CPU parallel computing. Moreover, several key optimization strategies of computation and storage based on the compute unified device architecture (CUDA) were adopted to accelerate the 3D stationary-phase QPSTM algorithm. Compared with the initial GPU code, the implementation of the key optimization steps, including thread optimization, shared memory optimization, register optimization and special function units (SFU), greatly improved the efficiency. A numerical example employing real large-scale, 3D seismic data showed that our scheme is nearly 80 times faster than the CPU-QPSTM algorithm. Our GPU/CPU heterogeneous parallel computing framework significant reduces the computational cost and facilitates 3D high-resolution imaging for large-scale seismic data.
Energy Technology Data Exchange (ETDEWEB)
Bauerle, Matthew [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)
2014-08-01
This project utilizes Graphics Processing Units (GPUs) to compute radiograph simulations for arbitrary objects. The generation of radiographs, also known as the forward projection imaging model, is computationally intensive and not widely utilized. The goal of this research is to develop a massively parallel algorithm that can compute forward projections for objects with a trillion voxels (3D pixels). To achieve this end, the data are divided into blocks that can each t into GPU memory. The forward projected image is also divided into segments to allow for future parallelization and to avoid needless computations.
Shrimankar, D. D.; Sathe, S. R.
2016-01-01
Sequence alignment is an important tool for describing the relationships between DNA sequences. Many sequence alignment algorithms exist, differing in efficiency, in their models of the sequences, and in the relationship between sequences. The focus of this study is to obtain an optimal alignment between two sequences of biological data, particularly DNA sequences. The algorithm is discussed with particular emphasis on time, speedup, and efficiency optimizations. Parallel programming presents a number of critical challenges to application developers. Today’s supercomputer often consists of clusters of SMP nodes. Programming paradigms such as OpenMP and MPI are used to write parallel codes for such architectures. However, the OpenMP programs cannot be scaled for more than a single SMP node. However, programs written in MPI can have more than single SMP nodes. But such a programming paradigm has an overhead of internode communication. In this work, we explore the tradeoffs between using OpenMP and MPI. We demonstrate that the communication overhead incurs significantly even in OpenMP loop execution and increases with the number of cores participating. We also demonstrate a communication model to approximate the overhead from communication in OpenMP loops. Our results are astonishing and interesting to a large variety of input data files. We have developed our own load balancing and cache optimization technique for message passing model. Our experimental results show that our own developed techniques give optimum performance of our parallel algorithm for various sizes of input parameter, such as sequence size and tile size, on a wide variety of multicore architectures. PMID:27932868
Shrimankar, D D; Sathe, S R
2016-01-01
Sequence alignment is an important tool for describing the relationships between DNA sequences. Many sequence alignment algorithms exist, differing in efficiency, in their models of the sequences, and in the relationship between sequences. The focus of this study is to obtain an optimal alignment between two sequences of biological data, particularly DNA sequences. The algorithm is discussed with particular emphasis on time, speedup, and efficiency optimizations. Parallel programming presents a number of critical challenges to application developers. Today's supercomputer often consists of clusters of SMP nodes. Programming paradigms such as OpenMP and MPI are used to write parallel codes for such architectures. However, the OpenMP programs cannot be scaled for more than a single SMP node. However, programs written in MPI can have more than single SMP nodes. But such a programming paradigm has an overhead of internode communication. In this work, we explore the tradeoffs between using OpenMP and MPI. We demonstrate that the communication overhead incurs significantly even in OpenMP loop execution and increases with the number of cores participating. We also demonstrate a communication model to approximate the overhead from communication in OpenMP loops. Our results are astonishing and interesting to a large variety of input data files. We have developed our own load balancing and cache optimization technique for message passing model. Our experimental results show that our own developed techniques give optimum performance of our parallel algorithm for various sizes of input parameter, such as sequence size and tile size, on a wide variety of multicore architectures.
Massively parallel mathematical sieves
Energy Technology Data Exchange (ETDEWEB)
Montry, G.R.
1989-01-01
The Sieve of Eratosthenes is a well-known algorithm for finding all prime numbers in a given subset of integers. A parallel version of the Sieve is described that produces computational speedups over 800 on a hypercube with 1,024 processing elements for problems of fixed size. Computational speedups as high as 980 are achieved when the problem size per processor is fixed. The method of parallelization generalizes to other sieves and will be efficient on any ensemble architecture. We investigate two highly parallel sieves using scattered decomposition and compare their performance on a hypercube multiprocessor. A comparison of different parallelization techniques for the sieve illustrates the trade-offs necessary in the design and implementation of massively parallel algorithms for large ensemble computers.
Extensions of Parallel Coordinates for Interactive Exploration of Large Multi-Timepoint Data Sets
Blaas, J.; Botha, C.P.; Post, F.H.
2008-01-01
Parallel coordinate plots (PCPs) are commonly used in information visualization to provide insight into multi-variate data. These plots help to spot correlations between variables. PCPs have been successfully applied to unstructured datasets up to a few millions of points. In this paper, we present
Node-based finite element method for large-scale adaptive fluid analysis in parallel environments
Energy Technology Data Exchange (ETDEWEB)
Toshimitsu, Fujisawa [Tokyo Univ., Collaborative Research Center of Frontier Simulation Software for Industrial Science, Institute of Industrial Science (Japan); Genki, Yagawa [Tokyo Univ., Department of Quantum Engineering and Systems Science (Japan)
2003-07-01
In this paper, a FEM-based (finite element method) mesh free method with a probabilistic node generation technique is presented. In the proposed method, all computational procedures, from the mesh generation to the solution of a system of equations, can be performed fluently in parallel in terms of nodes. Local finite element mesh is generated robustly around each node, even for harsh boundary shapes such as cracks. The algorithm and the data structure of finite element calculation are based on nodes, and parallel computing is realized by dividing a system of equations by the row of the global coefficient matrix. In addition, the node-based finite element method is accompanied by a probabilistic node generation technique, which generates good-natured points for nodes of finite element mesh. Furthermore, the probabilistic node generation technique can be performed in parallel environments. As a numerical example of the proposed method, we perform a compressible flow simulation containing strong shocks. Numerical simulations with frequent mesh refinement, which are required for such kind of analysis, can effectively be performed on parallel processors by using the proposed method. (authors)
Large parallel volumes of finite and compact sets in d-dimensional Euclidean space
DEFF Research Database (Denmark)
Kampf, Jürgen; Kiderlen, Markus
The r-parallel volume V (Cr) of a compact subset C in d-dimensional Euclidean space is the volume of the set Cr of all points of Euclidean distance at most r > 0 from C. According to Steiner’s formula, V (Cr) is a polynomial in r when C is convex. For finite sets C satisfying a certain geometric...
Node-based finite element method for large-scale adaptive fluid analysis in parallel environments
International Nuclear Information System (INIS)
Toshimitsu, Fujisawa; Genki, Yagawa
2003-01-01
In this paper, a FEM-based (finite element method) mesh free method with a probabilistic node generation technique is presented. In the proposed method, all computational procedures, from the mesh generation to the solution of a system of equations, can be performed fluently in parallel in terms of nodes. Local finite element mesh is generated robustly around each node, even for harsh boundary shapes such as cracks. The algorithm and the data structure of finite element calculation are based on nodes, and parallel computing is realized by dividing a system of equations by the row of the global coefficient matrix. In addition, the node-based finite element method is accompanied by a probabilistic node generation technique, which generates good-natured points for nodes of finite element mesh. Furthermore, the probabilistic node generation technique can be performed in parallel environments. As a numerical example of the proposed method, we perform a compressible flow simulation containing strong shocks. Numerical simulations with frequent mesh refinement, which are required for such kind of analysis, can effectively be performed on parallel processors by using the proposed method. (authors)
Eigensolution of finite element problems in a completely connected parallel architecture
Akl, Fred A.; Morel, Michael R.
1989-01-01
A parallel algorithm for the solution of the generalized eigenproblem in linear elastic finite element analysis, (K)(phi)=(M)(phi)(omega), where (K) and (M) are of order N, and (omega) is of order q is presented. The parallel algorithm is based on a completely connected parallel architecture in which each processor is allowed to communicate with all other processors. The algorithm has been successfully implemented on a tightly coupled multiple-instruction-multiple-data (MIMD) parallel processing computer, Cray X-MP. A finite element model is divided into m domains each of which is assumed to process n elements. Each domain is then assigned to a processor, or to a logical processor (task) if the number of domains exceeds the number of physical processors. The macro-tasking library routines are used in mapping each domain to a user task. Computational speed-up and efficiency are used to determine the effectiveness of the algorithm. The effect of the number of domains, the number of degrees-of-freedom located along the global fronts and the dimension of the subspace on the performance of the algorithm are investigated. For a 64-element rectangular plate, speed-ups of 1.86, 3.13, 3.18 and 3.61 are achieved on two, four, six and eight processors, respectively.
Directory of Open Access Journals (Sweden)
Mihai-Victor PRICOP
2010-09-01
Full Text Available The present paper introduces a numerical approach of static linear elasticity equations for anisotropic materials. The domain and boundary conditions are simple, to enhance an easy implementation of the finite difference scheme. SOR and gradient are used to solve the resulting linear system. The simplicity of the geometry is also useful for MPI parallelization of the code.
The Problem with Using Historical Parallels as a Method in Holocaust and Genocide Teaching
Avraham, Doron
2010-01-01
Teaching the Holocaust in multicultural classrooms and in places which have experienced mass violence raises the question of whether specific methods of teaching are required. One of the answers is that Holocaust education in these cases should facilitate the creation of parallels and similarities between past events and the experiences of the…
Energy Technology Data Exchange (ETDEWEB)
Thiess, Alexander R.
2011-12-19
In this thesis we present the development of the self-consistent, full-potential Korringa-Kohn-Rostoker (KKR) Green function method KKRnano for calculating the electronic properties, magnetic interactions, and total energy including all electrons on the basis of the density functional theory (DFT) on high-end massively parallelized high-performance computers for supercells containing thousands of atoms without sacrifice of accuracy. KKRnano was used for the following two applications. The first application is centered in the field of dilute magnetic semiconductors. In this field a new promising material combination was identified: gadolinium doped gallium nitride which shows ferromagnetic ordering of colossal magnetic moments above room temperature. It quickly turned out that additional extrinsic defects are inducing the striking properties. However, the question which kind of extrinsic defects are present in experimental samples is still unresolved. In order to shed light on this open question, we perform extensive studies of the most promising candidates: interstitial nitrogen and oxygen, as well as gallium vacancies. By analyzing the pairwise magnetic coupling among defects it is shown that nitrogen and oxygen interstitials cannot support thermally stable ferromagnetic order. Gallium vacancies, on the other hand, facilitate an important coupling mechanism. The vacancies are found to induce large magnetic moments on all surrounding nitrogen sites, which then couple ferromagnetically both among themselves and with the gadolinium dopants. Based on a statistical evaluation it can be concluded that already small concentrations of gallium vacancies can lead to a distinct long-range ferromagnetic ordering. Beyond this important finding we present further indications, from which we infer that gallium vacancies likely cause the striking ferromagnetic coupling of colossal magnetic moments in GaN:Gd. The second application deals with the phase-change material germanium
International Nuclear Information System (INIS)
Thiess, Alexander R.
2011-01-01
In this thesis we present the development of the self-consistent, full-potential Korringa-Kohn-Rostoker (KKR) Green function method KKRnano for calculating the electronic properties, magnetic interactions, and total energy including all electrons on the basis of the density functional theory (DFT) on high-end massively parallelized high-performance computers for supercells containing thousands of atoms without sacrifice of accuracy. KKRnano was used for the following two applications. The first application is centered in the field of dilute magnetic semiconductors. In this field a new promising material combination was identified: gadolinium doped gallium nitride which shows ferromagnetic ordering of colossal magnetic moments above room temperature. It quickly turned out that additional extrinsic defects are inducing the striking properties. However, the question which kind of extrinsic defects are present in experimental samples is still unresolved. In order to shed light on this open question, we perform extensive studies of the most promising candidates: interstitial nitrogen and oxygen, as well as gallium vacancies. By analyzing the pairwise magnetic coupling among defects it is shown that nitrogen and oxygen interstitials cannot support thermally stable ferromagnetic order. Gallium vacancies, on the other hand, facilitate an important coupling mechanism. The vacancies are found to induce large magnetic moments on all surrounding nitrogen sites, which then couple ferromagnetically both among themselves and with the gadolinium dopants. Based on a statistical evaluation it can be concluded that already small concentrations of gallium vacancies can lead to a distinct long-range ferromagnetic ordering. Beyond this important finding we present further indications, from which we infer that gallium vacancies likely cause the striking ferromagnetic coupling of colossal magnetic moments in GaN:Gd. The second application deals with the phase-change material germanium
Large scale simulations of lattice QCD thermodynamics on Columbia Parallel Supercomputers
International Nuclear Information System (INIS)
Ohta, Shigemi
1989-01-01
The Columbia Parallel Supercomputer project aims at the construction of a parallel processing, multi-gigaflop computer optimized for numerical simulations of lattice QCD. The project has three stages; 16-node, 1/4GF machine completed in April 1985, 64-node, 1GF machine completed in August 1987, and 256-node, 16GF machine now under construction. The machines all share a common architecture; a two dimensional torus formed from a rectangular array of N 1 x N 2 independent and identical processors. A processor is capable of operating in a multi-instruction multi-data mode, except for periods of synchronous interprocessor communication with its four nearest neighbors. Here the thermodynamics simulations on the two working machines are reported. (orig./HSI)
Large Scale Parallel DNA Detection by Two-Dimensional Solid-State Multipore Systems.
Athreya, Nagendra Bala Murali; Sarathy, Aditya; Leburton, Jean-Pierre
2018-04-23
We describe a scalable device design of a dense array of multiple nanopores made from nanoscale semiconductor materials to detect and identify translocations of many biomolecules in a massively parallel detection scheme. We use molecular dynamics coupled to nanoscale device simulations to illustrate the ability of this device setup to uniquely identify DNA parallel translocations. We show that the transverse sheet currents along membranes are immune to the crosstalk effects arising from simultaneous translocations of biomolecules through multiple pores, due to their ability to sense only the local potential changes. We also show that electronic sensing across the nanopore membrane offers a higher detection resolution compared to ionic current blocking technique in a multipore setup, irrespective of the irregularities that occur while fabricating the nanopores in a two-dimensional membrane.
Hierarchical Parallel Matrix Multiplication on Large-Scale Distributed Memory Platforms
Quintin, Jean-Noel
2013-10-01
Matrix multiplication is a very important computation kernel both in its own right as a building block of many scientific applications and as a popular representative for other scientific applications. Cannon\\'s algorithm which dates back to 1969 was the first efficient algorithm for parallel matrix multiplication providing theoretically optimal communication cost. However this algorithm requires a square number of processors. In the mid-1990s, the SUMMA algorithm was introduced. SUMMA overcomes the shortcomings of Cannon\\'s algorithm as it can be used on a nonsquare number of processors as well. Since then the number of processors in HPC platforms has increased by two orders of magnitude making the contribution of communication in the overall execution time more significant. Therefore, the state of the art parallel matrix multiplication algorithms should be revisited to reduce the communication cost further. This paper introduces a new parallel matrix multiplication algorithm, Hierarchical SUMMA (HSUMMA), which is a redesign of SUMMA. Our algorithm reduces the communication cost of SUMMA by introducing a two-level virtual hierarchy into the two-dimensional arrangement of processors. Experiments on an IBM BlueGene/P demonstrate the reduction of communication cost up to 2.08 times on 2048 cores and up to 5.89 times on 16384 cores. © 2013 IEEE.
Hierarchical Parallel Matrix Multiplication on Large-Scale Distributed Memory Platforms
Quintin, Jean-Noel; Hasanov, Khalid; Lastovetsky, Alexey
2013-01-01
Matrix multiplication is a very important computation kernel both in its own right as a building block of many scientific applications and as a popular representative for other scientific applications. Cannon's algorithm which dates back to 1969 was the first efficient algorithm for parallel matrix multiplication providing theoretically optimal communication cost. However this algorithm requires a square number of processors. In the mid-1990s, the SUMMA algorithm was introduced. SUMMA overcomes the shortcomings of Cannon's algorithm as it can be used on a nonsquare number of processors as well. Since then the number of processors in HPC platforms has increased by two orders of magnitude making the contribution of communication in the overall execution time more significant. Therefore, the state of the art parallel matrix multiplication algorithms should be revisited to reduce the communication cost further. This paper introduces a new parallel matrix multiplication algorithm, Hierarchical SUMMA (HSUMMA), which is a redesign of SUMMA. Our algorithm reduces the communication cost of SUMMA by introducing a two-level virtual hierarchy into the two-dimensional arrangement of processors. Experiments on an IBM BlueGene/P demonstrate the reduction of communication cost up to 2.08 times on 2048 cores and up to 5.89 times on 16384 cores. © 2013 IEEE.
Tomar, S.K.
2002-01-01
It is well known that elliptic problems when posed on non-smooth domains, develop singularities. We examine such problems within the framework of spectral element methods and resolve the singularities with exponential accuracy.
Fishery management problems and possibilities on large southeastern reservoirs
Parsons, John W.
1958-01-01
Principal problems concerning the fisheries of large reservoirs in the Southeast are: inefficient and highly selective exploitation of fish stocks, and protection and reclamation of damaged or threatened fisheries in tailwaters and tributary streams. Seven mainstream reservoirs on which data are available support an average angling pressure of 4.9 trips per acre per year and an average catch of 16 pounds of sport fish and 6 pounds of food fish. Commercial take is 7 pounds per acre. The rate of catch of sport fish, based upon tag returns, is only 3 percent. Sixteen storage reservoirs support an average angling pressure of 5.0 trips per acre per year and an average catch of 13 pounds of sport fish and 1 pound of food fish. Commercial catch is of no significance. Average rate of catch of sport fish is 17 percent of the catchable population. Fish population studies indicate that there are twice as many sport fish and four times as many food fish in mainstream than there are in storage reservoirs.
Parallel runs of a large air pollution model on a grid of Sun computers
DEFF Research Database (Denmark)
Alexandrov, V.N.; Owczarz, W.; Thomsen, Per Grove
2004-01-01
Large -scale air pollution models can successfully be used in different environmental studies. These models are described mathematically by systems of partial differential equations. Splitting procedures followed by discretization of the spatial derivatives leads to several large systems...
A Boundary Element Solution to the Problem of Interacting AC Fields in Parallel Conductors
Directory of Open Access Journals (Sweden)
Einar M. Rønquist
1984-04-01
Full Text Available The ac fields in electrically insulated conductors will interact through the surrounding electromagnetic fields. The pertinent field equations reduce to the Helmholtz equation inside each conductor (interior problem, and to the Laplace equation outside the conductors (exterior problem. These equations are transformed to integral equations, with the magnetic vector potential and its normal derivative on the boundaries as unknowns. The integral equations are then approximated by sets of algebraic equations. The interior problem involves only unknowns on the boundary of each conductor, while the exterior problem couples unknowns from several conductors. The interior and the exterior problem are coupled through the field continuity conditions. The full set of equations is solved by standard Gaussian elimination. We also show how the total current and the dissipated power within each conductor can be expressed as boundary integrals. Finally, computational results for a sample problem are compared with a finite difference solution.
Parallel Implementation of Riccati Recursion for Solving Linear-Quadratic Control Problems
DEFF Research Database (Denmark)
Frison, Gianluca; Jørgensen, John Bagterp
2013-01-01
In both Active-Set (AS) and Interior-Point (IP) algorithms for Model Predictive Control (MPC), sub-problems in the form of linear-quadratic (LQ) control problems need to be solved at each iteration. The solution of these sub-problems is usually the main computational effort. In this paper...... an alternative version of the Riccati recursion solver for LQ control problems is presented. The performance of both the classical and the alternative version is analyzed from a theoretical as well as a numerical point of view, and the alternative version is found to be approximately 50% faster than...
DEFF Research Database (Denmark)
Knecht, Stefan; Jensen, Hans Jørgen Aagaard; Fleig, Timo
2010-01-01
We present a parallel implementation of a large-scale relativistic double-group configuration interaction CIprogram. It is applicable with a large variety of two- and four-component Hamiltonians. The parallel algorithm is based on a distributed data model in combination with a static load balanci...
Radiation problems in the design of the large electron-positron collider (LEP)
International Nuclear Information System (INIS)
Fasso, A.; Goebel, K.; Hoefert, M.; Rau, G.; Schoenbacher, H.; Stevenson, G.R.; Sullivan, A.H.; Swanson, W.P.; Tuyn, J.W.N.
1984-01-01
This is a comprehensive review of the radiation problems taken into account in the design studies for the Large Electron-Positron collider (LEP) now under construction at CERN. It provides estimates and calculations of the magnitude of the most important hazards, including those from non-ionizing radiations and magnetic fields as well as from ionizing radiation, and describes the measures to be taken in the design, construction, and operation to limit them. Damage to components is considered as well as the risk to people. More general explanations are given of the physical processes and technical parameters that influence the production and effects of radiation, and a comprehensive bibliography provides access to the basic theories and other discussions of the subject. The report effectively summarizes the findings of the Working Group on LEP radiation problems and parallels the results of analogous studies made for the previous large accelerator. The concluding chapters describe the LEP radiation protection system, which is foreseen to reduce doses far below the legal limits for all those working with the machine or living nearby, and summarize the environmental impact. Costs are also briefly considered. (orig.)
Crockett, Thomas W.
1995-01-01
This article provides a broad introduction to the subject of parallel rendering, encompassing both hardware and software systems. The focus is on the underlying concepts and the issues which arise in the design of parallel rendering algorithms and systems. We examine the different types of parallelism and how they can be applied in rendering applications. Concepts from parallel computing, such as data decomposition, task granularity, scalability, and load balancing, are considered in relation to the rendering problem. We also explore concepts from computer graphics, such as coherence and projection, which have a significant impact on the structure of parallel rendering algorithms. Our survey covers a number of practical considerations as well, including the choice of architectural platform, communication and memory requirements, and the problem of image assembly and display. We illustrate the discussion with numerous examples from the parallel rendering literature, representing most of the principal rendering methods currently used in computer graphics.
International Nuclear Information System (INIS)
Takei, Toshifumi; Doi, Shun; Matsumoto, Hideki; Muramatsu, Kazuhiro
2000-01-01
We have developed a concurrent visualization system RVSLIB (Real-time Visual Simulation Library). This paper shows the effectiveness of the system when it is applied to large-scale unsteady simulations, for which the conventional post-processing approach may no longer work, on high-performance parallel vector supercomputers. The system performs almost all of the visualization tasks on a computation server and uses compressed visualized image data for efficient communication between the server and the user terminal. We have introduced several techniques, including vectorization and parallelization, into the system to minimize the computational costs of the visualization tools. The performance of RVSLIB was evaluated by using an actual CFD code on an NEC SX-4. The computational time increase due to the concurrent visualization was at most 3% for a smaller (1.6 million) grid and less than 1% for a larger (6.2 million) one. (author)
Xu, Zhenzhen; Zou, Yongxing; Kong, Xiangjie
2015-01-01
To our knowledge, this paper investigates the first application of meta-heuristic algorithms to tackle the parallel machines scheduling problem with weighted late work criterion and common due date ([Formula: see text]). Late work criterion is one of the performance measures of scheduling problems which considers the length of late parts of particular jobs when evaluating the quality of scheduling. Since this problem is known to be NP-hard, three meta-heuristic algorithms, namely ant colony system, genetic algorithm, and simulated annealing are designed and implemented, respectively. We also propose a novel algorithm named LDF (largest density first) which is improved from LPT (longest processing time first). The computational experiments compared these meta-heuristic algorithms with LDF, LPT and LS (list scheduling), and the experimental results show that SA performs the best in most cases. However, LDF is better than SA in some conditions, moreover, the running time of LDF is much shorter than SA.
Liu, Yang; Bagci, Hakan; Michielssen, Eric
2013-01-01
numbers of temporal and spatial basis functions discretizing the current [Shanker et al., IEEE Trans. Antennas Propag., 51, 628-641, 2003]. In the past, serial versions of these solvers have been successfully applied to the analysis of scattering from
Solving Large-Scale QAP Problems in Parallel with the Search
DEFF Research Database (Denmark)
Clausen, Jens; Brüngger, A.; Marzetta, A.
1998-01-01
Program libraries are one tool to make the cooperation between specialists from various fields successful: the separation of application-specific knowledge from application-independent tasks ensures portability, maintenance, extensibility, and flexibility. The current paper demonstrates the success...
International Nuclear Information System (INIS)
Tosic, P.T.
2011-01-01
We study certain types of Cellular Automata (CA) viewed as an abstraction of large-scale Multi-Agent Systems (MAS). We argue that the classical CA model needs to be modified in several important respects, in order to become a relevant and sufficiently general model for the large-scale MAS, and so that thus generalized model can capture many important MAS properties at the level of agent ensembles and their long-term collective behavior patterns. We specifically focus on the issue of inter-agent communication in CA, and propose sequential cellular automata (SCA) as the first step, and genuinely Asynchronous Cellular Automata (ACA) as the ultimate deterministic CA-based abstract models for large-scale MAS made of simple reactive agents. We first formulate deterministic and nondeterministic versions of sequential CA, and then summarize some interesting configuration space properties (i.e., possible behaviors) of a restricted class of sequential CA. In particular, we compare and contrast those properties of sequential CA with the corresponding properties of the classical (that is, parallel and perfectly synchronous) CA with the same restricted class of update rules. We analytically demonstrate failure of the studied sequential CA models to simulate all possible behaviors of perfectly synchronous parallel CA, even for a very restricted class of non-linear totalistic node update rules. The lesson learned is that the interleaving semantics of concurrency, when applied to sequential CA, is not refined enough to adequately capture the perfect synchrony of parallel CA updates. Last but not least, we outline what would be an appropriate CA-like abstraction for large-scale distributed computing insofar as the inter-agent communication model is concerned, and in that context we propose genuinely asynchronous CA. (author)
DEFF Research Database (Denmark)
Rasmussen, Marie-Louise Højlund; Stolpe, Mathias
2008-01-01
the physics, and the cuts (Combinatorial Benders’ and projected Chvátal–Gomory) come from an understanding of the particular mathematical structure of the reformulation. The impact of a stronger representation is investigated on several truss topology optimization problems in two and three dimensions.......The subject of this article is solving discrete truss topology optimization problems with local stress and displacement constraints to global optimum. We consider a formulation based on the Simultaneous ANalysis and Design (SAND) approach. This intrinsically non-convex problem is reformulated...
Fast Simulation of Large-Scale Floods Based on GPU Parallel Computing
Qiang Liu; Yi Qin; Guodong Li
2018-01-01
Computing speed is a significant issue of large-scale flood simulations for real-time response to disaster prevention and mitigation. Even today, most of the large-scale flood simulations are generally run on supercomputers due to the massive amounts of data and computations necessary. In this work, a two-dimensional shallow water model based on an unstructured Godunov-type finite volume scheme was proposed for flood simulation. To realize a fast simulation of large-scale floods on a personal...
International Nuclear Information System (INIS)
Yamamoto, Akio; Hashimoto, Hiroshi
2002-01-01
The distributed genetic algorithm (DGA) is applied for loading pattern optimization problems of the pressurized water reactors. A basic concept of DGA follows that of the conventional genetic algorithm (GA). However, DGA equally distributes candidates of solutions (i.e. loading patterns) to several independent ''islands'' and evolves them in each island. Communications between islands, i.e. migrations of some candidates between islands are performed with a certain period. Since candidates of solutions independently evolve in each island while accepting different genes of migrants, premature convergence in the conventional GA can be prevented. Because many candidate loading patterns should be evaluated in GA or DGA, the parallelization is efficient to reduce turn around time. Parallel efficiency of DGA was measured using our optimization code and good efficiency was attained even in a heterogeneous cluster environment due to dynamic distribution of the calculation load. The optimization code is based on the client/server architecture with the TCP/IP native socket and a client (optimization) module and calculation server modules communicate the objects of loading patterns each other. Throughout the sensitivity study on optimization parameters of DGA, a suitable set of the parameters for a test problem was identified. Finally, optimization capability of DGA and the conventional GA was compared in the test problem and DGA provided better optimization results than the conventional GA. (author)
Properties and solution methods for large location-allocation problems
DEFF Research Database (Denmark)
Juel, Henrik; Love, Robert F.
1982-01-01
Location-allocation with l$ _p$ distances is studied. It is shown that this structure can be expressed as a concave minimization programming problem. Since concave minimization algorithms are not yet well developed, five solution methods are developed which utilize the special properties of the l......Location-allocation with l$ _p$ distances is studied. It is shown that this structure can be expressed as a concave minimization programming problem. Since concave minimization algorithms are not yet well developed, five solution methods are developed which utilize the special properties...... of the location-allocation problem. Using the rectilinear distance measure, two of these algorithms achieved optimal solutions in all 102 test problems for which solutions were known. The algorithms can be applied to much larger problems than any existing exact methods....
Koldan, Jelena; Puzyrev, Vladimir; de la Puente, Josep; Houzeaux, Guillaume; Cela, José María
2014-06-01
We present an elaborate preconditioning scheme for Krylov subspace methods which has been developed to improve the performance and reduce the execution time of parallel node-based finite-element (FE) solvers for 3-D electromagnetic (EM) numerical modelling in exploration geophysics. This new preconditioner is based on algebraic multigrid (AMG) that uses different basic relaxation methods, such as Jacobi, symmetric successive over-relaxation (SSOR) and Gauss-Seidel, as smoothers and the wave front algorithm to create groups, which are used for a coarse-level generation. We have implemented and tested this new preconditioner within our parallel nodal FE solver for 3-D forward problems in EM induction geophysics. We have performed series of experiments for several models with different conductivity structures and characteristics to test the performance of our AMG preconditioning technique when combined with biconjugate gradient stabilized method. The results have shown that, the more challenging the problem is in terms of conductivity contrasts, ratio between the sizes of grid elements and/or frequency, the more benefit is obtained by using this preconditioner. Compared to other preconditioning schemes, such as diagonal, SSOR and truncated approximate inverse, the AMG preconditioner greatly improves the convergence of the iterative solver for all tested models. Also, when it comes to cases in which other preconditioners succeed to converge to a desired precision, AMG is able to considerably reduce the total execution time of the forward-problem code-up to an order of magnitude. Furthermore, the tests have confirmed that our AMG scheme ensures grid-independent rate of convergence, as well as improvement in convergence regardless of how big local mesh refinements are. In addition, AMG is designed to be a black-box preconditioner, which makes it easy to use and combine with different iterative methods. Finally, it has proved to be very practical and efficient in the
Nuclear EMP simulation for large-scale urban environments. FDTD for electrically large problems.
Energy Technology Data Exchange (ETDEWEB)
Smith, William S. [Los Alamos National Laboratory; Bull, Jeffrey S. [Los Alamos National Laboratory; Wilcox, Trevor [Los Alamos National Laboratory; Bos, Randall J. [Los Alamos National Laboratory; Shao, Xuan-Min [Los Alamos National Laboratory; Goorley, John T. [Los Alamos National Laboratory; Costigan, Keeley R. [Los Alamos National Laboratory
2012-08-13
In case of a terrorist nuclear attack in a metropolitan area, EMP measurement could provide: (1) a prompt confirmation of the nature of the explosion (chemical or nuclear) for emergency response; and (2) and characterization parameters of the device (reaction history, yield) for technical forensics. However, urban environment could affect the fidelity of the prompt EMP measurement (as well as all other types of prompt measurement): (1) Nuclear EMP wavefront would no longer be coherent, due to incoherent production, attenuation, and propagation of gamma and electrons; and (2) EMP propagation from source region outward would undergo complicated transmission, reflection, and diffraction processes. EMP simulation for electrically-large urban environment: (1) Coupled MCNP/FDTD (Finite-difference time domain Maxwell solver) approach; and (2) FDTD tends to be limited to problems that are not 'too' large compared to the wavelengths of interest because of numerical dispersion and anisotropy. We use a higher-order low-dispersion, isotropic FDTD algorithm for EMP propagation.
Li, Kenli; Zou, Shuting; Xv, Jin
2008-01-01
Elliptic curve cryptographic algorithms convert input data to unrecognizable encryption and the unrecognizable data back again into its original decrypted form. The security of this form of encryption hinges on the enormous difficulty that is required to solve the elliptic curve discrete logarithm problem (ECDLP), especially over GF(2(n)), n in Z+. This paper describes an effective method to find solutions to the ECDLP by means of a molecular computer. We propose that this research accomplishment would represent a breakthrough for applied biological computation and this paper demonstrates that in principle this is possible. Three DNA-based algorithms: a parallel adder, a parallel multiplier, and a parallel inverse over GF(2(n)) are described. The biological operation time of all of these algorithms is polynomial with respect to n. Considering this analysis, cryptography using a public key might be less secure. In this respect, a principal contribution of this paper is to provide enhanced evidence of the potential of molecular computing to tackle such ambitious computations.
Mathematical programming methods for large-scale topology optimization problems
DEFF Research Database (Denmark)
Rojas Labanda, Susana
for mechanical problems, but has rapidly extended to many other disciplines, such as fluid dynamics and biomechanical problems. However, the novelty and improvements of optimization methods has been very limited. It is, indeed, necessary to develop of new optimization methods to improve the final designs......, and at the same time, reduce the number of function evaluations. Nonlinear optimization methods, such as sequential quadratic programming and interior point solvers, have almost not been embraced by the topology optimization community. Thus, this work is focused on the introduction of this kind of second...... for the classical minimum compliance problem. Two of the state-of-the-art optimization algorithms are investigated and implemented for this structural topology optimization problem. A Sequential Quadratic Programming (TopSQP) and an interior point method (TopIP) are developed exploiting the specific mathematical...
E.J. Spee (Edwin); P.M. de Zeeuw (Paul); J.G. Verwer (Jan); J.G. Blom (Joke); W. Hundsdorfer (Willem)
1996-01-01
textabstractAtmospheric air quality modeling relies in part on numerical simulation. Required numerical simulations are often hampered by lack of computer capacity and computational speed. This problem is most severe in the field of global modeling where transport and exchange of trace constituents
A fine-grained parallel algorithm for the cyclic flexible job shop problem
Directory of Open Access Journals (Sweden)
Bożejko Wojciech
2017-06-01
Full Text Available In this paper there is considered a flexible job shop problem of operations scheduling. The new, very fast method of determination of cycle time is presented. In the design of heuristic algorithm there was the neighborhood inspired by the game of golf applied. Lower bound of the criterion function was used in the search of the neighborhood.
International Nuclear Information System (INIS)
Biage, M.
1983-04-01
A heat transfer problem in parallel plates with infinite with has been solved, with axial heat conduction in the fluid and in the wall, considering steady-state laminar flow for a Newtonian fluid and a fully developed velocity profile. The duct consists of an infinite inicial part, insulated on both plates, an intermediale part of finite length, with a prescribed heat flux in the upper plate and insulated on the botton plate, and by another infinite part also insulated on both plates. The problem has been solved by a numerical combination of the integral equation method and the variational method. Both, the performance of the numerical technique employed and results obtained are analyzed in this work. It is demostrated that the heat conduction in the wall significantly modifies the heat transfer parameters. (Author) [pt
Gross, Lutz; Altinay, Cihan; Fenwick, Joel; Smith, Troy
2014-05-01
inversion and appropriate solution schemes in escript. We will also give a brief introduction into escript's open framework for defining and solving geophysical inversion problems. Finally we will show some benchmark results to demonstrate the computational scalability of the inversion method across a large number of cores and compute nodes in a parallel computing environment. References: - L. Gross et al. (2013): Escript Solving Partial Differential Equations in Python Version 3.4, The University of Queensland, https://launchpad.net/escript-finley - L. Gross and C. Kemp (2013) Large Scale Joint Inversion of Geophysical Data using the Finite Element Method in escript. ASEG Extended Abstracts 2013, http://dx.doi.org/10.1071/ASEG2013ab306 - T. Poulet, L. Gross, D. Georgiev, J. Cleverley (2012): escript-RT: Reactive transport simulation in Python using escript, Computers & Geosciences, Volume 45, 168-176. http://dx.doi.org/10.1016/j.cageo.2011.11.005.
Energy Technology Data Exchange (ETDEWEB)
Philip, Bobby, E-mail: philipb@ornl.gov [Oak Ridge National Laboratory, One Bethel Valley Road, Oak Ridge, TN 37831 (United States); Berrill, Mark A.; Allu, Srikanth; Hamilton, Steven P.; Sampath, Rahul S.; Clarno, Kevin T. [Oak Ridge National Laboratory, One Bethel Valley Road, Oak Ridge, TN 37831 (United States); Dilts, Gary A. [Los Alamos National Laboratory, PO Box 1663, Los Alamos, NM 87545 (United States)
2015-04-01
This paper describes an efficient and nonlinearly consistent parallel solution methodology for solving coupled nonlinear thermal transport problems that occur in nuclear reactor applications over hundreds of individual 3D physical subdomains. Efficiency is obtained by leveraging knowledge of the physical domains, the physics on individual domains, and the couplings between them for preconditioning within a Jacobian Free Newton Krylov method. Details of the computational infrastructure that enabled this work, namely the open source Advanced Multi-Physics (AMP) package developed by the authors is described. Details of verification and validation experiments, and parallel performance analysis in weak and strong scaling studies demonstrating the achieved efficiency of the algorithm are presented. Furthermore, numerical experiments demonstrate that the preconditioner developed is independent of the number of fuel subdomains in a fuel rod, which is particularly important when simulating different types of fuel rods. Finally, we demonstrate the power of the coupling methodology by considering problems with couplings between surface and volume physics and coupling of nonlinear thermal transport in fuel rods to an external radiation transport code.
Large scale parallel FEM computations of far/near stress field changes in rocks
Czech Academy of Sciences Publication Activity Database
Blaheta, Radim; Byczanski, Petr; Jakl, Ondřej; Kohut, Roman; Kolcun, Alexej; Krečmer, Karel; Starý, Jiří
2006-01-01
Roč. 22, č. 4 (2006), s. 449-459 ISSN 0167-739X R&D Projects: GA ČR(CZ) GA105/02/0492; GA AV ČR(CZ) 1ET400300415 Institutional research plan: CEZ:AV0Z30860518 Keywords : large scale finite element analysis Subject RIV: BA - General Mathematics Impact factor: 0.722, year: 2006
Directory of Open Access Journals (Sweden)
Julián A García-Grajales
Full Text Available With the growing body of research on traumatic brain injury and spinal cord injury, computational neuroscience has recently focused its modeling efforts on neuronal functional deficits following mechanical loading. However, in most of these efforts, cell damage is generally only characterized by purely mechanistic criteria, functions of quantities such as stress, strain or their corresponding rates. The modeling of functional deficits in neurites as a consequence of macroscopic mechanical insults has been rarely explored. In particular, a quantitative mechanically based model of electrophysiological impairment in neuronal cells, Neurite, has only very recently been proposed. In this paper, we present the implementation details of this model: a finite difference parallel program for simulating electrical signal propagation along neurites under mechanical loading. Following the application of a macroscopic strain at a given strain rate produced by a mechanical insult, Neurite is able to simulate the resulting neuronal electrical signal propagation, and thus the corresponding functional deficits. The simulation of the coupled mechanical and electrophysiological behaviors requires computational expensive calculations that increase in complexity as the network of the simulated cells grows. The solvers implemented in Neurite--explicit and implicit--were therefore parallelized using graphics processing units in order to reduce the burden of the simulation costs of large scale scenarios. Cable Theory and Hodgkin-Huxley models were implemented to account for the electrophysiological passive and active regions of a neurite, respectively, whereas a coupled mechanical model accounting for the neurite mechanical behavior within its surrounding medium was adopted as a link between electrophysiology and mechanics. This paper provides the details of the parallel implementation of Neurite, along with three different application examples: a long myelinated axon
Suryanarayana, Phanish; Pratapa, Phanisri P.; Sharma, Abhiraj; Pask, John E.
2018-03-01
We present SQDFT: a large-scale parallel implementation of the Spectral Quadrature (SQ) method for O(N) Kohn-Sham Density Functional Theory (DFT) calculations at high temperature. Specifically, we develop an efficient and scalable finite-difference implementation of the infinite-cell Clenshaw-Curtis SQ approach, in which results for the infinite crystal are obtained by expressing quantities of interest as bilinear forms or sums of bilinear forms, that are then approximated by spatially localized Clenshaw-Curtis quadrature rules. We demonstrate the accuracy of SQDFT by showing systematic convergence of energies and atomic forces with respect to SQ parameters to reference diagonalization results, and convergence with discretization to established planewave results, for both metallic and insulating systems. We further demonstrate that SQDFT achieves excellent strong and weak parallel scaling on computer systems consisting of tens of thousands of processors, with near perfect O(N) scaling with system size and wall times as low as a few seconds per self-consistent field iteration. Finally, we verify the accuracy of SQDFT in large-scale quantum molecular dynamics simulations of aluminum at high temperature.
Guaranteed Discrete Energy Optimization on Large Protein Design Problems.
Simoncini, David; Allouche, David; de Givry, Simon; Delmas, Céline; Barbe, Sophie; Schiex, Thomas
2015-12-08
In Computational Protein Design (CPD), assuming a rigid backbone and amino-acid rotamer library, the problem of finding a sequence with an optimal conformation is NP-hard. In this paper, using Dunbrack's rotamer library and Talaris2014 decomposable energy function, we use an exact deterministic method combining branch and bound, arc consistency, and tree-decomposition to provenly identify the global minimum energy sequence-conformation on full-redesign problems, defining search spaces of size up to 10(234). This is achieved on a single core of a standard computing server, requiring a maximum of 66GB RAM. A variant of the algorithm is able to exhaustively enumerate all sequence-conformations within an energy threshold of the optimum. These proven optimal solutions are then used to evaluate the frequencies and amplitudes, in energy and sequence, at which an existing CPD-dedicated simulated annealing implementation may miss the optimum on these full redesign problems. The probability of finding an optimum drops close to 0 very quickly. In the worst case, despite 1,000 repeats, the annealing algorithm remained more than 1 Rosetta unit away from the optimum, leading to design sequences that could differ from the optimal sequence by more than 30% of their amino acids.
Spectral Analysis of Large Finite Element Problems by Optimization Methods
Directory of Open Access Journals (Sweden)
Luca Bergamaschi
1994-01-01
Full Text Available Recently an efficient method for the solution of the partial symmetric eigenproblem (DACG, deflated-accelerated conjugate gradient was developed, based on the conjugate gradient (CG minimization of successive Rayleigh quotients over deflated subspaces of decreasing size. In this article four different choices of the coefficient βk required at each DACG iteration for the computation of the new search direction Pk are discussed. The “optimal” choice is the one that yields the same asymptotic convergence rate as the CG scheme applied to the solution of linear systems. Numerical results point out that the optimal βk leads to a very cost effective algorithm in terms of CPU time in all the sample problems presented. Various preconditioners are also analyzed. It is found that DACG using the optimal βk and (LLT−1 as a preconditioner, L being the incomplete Cholesky factor of A, proves a very promising method for the partial eigensolution. It appears to be superior to the Lanczos method in the evaluation of the 40 leftmost eigenpairs of five finite element problems, and particularly for the largest problem, with size equal to 4560, for which the speed gain turns out to fall between 2.5 and 6.0, depending on the eigenpair level.
Wan, Shixiang; Zou, Quan
2017-01-01
Multiple sequence alignment (MSA) plays a key role in biological sequence analyses, especially in phylogenetic tree construction. Extreme increase in next-generation sequencing results in shortage of efficient ultra-large biological sequence alignment approaches for coping with different sequence types. Distributed and parallel computing represents a crucial technique for accelerating ultra-large (e.g. files more than 1 GB) sequence analyses. Based on HAlign and Spark distributed computing system, we implement a highly cost-efficient and time-efficient HAlign-II tool to address ultra-large multiple biological sequence alignment and phylogenetic tree construction. The experiments in the DNA and protein large scale data sets, which are more than 1GB files, showed that HAlign II could save time and space. It outperformed the current software tools. HAlign-II can efficiently carry out MSA and construct phylogenetic trees with ultra-large numbers of biological sequences. HAlign-II shows extremely high memory efficiency and scales well with increases in computing resource. THAlign-II provides a user-friendly web server based on our distributed computing infrastructure. HAlign-II with open-source codes and datasets was established at http://lab.malab.cn/soft/halign.
Newton Methods for Large Scale Problems in Machine Learning
Hansen, Samantha Leigh
2014-01-01
The focus of this thesis is on practical ways of designing optimization algorithms for minimizing large-scale nonlinear functions with applications in machine learning. Chapter 1 introduces the overarching ideas in the thesis. Chapters 2 and 3 are geared towards supervised machine learning applications that involve minimizing a sum of loss…
Biopolitics problems of large-scale hydraulic engineering construction
International Nuclear Information System (INIS)
Romanenko, V.D.
1997-01-01
The XX century which will enter in a history as a century of large-scale hydraulic engineering constructions come to the finish. Only on the European continent 517 large reservoirs (more than 1000 million km 3 of water were detained, had been constructed for a period from 1901 till 1985. In the Danube basin a plenty for reservoirs of power stations, navigations, navigating sluices and other hydraulic engineering structures are constructed. Among them more than 40 especially large objects are located along the main bed of the river. A number of hydro-complexes such as Dnieper-Danube and Gabcikovo, Danube-Oder-Labe (project), Danube-Tissa, Danube-Adriatic Sea (project), Danube-Aegean Sea, Danube-Black Sea ones, are entered into operation or are in a stage of designing. Hydraulic engineering construction was especially heavily conducted in Ukraine. On its territory some large reservoirs on Dnieper and Yuzhny Bug were constructed, which have heavily changed the hydrological regime of the rivers. Summarised the results of river systems regulating in Ukraine one can be noted that more than 27 thousand ponds (3 km 3 per year), 1098 reservoirs of total volume 55 km 3 , 11 large channels of total length more than 2000 km and with productivity of 1000 m 2 /s have been created in Ukraine. Hydraulic engineering construction played an important role in development of the industry and agriculture, water-supply of the cities and settlements, in environmental effects, and maintenance of safe navigation in Danube, Dnieper and other rivers. In next part of the paper, the environmental changes after construction of the Karakum Channel on the Aral Sea in the Middle Asia are discussed
Edgerton, Jason D; Keough, Matthew T; Roberts, Lance W
2018-02-21
This study examines whether there are multiple joint trajectories of depression and problem gambling co-development in a sample of emerging adults. Data were from the Manitoba Longitudinal Study of Young Adults (n = 679), which was collected in 4 waves across 5 years (age 18-20 at baseline). Parallel process latent class growth modeling was used to identified 5 joint trajectory classes: low decreasing gambling, low increasing depression (81%); low stable gambling, moderate decreasing depression (9%); low stable gambling, high decreasing depression (5%); low stable gambling, moderate stable depression (3%); moderate stable problem gambling, no depression (2%). There was no evidence of reciprocal growth in problem gambling and depression in any of the joint classes. Multinomial logistic regression analyses of baseline risk and protective factors found that only neuroticism, escape-avoidance coping, and perceived level of family social support were significant predictors of joint trajectory class membership. Consistent with the pathways model framework, we observed that individuals in the problem gambling only class were more likely using gambling as a stable way to cope with negative emotions. Similarly, high levels of neuroticism and low levels of family support were associated with increased odds of being in a class with moderate to high levels of depressive symptoms (but low gambling problems). The results suggest that interventions for problem gambling and/or depression need to focus on promoting more adaptive coping skills among more "at-risk" young adults, and such interventions should be tailored in relation to specific subtypes of comorbid mental illness.
International Nuclear Information System (INIS)
Samatova, N F; Schmidt, M C; Hendrix, W; Breimyer, P; Thomas, K; Park, B-H
2008-01-01
Data-driven construction of predictive models for biological systems faces challenges from data intensity, uncertainty, and computational complexity. Data-driven model inference is often considered a combinatorial graph problem where an enumeration of all feasible models is sought. The data-intensive and the NP-hard nature of such problems, however, challenges existing methods to meet the required scale of data size and uncertainty, even on modern supercomputers. Maximal clique enumeration (MCE) in a graph derived from such biological data is often a rate-limiting step in detecting protein complexes in protein interaction data, finding clusters of co-expressed genes in microarray data, or identifying clusters of orthologous genes in protein sequence data. We report two key advances that address this challenge. We designed and implemented the first (to the best of our knowledge) parallel MCE algorithm that scales linearly on thousands of processors running MCE on real-world biological networks with thousands and hundreds of thousands of vertices. In addition, we proposed and developed the Graph Perturbation Theory (GPT) that establishes a foundation for efficiently solving the MCE problem in perturbed graphs, which model the uncertainty in the data. GPT formulates necessary and sufficient conditions for detecting the differences between the sets of maximal cliques in the original and perturbed graphs and reduces the enumeration time by more than 80% compared to complete recomputation
Müller, Thomas
2009-11-12
The accurate prediction of the potential energy function of the X1Sigmag+ state of Cr2 is a remarkable challenge; large differential electron correlation effects, significant scalar relativistic contributions, the need for large flexible basis sets containing g functions, the importance of semicore valence electron correlation, and its multireference nature pose considerable obstacles. So far, the only reasonable successful approaches were based on multireference perturbation theory (MRPT). Recently, there was some controversy in the literature about the role of error compensation and systematic defects of various MRPT implementations that cannot be easily overcome. A detailed basis set study of the potential energy function is presented, adopting a variational method. The method of choice for this electron-rich target with up to 28 correlated electrons is fully uncontracted multireference-averaged quadratic coupled cluster (MR-AQCC), which shares the flexibility of the multireference configuration interaction (MRCI) approach and is, in addition, approximately size-extensive (0.02 eV in error as compared to the MRCI value of 1.37 eV for two noninteracting chromium atoms). The best estimate for De arrives at 1.48 eV and agrees well with the experimental data of 1.47 +/- 0.056 eV. At the estimated CBS limit, the equilibrium bond distance (1.685 A) and vibrational frequency (459 cm-1) are in agreement with experiment (1.679 A, 481 cm-1). Large basis sets and reference configuration spaces invariably result in huge wave function expansions (here, up to 2.8 billion configuration state functions), and efficient parallel implementations of the method are crucial. Hence, relevant details on implementation and general performance of the parallel program code are discussed as well.
Multitasking TORT Under UNICOS: Parallel Performance Models and Measurements
International Nuclear Information System (INIS)
Azmy, Y.Y.; Barnett, D.A.
1999-01-01
The existing parallel algorithms in the TORT discrete ordinates were updated to function in a UNI-COS environment. A performance model for the parallel overhead was derived for the existing algorithms. The largest contributors to the parallel overhead were identified and a new algorithm was developed. A parallel overhead model was also derived for the new algorithm. The results of the comparison of parallel performance models were compared to applications of the code to two TORT standard test problems and a large production problem. The parallel performance models agree well with the measured parallel overhead
Multitasking TORT under UNICOS: Parallel performance models and measurements
International Nuclear Information System (INIS)
Barnett, A.; Azmy, Y.Y.
1999-01-01
The existing parallel algorithms in the TORT discrete ordinates code were updated to function in a UNICOS environment. A performance model for the parallel overhead was derived for the existing algorithms. The largest contributors to the parallel overhead were identified and a new algorithm was developed. A parallel overhead model was also derived for the new algorithm. The results of the comparison of parallel performance models were compared to applications of the code to two TORT standard test problems and a large production problem. The parallel performance models agree well with the measured parallel overhead
Barth, Timothy J.; Chan, Tony F.; Tang, Wei-Pai
1998-01-01
This paper considers an algebraic preconditioning algorithm for hyperbolic-elliptic fluid flow problems. The algorithm is based on a parallel non-overlapping Schur complement domain-decomposition technique for triangulated domains. In the Schur complement technique, the triangulation is first partitioned into a number of non-overlapping subdomains and interfaces. This suggests a reordering of triangulation vertices which separates subdomain and interface solution unknowns. The reordering induces a natural 2 x 2 block partitioning of the discretization matrix. Exact LU factorization of this block system yields a Schur complement matrix which couples subdomains and the interface together. The remaining sections of this paper present a family of approximate techniques for both constructing and applying the Schur complement as a domain-decomposition preconditioner. The approximate Schur complement serves as an algebraic coarse space operator, thus avoiding the known difficulties associated with the direct formation of a coarse space discretization. In developing Schur complement approximations, particular attention has been given to improving sequential and parallel efficiency of implementations without significantly degrading the quality of the preconditioner. A computer code based on these developments has been tested on the IBM SP2 using MPI message passing protocol. A number of 2-D calculations are presented for both scalar advection-diffusion equations as well as the Euler equations governing compressible fluid flow to demonstrate performance of the preconditioning algorithm.
Problems and legislative remedies of the parallel law systems in Japan for nuclear power reactors
International Nuclear Information System (INIS)
Irie, Kazutomo
2011-01-01
There are two established laws governing nuclear power reactors in Japan. One is the Electricity Utilities Industry Law, which regulates the nuclear power reactors, and the other is the so-called 'Reactor Regulation Law', which dually regulates the reactors in some phases. When a graded approach on the regulation of nuclear reactors was adopted, it extended over these two laws and was legislatively imperfect. Such imperfection created problems from the beginning. Also, the original regulatory structures presented by these laws had become obscure during the operation process of the graded regulation. The situation becomes further complicated by the revision of these laws in recent years. It appears that the trait of the regulatory procedural structure of the Electricity Utilities Industry Law has been weakened. As there is a pressing need to review the entire regulatory structure and to propose a unified regulatory system by combining these laws, this paper examines the merits and demerits of combining these laws under a unified regulation. (author)
Monte Carlo problem and parallel computers, and how to do a fast particle mover on the STAR 100
International Nuclear Information System (INIS)
Sinz, K.H.P.H.
1975-01-01
Particle simulation problems of the Monte Carlo type are widely believed to be intrinsically highly scalar problems. In the absence of a definitive mathematical theorem to the contrary, this belief is based on the very apparent programming difficulties encountered on a vector machine. This class of problem is therefore thought to be ill-suited to highly parallel and vectorized computers. However, it is demonstrated by several examples that a particle mover is fully vectorizable. In the case of the CDC STAR 100 it is found that the performance of such a particle mover is not hopeless but hopeful, and is in fact helpful. One of the several possible vectorizations is estimated to yield a gain of a factor of 15 on the STAR over good serial coding on the same machine. This falls far short of the STAR's peak vector performance of 30 to 70 times scalar rates because certain fast vector instructions are not available and have to be simulated. The current STAR algorithm outperforms the carefully handcoded 7600 by a factor of 3. This performance margin is achievable despite the 7600's fivefold superior scalar capability. A more generally vectorized particle mover will always substantially outperform scalar coding on any machine equipped with a properly chosen set of fast vector instructions. (U.S.)
International Nuclear Information System (INIS)
Fevotte, F.; Lathuiliere, B.
2013-01-01
The large increase in computing power over the past few years now makes it possible to consider developing 3D full-core heterogeneous deterministic neutron transport solvers for reference calculations. Among all approaches presented in the literature, the method first introduced in [1] seems very promising. It consists in iterating over resolutions of 2D and ID MOC problems by taking advantage of prismatic geometries without introducing approximations of a low order operator such as diffusion. However, before developing a solver with all industrial options at EDF, several points needed to be clarified. In this work, we first prove the convergence of this iterative process, under some assumptions. We then present our high-performance, parallel implementation of this algorithm in the MICADO solver. Benchmarking the solver against the Takeda case shows that the 2D-1D coupling algorithm does not seem to affect the spatial convergence order of the MOC solver. As for performance issues, our study shows that even though the data distribution is suited to the 2D solver part, the efficiency of the ID part is sufficient to ensure a good parallel efficiency of the global algorithm. After this study, the main remaining difficulty implementation-wise is about the memory requirement of a vector used for initialization. An efficient acceleration operator will also need to be developed. (authors)
Solution accelerators for large scale 3D electromagnetic inverse problems
International Nuclear Information System (INIS)
Newman, Gregory A.; Boggs, Paul T.
2004-01-01
We provide a framework for preconditioning nonlinear 3D electromagnetic inverse scattering problems using nonlinear conjugate gradient (NLCG) and limited memory (LM) quasi-Newton methods. Key to our approach is the use of an approximate adjoint method that allows for an economical approximation of the Hessian that is updated at each inversion iteration. Using this approximate Hessian as a preconditoner, we show that the preconditioned NLCG iteration converges significantly faster than the non-preconditioned iteration, as well as converging to a data misfit level below that observed for the non-preconditioned method. Similar conclusions are also observed for the LM iteration; preconditioned with the approximate Hessian, the LM iteration converges faster than the non-preconditioned version. At this time, however, we see little difference between the convergence performance of the preconditioned LM scheme and the preconditioned NLCG scheme. A possible reason for this outcome is the behavior of the line search within the LM iteration. It was anticipated that, near convergence, a step size of one would be approached, but what was observed, instead, were step lengths that were nowhere near one. We provide some insights into the reasons for this behavior and suggest further research that may improve the performance of the LM methods
Density control problems in large stellarators with neoclassical transport
International Nuclear Information System (INIS)
Maassberg, H.; Beidler, C.D.; Simmet, E.E.
1999-01-01
With respect to the particle flux, the off-diagonal term in the neoclassical transport matrix becomes crucial in the stellarator long-mean-free-path regime. Central heating with peaked temperature profiles can make an active density profile control by central particle refuelling mandatory. The neoclassical particle confinement can significantly exceed the energy confinement at the outer radii. As a consequence, the required central refuelling may be larger than the neoclassical particle fluxes at outer radii leading to the loss of the global density control. Radiative losses as well as additional 'anomalous' electron heat diffusivities further exacerbate this problem. In addition to the analytical formulation of the neoclassical link of particle and energy fluxes, simplified model simulations as well as time-dependent ASTRA code simulations are described. In particular, the 'low-' and 'high-mirror' W7-X configurations are compared. For the W7-X 'high-mirror' configuration especially, the appearance of the neoclassical particle transport barrier is predicted at higher densities. (author)
Balanced, parallel operation of flashlamps
International Nuclear Information System (INIS)
Carder, B.M.; Merritt, B.T.
1979-01-01
A new energy store, the Compensated Pulsed Alternator (CPA), promises to be a cost effective substitute for capacitors to drive flashlamps that pump large Nd:glass lasers. Because the CPA is large and discrete, it will be necessary that it drive many parallel flashlamp circuits, presenting a problem in equal current distribution. Current division to +- 20% between parallel flashlamps has been achieved, but this is marginal for laser pumping. A method is presented here that provides equal current sharing to about 1%, and it includes fused protection against short circuit faults. The method was tested with eight parallel circuits, including both open-circuit and short-circuit fault tests
Dirofilariasis in Russian Federation: a big problem with large distribution
Directory of Open Access Journals (Sweden)
Tatyana V. Moskvina
2018-02-01
Full Text Available Dirofilariasis caused by Dirofilaria spp. is an important vector-borne and largely zoonotic disease. In Russia, dirofilariasis is caused by two agents D. immitis and D. repens. The present article provides detailed analyses of human and canine dirofilariasis methods of diagnosis, treatment, and prevention of the disease, with particular reference to the control programmes. Information has been summarised from literature in different languages that are not readily accessible to the international scientific community. Human dirofilariasis was first registered in Russia in 1915, and recent reports showed that the total number of infected humans increases on average by 1.8 times every three years. Human dirofilariasis was registered in 42 federal subjects. Totally 1162 cases of subcutaneous dirofilariasis were registered in the Russian Federation between 1915-2013, the most frequently type of subcutanious dirofilariasis was ocular dirofilariasis (more than 50% cases. Seven cases of pulmonary dirofilarisis were registered in Russia. The treatment of human dirofilariasis includes surgical removal of worms only; in result preventive measure have major importance for reducing risk of Dirofilaria infection. Control programmes have been implemented by the government at all administrative levels including diagnosis and treatment of patients, identification, isolation, and treatment of infected dogs, monthly chemoprophylactic of dogs during spring-summer periods, and regular vector control.
Enabling High Performance Large Scale Dense Problems through KBLAS
Abdelfattah, Ahmad
2014-05-04
KBLAS (KAUST BLAS) is a small library that provides highly optimized BLAS routines on systems accelerated with GPUs. KBLAS is entirely written in CUDA C, and targets NVIDIA GPUs with compute capability 2.0 (Fermi) or higher. The current focus is on level-2 BLAS routines, namely the general matrix vector multiplication (GEMV) kernel, and the symmetric/hermitian matrix vector multiplication (SYMV/HEMV) kernel. KBLAS provides these two kernels in all four precisions (s, d, c, and z), with support to multi-GPU systems. Through advanced optimization techniques that target latency hiding and pushing memory bandwidth to the limit, KBLAS outperforms state-of-the-art kernels by 20-90% improvement. Competitors include CUBLAS-5.5, MAGMABLAS-1.4.0, and CULAR17. The SYMV/HEMV kernel from KBLAS has been adopted by NVIDIA, and should appear in CUBLAS-6.0. KBLAS has been used in large scale simulations of multi-object adaptive optics.
Eighth SIAM conference on parallel processing for scientific computing: Final program and abstracts
Energy Technology Data Exchange (ETDEWEB)
NONE
1997-12-31
This SIAM conference is the premier forum for developments in parallel numerical algorithms, a field that has seen very lively and fruitful developments over the past decade, and whose health is still robust. Themes for this conference were: combinatorial optimization; data-parallel languages; large-scale parallel applications; message-passing; molecular modeling; parallel I/O; parallel libraries; parallel software tools; parallel compilers; particle simulations; problem-solving environments; and sparse matrix computations.
International Nuclear Information System (INIS)
Bellucci, V.J.
1990-01-01
This paper describes IBM's approach to parallel computing using the IBM ES/3090 computer. Parallel processing concepts were discussed including its advantages, potential performance improvements and limitations. Particular applications and capabilities for the IBM ES/3090 were presented along with preliminary results from some utilities in the application of parallel processing to simulation of system reliability, air pollution models, and power network dynamics
Kim, Song-Ju; Aono, Masashi; Hara, Masahiko
2010-07-01
We propose a model - the "tug-of-war (TOW) model" - to conduct unique parallel searches using many nonlocally-correlated search agents. The model is based on the property of a single-celled amoeba, the true slime mold Physarum, which maintains a constant intracellular resource volume while collecting environmental information by concurrently expanding and shrinking its branches. The conservation law entails a "nonlocal correlation" among the branches, i.e., volume increment in one branch is immediately compensated by volume decrement(s) in the other branch(es). This nonlocal correlation was shown to be useful for decision making in the case of a dilemma. The multi-armed bandit problem is to determine the optimal strategy for maximizing the total reward sum with incompatible demands, by either exploiting the rewards obtained using the already collected information or exploring new information for acquiring higher payoffs involving risks. Our model can efficiently manage the "exploration-exploitation dilemma" and exhibits good performances. The average accuracy rate of our model is higher than those of well-known algorithms such as the modified -greedy algorithm and modified softmax algorithm, especially, for solving relatively difficult problems. Moreover, our model flexibly adapts to changing environments, a property essential for living organisms surviving in uncertain environments.
Directory of Open Access Journals (Sweden)
Rui Zhang
2013-01-01
Full Text Available We consider a parallel machine scheduling problem with random processing/setup times and adjustable production rates. The objective functions to be minimized consist of two parts; the first part is related with the due date performance (i.e., the tardiness of the jobs, while the second part is related with the setting of machine speeds. Therefore, the decision variables include both the production schedule (sequences of jobs and the production rate of each machine. The optimization process, however, is significantly complicated by the stochastic factors in the manufacturing system. To address the difficulty, a simulation-based three-stage optimization framework is presented in this paper for high-quality robust solutions to the integrated scheduling problem. The first stage (crude optimization is featured by the ordinal optimization theory, the second stage (finer optimization is implemented with a metaheuristic called differential evolution, and the third stage (fine-tuning is characterized by a perturbation-based local search. Finally, computational experiments are conducted to verify the effectiveness of the proposed approach. Sensitivity analysis and practical implications are also discussed.
Directory of Open Access Journals (Sweden)
V. N. Paschenko
2015-01-01
Full Text Available The paper describes a mechanism representing a kind of mechanisms of parallel kinematics with three degrees of freedom based on the crank mechanism. This mechanism consists of two platforms, namely: the lower fixed and the upper movable. The upper platform is connected to the lower one by six movable elements, three of which are rods attached to the bases by means of spherical joints, and another three have a crank structure.The paper shows an approach to the solution of a direct task of kinematics based on mathematical modeling. The inverse problem of kinematics is formulated as follows: at specified angles of rotation drive (the values of generalized coordinates to determine the position of the top mobile platform.To solve this problem has been used a mathematical model describing the proposed system. On the basis of the constructed model were made the necessary calculations that allowed us using the values of crank angles connected with the engines to determine the position of the platform in space. To solve the problem we used the method of virtual points to reduce the number of equations and unknowns, which determine the position of the upper platform in space, at a crucial system from eighteen to nine, thus simplifying the solution.To check the solution correctness was carried out numerical experiment. Each generalized coordinate took on values in the range from -30 ° to 30 °; for them a direct positional problem was solved, and its result was inserted, as initial data, in the previous solved and proven inverse problem on the position of the platform under study.The paper presents comparative results of measurements with the calculated values of the generalized coordinates and draws the appropriate conclusions, that this model is in good compliance with the results observed in practice. One of the distinctive features of the proposed approach is that rotation angles of engines are used as the generalized coordinates. This allowed us
Liu, Yang
2014-07-01
The computational complexity and memory requirements of classically formulated marching-on-in-time (MOT)-based surface integral equation (SIE) solvers scale as O(Nt Ns 2) and O(Ns 2), respectively; here Nt and Ns denote the number of temporal and spatial degrees of freedom of the current density. The multilevel plane wave time domain (PWTD) algorithm, viz., the time domain counterpart of the multilevel fast multipole method, reduces these costs to O(Nt Nslog2 Ns) and O(Ns 1.5) (Ergin et al., IEEE Trans. Antennas Mag., 41, 39-52, 1999). Previously, PWTD-accelerated MOT-SIE solvers have been used to analyze transient scattering from perfect electrically conducting (PEC) and homogeneous dielectric objects discretized in terms of a million spatial unknowns (Shanker et al., IEEE Trans. Antennas Propag., 51, 628-641, 2003). More recently, an efficient parallelized solver that employs an advanced hierarchical and provably scalable spatial, angular, and temporal load partitioning strategy has been developed to analyze transient scattering problems that involve ten million spatial unknowns (Liu et. al., in URSI Digest, 2013).
Scalable Newton-Krylov solver for very large power flow problems
Idema, R.; Lahaye, D.J.P.; Vuik, C.; Van der Sluis, L.
2010-01-01
The power flow problem is generally solved by the Newton-Raphson method with a sparse direct solver for the linear system of equations in each iteration. While this works fine for small power flow problems, we will show that for very large problems the direct solver is very slow and we present
International Nuclear Information System (INIS)
Sreekumar, M; Nagarajan, T; Singaperumal, M
2008-01-01
This experimental study investigates the coupled effect of the force developed by the shape memory alloy (SMA) actuators and the force required for the large deflection of an elastica member in a compliant parallel mechanism. The compliant mechanism developed in house consists of a moving platform mounted on a superelastic pillar and three SMA wire actuators to manipulate the platform. A three-axis MEMS accelerometer has been mounted on the moving platform to measure its tilt angle. Three miniature force sensors have been designed and fabricated out of cantilever beams, each mounted with a pair of strain gauges, to measure the force developed by the respective actuators. The force sensors are highly sensitive and cost effective compared to commercially available miniature force sensors. Calibration of the force sensors has been accomplished with known weights, and for the three-axis MEMS accelerometer a rotary base has been considered which is usually used in optical applications. The calibration curves obtained, with R-squared values between 0.9997 and 1.0, show that both the tilt and force sensors considered are most appropriate for the respective applications. The mechanism fixed with the sensors and the drivers for the SMA actuators is integrated with a National Instrument's data acquisition system. The experimental results have been compared with the analytical results and it was found that the relative error is less than 2%. This is a preliminary study in the development of a mechanism for eye prosthesis and similar applications
Parallel magnetic resonance imaging
International Nuclear Information System (INIS)
Larkman, David J; Nunes, Rita G
2007-01-01
Parallel imaging has been the single biggest innovation in magnetic resonance imaging in the last decade. The use of multiple receiver coils to augment the time consuming Fourier encoding has reduced acquisition times significantly. This increase in speed comes at a time when other approaches to acquisition time reduction were reaching engineering and human limits. A brief summary of spatial encoding in MRI is followed by an introduction to the problem parallel imaging is designed to solve. There are a large number of parallel reconstruction algorithms; this article reviews a cross-section, SENSE, SMASH, g-SMASH and GRAPPA, selected to demonstrate the different approaches. Theoretical (the g-factor) and practical (coil design) limits to acquisition speed are reviewed. The practical implementation of parallel imaging is also discussed, in particular coil calibration. How to recognize potential failure modes and their associated artefacts are shown. Well-established applications including angiography, cardiac imaging and applications using echo planar imaging are reviewed and we discuss what makes a good application for parallel imaging. Finally, active research areas where parallel imaging is being used to improve data quality by repairing artefacted images are also reviewed. (invited topical review)
DORT and TORT workshop -- Outline for presentation for performance issues for large problems
International Nuclear Information System (INIS)
Barnett, A.
1998-04-01
This paper addresses the problem of running large TORT programs. The problem being a limited amount of time per job and limited amount of memory and disk space. The solution that the author outlines here is to break up the TORT run in time and space. For the time problem run multiple, sequential, dependent jobs. For the space problem use TORT internal memory conservation features. TORT is a three-dimensional discrete ordinates neutron/photon transport code
International Nuclear Information System (INIS)
Misawa, Takeharu; Yoshida, Hiroyuki; Akimoto, Hajime
2008-01-01
In Japan Atomic Energy Agency (JAEA), the Innovative Water Reactor for Flexible Fuel Cycle (FLWR) has been developed. For thermal design of FLWR, it is necessary to develop analytical method to predict boiling transition of FLWR. Japan Atomic Energy Agency (JAEA) has been developing three-dimensional two-fluid model analysis code ACE-3D, which adopts boundary fitted coordinate system to simulate complex shape channel flow. In this paper, as a part of development of ACE-3D to apply to rod bundle analysis, introduction of parallelization to ACE-3D and assessments of ACE-3D are shown. In analysis of large-scale domain such as a rod bundle, even two-fluid model requires large number of computational cost, which exceeds upper limit of memory amount of 1 CPU. Therefore, parallelization was introduced to ACE-3D to divide data amount for analysis of large-scale domain among large number of CPUs, and it is confirmed that analysis of large-scale domain such as a rod bundle can be performed by parallel computation with keeping parallel computation performance even using large number of CPUs. ACE-3D adopts two-phase flow models, some of which are dependent upon channel geometry. Therefore, analyses in the domains, which simulate individual subchannel and 37 rod bundle, are performed, and compared with experiments. It is confirmed that the results obtained by both analyses using ACE-3D show agreement with past experimental result qualitatively. (author)
A parallel nearly implicit time-stepping scheme
Botchev, Mike A.; van der Vorst, Henk A.
2001-01-01
Across-the-space parallelism still remains the most mature, convenient and natural way to parallelize large scale problems. One of the major problems here is that implicit time stepping is often difficult to parallelize due to the structure of the system. Approximate implicit schemes have been suggested to circumvent the problem. These schemes have attractive stability properties and they are also very well parallelizable. The purpose of this article is to give an overall assessment of the pa...
DEFF Research Database (Denmark)
Quaglia, Alberto; Sarup, Bent; Sin, Gürkan
2013-01-01
structure for efficient formulation of enterprise-wide optimization problems is presented. Through the integration of the described data structure in our synthesis and design framework, the problem formulation workflow is automated in a software tool, reducing time and resources needed to formulate large......The formulation of Enterprise-Wide Optimization (EWO) problems as mixed integer nonlinear programming requires collecting, consolidating and systematizing large amount of data, coming from different sources and specific to different disciplines. In this manuscript, a generic and flexible data...... problems, while ensuring at the same time data consistency and quality at the application stage....
Implementation of a partitioned algorithm for simulation of large CSI problems
Alvin, Kenneth F.; Park, K. C.
1991-01-01
The implementation of a partitioned numerical algorithm for determining the dynamic response of coupled structure/controller/estimator finite-dimensional systems is reviewed. The partitioned approach leads to a set of coupled first and second-order linear differential equations which are numerically integrated with extrapolation and implicit step methods. The present software implementation, ACSIS, utilizes parallel processing techniques at various levels to optimize performance on a shared-memory concurrent/vector processing system. A general procedure for the design of controller and filter gains is also implemented, which utilizes the vibration characteristics of the structure to be solved. Also presented are: example problems; a user's guide to the software; the procedures and algorithm scripts; a stability analysis for the algorithm; and the source code for the parallel implementation.
World, We Have Problems: Simulation for Large Complex, Risky Projects, and Events
Elfrey, Priscilla
2010-01-01
Prior to a spacewalk during the NASA STS/129 mission in November 2009, Columbia Broadcasting System (CBS) correspondent William Harwood reported astronauts, "were awakened again", as they had been the day previously. Fearing something not properly connected was causing a leak, the crew, both on the ground and in space, stopped and checked everything. The alarm proved false. The crew did complete its work ahead of schedule, but the incident reminds us that correctly connecting hundreds and thousands of entities, subsystems and systems, finding leaks, loosening stuck valves, and adding replacements to very large complex systems over time does not occur magically. Everywhere major projects present similar pressures. Lives are at - risk. Responsibility is heavy. Large natural and human-created disasters introduce parallel difficulties as people work across boundaries their countries, disciplines, languages, and cultures with known immediate dangers as well as the unexpected. NASA has long accepted that when humans have to go where humans cannot go that simulation is the sole solution. The Agency uses simulation to achieve consensus, reduce ambiguity and uncertainty, understand problems, make decisions, support design, do planning and troubleshooting, as well as for operations, training, testing, and evaluation. Simulation is at the heart of all such complex systems, products, projects, programs, and events. Difficult, hazardous short and, especially, long-term activities have a persistent need for simulation from the first insight into a possibly workable idea or answer until the final report perhaps beyond our lifetime is put in the archive. With simulation we create a common mental model, try-out breakdowns of machinery or teamwork, and find opportunity for improvement. Lifecycle simulation proves to be increasingly important as risks and consequences intensify. Across the world, disasters are increasing. We anticipate more of them, as the results of global warming
Energy Technology Data Exchange (ETDEWEB)
Chang, Jonghwa [Korea Atomic Energy Research Institute, Daejeon (Korea, Republic of)
2014-05-15
Parallelization of Monte Carlo simulation is widely adpoted. There are also several parallel algorithms developed for the SN transport theory using the parallel wave sweeping algorithm and for the CPM using parallel ray tracing. For practical purpose of reactor physics application, the thermal feedback and burnup effects on the multigroup cross section should be considered. In this respect, the domain decomposition method(DDM) is suitable for distributing the expensive cross section calculation work. Parallel transport code and diffusion code based on the Raviart-Thomas mixed finite element method was developed. However most of the developed methods rely on the heuristic convergence of flux and current at the domain interfaces. Convergence was not attained in some cases. Mechanical stress computation community has also work on the DDM to solve the stress-strain equation using the finite element methods. The most successful domain decomposition method in terms of robustness is FETI-DP. We have modified the original FETI-DP to solve the eigenvalue problem for the multigroup diffusion problem in this study.
Energy Technology Data Exchange (ETDEWEB)
Sauget, M
2007-12-15
This research is about the application of neural networks used in the external radiotherapy domain. The goal is to elaborate a new evaluating system for the radiation dose distributions in heterogeneous environments. The al objective of this work is to build a complete tool kit to evaluate the optimal treatment planning. My st research point is about the conception of an incremental learning algorithm. The interest of my work is to combine different optimizations specialized in the function interpolation and to propose a new algorithm allowing to change the neural network architecture during the learning phase. This algorithm allows to minimise the al size of the neural network while keeping a good accuracy. The second part of my research is to parallelize the previous incremental learning algorithm. The goal of that work is to increase the speed of the learning step as well as the size of the learned dataset needed in a clinical case. For that, our incremental learning algorithm presents an original data decomposition with overlapping, together with a fault tolerance mechanism. My last research point is about a fast and accurate algorithm computing the radiation dose deposit in any heterogeneous environment. At the present time, the existing solutions used are not optimal. The fast solution are not accurate and do not give an optimal treatment planning. On the other hand, the accurate solutions are far too slow to be used in a clinical context. Our algorithm answers to this problem by bringing rapidity and accuracy. The concept is to use a neural network adequately learned together with a mechanism taking into account the environment changes. The advantages of this algorithm is to avoid the use of a complex physical code while keeping a good accuracy and reasonable computation times. (author)
Klegeris, Andis; Hurren, Heather
2011-12-01
Problem-based learning (PBL) can be described as a learning environment where the problem drives the learning. This technique usually involves learning in small groups, which are supervised by tutors. It is becoming evident that PBL in a small-group setting has a robust positive effect on student learning and skills, including better problem-solving skills and an increase in overall motivation. However, very little research has been done on the educational benefits of PBL in a large classroom setting. Here, we describe a PBL approach (using tutorless groups) that was introduced as a supplement to standard didactic lectures in University of British Columbia Okanagan undergraduate biochemistry classes consisting of 45-85 students. PBL was chosen as an effective method to assist students in learning biochemical and physiological processes. By monitoring student attendance and using informal and formal surveys, we demonstrated that PBL has a significant positive impact on student motivation to attend and participate in the course work. Student responses indicated that PBL is superior to traditional lecture format with regard to the understanding of course content and retention of information. We also demonstrated that student problem-solving skills are significantly improved, but additional controlled studies are needed to determine how much PBL exercises contribute to this improvement. These preliminary data indicated several positive outcomes of using PBL in a large classroom setting, although further studies aimed at assessing student learning are needed to further justify implementation of this technique in courses delivered to large undergraduate classes.
Feletto, Eleonora; Schonfeld, Sara J; Kovalevskiy, Evgeny V; Bukhtiyarov, Igor V; Kashanskiy, Sergey V; Moissonnier, Monika; Straif, Kurt; Kromhout, Hans
2017-01-01
INTRODUCTION: Historic dust concentrations are available in a large-scale cohort study of workers in a chrysotile mine and processing factories in Asbest, Russian Federation. Parallel dust (gravimetric) and fibre (phase-contrast optical microscopy) concentrations collected in 1995, 2007 and 2013/14
Wicked Problems in Large Organizations: Why Pilot Retention Continues to Challenge the Air Force
2017-05-25
solving complex problems even more challenging.10 This idea of complexity extends to another theoretical concept , the complex adaptive system, which... concept in order to avoid the pitfalls and dangers in group problem - solving .26 His ideas to mitigate potential groupthink place responsibility... Problems in Large Organizations: Why Pilot Retention Continues to Challenge the Air Force 5a. CONTRACT NUMBER 5b. GRANT NUMBER 5c. PROGRAM
Matpar: Parallel Extensions for MATLAB
Springer, P. L.
1998-01-01
Matpar is a set of client/server software that allows a MATLAB user to take advantage of a parallel computer for very large problems. The user can replace calls to certain built-in MATLAB functions with calls to Matpar functions.
International Nuclear Information System (INIS)
Azadeh, A.; Maleki Shoja, B.; Ghanei, S.; Sheikhalishahi, M.
2015-01-01
This research investigates a redundancy-scheduling optimization problem for a multi-state series parallel system. The system is a flow shop manufacturing system with multi-state machines. Each manufacturing machine may have different performance rates including perfect performance, decreased performance and complete failure. Moreover, warm standby redundancy is considered for the redundancy allocation problem. Three objectives are considered for the problem: (1) minimizing system purchasing cost, (2) minimizing makespan, and (3) maximizing system reliability. Universal generating function is employed to evaluate system performance and overall reliability of the system. Since the problem is in the NP-hard class of combinatorial problems, genetic algorithm (GA) is used to find optimal/near optimal solutions. Different test problems are generated to evaluate the effectiveness and efficiency of proposed approach and compared to simulated annealing optimization method. The results show the proposed approach is capable of finding optimal/near optimal solution within a very reasonable time. - Highlights: • A redundancy-scheduling optimization problem for a multi-state series parallel system. • A flow shop with multi-state machines and warm standby redundancy. • Objectives are to optimize system purchasing cost, makespan and reliability. • Different test problems are generated and evaluated by a unique genetic algorithm. • It locates optimal/near optimal solution within a very reasonable time
Akl, Selim G
1985-01-01
Parallel Sorting Algorithms explains how to use parallel algorithms to sort a sequence of items on a variety of parallel computers. The book reviews the sorting problem, the parallel models of computation, parallel algorithms, and the lower bounds on the parallel sorting problems. The text also presents twenty different algorithms, such as linear arrays, mesh-connected computers, cube-connected computers. Another example where algorithm can be applied is on the shared-memory SIMD (single instruction stream multiple data stream) computers in which the whole sequence to be sorted can fit in the
Solving Large-Scale Computational Problems Using Insights from Statistical Physics
Energy Technology Data Exchange (ETDEWEB)
Selman, Bart [Cornell University
2012-02-29
Many challenging problems in computer science and related fields can be formulated as constraint satisfaction problems. Such problems consist of a set of discrete variables and a set of constraints between those variables, and represent a general class of so-called NP-complete problems. The goal is to find a value assignment to the variables that satisfies all constraints, generally requiring a search through and exponentially large space of variable-value assignments. Models for disordered systems, as studied in statistical physics, can provide important new insights into the nature of constraint satisfaction problems. Recently, work in this area has resulted in the discovery of a new method for solving such problems, called the survey propagation (SP) method. With SP, we can solve problems with millions of variables and constraints, an improvement of two orders of magnitude over previous methods.
The Cauchy problem for a model of immiscible gas flow with large data
Energy Technology Data Exchange (ETDEWEB)
Sande, Hilde
2008-12-15
The thesis consists of an introduction and two papers; 1. The solution of the Cauchy problem with large data for a model of a mixture of gases. 2. Front tracking for a model of immiscible gas flow with large data. (AG) refs, figs
Kupfer, Tom; Lehmann, Joerg; Butler, Duncan J; Ramanathan, Ganesan; Bailey, Tracy E; Franich, Rick D
2017-11-01
This study investigates a large-area plane-parallel ionization chamber (LAC) for measurements of dose-area product in water (DAP w ) in megavoltage (MV) photon fields. Uniformity of electrode separation of the LAC (PTW34070 Bragg Peak Chamber, sensitive volume diameter: 8.16 cm) was measured using high-resolution microCT. Signal dependence on angle α of beam incidence for square 6 MV fields of side length s = 20 cm and 1 cm was measured in air. Polarity and recombination effects were characterized in 6, 10, and 18 MV photons fields. To assess the lateral setup tolerance, scanned LAC profiles of a 1 × 1 cm 2 field were acquired. A 6 MV calibration coefficient, N D ,w, LAC , was determined in a field collimated by a 5 cm diameter stereotactic cone with known DAP w . Additional calibrations in 10 × 10 cm 2 fields at 6, 10, and 18 MV were performed. Electrode separation is uniform and agrees with specifications. Volume-averaging leads to a signal increase proportional to ~1/cos(α) in small fields. Correction factors for polarity and recombination range between 0.9986 to 0.9996 and 1.0007 to 1.0024, respectively. Off-axis displacement by up to 0.5 cm did not change the measured signal in a 1 × 1 cm 2 field. N D ,w, LAC was 163.7 mGy cm -2 nC -1 and differs by +3.0% from the coefficient derived in the 10 × 10 cm 2 6 MV field. Response in 10 and 18 MV fields increased by 1.0% and 2.7% compared to 6 MV. The LAC requires only small correction factors for DAP w measurements and shows little energy dependence. Lateral setup errors of 0.5 cm are tolerated in 1 × 1 cm 2 fields, but beam incidence must be kept as close to normal as possible. Calibration in 10 × 10 fields is not recommended because of the LAC's over-response. The accuracy of relative point-dose measurements in the field's periphery is an important limiting factor for the accuracy of DAP w measurements. © 2017 The Authors. Journal of Applied Clinical Medical Physics published by Wiley Periodicals, Inc. on
Streaming for Functional Data-Parallel Languages
DEFF Research Database (Denmark)
Madsen, Frederik Meisner
In this thesis, we investigate streaming as a general solution to the space inefficiency commonly found in functional data-parallel programming languages. The data-parallel paradigm maps well to parallel SIMD-style hardware. However, the traditional fully materializing execution strategy...... by extending two existing data-parallel languages: NESL and Accelerate. In the extensions we map bulk operations to data-parallel streams that can evaluate fully sequential, fully parallel or anything in between. By a dataflow, piecewise parallel execution strategy, the runtime system can adjust to any target...... flattening necessitates all sub-computations to materialize at the same time. For example, naive n by n matrix multiplication requires n^3 space in NESL because the algorithm contains n^3 independent scalar multiplications. For large values of n, this is completely unacceptable. We address the problem...
DEFF Research Database (Denmark)
Knecht, Stefan; Jensen, Hans Jørgen Aagaard; Fleig, Timo
2008-01-01
We present a parallel implementation of a string-driven general active space configuration interaction program for nonrelativistic and scalar-relativistic electronic-structure calculations. The code has been modularly incorporated in the DIRAC quantum chemistry program package. The implementation...
Parallel programming practical aspects, models and current limitations
Tarkov, Mikhail S
2014-01-01
Parallel programming is designed for the use of parallel computer systems for solving time-consuming problems that cannot be solved on a sequential computer in a reasonable time. These problems can be divided into two classes: 1. Processing large data arrays (including processing images and signals in real time)2. Simulation of complex physical processes and chemical reactions For each of these classes, prospective methods are designed for solving problems. For data processing, one of the most promising technologies is the use of artificial neural networks. Particles-in-cell method and cellular automata are very useful for simulation. Problems of scalability of parallel algorithms and the transfer of existing parallel programs to future parallel computers are very acute now. An important task is to optimize the use of the equipment (including the CPU cache) of parallel computers. Along with parallelizing information processing, it is essential to ensure the processing reliability by the relevant organization ...
Parallel thermal radiation transport in two dimensions
International Nuclear Information System (INIS)
Smedley-Stevenson, R.P.; Ball, S.R.
2003-01-01
This paper describes the distributed memory parallel implementation of a deterministic thermal radiation transport algorithm in a 2-dimensional ALE hydrodynamics code. The parallel algorithm consists of a variety of components which are combined in order to produce a state of the art computational capability, capable of solving large thermal radiation transport problems using Blue-Oak, the 3 Tera-Flop MPP (massive parallel processors) computing facility at AWE (United Kingdom). Particular aspects of the parallel algorithm are described together with examples of the performance on some challenging applications. (author)
Parallel thermal radiation transport in two dimensions
Energy Technology Data Exchange (ETDEWEB)
Smedley-Stevenson, R.P.; Ball, S.R. [AWE Aldermaston (United Kingdom)
2003-07-01
This paper describes the distributed memory parallel implementation of a deterministic thermal radiation transport algorithm in a 2-dimensional ALE hydrodynamics code. The parallel algorithm consists of a variety of components which are combined in order to produce a state of the art computational capability, capable of solving large thermal radiation transport problems using Blue-Oak, the 3 Tera-Flop MPP (massive parallel processors) computing facility at AWE (United Kingdom). Particular aspects of the parallel algorithm are described together with examples of the performance on some challenging applications. (author)
International Nuclear Information System (INIS)
Richstein, J.
1986-01-01
This work describes measurements on the drift of electrons in gases, using the TPC90, the prototype of the ALEPH Time Projection Chamber. Tracks which were created by UV-Laser ionization have been drifted over distances of up to 1.3 m in parallel electric and magnetic fields. Electron drift properties have been systematically measured as a function of these, in several gas mixtures. (orig./HSI)
International Nuclear Information System (INIS)
Yang Lei; Gong Xueyu; Wang Ling
2013-01-01
Combined with standard mathematical model for evaluating quality of deploying results, a new high-performance parallel algorithm for source pencils' deployment was obtained by using parallel plant growth simulation algorithm which was completely parallelized with CUDA execute model, and the corresponding code can run on GPU. Based on such work, several instances in various scales were used to test the new version of algorithm. The results show that, based on the advantage of old versions. the performance of new one is improved more than 500 times comparing with the CPU version, and also 30 times with the CPU plus GPU hybrid version. The computation time of new version is less than ten minutes for the irradiator of which the activity is less than 111 PBq. For a single GTX275 GPU, the maximum computing power of new version is no more than 167 PBq as well as the computation time is no more than 25 minutes, and for multiple GPUs, the power can be improved more. Overall, the new version of algorithm running on GPU can satisfy the requirement of source pencils' deployment of any domestic irradiator, and it is of high competitiveness. (authors)
FDTD method for laser absorption in metals for large scale problems.
Deng, Chun; Ki, Hyungson
2013-10-21
The FDTD method has been successfully used for many electromagnetic problems, but its application to laser material processing has been limited because even a several-millimeter domain requires a prohibitively large number of grids. In this article, we present a novel FDTD method for simulating large-scale laser beam absorption problems, especially for metals, by enlarging laser wavelength while maintaining the material's reflection characteristics. For validation purposes, the proposed method has been tested with in-house FDTD codes to simulate p-, s-, and circularly polarized 1.06 μm irradiation on Fe and Sn targets, and the simulation results are in good agreement with theoretical predictions.
Solving a large-scale precedence constrained scheduling problem with elastic jobs using tabu search
DEFF Research Database (Denmark)
Pedersen, C.R.; Rasmussen, R.V.; Andersen, Kim Allan
2007-01-01
exploitation of the elastic jobs and solve the problem using a tabu search procedure. Finding an initial feasible solution is in general -complete, but the tabu search procedure includes a specialized heuristic for solving this problem. The solution method has proven to be very efficient and leads......This paper presents a solution method for minimizing makespan of a practical large-scale scheduling problem with elastic jobs. The jobs are processed on three servers and restricted by precedence constraints, time windows and capacity limitations. We derive a new method for approximating the server...... to a significant decrease in makespan compared to the strategy currently implemented....
Minimization of Linear Functionals Defined on| Solutions of Large-Scale Discrete Ill-Posed Problems
DEFF Research Database (Denmark)
Elden, Lars; Hansen, Per Christian; Rojas, Marielba
2003-01-01
The minimization of linear functionals de ned on the solutions of discrete ill-posed problems arises, e.g., in the computation of con dence intervals for these solutions. In 1990, Elden proposed an algorithm for this minimization problem based on a parametric-programming reformulation involving...... the solution of a sequence of trust-region problems, and using matrix factorizations. In this paper, we describe MLFIP, a large-scale version of this algorithm where a limited-memory trust-region solver is used on the subproblems. We illustrate the use of our algorithm in connection with an inverse heat...
Solving a large-scale precedence constrained scheduling problem with elastic jobs using tabu search
DEFF Research Database (Denmark)
Pedersen, C.R.; Rasmussen, R.V.; Andersen, Kim Allan
2007-01-01
This paper presents a solution method for minimizing makespan of a practical large-scale scheduling problem with elastic jobs. The jobs are processed on three servers and restricted by precedence constraints, time windows and capacity limitations. We derive a new method for approximating the server...... exploitation of the elastic jobs and solve the problem using a tabu search procedure. Finding an initial feasible solution is in general -complete, but the tabu search procedure includes a specialized heuristic for solving this problem. The solution method has proven to be very efficient and leads...
International Nuclear Information System (INIS)
Gene Golub; Kwok Ko
2009-01-01
The solutions of sparse eigenvalue problems and linear systems constitute one of the key computational kernels in the discretization of partial differential equations for the modeling of linear accelerators. The computational challenges faced by existing techniques for solving those sparse eigenvalue problems and linear systems call for continuing research to improve on the algorithms so that ever increasing problem size as required by the physics application can be tackled. Under the support of this award, the filter algorithm for solving large sparse eigenvalue problems was developed at Stanford to address the computational difficulties in the previous methods with the goal to enable accelerator simulations on then the world largest unclassified supercomputer at NERSC for this class of problems. Specifically, a new method, the Hemitian skew-Hemitian splitting method, was proposed and researched as an improved method for solving linear systems with non-Hermitian positive definite and semidefinite matrices.
International Nuclear Information System (INIS)
Jejcic, A.; Maillard, J.; Maurel, G.; Silva, J.; Wolff-Bacha, F.
1997-01-01
The work in the field of parallel processing has developed as research activities using several numerical Monte Carlo simulations related to basic or applied current problems of nuclear and particle physics. For the applications utilizing the GEANT code development or improvement works were done on parts simulating low energy physical phenomena like radiation, transport and interaction. The problem of actinide burning by means of accelerators was approached using a simulation with the GEANT code. A program of neutron tracking in the range of low energies up to the thermal region has been developed. It is coupled to the GEANT code and permits in a single pass the simulation of a hybrid reactor core receiving a proton burst. Other works in this field refers to simulations for nuclear medicine applications like, for instance, development of biological probes, evaluation and characterization of the gamma cameras (collimators, crystal thickness) as well as the method for dosimetric calculations. Particularly, these calculations are suited for a geometrical parallelization approach especially adapted to parallel machines of the TN310 type. Other works mentioned in the same field refer to simulation of the electron channelling in crystals and simulation of the beam-beam interaction effect in colliders. The GEANT code was also used to simulate the operation of germanium detectors designed for natural and artificial radioactivity monitoring of environment
Energy Technology Data Exchange (ETDEWEB)
Hasenkamp, Daren; Sim, Alexander; Wehner, Michael; Wu, Kesheng
2010-09-30
Extensive computing power has been used to tackle issues such as climate changes, fusion energy, and other pressing scientific challenges. These computations produce a tremendous amount of data; however, many of the data analysis programs currently only run a single processor. In this work, we explore the possibility of using the emerging cloud computing platform to parallelize such sequential data analysis tasks. As a proof of concept, we wrap a program for analyzing trends of tropical cyclones in a set of virtual machines (VMs). This approach allows the user to keep their familiar data analysis environment in the VMs, while we provide the coordination and data transfer services to ensure the necessary input and output are directed to the desired locations. This work extensively exercises the networking capability of the cloud computing systems and has revealed a number of weaknesses in the current cloud system software. In our tests, we are able to scale the parallel data analysis job to a modest number of VMs and achieve a speedup that is comparable to running the same analysis task using MPI. However, compared to MPI based parallelization, the cloud-based approach has a number of advantages. The cloud-based approach is more flexible because the VMs can capture arbitrary software dependencies without requiring the user to rewrite their programs. The cloud-based approach is also more resilient to failure; as long as a single VM is running, it can make progress while as soon as one MPI node fails the whole analysis job fails. In short, this initial work demonstrates that a cloud computing system is a viable platform for distributed scientific data analyses traditionally conducted on dedicated supercomputing systems.
International Nuclear Information System (INIS)
Hasenkamp, Daren; Sim, Alexander; Wehner, Michael; Wu, Kesheng
2010-01-01
Extensive computing power has been used to tackle issues such as climate changes, fusion energy, and other pressing scientific challenges. These computations produce a tremendous amount of data; however, many of the data analysis programs currently only run a single processor. In this work, we explore the possibility of using the emerging cloud computing platform to parallelize such sequential data analysis tasks. As a proof of concept, we wrap a program for analyzing trends of tropical cyclones in a set of virtual machines (VMs). This approach allows the user to keep their familiar data analysis environment in the VMs, while we provide the coordination and data transfer services to ensure the necessary input and output are directed to the desired locations. This work extensively exercises the networking capability of the cloud computing systems and has revealed a number of weaknesses in the current cloud system software. In our tests, we are able to scale the parallel data analysis job to a modest number of VMs and achieve a speedup that is comparable to running the same analysis task using MPI. However, compared to MPI based parallelization, the cloud-based approach has a number of advantages. The cloud-based approach is more flexible because the VMs can capture arbitrary software dependencies without requiring the user to rewrite their programs. The cloud-based approach is also more resilient to failure; as long as a single VM is running, it can make progress while as soon as one MPI node fails the whole analysis job fails. In short, this initial work demonstrates that a cloud computing system is a viable platform for distributed scientific data analyses traditionally conducted on dedicated supercomputing systems.
Energy Technology Data Exchange (ETDEWEB)
McWilliams, T.; Widdoes, Jr., L. C.; Wood, L.
1976-09-30
The design of an extremely high performance programmable digital filter of novel architecture, the LLL Programmable Digital Filter, is described. The digital filter is a high-performance multiprocessor having general purpose applicability and high programmability; it is extremely cost effective either in a uniprocessor or a multiprocessor configuration. The architecture and instruction set of the individual processor was optimized with regard to the multiple processor configuration. The optimal structure of a parallel processing system was determined for addressing the specific Navy application centering on the advanced digital filtering of passive acoustic ASW data of the type obtained from the SOSUS net. 148 figures. (RWR)
A note on solving large-scale zero-one programming problems
Adema, Jos J.
1988-01-01
A heuristic for solving large-scale zero-one programming problems is provided. The heuristic is based on the modifications made by H. Crowder et al. (1983) to the standard branch-and-bound strategy. First, the initialization is modified. The modification is only useful if the objective function
Towards a Versatile Problem Diagnosis Infrastructure for LargeWireless Sensor Networks
Iwanicki, Konrad; Steen, van Maarten
2007-01-01
In this position paper, we address the issue of durable maintenance of a wireless sensor network, which will be crucial if the vision of large, long-lived sensornets is to become reality. Durable maintenance requires tools for diagnosing and fixing occurring problems, which can range from
Large scale inverse problems computational methods and applications in the earth sciences
Scheichl, Robert; Freitag, Melina A; Kindermann, Stefan
2013-01-01
This book is thesecond volume of three volume series recording the ""Radon Special Semester 2011 on Multiscale Simulation & Analysis in Energy and the Environment"" taking place in Linz, Austria, October 3-7, 2011. The volume addresses the common ground in the mathematical and computational procedures required for large-scale inverse problems and data assimilation in forefront applications.
An efficient method to handle the 'large p, small n' problem for ...
Indian Academy of Sciences (India)
So-called 'large p small n' or 'short-fat data' problem can occur if the number of ... (SNPs) in GWAS based on the Haseman–Elston regression. (H–E) (DeFries 2010). ..... For instance, if a population is admixed or genetic heterogeneity, the ...
The coupled dynamical problem of thermoelasticity in case of large temperature differences
International Nuclear Information System (INIS)
Szekeres, A.
1981-01-01
In the tasks of thermoelasticity in general, also in dynamical problems it is common to suppose small temperature differences. The equations used in scientific literature refer to these. It arises the thought of what is the influence on the dynamical problems of taking into account the large temperature changes. To investigate this first we present the general equation of heat conduction in case of small temperature differences according to Nowacki and Biot. On this basis we introduce the general equation of heat conduction with large temperature changes. Some remarks show the connection between the two cases. Using the latter in the equations of thermoelasticity we write down the expressions of the problem for the thermal shock of a long bar. Finally we show the results of the numerical example and the experimental opoortunity to measure some of the constants. (orig.)
The interaction problems between large and small business in modern conditions
Directory of Open Access Journals (Sweden)
Belyaev Mikhail
2017-01-01
Full Text Available The development of market relations, changes of the conditions in the business environment encourage the enterprises to look for new management methods and to improve forms of interaction. In this regard, the identification of the interaction of large and small businesses, as well as the evaluation of their relation development seems important and urgent problem in modern conditions. The purpose of the survey – the study of the interaction of large and small businesses, as well as the evaluation of the relations between them. The study was conducted on the basis of a comprehensive and systematic approach in which methods of comparative, retrospective, statistical, mathematical analysis are used. In accordance with the purpose there are identified the prerequisites for the development of large and small businesses, features and their functioning problems, and the links between them. The most common form of interaction of large and small enterprises were identified - outsourcing, franchising, leasing, subcontracting, venture financing, the establishment of regional cooperation forms of large and small businesses. However, the cooperative processes of large and small business in Russia developed are not enough today.The authors identified factors that impede the growth of Russian production, offered recommendations for the development of large and small businesses, justified the state's role in this process. In addition, they described the mechanism of state support of small business, including organizational, financial, information and consulting components.
He, Qiang; Hu, Xiangtao; Ren, Hong; Zhang, Hongqi
2015-11-01
A novel artificial fish swarm algorithm (NAFSA) is proposed for solving large-scale reliability-redundancy allocation problem (RAP). In NAFSA, the social behaviors of fish swarm are classified in three ways: foraging behavior, reproductive behavior, and random behavior. The foraging behavior designs two position-updating strategies. And, the selection and crossover operators are applied to define the reproductive ability of an artificial fish. For the random behavior, which is essentially a mutation strategy, the basic cloud generator is used as the mutation operator. Finally, numerical results of four benchmark problems and a large-scale RAP are reported and compared. NAFSA shows good performance in terms of computational accuracy and computational efficiency for large scale RAP. Copyright © 2015 ISA. Published by Elsevier Ltd. All rights reserved.
Parallel-In-Time For Moving Meshes
Energy Technology Data Exchange (ETDEWEB)
Falgout, R. D. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Manteuffel, T. A. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Southworth, B. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Schroder, J. B. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)
2016-02-04
With steadily growing computational resources available, scientists must develop e ective ways to utilize the increased resources. High performance, highly parallel software has be- come a standard. However until recent years parallelism has focused primarily on the spatial domain. When solving a space-time partial di erential equation (PDE), this leads to a sequential bottleneck in the temporal dimension, particularly when taking a large number of time steps. The XBraid parallel-in-time library was developed as a practical way to add temporal parallelism to existing se- quential codes with only minor modi cations. In this work, a rezoning-type moving mesh is applied to a di usion problem and formulated in a parallel-in-time framework. Tests and scaling studies are run using XBraid and demonstrate excellent results for the simple model problem considered herein.
Averkin, Sergey N.; Gatsonis, Nikolaos A.
2018-06-01
An unstructured electrostatic Particle-In-Cell (EUPIC) method is developed on arbitrary tetrahedral grids for simulation of plasmas bounded by arbitrary geometries. The electric potential in EUPIC is obtained on cell vertices from a finite volume Multi-Point Flux Approximation of Gauss' law using the indirect dual cell with Dirichlet, Neumann and external circuit boundary conditions. The resulting matrix equation for the nodal potential is solved with a restarted generalized minimal residual method (GMRES) and an ILU(0) preconditioner algorithm, parallelized using a combination of node coloring and level scheduling approaches. The electric field on vertices is obtained using the gradient theorem applied to the indirect dual cell. The algorithms for injection, particle loading, particle motion, and particle tracking are parallelized for unstructured tetrahedral grids. The algorithms for the potential solver, electric field evaluation, loading, scatter-gather algorithms are verified using analytic solutions for test cases subject to Laplace and Poisson equations. Grid sensitivity analysis examines the L2 and L∞ norms of the relative error in potential, field, and charge density as a function of edge-averaged and volume-averaged cell size. Analysis shows second order of convergence for the potential and first order of convergence for the electric field and charge density. Temporal sensitivity analysis is performed and the momentum and energy conservation properties of the particle integrators in EUPIC are examined. The effects of cell size and timestep on heating, slowing-down and the deflection times are quantified. The heating, slowing-down and the deflection times are found to be almost linearly dependent on number of particles per cell. EUPIC simulations of current collection by cylindrical Langmuir probes in collisionless plasmas show good comparison with previous experimentally validated numerical results. These simulations were also used in a parallelization
Energy Technology Data Exchange (ETDEWEB)
Birkner, P.; Foerg, R. [Lech-Elektrizitaetswerke AG, Augsburg (Germany)
1998-06-29
For practical conditions one can find currents in underground electrical conductors like cable coverings earthed on both sides. As an example these currents are due to the alternating current system of the railroad or to the alternating current system of a Peterson coil, that tries to find a minimum resistance way from the transformer station to the place of the earth fault. Currents like these create a series voltage in the cable by inductive coupling. The voltage depends on the type and the length of the cable. The series voltages of all three phases form a zero sequence system. Taking into consideration that two cable systems running parallel to another, under certain circumstances it is possible to achieve a circulating zero sequence current. Additionally there is a shift voltage between the neutral point and the earth in the case of an earth fault in another place in the grid. The combination of these two factors can cause a malfunction of the earth fault detecting relays that are assigned to the parallel cable system. (orig.) [Deutsch] Im Erdreich vorhandene elektrische Leiter, z.B. die beidseitig geerdeten Schirme von Energiekabeln, werden in der Praxis nicht selten von Stroemen beaufschlagt. Dabei kann es sich z.B. auch um den Wechselstrom einer Petersenspule, der sich im Erdschlussfall einen widerstandsminimierten Weg vom Umspannwerk zur Fehlerstelle sucht, handeln. Ueber induktive Einkopplung entsteht im Leiter des Kabels eine Laengsspannung. Deren Hoehe ist vom Kabeltyp und der Kabellaenge abhaengig. Liegt als Netzkonfiguration eine Doppelleitung vor, die parallel betrieben wird, so koennen sich unter gewissen Randbedingungen kreisende Nullstroeme ausbilden. Diese wiederum koennen bei Vorhandensein einer Verlagerungsspannung zu einem Fehlansprechen von wattmetrischen Erdschlussrichtungsrelais fuehren. (orig.)
Solution of large nonlinear time-dependent problems using reduced coordinates
International Nuclear Information System (INIS)
Mish, K.D.
1987-01-01
This research is concerned with the idea of reducing a large time-dependent problem, such as one obtained from a finite-element discretization, down to a more manageable size while preserving the most-important physical behavior of the solution. This reduction process is motivated by the concept of a projection operator on a Hilbert Space, and leads to the Lanczos Algorithm for generation of approximate eigenvectors of a large symmetric matrix. The Lanczos Algorithm is then used to develop a reduced form of the spatial component of a time-dependent problem. The solution of the remaining temporal part of the problem is considered from the standpoint of numerical-integration schemes in the time domain. All of these theoretical results are combined to motivate the proposed reduced coordinate algorithm. This algorithm is then developed, discussed, and compared to related methods from the mechanics literature. The proposed reduced coordinate method is then applied to the solution of some representative problems in mechanics. The results of these problems are discussed, conclusions are drawn, and suggestions are made for related future research
A hybrid adaptive large neighborhood search algorithm applied to a lot-sizing problem
DEFF Research Database (Denmark)
Muller, Laurent Flindt; Spoorendonk, Simon
This paper presents a hybrid of a general heuristic framework that has been successfully applied to vehicle routing problems and a general purpose MIP solver. The framework uses local search and an adaptive procedure which choses between a set of large neighborhoods to be searched. A mixed integer...... of a solution and to investigate the feasibility of elements in such a neighborhood. The hybrid heuristic framework is applied to the multi-item capacitated lot sizing problem with dynamic lot sizes, where experiments have been conducted on a series of instances from the literature. On average the heuristic...
Structural problems of public participation in large-scale projects with environmental impact
International Nuclear Information System (INIS)
Bechmann, G.
1989-01-01
Four items are discussed showing that the problems involved through participation of the public in large-scale projects with environmental impact cannot be solved satisfactorily without suitable modification of the existing legal framework. The problematic items are: the status of the electric utilities as a quasi public enterprise; informal preliminary negotiations; the penetration of scientific argumentation into administrative decisions; the procedural concept. The paper discusses the fundamental issue of the problem-adequate design of the procedure and develops suggestions for a cooperative participation design. (orig./HSCH) [de
DEFF Research Database (Denmark)
Wøhlk, Sanne; Laporte, Gilbert
2017-01-01
The aim of this paper is to computationally compare several algorithms for the Minimum Cost Perfect Matching Problem on an undirected complete graph. Our work is motivated by the need to solve large instances of the Capacitated Arc Routing Problem (CARP) arising in the optimization of garbage...... collection in Denmark. Common heuristics for the CARP involve the optimal matching of the odd-degree nodes of a graph. The algorithms used in the comparison include the CPLEX solution of an exact formulation, the LEDA matching algorithm, a recent implementation of the Blossom algorithm, as well as six...
Fox, Geoffrey C; Messina, Guiseppe C
2014-01-01
A clear illustration of how parallel computers can be successfully appliedto large-scale scientific computations. This book demonstrates how avariety of applications in physics, biology, mathematics and other scienceswere implemented on real parallel computers to produce new scientificresults. It investigates issues of fine-grained parallelism relevant forfuture supercomputers with particular emphasis on hypercube architecture. The authors describe how they used an experimental approach to configuredifferent massively parallel machines, design and implement basic systemsoftware, and develop
Improving the computation efficiency of COBRA-TF for LWR safety analysis of large problems
International Nuclear Information System (INIS)
Cuervo, D.; Avramova, M. N.; Ivanov, K. N.
2004-01-01
A matrix solver is implemented in COBRA-TF in order to improve the computation efficiency of both numerical solution methods existing in the code, the Gauss elimination and the Gauss-Seidel iterative technique. Both methods are used to solve the system of pressure linear equations and relay on the solution of large sparse matrices. The introduced solver accelerates the solution of these matrices in cases of large number of cells. The execution time is reduced in half as compared to the execution time without using matrix solver for the cases with large matrices. The achieved improvement and the planned future work in this direction are important for performing efficient LWR safety analyses of large problems. (authors)
Numerical solution of large nonlinear boundary value problems by quadratic minimization techniques
International Nuclear Information System (INIS)
Glowinski, R.; Le Tallec, P.
1984-01-01
The objective of this paper is to describe the numerical treatment of large highly nonlinear two or three dimensional boundary value problems by quadratic minimization techniques. In all the different situations where these techniques were applied, the methodology remains the same and is organized as follows: 1) derive a variational formulation of the original boundary value problem, and approximate it by Galerkin methods; 2) transform this variational formulation into a quadratic minimization problem (least squares methods) or into a sequence of quadratic minimization problems (augmented lagrangian decomposition); 3) solve each quadratic minimization problem by a conjugate gradient method with preconditioning, the preconditioning matrix being sparse, positive definite, and fixed once for all in the iterative process. This paper will illustrate the methodology above on two different examples: the description of least squares solution methods and their application to the solution of the unsteady Navier-Stokes equations for incompressible viscous fluids; the description of augmented lagrangian decomposition techniques and their application to the solution of equilibrium problems in finite elasticity
Domain decomposition methods and parallel computing
International Nuclear Information System (INIS)
Meurant, G.
1991-01-01
In this paper, we show how to efficiently solve large linear systems on parallel computers. These linear systems arise from discretization of scientific computing problems described by systems of partial differential equations. We show how to get a discrete finite dimensional system from the continuous problem and the chosen conjugate gradient iterative algorithm is briefly described. Then, the different kinds of parallel architectures are reviewed and their advantages and deficiencies are emphasized. We sketch the problems found in programming the conjugate gradient method on parallel computers. For this algorithm to be efficient on parallel machines, domain decomposition techniques are introduced. We give results of numerical experiments showing that these techniques allow a good rate of convergence for the conjugate gradient algorithm as well as computational speeds in excess of a billion of floating point operations per second. (author). 5 refs., 11 figs., 2 tabs., 1 inset
Parallel Computing Strategies for Irregular Algorithms
Biswas, Rupak; Oliker, Leonid; Shan, Hongzhang; Biegel, Bryan (Technical Monitor)
2002-01-01
Parallel computing promises several orders of magnitude increase in our ability to solve realistic computationally-intensive problems, but relies on their efficient mapping and execution on large-scale multiprocessor architectures. Unfortunately, many important applications are irregular and dynamic in nature, making their effective parallel implementation a daunting task. Moreover, with the proliferation of parallel architectures and programming paradigms, the typical scientist is faced with a plethora of questions that must be answered in order to obtain an acceptable parallel implementation of the solution algorithm. In this paper, we consider three representative irregular applications: unstructured remeshing, sparse matrix computations, and N-body problems, and parallelize them using various popular programming paradigms on a wide spectrum of computer platforms ranging from state-of-the-art supercomputers to PC clusters. We present the underlying problems, the solution algorithms, and the parallel implementation strategies. Smart load-balancing, partitioning, and ordering techniques are used to enhance parallel performance. Overall results demonstrate the complexity of efficiently parallelizing irregular algorithms.
DEFF Research Database (Denmark)
Muller, Laurent Flindt
2009-01-01
We present an application of an Adaptive Large Neighborhood Search (ALNS) algorithm to the Resource-constrained Project Scheduling Problem (RCPSP). The ALNS framework was first proposed by Pisinger and Røpke [19] and can be described as a large neighborhood search algorithm with an adaptive layer......, where a set of destroy/repair neighborhoods compete to modify the current solution in each iteration of the algorithm. Experiments are performed on the wellknown J30, J60 and J120 benchmark instances, which show that the proposed algorithm is competitive and confirms the strength of the ALNS framework...
Large radiative corrections to the effective potential and the gauge hierarchy problem
International Nuclear Information System (INIS)
Sachrajda, C.T.C.
1982-01-01
We study the higher order corrections to the effective potential in a simple toy model and in the SU(5) grand unified theory, with a view to seeing what their effects are on the stability equations, and hence on the gauge hierarchy problem for these theories. These corrections contain powers of log (v 2 /h 2 ), where v and h are the large and small vacuum expectation values respectively, and hence cannot a priori be neglected. Nevertheless, after summing these large logarithms we find that the stability equations always contain two equations for v (i.e. these equations are independent of h) and hence can only be satisfied by a special (and hence unnatural) choice of parameters. This we claim is the precise statement of the gauge hierarchy problem. (orig.)
A Large Scale Problem Based Learning inter-European Student Satellite Construction Project
DEFF Research Database (Denmark)
Nielsen, Jens Frederik Dalsgaard; Alminde, Lars; Bisgaard, Morten
2006-01-01
that electronic communication technology was vital within the project. Additionally the SSETI EXPRESS project implied the following problems it didn’t fit to a standard semester - 18 months for the satellite project compared to 5/6 months for a “normal” semester project. difficulties in integrating the tasks......A LARGE SCALE PROBLEM BASED LEARNING INTER-EUROPEAN STUDENT SATELLITE CONSTRUCTION PROJECT This paper describes the pedagogical outcome of a large scale PBL experiment. ESA (European Space Agency) Education Office launched January 2004 an ambitious project: Let students from all over Europe build....... The satellite was successfully launched on October 27th 2005 (http://www.express.space.aau.dk). The project was a student driven project with student project responsibility adding at lot of international experiences and project management skills to the outcome of more traditional one semester, single group...
Solving Large Scale Nonlinear Eigenvalue Problem in Next-Generation Accelerator Design
Energy Technology Data Exchange (ETDEWEB)
Liao, Ben-Shan; Bai, Zhaojun; /UC, Davis; Lee, Lie-Quan; Ko, Kwok; /SLAC
2006-09-28
A number of numerical methods, including inverse iteration, method of successive linear problem and nonlinear Arnoldi algorithm, are studied in this paper to solve a large scale nonlinear eigenvalue problem arising from finite element analysis of resonant frequencies and external Q{sub e} values of a waveguide loaded cavity in the next-generation accelerator design. They present a nonlinear Rayleigh-Ritz iterative projection algorithm, NRRIT in short and demonstrate that it is the most promising approach for a model scale cavity design. The NRRIT algorithm is an extension of the nonlinear Arnoldi algorithm due to Voss. Computational challenges of solving such a nonlinear eigenvalue problem for a full scale cavity design are outlined.
The holographic dual of a Riemann problem in a large number of dimensions
Energy Technology Data Exchange (ETDEWEB)
Herzog, Christopher P.; Spillane, Michael [C.N. Yang Institute for Theoretical Physics, Department of Physics and Astronomy,Stony Brook University, Stony Brook, NY 11794 (United States); Yarom, Amos [Department of Physics, Technion,Haifa 32000 (Israel)
2016-08-22
We study properties of a non equilibrium steady state generated when two heat baths are initially in contact with one another. The dynamics of the system we study are governed by holographic duality in a large number of dimensions. We discuss the “phase diagram” associated with the steady state, the dual, dynamical, black hole description of this problem, and its relation to the fluid/gravity correspondence.
McCallum, Ethan
2011-01-01
It's tough to argue with R as a high-quality, cross-platform, open source statistical software product-unless you're in the business of crunching Big Data. This concise book introduces you to several strategies for using R to analyze large datasets. You'll learn the basics of Snow, Multicore, Parallel, and some Hadoop-related tools, including how to find them, how to use them, when they work well, and when they don't. With these packages, you can overcome R's single-threaded nature by spreading work across multiple CPUs, or offloading work to multiple machines to address R's memory barrier.
Improvement of Monte Carlo code A3MCNP for large-scale shielding problems
International Nuclear Information System (INIS)
Miyake, Y.; Ohmura, M.; Hasegawa, T.; Ueki, K.; Sato, O.; Haghighat, A.; Sjoden, G.E.
2004-01-01
A 3 MCNP (Automatic Adjoint Accelerated MCNP) is a revised version of the MCNP Monte Carlo code, that automatically prepares variance reduction parameters for the CADIS (Consistent Adjoint Driven Importance Sampling) methodology. Using a deterministic 'importance' (or adjoint) function, CADIS performs source and transport biasing within the weight-window technique. The current version of A 3 MCNP uses the 3-D Sn transport TORT code to determine a 3-D importance function distribution. Based on simulation of several real-life problems, it is demonstrated that A 3 MCNP provides precise calculation results with a remarkably short computation time by using the proper and objective variance reduction parameters. However, since the first version of A 3 MCNP provided only a point source configuration option for large-scale shielding problems, such as spent-fuel transport casks, a large amount of memory may be necessary to store enough points to properly represent the source. Hence, we have developed an improved version of A 3 MCNP (referred to as A 3 MCNPV) which has a volumetric source configuration option. This paper describes the successful use of A 3 MCNPV for a concrete cask streaming problem and a PWR dosimetry problem. (author)
Directory of Open Access Journals (Sweden)
C. Barthe
2012-01-01
Full Text Available The paper describes the fully parallelized electrical scheme CELLS which is suitable to simulate explicitly electrified storm systems on parallel computers. Our motivation here is to show that a cloud electricity scheme can be developed for use on large grids with complex terrain. Large computational domains are needed to perform real case meteorological simulations with many independent convective cells.
The scheme computes the bulk electric charge attached to each cloud particle and hydrometeor. Positive and negative ions are also taken into account. Several parametrizations of the dominant non-inductive charging process are included and an inductive charging process as well. The electric field is obtained by inverting the Gauss equation with an extension to terrain-following coordinates. The new feature concerns the lightning flash scheme which is a simplified version of an older detailed sequential scheme. Flashes are composed of a bidirectional leader phase (vertical extension from the triggering point and a phase obeying a fractal law (with horizontal extension on electrically charged zones. The originality of the scheme lies in the way the branching phase is treated to get a parallel code.
The complete electrification scheme is tested for the 10 July 1996 STERAO case and for the 21 July 1998 EULINOX case. Flash characteristics are analysed in detail and additional sensitivity experiments are performed for the STERAO case. Although the simulations were run for flat terrain conditions, they show that the model behaves well on multiprocessor computers. This opens a wide area of application for this electrical scheme with the next objective of running real meterological case on large domains.
Sensitivity Analysis of the Proximal-Based Parallel Decomposition Methods
Directory of Open Access Journals (Sweden)
Feng Ma
2014-01-01
Full Text Available The proximal-based parallel decomposition methods were recently proposed to solve structured convex optimization problems. These algorithms are eligible for parallel computation and can be used efficiently for solving large-scale separable problems. In this paper, compared with the previous theoretical results, we show that the range of the involved parameters can be enlarged while the convergence can be still established. Preliminary numerical tests on stable principal component pursuit problem testify to the advantages of the enlargement.
Compiler Technology for Parallel Scientific Computation
Directory of Open Access Journals (Sweden)
Can Özturan
1994-01-01
Full Text Available There is a need for compiler technology that, given the source program, will generate efficient parallel codes for different architectures with minimal user involvement. Parallel computation is becoming indispensable in solving large-scale problems in science and engineering. Yet, the use of parallel computation is limited by the high costs of developing the needed software. To overcome this difficulty we advocate a comprehensive approach to the development of scalable architecture-independent software for scientific computation based on our experience with equational programming language (EPL. Our approach is based on a program decomposition, parallel code synthesis, and run-time support for parallel scientific computation. The program decomposition is guided by the source program annotations provided by the user. The synthesis of parallel code is based on configurations that describe the overall computation as a set of interacting components. Run-time support is provided by the compiler-generated code that redistributes computation and data during object program execution. The generated parallel code is optimized using techniques of data alignment, operator placement, wavefront determination, and memory optimization. In this article we discuss annotations, configurations, parallel code generation, and run-time support suitable for parallel programs written in the functional parallel programming language EPL and in Fortran.
International Nuclear Information System (INIS)
Nahas, Nabil; Nourelfath, Mustapha; Ait-Kadi, Daoud
2007-01-01
The redundancy allocation problem (RAP) is a well known NP-hard problem which involves the selection of elements and redundancy levels to maximize system reliability given various system-level constraints. As telecommunications and internet protocol networks, manufacturing and power systems are becoming more and more complex, while requiring short developments schedules and very high reliability, it is becoming increasingly important to develop efficient solutions to the RAP. This paper presents an efficient algorithm to solve this reliability optimization problem. The idea of a heuristic approach design is inspired from the ant colony meta-heuristic optimization method and the degraded ceiling local search technique. Our hybridization of the ant colony meta-heuristic with the degraded ceiling performs well and is competitive with the best-known heuristics for redundancy allocation. Numerical results for the 33 test problems from previous research are reported and compared. The solutions found by our approach are all better than or are in par with the well-known best solutions
Parallel Algorithms and Patterns
Energy Technology Data Exchange (ETDEWEB)
Robey, Robert W. [Los Alamos National Lab. (LANL), Los Alamos, NM (United States)
2016-06-16
This is a powerpoint presentation on parallel algorithms and patterns. A parallel algorithm is a well-defined, step-by-step computational procedure that emphasizes concurrency to solve a problem. Examples of problems include: Sorting, searching, optimization, matrix operations. A parallel pattern is a computational step in a sequence of independent, potentially concurrent operations that occurs in diverse scenarios with some frequency. Examples are: Reductions, prefix scans, ghost cell updates. We only touch on parallel patterns in this presentation. It really deserves its own detailed discussion which Gabe Rockefeller would like to develop.
Poulet, Thomas; Paesold, Martin; Veveakis, Manolis
2017-03-01
Faults play a major role in many economically and environmentally important geological systems, ranging from impermeable seals in petroleum reservoirs to fluid pathways in ore-forming hydrothermal systems. Their behavior is therefore widely studied and fault mechanics is particularly focused on the mechanisms explaining their transient evolution. Single faults can change in time from seals to open channels as they become seismically active and various models have recently been presented to explain the driving forces responsible for such transitions. A model of particular interest is the multi-physics oscillator of Alevizos et al. (J Geophys Res Solid Earth 119(6), 4558-4582, 2014) which extends the traditional rate and state friction approach to rate and temperature-dependent ductile rocks, and has been successfully applied to explain spatial features of exposed thrusts as well as temporal evolutions of current subduction zones. In this contribution we implement that model in REDBACK, a parallel open-source multi-physics simulator developed to solve such geological instabilities in three dimensions. The resolution of the underlying system of equations in a tightly coupled manner allows REDBACK to capture appropriately the various theoretical regimes of the system, including the periodic and non-periodic instabilities. REDBACK can then be used to simulate the drastic permeability evolution in time of such systems, where nominally impermeable faults can sporadically become fluid pathways, with permeability increases of several orders of magnitude.
Cao, Yang; Liu, Chun; Huang, Yuehui; Wang, Tieqiang; Sun, Chenjun; Yuan, Yue; Zhang, Xinsong; Wu, Shuyun
2017-02-01
With the development of roof photovoltaic power (PV) generation technology and the increasingly urgent need to improve supply reliability levels in remote areas, islanded microgrid with photovoltaic and energy storage systems (IMPE) is developing rapidly. The high costs of photovoltaic panel material and energy storage battery material have become the primary factors that hinder the development of IMPE. The advantages and disadvantages of different types of photovoltaic panel materials and energy storage battery materials are analyzed in this paper, and guidance is provided on material selection for IMPE planners. The time sequential simulation method is applied to optimize material demands of the IMPE. The model is solved by parallel algorithms that are provided by a commercial solver named CPLEX. Finally, to verify the model, an actual IMPE is selected as a case system. Simulation results on the case system indicate that the optimization model and corresponding algorithm is feasible. Guidance for material selection and quantity demand for IMPEs in remote areas is provided by this method.
Energy Technology Data Exchange (ETDEWEB)
Almeida, Adino Americo Heimlich
2009-07-01
Graphics Processing Units (GPU) are high performance co-processors intended, originally, to improve the use and quality of computer graphics applications. Since researchers and practitioners realized the potential of using GPU for general purpose, their application has been extended to other fields out of computer graphics scope. The main objective of this work is to evaluate the impact of using GPU in two typical problems of Nuclear area. The neutron transport simulation using Monte Carlo method and solve heat equation in a bi-dimensional domain by finite differences method. To achieve this, we develop parallel algorithms for GPU and CPU in the two problems described above. The comparison showed that the GPU-based approach is faster than the CPU in a computer with two quad core processors, without precision loss. (author)
Directory of Open Access Journals (Sweden)
Bailing Liu
2015-01-01
Full Text Available Facility location, inventory control, and vehicle routes scheduling are three key issues to be settled in the design of logistics system for e-commerce. Due to the online shopping features of e-commerce, customer returns are becoming much more than traditional commerce. This paper studies a three-phase supply chain distribution system consisting of one supplier, a set of retailers, and a single type of product with continuous review (Q, r inventory policy. We formulate a stochastic location-inventory-routing problem (LIRP model with no quality defects returns. To solve the NP-hand problem, a pseudo-parallel genetic algorithm integrating simulated annealing (PPGASA is proposed. The computational results show that PPGASA outperforms GA on optimal solution, computing time, and computing stability.
Demir, I.; Krajewski, W. F.
2013-12-01
As geoscientists are confronted with increasingly massive datasets from environmental observations to simulations, one of the biggest challenges is having the right tools to gain scientific insight from the data and communicate the understanding to stakeholders. Recent developments in web technologies make it easy to manage, visualize and share large data sets with general public. Novel visualization techniques and dynamic user interfaces allow users to interact with data, and modify the parameters to create custom views of the data to gain insight from simulations and environmental observations. This requires developing new data models and intelligent knowledge discovery techniques to explore and extract information from complex computational simulations or large data repositories. Scientific visualization will be an increasingly important component to build comprehensive environmental information platforms. This presentation provides an overview of the trends and challenges in the field of scientific visualization, and demonstrates information visualization and communication tools developed within the light of these challenges.
Scaling Up Coordinate Descent Algorithms for Large ℓ1 Regularization Problems
Energy Technology Data Exchange (ETDEWEB)
Scherrer, Chad; Halappanavar, Mahantesh; Tewari, Ambuj; Haglin, David J.
2012-07-03
We present a generic framework for parallel coordinate descent (CD) algorithms that has as special cases the original sequential algorithms of Cyclic CD and Stochastic CD, as well as the recent parallel Shotgun algorithm of Bradley et al. We introduce two novel parallel algorithms that are also special cases---Thread-Greedy CD and Coloring-Based CD---and give performance measurements for an OpenMP implementation of these.
Large Scale Earth's Bow Shock with Northern IMF as Simulated by PIC Code in Parallel with MHD Model
Baraka, Suleiman
2016-06-01
In this paper, we propose a 3D kinetic model (particle-in-cell, PIC) for the description of the large scale Earth's bow shock. The proposed version is stable and does not require huge or extensive computer resources. Because PIC simulations work with scaled plasma and field parameters, we also propose to validate our code by comparing its results with the available MHD simulations under same scaled solar wind (SW) and (IMF) conditions. We report new results from the two models. In both codes the Earth's bow shock position is found to be ≈14.8 R E along the Sun-Earth line, and ≈29 R E on the dusk side. Those findings are consistent with past in situ observations. Both simulations reproduce the theoretical jump conditions at the shock. However, the PIC code density and temperature distributions are inflated and slightly shifted sunward when compared to the MHD results. Kinetic electron motions and reflected ions upstream may cause this sunward shift. Species distributions in the foreshock region are depicted within the transition of the shock (measured ≈2 c/ ω pi for Θ Bn = 90° and M MS = 4.7) and in the downstream. The size of the foot jump in the magnetic field at the shock is measured to be (1.7 c/ ω pi ). In the foreshocked region, the thermal velocity is found equal to 213 km s-1 at 15 R E and is equal to 63 km s -1 at 12 R E (magnetosheath region). Despite the large cell size of the current version of the PIC code, it is powerful to retain macrostructure of planets magnetospheres in very short time, thus it can be used for pedagogical test purposes. It is also likely complementary with MHD to deepen our understanding of the large scale magnetosphere.
Sparse deconvolution for the large-scale ill-posed inverse problem of impact force reconstruction
Qiao, Baijie; Zhang, Xingwu; Gao, Jiawei; Liu, Ruonan; Chen, Xuefeng
2017-01-01
Most previous regularization methods for solving the inverse problem of force reconstruction are to minimize the l2-norm of the desired force. However, these traditional regularization methods such as Tikhonov regularization and truncated singular value decomposition, commonly fail to solve the large-scale ill-posed inverse problem in moderate computational cost. In this paper, taking into account the sparse characteristic of impact force, the idea of sparse deconvolution is first introduced to the field of impact force reconstruction and a general sparse deconvolution model of impact force is constructed. Second, a novel impact force reconstruction method based on the primal-dual interior point method (PDIPM) is proposed to solve such a large-scale sparse deconvolution model, where minimizing the l2-norm is replaced by minimizing the l1-norm. Meanwhile, the preconditioned conjugate gradient algorithm is used to compute the search direction of PDIPM with high computational efficiency. Finally, two experiments including the small-scale or medium-scale single impact force reconstruction and the relatively large-scale consecutive impact force reconstruction are conducted on a composite wind turbine blade and a shell structure to illustrate the advantage of PDIPM. Compared with Tikhonov regularization, PDIPM is more efficient, accurate and robust whether in the single impact force reconstruction or in the consecutive impact force reconstruction.
Energy Technology Data Exchange (ETDEWEB)
Sato, S; Sugiyama, S; Nagata, K; Nanba, K; Shimono, M [Sumitomo Light Metal Industries Ltd., Tokyo (Japan)
1977-06-01
The corrosion resistance of titanium in sea water is extremely excellent, but titanium tubes are expensive, and the copper alloy tubes resistant in polluted sea water were developed, therefore they were not used practically. In 1970, ammonia attack was found on the copper alloy tubes in the air-cooled portion of condensers, and titanium tubes have been used as the countermeasure. As the result of the use, the galvanic attack on copper alloy tube plates with titanium tubes as cathode and the hydrogen absorption at titanium tube ends owing to excess electrolytic protection was observed, but the corrosion resistance of titanium tubes was perfect. These problems can be controlled by the application of proper electrolytic protection. The condensers with all titanium tubes adopted recently in USA are intended to realize perfectly no-leak condensers as the countermeasure to the corrosion in steam generators of PWR plants. Regarding large condensers of nowadays, three problems are pointed out, namely the vibration of condenser tubes, the method of joining tubes and tube plates, and the tubes of no coolant leak. These three problems in case of titanium tubes were studied, and the problem of the fouling of tubes was also examined. The intervals of supporting plates for titanium tubes should be narrowed. The joining of titanium tubes and titanium tube plates by welding is feasible and promising. The cleaning with sponge balls is effective to control fouling.
Large neighborhood search for the double traveling salesman problem with multiple stacks
Energy Technology Data Exchange (ETDEWEB)
Bent, Russell W [Los Alamos National Laboratory; Van Hentenryck, Pascal [BROWN UNIV
2009-01-01
This paper considers a complex real-life short-haul/long haul pickup and delivery application. The problem can be modeled as double traveling salesman problem (TSP) in which the pickups and the deliveries happen in the first and second TSPs respectively. Moreover, the application features multiple stacks in which the items must be stored and the pickups and deliveries must take place in reserve (LIFO) order for each stack. The goal is to minimize the total travel time satisfying these constraints. This paper presents a large neighborhood search (LNS) algorithm which improves the best-known results on 65% of the available instances and is always within 2% of the best-known solutions.
International Nuclear Information System (INIS)
Chang, Jonghwa
2014-01-01
Today, we can use a computer cluster consist of a few hundreds CPUs with reasonable budget. Such computer system enables us to do detailed modeling of reactor core. The detailed modeling will improve the safety and the economics of a nuclear reactor by eliminating un-necessary conservatism or missing consideration. To take advantage of such a cluster computer, efficient parallel algorithms must be developed. Mechanical structure analysis community has studied the domain decomposition method to solve the stress-strain equation using the finite element methods. One of the most successful domain decomposition method in terms of robustness is FETI-DP. We have modified the original FETI-DP to solve the eigenvalue problem for the multi-group diffusion problem in previous study. In this study, we report the result of recent modification to handle the three-dimensional subdomain partitioning, and the sub-domain multi-group problem. Modified FETI-DP algorithm has been successfully applied for the eigenvalue problem of multi-group neutron diffusion equation. The overall CPU time is decreasing as number of sub-domains (partitions) is increasing. However, there may be a limit in decrement due to increment of the number of primal points will increase the CPU time spent by the solution of the global equation. Even distribution of computational load (criterion a) is important to achieve fast computation. The subdomain partition can be effectively performed using suitable graph theory partition package such as MeTIS
Energy Technology Data Exchange (ETDEWEB)
Chang, Jonghwa [Korea Atomic Energy Research Institute, Daejeon (Korea, Republic of)
2014-10-15
Today, we can use a computer cluster consist of a few hundreds CPUs with reasonable budget. Such computer system enables us to do detailed modeling of reactor core. The detailed modeling will improve the safety and the economics of a nuclear reactor by eliminating un-necessary conservatism or missing consideration. To take advantage of such a cluster computer, efficient parallel algorithms must be developed. Mechanical structure analysis community has studied the domain decomposition method to solve the stress-strain equation using the finite element methods. One of the most successful domain decomposition method in terms of robustness is FETI-DP. We have modified the original FETI-DP to solve the eigenvalue problem for the multi-group diffusion problem in previous study. In this study, we report the result of recent modification to handle the three-dimensional subdomain partitioning, and the sub-domain multi-group problem. Modified FETI-DP algorithm has been successfully applied for the eigenvalue problem of multi-group neutron diffusion equation. The overall CPU time is decreasing as number of sub-domains (partitions) is increasing. However, there may be a limit in decrement due to increment of the number of primal points will increase the CPU time spent by the solution of the global equation. Even distribution of computational load (criterion a) is important to achieve fast computation. The subdomain partition can be effectively performed using suitable graph theory partition package such as MeTIS.
Energy Technology Data Exchange (ETDEWEB)
Verma, Prakash; Morales, Jorge A., E-mail: jorge.morales@ttu.edu [Department of Chemistry and Biochemistry, Texas Tech University, P.O. Box 41061, Lubbock, Texas 79409-1061 (United States); Perera, Ajith [Department of Chemistry and Biochemistry, Texas Tech University, P.O. Box 41061, Lubbock, Texas 79409-1061 (United States); Department of Chemistry, Quantum Theory Project, University of Florida, Gainesville, Florida 32611 (United States)
2013-11-07
Coupled cluster (CC) methods provide highly accurate predictions of molecular properties, but their high computational cost has precluded their routine application to large systems. Fortunately, recent computational developments in the ACES III program by the Bartlett group [the OED/ERD atomic integral package, the super instruction processor, and the super instruction architecture language] permit overcoming that limitation by providing a framework for massively parallel CC implementations. In that scheme, we are further extending those parallel CC efforts to systematically predict the three main electron spin resonance (ESR) tensors (A-, g-, and D-tensors) to be reported in a series of papers. In this paper inaugurating that series, we report our new ACES III parallel capabilities that calculate isotropic hyperfine coupling constants in 38 neutral, cationic, and anionic radicals that include the {sup 11}B, {sup 17}O, {sup 9}Be, {sup 19}F, {sup 1}H, {sup 13}C, {sup 35}Cl, {sup 33}S,{sup 14}N, {sup 31}P, and {sup 67}Zn nuclei. Present parallel calculations are conducted at the Hartree-Fock (HF), second-order many-body perturbation theory [MBPT(2)], CC singles and doubles (CCSD), and CCSD with perturbative triples [CCSD(T)] levels using Roos augmented double- and triple-zeta atomic natural orbitals basis sets. HF results consistently overestimate isotropic hyperfine coupling constants. However, inclusion of electron correlation effects in the simplest way via MBPT(2) provides significant improvements in the predictions, but not without occasional failures. In contrast, CCSD results are consistently in very good agreement with experimental results. Inclusion of perturbative triples to CCSD via CCSD(T) leads to small improvements in the predictions, which might not compensate for the extra computational effort at a non-iterative N{sup 7}-scaling in CCSD(T). The importance of these accurate computations of isotropic hyperfine coupling constants to elucidate
Pseudoinverse preconditioners and iterative methods for large dense linear least-squares problems
Directory of Open Access Journals (Sweden)
Oskar Cahueñas
2013-05-01
Full Text Available We address the issue of approximating the pseudoinverse of the coefficient matrix for dynamically building preconditioning strategies for the numerical solution of large dense linear least-squares problems. The new preconditioning strategies are embedded into simple and well-known iterative schemes that avoid the use of the, usually ill-conditioned, normal equations. We analyze a scheme to approximate the pseudoinverse, based on Schulz iterative method, and also different iterative schemes, based on extensions of Richardson's method, and the conjugate gradient method, that are suitable for preconditioning strategies. We present preliminary numerical results to illustrate the advantages of the proposed schemes.
Lagardère, Louis; Jolly, Luc-Henri; Lipparini, Filippo; Aviat, Félix; Stamm, Benjamin; Jing, Zhifeng F; Harger, Matthew; Torabifard, Hedieh; Cisneros, G Andrés; Schnieders, Michael J; Gresh, Nohad; Maday, Yvon; Ren, Pengyu Y; Ponder, Jay W; Piquemal, Jean-Philip
2018-01-28
We present Tinker-HP, a massively MPI parallel package dedicated to classical molecular dynamics (MD) and to multiscale simulations, using advanced polarizable force fields (PFF) encompassing distributed multipoles electrostatics. Tinker-HP is an evolution of the popular Tinker package code that conserves its simplicity of use and its reference double precision implementation for CPUs. Grounded on interdisciplinary efforts with applied mathematics, Tinker-HP allows for long polarizable MD simulations on large systems up to millions of atoms. We detail in the paper the newly developed extension of massively parallel 3D spatial decomposition to point dipole polarizable models as well as their coupling to efficient Krylov iterative and non-iterative polarization solvers. The design of the code allows the use of various computer systems ranging from laboratory workstations to modern petascale supercomputers with thousands of cores. Tinker-HP proposes therefore the first high-performance scalable CPU computing environment for the development of next generation point dipole PFFs and for production simulations. Strategies linking Tinker-HP to Quantum Mechanics (QM) in the framework of multiscale polarizable self-consistent QM/MD simulations are also provided. The possibilities, performances and scalability of the software are demonstrated via benchmarks calculations using the polarizable AMOEBA force field on systems ranging from large water boxes of increasing size and ionic liquids to (very) large biosystems encompassing several proteins as well as the complete satellite tobacco mosaic virus and ribosome structures. For small systems, Tinker-HP appears to be competitive with the Tinker-OpenMM GPU implementation of Tinker. As the system size grows, Tinker-HP remains operational thanks to its access to distributed memory and takes advantage of its new algorithmic enabling for stable long timescale polarizable simulations. Overall, a several thousand-fold acceleration over
Parallel discrete event simulation
Overeinder, B.J.; Hertzberger, L.O.; Sloot, P.M.A.; Withagen, W.J.
1991-01-01
In simulating applications for execution on specific computing systems, the simulation performance figures must be known in a short period of time. One basic approach to the problem of reducing the required simulation time is the exploitation of parallelism. However, in parallelizing the simulation
EvArnoldi: A New Algorithm for Large-Scale Eigenvalue Problems.
Tal-Ezer, Hillel
2016-05-19
Eigenvalues and eigenvectors are an essential theme in numerical linear algebra. Their study is mainly motivated by their high importance in a wide range of applications. Knowledge of eigenvalues is essential in quantum molecular science. Solutions of the Schrödinger equation for the electrons composing the molecule are the basis of electronic structure theory. Electronic eigenvalues compose the potential energy surfaces for nuclear motion. The eigenvectors allow calculation of diople transition matrix elements, the core of spectroscopy. The vibrational dynamics molecule also requires knowledge of the eigenvalues of the vibrational Hamiltonian. Typically in these problems, the dimension of Hilbert space is huge. Practically, only a small subset of eigenvalues is required. In this paper, we present a highly efficient algorithm, named EvArnoldi, for solving the large-scale eigenvalues problem. The algorithm, in its basic formulation, is mathematically equivalent to ARPACK ( Sorensen , D. C. Implicitly Restarted Arnoldi/Lanczos Methods for Large Scale Eigenvalue Calculations ; Springer , 1997 ; Lehoucq , R. B. ; Sorensen , D. C. SIAM Journal on Matrix Analysis and Applications 1996 , 17 , 789 ; Calvetti , D. ; Reichel , L. ; Sorensen , D. C. Electronic Transactions on Numerical Analysis 1994 , 2 , 21 ) (or Eigs of Matlab) but significantly simpler.
Wardley, C Sonia; Applegate, E Brooks; Almaleki, A Deyab; Van Rhee, James A
2016-03-01
A 6-year longitudinal study was conducted to compare the perceived stress experienced during a 2-year master's physician assistant program by 5 cohorts of students enrolled in either problem-based learning (PBL) or lecture-based learning (LBL) curricular tracks. The association of perceived stress with academic achievement was also assessed. Students rated their stress levels on visual analog scales in relation to family obligations, financial concerns, schoolwork, and relocation and overall on 6 occasions throughout the program. A mixed model analysis of variance examined the students' perceived level of stress by curriculum and over time. Regression analysis further examined school work-related stress after controlling for other stressors and possible lag effect of stress from the previous time point. Students reported that overall stress increased throughout the didactic year followed by a decline in the clinical year with statistically significant curricular (PBL versus LBL) and time differences. PBL students also reported significantly more stress resulting from school work than LBL students at some time points. Moreover, when the other measured stressors and possible lag effects were controlled, significant differences between PBL and LBL students' perceived stress related to school work persisted at the 8- and 12-month measurement points. Increased stress in both curricula was associated with higher achievement in overall and individual organ system examination scores. Physician assistant programs that embrace a PBL pedagogy to prepare students to think clinically may need to provide students with additional support through the didactic curriculum.
Haidar, Azzam
2011-01-01
This paper introduces a novel implementation in reducing a symmetric dense matrix to tridiagonal form, which is the preprocessing step toward solving symmetric eigenvalue problems. Based on tile algorithms, the reduction follows a two-stage approach, where the tile matrix is first reduced to symmetric band form prior to the final condensed structure. The challenging trade-off between algorithmic performance and task granularity has been tackled through a grouping technique, which consists of aggregating fine-grained and memory-aware computational tasks during both stages, while sustaining the application\\'s overall high performance. A dynamic runtime environment system then schedules the different tasks in an out-of-order fashion. The performance for the tridiagonal reduction reported in this paper is unprecedented. Our implementation results in up to 50-fold and 12-fold improvement (130 Gflop/s) compared to the equivalent routines from LAPACK V3.2 and Intel MKL V10.3, respectively, on an eight socket hexa-core AMD Opteron multicore shared-memory system with a matrix size of 24000×24000. Copyright 2011 ACM.
Wu, Xingfu; Taylor, Valerie
2011-01-01
The NAS Parallel Benchmarks (NPB) are well-known applications with the fixed algorithms for evaluating parallel systems and tools. Multicore supercomputers provide a natural programming paradigm for hybrid programs, whereby OpenMP can be used with the data sharing with the multicores that comprise a node and MPI can be used with the communication between nodes. In this paper, we use SP and BT benchmarks of MPI NPB 3.3 as a basis for a comparative approach to implement hybrid MPI/OpenMP versions of SP and BT. In particular, we can compare the performance of the hybrid SP and BT with the MPI counterparts on large-scale multicore supercomputers. Our performance results indicate that the hybrid SP outperforms the MPI SP by up to 20.76%, and the hybrid BT outperforms the MPI BT by up to 8.58% on up to 10,000 cores on BlueGene/P at Argonne National Laboratory and Jaguar (Cray XT4/5) at Oak Ridge National Laboratory. We also use performance tools and MPI trace libraries available on these supercomputers to further investigate the performance characteristics of the hybrid SP and BT.
Wu, X.; Taylor, V.
2011-01-01
The NAS Parallel Benchmarks (NPB) are well-known applications with fixed algorithms for evaluating parallel systems and tools. Multicore clusters provide a natural programming paradigm for hybrid programs, whereby OpenMP can be used with the data sharing with the multicores that comprise a node, and MPI can be used with the communication between nodes. In this paper, we use Scalar Pentadiagonal (SP) and Block Tridiagonal (BT) benchmarks of MPI NPB 3.3 as a basis for a comparative approach to implement hybrid MPI/OpenMP versions of SP and BT. In particular, we can compare the performance of the hybrid SP and BT with the MPI counterparts on large-scale multicore clusters, Intrepid (BlueGene/P) at Argonne National Laboratory and Jaguar (Cray XT4/5) at Oak Ridge National Laboratory. Our performance results indicate that the hybrid SP outperforms the MPI SP by up to 20.76 %, and the hybrid BT outperforms the MPI BT by up to 8.58 % on up to 10 000 cores on Intrepid and Jaguar. We also use performance tools and MPI trace libraries available on these clusters to further investigate the performance characteristics of the hybrid SP and BT. © 2011 The Author. Published by Oxford University Press on behalf of The British Computer Society. All rights reserved.
Wu, Xingfu
2011-03-29
The NAS Parallel Benchmarks (NPB) are well-known applications with the fixed algorithms for evaluating parallel systems and tools. Multicore supercomputers provide a natural programming paradigm for hybrid programs, whereby OpenMP can be used with the data sharing with the multicores that comprise a node and MPI can be used with the communication between nodes. In this paper, we use SP and BT benchmarks of MPI NPB 3.3 as a basis for a comparative approach to implement hybrid MPI/OpenMP versions of SP and BT. In particular, we can compare the performance of the hybrid SP and BT with the MPI counterparts on large-scale multicore supercomputers. Our performance results indicate that the hybrid SP outperforms the MPI SP by up to 20.76%, and the hybrid BT outperforms the MPI BT by up to 8.58% on up to 10,000 cores on BlueGene/P at Argonne National Laboratory and Jaguar (Cray XT4/5) at Oak Ridge National Laboratory. We also use performance tools and MPI trace libraries available on these supercomputers to further investigate the performance characteristics of the hybrid SP and BT.
Wu, X.
2011-07-18
The NAS Parallel Benchmarks (NPB) are well-known applications with fixed algorithms for evaluating parallel systems and tools. Multicore clusters provide a natural programming paradigm for hybrid programs, whereby OpenMP can be used with the data sharing with the multicores that comprise a node, and MPI can be used with the communication between nodes. In this paper, we use Scalar Pentadiagonal (SP) and Block Tridiagonal (BT) benchmarks of MPI NPB 3.3 as a basis for a comparative approach to implement hybrid MPI/OpenMP versions of SP and BT. In particular, we can compare the performance of the hybrid SP and BT with the MPI counterparts on large-scale multicore clusters, Intrepid (BlueGene/P) at Argonne National Laboratory and Jaguar (Cray XT4/5) at Oak Ridge National Laboratory. Our performance results indicate that the hybrid SP outperforms the MPI SP by up to 20.76 %, and the hybrid BT outperforms the MPI BT by up to 8.58 % on up to 10 000 cores on Intrepid and Jaguar. We also use performance tools and MPI trace libraries available on these clusters to further investigate the performance characteristics of the hybrid SP and BT. © 2011 The Author. Published by Oxford University Press on behalf of The British Computer Society. All rights reserved.
Penders, Bart; Vos, Rein; Horstman, Klasien
2009-11-01
Solving complex problems in large-scale research programmes requires cooperation and division of labour. Simultaneously, large-scale problem solving also gives rise to unintended side effects. Based upon 5 years of researching two large-scale nutrigenomic research programmes, we argue that problems are fragmented in order to be solved. These sub-problems are given priority for practical reasons and in the process of solving them, various changes are introduced in each sub-problem. Combined with additional diversity as a result of interdisciplinarity, this makes reassembling the original and overall goal of the research programme less likely. In the case of nutrigenomics and health, this produces a diversification of health. As a result, the public health goal of contemporary nutrition science is not reached in the large-scale research programmes we studied. Large-scale research programmes are very successful in producing scientific publications and new knowledge; however, in reaching their political goals they often are less successful.
Leon, Stéphane; Bergond, Gilles; Vallenari, Antonella
1999-04-01
We present the tidal tail distributions of a sample of candidate binary clusters located in the bar of the Large Magellanic Cloud (LMC). One isolated cluster, SL 268, is presented in order to study the effect of the LMC tidal field. All the candidate binary clusters show tidal tails, confirming that the pairs are formed by physically linked objects. The stellar mass in the tails covers a large range, from 1.8x 10(3) to 3x 10(4) \\msun. We derive a total mass estimate for SL 268 and SL 356. At large radii, the projected density profiles of SL 268 and SL 356 fall off as r(-gamma ) , with gamma = 2.27 and gamma =3.44, respectively. Out of 4 pairs or multiple systems, 2 are older than the theoretical survival time of binary clusters (going from a few 10(6) years to 10(8) years). A pair shows too large age difference between the components to be consistent with classical theoretical models of binary cluster formation (Fujimoto & Kumai \\cite{fujimoto97}). We refer to this as the ``overmerging'' problem. A different scenario is proposed: the formation proceeds in large molecular complexes giving birth to groups of clusters over a few 10(7) years. In these groups the expected cluster encounter rate is larger, and tidal capture has higher probability. Cluster pairs are not born together through the splitting of the parent cloud, but formed later by tidal capture. For 3 pairs, we tentatively identify the star cluster group (SCG) memberships. The SCG formation, through the recent cluster starburst triggered by the LMC-SMC encounter, in contrast with the quiescent open cluster formation in the Milky Way can be an explanation to the paucity of binary clusters observed in our Galaxy. Based on observations collected at the European Southern Observatory, La Silla, Chile}
International Nuclear Information System (INIS)
Alikar, Najmeh; Mousavi, Seyed Mohsen; Raja Ghazilla, Raja Ariffin; Tavana, Madjid; Olugu, Ezutah Udoncy
2017-01-01
In this paper, we formulate a mixed-integer binary non-linear programming model to study a series-parallel multi-component multi-periodic inventory-redundancy allocation problem (IRAP). This IRAP is a novel redundancy allocation problem (RAP) because components (products) are purchased under an all unit discount (AUD) policy and then installed on a series-parallel system. The total budget available for purchasing the components, the storage space, the vehicle capacities, and the total weight of the system are limited. Moreover, a penalty function is used to penalize infeasible solutions, generated randomly. The overall goal is to find the optimal number of the components purchased for each subsystem so that the total costs including ordering cost, holding costs, and purchasing cost are minimized while the system reliability is maximized, simultaneously. A non-dominated sorting genetic algorithm-II (NSGA-II), a multi-objective particle swarm optimization (MOPSO), and a multi-objective harmony search (MOHS) algorithm are applied to obtain the optimal Pareto solutions. While no benchmark is available in the literature, some numerical examples are generated randomly to evaluate the results of NSGA-II on the proposed IRAP. The results are in favor of NSGA-II. - Highlights: • An inventory control system employing an all-unit discount policy is considered in the proposed model. • The proposed model considers limited total budget, storage space, transportation capacity, and total weight. Moreover, a penalty function is used to penalize infeasible solutions. • The overall goal is to find the optimal number components purchased for each subsystem so that the total costs including ordering cost, holding cost and purchasing cost are minimized and the system reliability are maximized, simultaneously. • A NSGA-II algorithm is derived where a multi-objective particle swarm optimization and a multi-objective harmony search algorithm are used to evaluate the NSGA-II results.
DEFF Research Database (Denmark)
Sitchinava, Nodar; Zeh, Norbert
2012-01-01
We present the parallel buffer tree, a parallel external memory (PEM) data structure for batched search problems. This data structure is a non-trivial extension of Arge's sequential buffer tree to a private-cache multiprocessor environment and reduces the number of I/O operations by the number of...... in the optimal OhOf(psortN + K/PB) parallel I/O complexity, where K is the size of the output reported in the process and psortN is the parallel I/O complexity of sorting N elements using P processors....
Parallel External Memory Graph Algorithms
DEFF Research Database (Denmark)
Arge, Lars Allan; Goodrich, Michael T.; Sitchinava, Nodari
2010-01-01
In this paper, we study parallel I/O efficient graph algorithms in the Parallel External Memory (PEM) model, one o f the private-cache chip multiprocessor (CMP) models. We study the fundamental problem of list ranking which leads to efficient solutions to problems on trees, such as computing lowest...... an optimal speedup of Â¿(P) in parallel I/O complexity and parallel computation time, compared to the single-processor external memory counterparts....
Performance of Air Pollution Models on Massively Parallel Computers
DEFF Research Database (Denmark)
Brown, John; Hansen, Per Christian; Wasniewski, Jerzy
1996-01-01
To compare the performance and use of three massively parallel SIMD computers, we implemented a large air pollution model on the computers. Using a realistic large-scale model, we gain detailed insight about the performance of the three computers when used to solve large-scale scientific problems...
Distributed Parallel Architecture for "Big Data"
Directory of Open Access Journals (Sweden)
Catalin BOJA
2012-01-01
Full Text Available This paper is an extension to the "Distributed Parallel Architecture for Storing and Processing Large Datasets" paper presented at the WSEAS SEPADS’12 conference in Cambridge. In its original version the paper went over the benefits of using a distributed parallel architecture to store and process large datasets. This paper analyzes the problem of storing, processing and retrieving meaningful insight from petabytes of data. It provides a survey on current distributed and parallel data processing technologies and, based on them, will propose an architecture that can be used to solve the analyzed problem. In this version there is more emphasis put on distributed files systems and the ETL processes involved in a distributed environment.
Parallelism in matrix computations
Gallopoulos, Efstratios; Sameh, Ahmed H
2016-01-01
This book is primarily intended as a research monograph that could also be used in graduate courses for the design of parallel algorithms in matrix computations. It assumes general but not extensive knowledge of numerical linear algebra, parallel architectures, and parallel programming paradigms. The book consists of four parts: (I) Basics; (II) Dense and Special Matrix Computations; (III) Sparse Matrix Computations; and (IV) Matrix functions and characteristics. Part I deals with parallel programming paradigms and fundamental kernels, including reordering schemes for sparse matrices. Part II is devoted to dense matrix computations such as parallel algorithms for solving linear systems, linear least squares, the symmetric algebraic eigenvalue problem, and the singular-value decomposition. It also deals with the development of parallel algorithms for special linear systems such as banded ,Vandermonde ,Toeplitz ,and block Toeplitz systems. Part III addresses sparse matrix computations: (a) the development of pa...
Decomposition based parallel processing technique for efficient collaborative optimization
International Nuclear Information System (INIS)
Park, Hyung Wook; Kim, Sung Chan; Kim, Min Soo; Choi, Dong Hoon
2000-01-01
In practical design studies, most of designers solve multidisciplinary problems with complex design structure. These multidisciplinary problems have hundreds of analysis and thousands of variables. The sequence of process to solve these problems affects the speed of total design cycle. Thus it is very important for designer to reorder original design processes to minimize total cost and time. This is accomplished by decomposing large multidisciplinary problem into several MultiDisciplinary Analysis SubSystem (MDASS) and processing it in parallel. This paper proposes new strategy for parallel decomposition of multidisciplinary problem to raise design efficiency by using genetic algorithm and shows the relationship between decomposition and Multidisciplinary Design Optimization(MDO) methodology
No firewalls or information problem for black holes entangled with large systems
Stoltenberg, Henry; Albrecht, Andreas
2015-01-01
We discuss how under certain conditions the black hole information puzzle and the (related) arguments that firewalls are a typical feature of black holes can break down. We first review the arguments of Almheiri, Marolf, Polchinski and Sully favoring firewalls, focusing on entanglements in a simple toy model for a black hole and the Hawking radiation. By introducing a large and inaccessible system entangled with the black hole (representing perhaps a de Sitter stretched horizon or inaccessible part of a landscape), we show complementarity can be restored and firewalls can be avoided throughout the black hole's evolution. Under these conditions black holes do not have an "information problem." We point out flaws in some of our earlier arguments that such entanglement might be generically present in some cosmological scenarios and call out certain ways our picture may still be realized.
Donohue, Mary J
2003-06-01
Oceanic circulation patterns deposit significant amounts of marine pollution, including derelict fishing gear from North Pacific Ocean fisheries, in the Hawaiian Archipelago [Mar. Pollut. Bull. 42(12) (2001) 1301]. Management responsibility for these islands and their associated natural resources is shared by several government authorities. Non-governmental organizations (NGOs) and private industry also have interests in the archipelago. Since the marine debris problem in this region is too large for any single agency to manage, a multiagency marine debris working group (group) was established in 1998 to improve marine debris mitigation in Hawaii. To date, 16 federal, state, and local agencies, working with industry and NGOs, have removed 195 tons of derelict fishing gear from the Northwestern Hawaiian Islands. This review details the evolution of the partnership, notes its challenges and rewards, and advocates its continued use as an effective resource management tool.
Application of spectral Lanczos decomposition method to large scale problems arising geophysics
Energy Technology Data Exchange (ETDEWEB)
Tamarchenko, T. [Western Atlas Logging Services, Houston, TX (United States)
1996-12-31
This paper presents an application of Spectral Lanczos Decomposition Method (SLDM) to numerical modeling of electromagnetic diffusion and elastic waves propagation in inhomogeneous media. SLDM approximates an action of a matrix function as a linear combination of basis vectors in Krylov subspace. I applied the method to model electromagnetic fields in three-dimensions and elastic waves in two dimensions. The finite-difference approximation of the spatial part of differential operator reduces the initial boundary-value problem to a system of ordinary differential equations with respect to time. The solution to this system requires calculating exponential and sine/cosine functions of the stiffness matrices. Large scale numerical examples are in a good agreement with the theoretical error bounds and stability estimates given by Druskin, Knizhnerman, 1987.
Xyce parallel electronic simulator release notes.
Energy Technology Data Exchange (ETDEWEB)
Keiter, Eric R; Hoekstra, Robert John; Mei, Ting; Russo, Thomas V.; Schiek, Richard Louis; Thornquist, Heidi K.; Rankin, Eric Lamont; Coffey, Todd S; Pawlowski, Roger P; Santarelli, Keith R.
2010-05-01
The Xyce Parallel Electronic Simulator has been written to support, in a rigorous manner, the simulation needs of the Sandia National Laboratories electrical designers. Specific requirements include, among others, the ability to solve extremely large circuit problems by supporting large-scale parallel computing platforms, improved numerical performance and object-oriented code design and implementation. The Xyce release notes describe: Hardware and software requirements New features and enhancements Any defects fixed since the last release Current known defects and defect workarounds For up-to-date information not available at the time these notes were produced, please visit the Xyce web page at http://www.cs.sandia.gov/xyce.
A Very Large Area Network (VLAN) knowledge-base applied to space communication problems
Zander, Carol S.
1988-01-01
This paper first describes a hierarchical model for very large area networks (VLAN). Space communication problems whose solution could profit by the model are discussed and then an enhanced version of this model incorporating the knowledge needed for the missile detection-destruction problem is presented. A satellite network or VLAN is a network which includes at least one satellite. Due to the complexity, a compromise between fully centralized and fully distributed network management has been adopted. Network nodes are assigned to a physically localized group, called a partition. Partitions consist of groups of cell nodes with one cell node acting as the organizer or master, called the Group Master (GM). Coordinating the group masters is a Partition Master (PM). Knowledge is also distributed hierarchically existing in at least two nodes. Each satellite node has a back-up earth node. Knowledge must be distributed in such a way so as to minimize information loss when a node fails. Thus the model is hierarchical both physically and informationally.
International Nuclear Information System (INIS)
Kimura, Toshiya.
1997-03-01
A two-dimensional explicit Euler solver has been implemented for five MIMD parallel computers of different machine architectures in Center for Promotion of Computational Science and Engineering of Japan Atomic Energy Research Institute. These parallel computers are Fujitsu VPP300, NEC SX-4, CRAY T94, IBM SP2, and Hitachi SR2201. The code was parallelized by several parallelization methods, and a typical compressible flow problem has been calculated for different grid sizes changing the number of processors. Their effective performances for parallel calculations, such as calculation speed, speed-up ratio and parallel efficiency, have been investigated and evaluated. The communication time among processors has been also measured and evaluated. As a result, the differences on the performance and the characteristics between vector-parallel and scalar-parallel computers can be pointed, and it will present the basic data for efficient use of parallel computers and for large scale CFD simulations on parallel computers. (author)
Murni, Bustamam, A.; Ernastuti, Handhika, T.; Kerami, D.
2017-07-01
Calculation of the matrix-vector multiplication in the real-world problems often involves large matrix with arbitrary size. Therefore, parallelization is needed to speed up the calculation process that usually takes a long time. Graph partitioning techniques that have been discussed in the previous studies cannot be used to complete the parallelized calculation of matrix-vector multiplication with arbitrary size. This is due to the assumption of graph partitioning techniques that can only solve the square and symmetric matrix. Hypergraph partitioning techniques will overcome the shortcomings of the graph partitioning technique. This paper addresses the efficient parallelization of matrix-vector multiplication through hypergraph partitioning techniques using CUDA GPU-based parallel computing. CUDA (compute unified device architecture) is a parallel computing platform and programming model that was created by NVIDIA and implemented by the GPU (graphics processing unit).
Parallel Symmetric Eigenvalue Problem Solvers
2015-05-01
Research” and the use of copyright material. Approved by Major Professor(s): Approved by: Head of the Departmental Graduate Program Date Alicia Marie... matrix . . . . . . . . . . . . . . . . . 106 8.15 Sparsity patterns for the Nastran benchmark of order 1.5 million . . . . 108 8.16 Sparsity patterns...magnitude eigenvalues of a given matrix pencil (A,B) along with their associated eigenvectors. Computing the smallest eigenvalues is more difficult
A Parallel Particle Swarm Optimizer
National Research Council Canada - National Science Library
Schutte, J. F; Fregly, B .J; Haftka, R. T; George, A. D
2003-01-01
.... Motivated by a computationally demanding biomechanical system identification problem, we introduce a parallel implementation of a stochastic population based global optimizer, the Particle Swarm...
Solving Man-Induced Large-Scale Conservation Problems: The Spanish Imperial Eagle and Power Lines
López-López, Pascual; Ferrer, Miguel; Madero, Agustín; Casado, Eva; McGrady, Michael
2011-01-01
Background Man-induced mortality of birds caused by electrocution with poorly-designed pylons and power lines has been reported to be an important mortality factor that could become a major cause of population decline of one of the world rarest raptors, the Spanish imperial eagle (Aquila adalberti). Consequently it has resulted in an increasing awareness of this problem amongst land managers and the public at large, as well as increased research into the distribution of electrocution events and likely mitigation measures. Methodology/Principal Findings We provide information of how mitigation measures implemented on a regional level under the conservation program of the Spanish imperial eagle have resulted in a positive shift of demographic trends in Spain. A 35 years temporal data set (1974–2009) on mortality of Spanish imperial eagle was recorded, including population censuses, and data on electrocution and non-electrocution of birds. Additional information was obtained from 32 radio-tracked young eagles and specific field surveys. Data were divided into two periods, before and after the approval of a regional regulation of power line design in 1990 which established mandatory rules aimed at minimizing or eliminating the negative impacts of power lines facilities on avian populations. Our results show how population size and the average annual percentage of population change have increased between the two periods, whereas the number of electrocuted birds has been reduced in spite of the continuous growing of the wiring network. Conclusions Our results demonstrate that solving bird electrocution is an affordable problem if political interest is shown and financial investment is made. The combination of an adequate spatial planning with a sustainable development of human infrastructures will contribute positively to the conservation of the Spanish imperial eagle and may underpin population growth and range expansion, with positive side effects on other endangered
International Nuclear Information System (INIS)
Hu Longfei; Fang Yang; Wang Yishun; Wang Long
2014-01-01
In prospecting and mining of a broken thick large uranium ore body, uncertain prospecting and shallow-hole shrinkage mining method resulted in large dilution rate and resource waste problems. Aimed at these problems, improvement schemes of enhancing the strength force of 'drilling prospecting instead of pit prospecting' and employing filling method stoping ore body were applied, and improvement result was analyzed. Experience was accumulated and evidence was provided for late prospecting and stoping work. (authors)
1982-01-01
Parallel Computations focuses on parallel computation, with emphasis on algorithms used in a variety of numerical and physical applications and for many different types of parallel computers. Topics covered range from vectorization of fast Fourier transforms (FFTs) and of the incomplete Cholesky conjugate gradient (ICCG) algorithm on the Cray-1 to calculation of table lookups and piecewise functions. Single tridiagonal linear systems and vectorized computation of reactive flow are also discussed.Comprised of 13 chapters, this volume begins by classifying parallel computers and describing techn
Parallelism, fractal geometry and other aspects of computational mathematics
International Nuclear Information System (INIS)
Churchhouse, R.F.
1991-01-01
In some fields such as meteorology, theoretical physics, quantum chemistry and hydrodynamics there are problems which involve so much computation that computers of the power of a thousand times a Cray 2 could be fully utilised if they were available. Since it is unlikely that uniprocessors of such power will be available, such large scale problems could be solved by using systems of computers running in parallel. This approach, of course, requires to find appropriate algorithms for the solution of such problems which can efficiently make use of a large number of computers working in parallel. 11 refs, 10 figs, 1 tab
International Nuclear Information System (INIS)
Li Hanyu; Zhou Haijing; Dong Zhiwei; Liao Cheng; Chang Lei; Cao Xiaolin; Xiao Li
2010-01-01
A large-scale parallel electromagnetic field simulation program JEMS-FDTD(J Electromagnetic Solver-Finite Difference Time Domain) is designed and implemented on JASMIN (J parallel Adaptive Structured Mesh applications INfrastructure). This program can simulate propagation, radiation, couple of electromagnetic field by solving Maxwell equations on structured mesh explicitly with FDTD method. JEMS-FDTD is able to simulate billion-mesh-scale problems on thousands of processors. In this article, the program is verified by simulating the radiation of an electric dipole. A beam waveguide is simulated to demonstrate the capability of large scale parallel computation. A parallel performance test indicates that a high parallel efficiency is obtained. (authors)
Adapting algorithms to massively parallel hardware
Sioulas, Panagiotis
2016-01-01
In the recent years, the trend in computing has shifted from delivering processors with faster clock speeds to increasing the number of cores per processor. This marks a paradigm shift towards parallel programming in which applications are programmed to exploit the power provided by multi-cores. Usually there is gain in terms of the time-to-solution and the memory footprint. Specifically, this trend has sparked an interest towards massively parallel systems that can provide a large number of processors, and possibly computing nodes, as in the GPUs and MPPAs (Massively Parallel Processor Arrays). In this project, the focus was on two distinct computing problems: k-d tree searches and track seeding cellular automata. The goal was to adapt the algorithms to parallel systems and evaluate their performance in different cases.
Well-posedness of the Cauchy problem for models of large amplitude internal waves
International Nuclear Information System (INIS)
Guyenne, Philippe; Lannes, David; Saut, Jean-Claude
2010-01-01
We consider in this paper the 'shallow-water/shallow-water' asymptotic model obtained in Choi and Camassa (1999 J. Fluid Mech. 396 1–36), Craig et al (2005 Commun. Pure. Appl. Math. 58 1587–641) (one-dimensional interface) and Bona et al (2008 J. Math. Pures Appl. 89 538–66) (two-dimensional interface) from the two-layer system with rigid lid, for the description of large amplitude internal waves at the interface of two layers of immiscible fluids of different densities. For one-dimensional interfaces, this system is of hyperbolic type and its local well-posedness does not raise serious difficulties, although other issues (blow-up, loss of hyperbolicity, etc) turn out to be delicate. For two-dimensional interfaces, the system is nonlocal. Nevertheless, we prove that it conserves some properties of 'hyperbolic type' and show that the associated Cauchy problem is locally well posed in suitable Sobolev classes provided some natural restrictions are imposed on the data. These results are illustrated by numerical simulations with emphasis on the formation of shock waves
A Framing Link Based Tabu Search Algorithm for Large-Scale Multidepot Vehicle Routing Problems
Directory of Open Access Journals (Sweden)
Xuhao Zhang
2014-01-01
Full Text Available A framing link (FL based tabu search algorithm is proposed in this paper for a large-scale multidepot vehicle routing problem (LSMDVRP. Framing links are generated during continuous great optimization of current solutions and then taken as skeletons so as to improve optimal seeking ability, speed up the process of optimization, and obtain better results. Based on the comparison between pre- and postmutation routes in the current solution, different parts are extracted. In the current optimization period, links involved in the optimal solution are regarded as candidates to the FL base. Multiple optimization periods exist in the whole algorithm, and there are several potential FLs in each period. If the update condition is satisfied, the FL base is updated, new FLs are added into the current route, and the next period starts. Through adjusting the borderline of multidepot sharing area with dynamic parameters, the authors define candidate selection principles for three kinds of customer connections, respectively. Link split and the roulette approach are employed to choose FLs. 18 LSMDVRP instances in three groups are studied and new optimal solution values for nine of them are obtained, with higher computation speed and reliability.
Stocks, Jennifer Dugan; Taneja, Baldeo K; Baroldi, Paolo; Findling, Robert L
2012-04-01
To evaluate safety and tolerability of four doses of immediate-release molindone hydrochloride in children with attention-deficit/hyperactivity disorder (ADHD) and serious conduct problems. This open-label, parallel-group, dose-ranging, multicenter trial randomized children, aged 6-12 years, with ADHD and persistent, serious conduct problems to receive oral molindone thrice daily for 9-12 weeks in four treatment groups: Group 1-10 mg (5 mg if weight conduct problems. Secondary outcome measures included change in Nisonger Child Behavior Rating Form-Typical Intelligence Quotient (NCBRF-TIQ) Conduct Problem subscale scores, change in Clinical Global Impressions-Severity (CGI-S) and -Improvement (CGI-I) subscale scores from baseline to end point, and Swanson, Nolan, and Pelham rating scale-revised (SNAP-IV) ADHD-related subscale scores. The study randomized 78 children; 55 completed the study. Treatment with molindone was generally well tolerated, with no clinically meaningful changes in laboratory or physical examination findings. The most common treatment-related adverse events (AEs) included somnolence (n=9), weight increase (n=8), akathisia (n=4), sedation (n=4), and abdominal pain (n=4). Mean weight increased by 0.54 kg, and mean body mass index by 0.24 kg/m(2). The incidence of AEs and treatment-related AEs increased with increasing dose. NCBRF-TIQ subscale scores improved in all four treatment groups, with 34%, 34%, 32%, and 55% decreases from baseline in groups 1, 2, 3, and 4, respectively. CGI-S and SNAP-IV scores improved over time in all treatment groups, and CGI-I scores improved to the greatest degree in group 4. Molindone at doses of 5-20 mg/day (children weighing <30 kg) and 20-40 mg (≥ 30 kg) was well tolerated, and preliminary efficacy results suggest that molindone produces dose-related behavioral improvements over 9-12 weeks. Additional double-blind, placebo-controlled trials are needed to further investigate molindone in this pediatric population.
International Nuclear Information System (INIS)
Khalili-Damghani, Kaveh; Amiri, Maghsoud
2012-01-01
In this paper, a procedure based on efficient epsilon-constraint method and data envelopment analysis (DEA) is proposed for solving binary-state multi-objective reliability redundancy allocation series-parallel problem (MORAP). In first module, a set of qualified non-dominated solutions on Pareto front of binary-state MORAP is generated using an efficient epsilon-constraint method. In order to test the quality of generated non-dominated solutions in this module, a multi-start partial bound enumeration algorithm is also proposed for MORAP. The performance of both procedures is compared using different metrics on well-known benchmark instance. The statistical analysis represents that not only the proposed efficient epsilon-constraint method outperform the multi-start partial bound enumeration algorithm but also it improves the founded upper bound of benchmark instance. Then, in second module, a DEA model is supplied to prune the generated non-dominated solutions of efficient epsilon-constraint method. This helps reduction of non-dominated solutions in a systematic manner and eases the decision making process for practical implementations. - Highlights: ► A procedure based on efficient epsilon-constraint method and DEA was proposed for solving MORAP. ► The performance of proposed procedure was compared with a multi-start PBEA. ► Methods were statistically compared using multi-objective metrics.
Software support for irregular and loosely synchronous problems
Choudhary, A.; Fox, G.; Hiranandani, S.; Kennedy, K.; Koelbel, C.; Ranka, S.; Saltz, J.
1992-01-01
A large class of scientific and engineering applications may be classified as irregular and loosely synchronous from the perspective of parallel processing. We present a partial classification of such problems. This classification has motivated us to enhance FORTRAN D to provide language support for irregular, loosely synchronous problems. We present techniques for parallelization of such problems in the context of FORTRAN D.
Fatima, Yaqoot; Doi, Suhail A R; O'Callaghan, Michael; Williams, Gail; Najman, Jake M; Mamun, Abdullah Al
2016-09-01
To compare parent and adolescent reports in exploring adolescent sleep problems and to identify the factors associated with adolescent sleep problem disclosures. Parent (n = 5185) and adolescent reports (n = 5171, age=13.9 ± 0.3 years), from a birth cohort were used to explore adolescent sleep problems. Kappa coefficients were used to assess the agreement, whereas, conditional agreement and disagreement ratios were used to identify the optimal informant. Logistic regression analysis was used to determine the factors affecting adolescent sleep problem disclosure. Parental reports identified only about one-third of the sleep problems reported by adolescents. Whereas adolescent reports identified up to two-thirds of the sleep problems reported by parents. Combined reports of parents and adolescent did not show any considerable difference from the adolescent report. Adolescent and parent health, maternal depression, and family communication were significantly associated with adolescents sleep problem disclosures. Adolescent reports could be used as the preferred source to explore adolescent sleep problems. Parental reports should be used when parents as observers are more reliable reporters, or where adolescents are cognitively unable to report sleep problems. Additionally, the impact of poor health, maternal depression and family communication on sleep problems disclosure should be considered for adolescent sleep problem diagnosis. ©2016 Foundation Acta Paediatrica. Published by John Wiley & Sons Ltd.
Applied Parallel Computing Industrial Computation and Optimization
DEFF Research Database (Denmark)
Madsen, Kaj; NA NA NA Olesen, Dorte
Proceedings and the Third International Workshop on Applied Parallel Computing in Industrial Problems and Optimization (PARA96)......Proceedings and the Third International Workshop on Applied Parallel Computing in Industrial Problems and Optimization (PARA96)...
Parallel evolutionary computation in bioinformatics applications.
Pinho, Jorge; Sobral, João Luis; Rocha, Miguel
2013-05-01
A large number of optimization problems within the field of Bioinformatics require methods able to handle its inherent complexity (e.g. NP-hard problems) and also demand increased computational efforts. In this context, the use of parallel architectures is a necessity. In this work, we propose ParJECoLi, a Java based library that offers a large set of metaheuristic methods (such as Evolutionary Algorithms) and also addresses the issue of its efficient execution on a wide range of parallel architectures. The proposed approach focuses on the easiness of use, making the adaptation to distinct parallel environments (multicore, cluster, grid) transparent to the user. Indeed, this work shows how the development of the optimization library can proceed independently of its adaptation for several architectures, making use of Aspect-Oriented Programming. The pluggable nature of parallelism related modules allows the user to easily configure its environment, adding parallelism modules to the base source code when needed. The performance of the platform is validated with two case studies within biological model optimization. Copyright © 2012 Elsevier Ireland Ltd. All rights reserved.
Energy Technology Data Exchange (ETDEWEB)
1991-10-23
An account of the Caltech Concurrent Computation Program (C{sup 3}P), a five year project that focused on answering the question: Can parallel computers be used to do large-scale scientific computations '' As the title indicates, the question is answered in the affirmative, by implementing numerous scientific applications on real parallel computers and doing computations that produced new scientific results. In the process of doing so, C{sup 3}P helped design and build several new computers, designed and implemented basic system software, developed algorithms for frequently used mathematical computations on massively parallel machines, devised performance models and measured the performance of many computers, and created a high performance computing facility based exclusively on parallel computers. While the initial focus of C{sup 3}P was the hypercube architecture developed by C. Seitz, many of the methods developed and lessons learned have been applied successfully on other massively parallel architectures.
Casanova, Henri; Robert, Yves
2008-01-01
""…The authors of the present book, who have extensive credentials in both research and instruction in the area of parallelism, present a sound, principled treatment of parallel algorithms. … This book is very well written and extremely well designed from an instructional point of view. … The authors have created an instructive and fascinating text. The book will serve researchers as well as instructors who need a solid, readable text for a course on parallelism in computing. Indeed, for anyone who wants an understandable text from which to acquire a current, rigorous, and broad vi
Klegeris, Andis; Bahniwal, Manpreet; Hurren, Heather
2013-01-01
Problem-based learning (PBL) was originally introduced in medical education programs as a form of small-group learning, but its use has now spread to large undergraduate classrooms in various other disciplines. Introduction of new teaching techniques, including PBL-based methods, needs to be justified by demonstrating the benefits of such techniques over classical teaching styles. Previously, we demonstrated that introduction of tutor-less PBL in a large third-year biochemistry undergraduate class increased student satisfaction and attendance. The current study assessed the generic problem-solving abilities of students from the same class at the beginning and end of the term, and compared student scores with similar data obtained in three classes not using PBL. Two generic problem-solving tests of equal difficulty were administered such that students took different tests at the beginning and the end of the term. Blinded marking showed a statistically significant 13% increase in the test scores of the biochemistry students exposed to PBL, while no trend toward significant change in scores was observed in any of the control groups not using PBL. Our study is among the first to demonstrate that use of tutor-less PBL in a large classroom leads to statistically significant improvement in generic problem-solving skills of students. PMID:23463230
Enhancement of a model for Large-scale Airline Network Planning Problems
Kölker, K.; Lopes dos Santos, Bruno F.; Lütjens, K.
2016-01-01
The main focus of this study is to solve the network planning problem based on passenger decision criteria including the preferred departure time and travel time for a real-sized airline network. For this purpose, a model of the integrated network planning problem is formulated including scheduling
On the problem of earthquake correlation in space and time over large distances
Georgoulas, G.; Konstantaras, A.; Maravelakis, E.; Katsifarakis, E.; Stylios, C. D.
2012-04-01
such as the well known k-means that tend to form "well-shaped" clusters may not suffice for the problem at hand and other families of unsupervised pattern recognition methods might be a better choice. One such algorithm is the DBSCAN algorithm which is based on the notion of density. In this proposed version the density is not estimated solely on the number of seismic events occurring at a specific spatio-temporal area, but also takes into account the size of the seismic event. A second method proposes the use of a modified measure of proximity that will also account for the size of the earthquake along with traditional clustering schemes such as k-means and agglomerative clustering (k-means is seeded with a quite large number for k and the results are fed to the hierarchical algorithm in order to alleviate the memory requirements on one hand and also allow for irregular shapes on the other hand). Preliminary results of seismic cluster formation using these algorithms appear promising as they are in agreement with geophysical observations on distinct seismic regions, such as those of the neighbouring regions in the Ionian sea and that of the southern Hellenic seismic arc; as well as by the location and orientation of the mapped network of underlying natural hazards beneath each clusters vicinity.
Double inflation: A possible resolution of the large-scale structure problem
International Nuclear Information System (INIS)
Turner, M.S.; Villumsen, J.V.; Vittorio, N.; Silk, J.; Juszkiewicz, R.
1986-11-01
A model is presented for the large-scale structure of the universe in which two successive inflationary phases resulted in large small-scale and small large-scale density fluctuations. This bimodal density fluctuation spectrum in an Ω = 1 universe dominated by hot dark matter leads to large-scale structure of the galaxy distribution that is consistent with recent observational results. In particular, large, nearly empty voids and significant large-scale peculiar velocity fields are produced over scales of ∼100 Mpc, while the small-scale structure over ≤ 10 Mpc resembles that in a low density universe, as observed. Detailed analytical calculations and numerical simulations are given of the spatial and velocity correlations. 38 refs., 6 figs
Directory of Open Access Journals (Sweden)
Aliasghar Baziar
2015-03-01
Full Text Available Abstract In order to handle large scale problems this study has used shuffled frog leaping algorithm. This algorithm is an optimization method based on natural memetics that uses a new two-phase modification to it to have a better search in the problem space. The suggested algorithm is evaluated by comparing to some well known algorithms using several benchmark optimization problems. The simulation results have clearly shown the superiority of this algorithm over other well-known methods in the area.
Numerical methods for the design of large-scale nonlinear discrete ill-posed inverse problems
International Nuclear Information System (INIS)
Haber, E; Horesh, L; Tenorio, L
2010-01-01
Design of experiments for discrete ill-posed problems is a relatively new area of research. While there has been some limited work concerning the linear case, little has been done to study design criteria and numerical methods for ill-posed nonlinear problems. We present an algorithmic framework for nonlinear experimental design with an efficient numerical implementation. The data are modeled as indirect, noisy observations of the model collected via a set of plausible experiments. An inversion estimate based on these data is obtained by a weighted Tikhonov regularization whose weights control the contribution of the different experiments to the data misfit term. These weights are selected by minimization of an empirical estimate of the Bayes risk that is penalized to promote sparsity. This formulation entails a bilevel optimization problem that is solved using a simple descent method. We demonstrate the viability of our design with a problem in electromagnetic imaging based on direct current resistivity and magnetotelluric data
A Large-Scale Study of Fingerprint Matching Systems for Sensor Interoperability Problem
Directory of Open Access Journals (Sweden)
Helala AlShehri
2018-03-01
Full Text Available The fingerprint is a commonly used biometric modality that is widely employed for authentication by law enforcement agencies and commercial applications. The designs of existing fingerprint matching methods are based on the hypothesis that the same sensor is used to capture fingerprints during enrollment and verification. Advances in fingerprint sensor technology have raised the question about the usability of current methods when different sensors are employed for enrollment and verification; this is a fingerprint sensor interoperability problem. To provide insight into this problem and assess the status of state-of-the-art matching methods to tackle this problem, we first analyze the characteristics of fingerprints captured with different sensors, which makes cross-sensor matching a challenging problem. We demonstrate the importance of fingerprint enhancement methods for cross-sensor matching. Finally, we conduct a comparative study of state-of-the-art fingerprint recognition methods and provide insight into their abilities to address this problem. We performed experiments using a public database (FingerPass that contains nine datasets captured with different sensors. We analyzed the effects of different sensors and found that cross-sensor matching performance deteriorates when different sensors are used for enrollment and verification. In view of our analysis, we propose future research directions for this problem.
A Large-Scale Study of Fingerprint Matching Systems for Sensor Interoperability Problem.
AlShehri, Helala; Hussain, Muhammad; AboAlSamh, Hatim; AlZuair, Mansour
2018-03-28
The fingerprint is a commonly used biometric modality that is widely employed for authentication by law enforcement agencies and commercial applications. The designs of existing fingerprint matching methods are based on the hypothesis that the same sensor is used to capture fingerprints during enrollment and verification. Advances in fingerprint sensor technology have raised the question about the usability of current methods when different sensors are employed for enrollment and verification; this is a fingerprint sensor interoperability problem. To provide insight into this problem and assess the status of state-of-the-art matching methods to tackle this problem, we first analyze the characteristics of fingerprints captured with different sensors, which makes cross-sensor matching a challenging problem. We demonstrate the importance of fingerprint enhancement methods for cross-sensor matching. Finally, we conduct a comparative study of state-of-the-art fingerprint recognition methods and provide insight into their abilities to address this problem. We performed experiments using a public database (FingerPass) that contains nine datasets captured with different sensors. We analyzed the effects of different sensors and found that cross-sensor matching performance deteriorates when different sensors are used for enrollment and verification. In view of our analysis, we propose future research directions for this problem.
Parallel computing in enterprise modeling.
Energy Technology Data Exchange (ETDEWEB)
Goldsby, Michael E.; Armstrong, Robert C.; Shneider, Max S.; Vanderveen, Keith; Ray, Jaideep; Heath, Zach; Allan, Benjamin A.
2008-08-01
This report presents the results of our efforts to apply high-performance computing to entity-based simulations with a multi-use plugin for parallel computing. We use the term 'Entity-based simulation' to describe a class of simulation which includes both discrete event simulation and agent based simulation. What simulations of this class share, and what differs from more traditional models, is that the result sought is emergent from a large number of contributing entities. Logistic, economic and social simulations are members of this class where things or people are organized or self-organize to produce a solution. Entity-based problems never have an a priori ergodic principle that will greatly simplify calculations. Because the results of entity-based simulations can only be realized at scale, scalable computing is de rigueur for large problems. Having said that, the absence of a spatial organizing principal makes the decomposition of the problem onto processors problematic. In addition, practitioners in this domain commonly use the Java programming language which presents its own problems in a high-performance setting. The plugin we have developed, called the Parallel Particle Data Model, overcomes both of these obstacles and is now being used by two Sandia frameworks: the Decision Analysis Center, and the Seldon social simulation facility. While the ability to engage U.S.-sized problems is now available to the Decision Analysis Center, this plugin is central to the success of Seldon. Because Seldon relies on computationally intensive cognitive sub-models, this work is necessary to achieve the scale necessary for realistic results. With the recent upheavals in the financial markets, and the inscrutability of terrorist activity, this simulation domain will likely need a capability with ever greater fidelity. High-performance computing will play an important part in enabling that greater fidelity.
Parallel scalability of Hartree-Fock calculations
Chow, Edmond; Liu, Xing; Smelyanskiy, Mikhail; Hammond, Jeff R.
2015-03-01
Quantum chemistry is increasingly performed using large cluster computers consisting of multiple interconnected nodes. For a fixed molecular problem, the efficiency of a calculation usually decreases as more nodes are used, due to the cost of communication between the nodes. This paper empirically investigates the parallel scalability of Hartree-Fock calculations. The construction of the Fock matrix and the density matrix calculation are analyzed separately. For the former, we use a parallelization of Fock matrix construction based on a static partitioning of work followed by a work stealing phase. For the latter, we use density matrix purification from the linear scaling methods literature, but without using sparsity. When using large numbers of nodes for moderately sized problems, density matrix computations are network-bandwidth bound, making purification methods potentially faster than eigendecomposition methods.
DEFF Research Database (Denmark)
Christensen, Jonas Mark; Røpke, Stefan
that serves all the customers. The second stage usesan Adaptive Large Neighborhood Search (ALNS) algorithm to minimise the travel distance, during the second phase all of the generated routes are considered by solving a set cover problem. The ALNS algorithm uses 4 destroy operators, 2 repair operators...
Directory of Open Access Journals (Sweden)
Yingni Zhai
2014-10-01
Full Text Available Purpose: A decomposition heuristics based on multi-bottleneck machines for large-scale job shop scheduling problems (JSP is proposed.Design/methodology/approach: In the algorithm, a number of sub-problems are constructed by iteratively decomposing the large-scale JSP according to the process route of each job. And then the solution of the large-scale JSP can be obtained by iteratively solving the sub-problems. In order to improve the sub-problems' solving efficiency and the solution quality, a detection method for multi-bottleneck machines based on critical path is proposed. Therewith the unscheduled operations can be decomposed into bottleneck operations and non-bottleneck operations. According to the principle of “Bottleneck leads the performance of the whole manufacturing system” in TOC (Theory Of Constraints, the bottleneck operations are scheduled by genetic algorithm for high solution quality, and the non-bottleneck operations are scheduled by dispatching rules for the improvement of the solving efficiency.Findings: In the process of the sub-problems' construction, partial operations in the previous scheduled sub-problem are divided into the successive sub-problem for re-optimization. This strategy can improve the solution quality of the algorithm. In the process of solving the sub-problems, the strategy that evaluating the chromosome's fitness by predicting the global scheduling objective value can improve the solution quality.Research limitations/implications: In this research, there are some assumptions which reduce the complexity of the large-scale scheduling problem. They are as follows: The processing route of each job is predetermined, and the processing time of each operation is fixed. There is no machine breakdown, and no preemption of the operations is allowed. The assumptions should be considered if the algorithm is used in the actual job shop.Originality/value: The research provides an efficient scheduling method for the
Xu, Jiuping
2014-01-01
This paper presents an extension of the multimode resource-constrained project scheduling problem for a large scale construction project where multiple parallel projects and a fuzzy random environment are considered. By taking into account the most typical goals in project management, a cost/weighted makespan/quality trade-off optimization model is constructed. To deal with the uncertainties, a hybrid crisp approach is used to transform the fuzzy random parameters into fuzzy variables that are subsequently defuzzified using an expected value operator with an optimistic-pessimistic index. Then a combinatorial-priority-based hybrid particle swarm optimization algorithm is developed to solve the proposed model, where the combinatorial particle swarm optimization and priority-based particle swarm optimization are designed to assign modes to activities and to schedule activities, respectively. Finally, the results and analysis of a practical example at a large scale hydropower construction project are presented to demonstrate the practicality and efficiency of the proposed model and optimization method. PMID:24550708
Xu, Jiuping; Feng, Cuiying
2014-01-01
This paper presents an extension of the multimode resource-constrained project scheduling problem for a large scale construction project where multiple parallel projects and a fuzzy random environment are considered. By taking into account the most typical goals in project management, a cost/weighted makespan/quality trade-off optimization model is constructed. To deal with the uncertainties, a hybrid crisp approach is used to transform the fuzzy random parameters into fuzzy variables that are subsequently defuzzified using an expected value operator with an optimistic-pessimistic index. Then a combinatorial-priority-based hybrid particle swarm optimization algorithm is developed to solve the proposed model, where the combinatorial particle swarm optimization and priority-based particle swarm optimization are designed to assign modes to activities and to schedule activities, respectively. Finally, the results and analysis of a practical example at a large scale hydropower construction project are presented to demonstrate the practicality and efficiency of the proposed model and optimization method.
The Solution of Large Time-Dependent Problems Using Reduced Coordinates.
1987-06-01
numerical intergration schemes for dynamic problems, the algorithm known as Newmark’s Method. The behavior of the Newmark scheme, as well as the basic...T’he horizontal displacements at the mid-height and the bottom of the buildin- are shown in f igure 4. 13. The solution history illustrated is for a
Asymptotic eigenvalue estimates for a Robin problem with a large parameter
Czech Academy of Sciences Publication Activity Database
Exner, Pavel; Minakov, A.; Parnovski, L.
2014-01-01
Roč. 71, č. 2 (2014), s. 141-156 ISSN 0032-5155 R&D Projects: GA ČR(CZ) GA14-06818S Institutional support: RVO:61389005 Keywords : Laplacian * Robin problem * eigenvalue asymptotics Subject RIV: BE - Theoretical Physics Impact factor: 0.250, year: 2014
Czech Academy of Sciences Publication Activity Database
Lukšan, Ladislav; Vlček, Jan
1998-01-01
Roč. 5, č. 3 (1998), s. 219-247 ISSN 1070-5325 R&D Projects: GA ČR GA201/96/0918 Keywords : nonlinear programming * sparse problems * equality constraints * truncated Newton method * augmented Lagrangian function * indefinite systems * indefinite preconditioners * conjugate gradient method * residual smoothing Subject RIV: BA - General Mathematics Impact factor: 0.741, year: 1998
Study on MPI/OpenMP hybrid parallelism for Monte Carlo neutron transport code
International Nuclear Information System (INIS)
Liang Jingang; Xu Qi; Wang Kan; Liu Shiwen
2013-01-01
Parallel programming with mixed mode of messages-passing and shared-memory has several advantages when used in Monte Carlo neutron transport code, such as fitting hardware of distributed-shared clusters, economizing memory demand of Monte Carlo transport, improving parallel performance, and so on. MPI/OpenMP hybrid parallelism was implemented based on a one dimension Monte Carlo neutron transport code. Some critical factors affecting the parallel performance were analyzed and solutions were proposed for several problems such as contention access, lock contention and false sharing. After optimization the code was tested finally. It is shown that the hybrid parallel code can reach good performance just as pure MPI parallel program, while it saves a lot of memory usage at the same time. Therefore hybrid parallel is efficient for achieving large-scale parallel of Monte Carlo neutron transport. (authors)
About Parallel Programming: Paradigms, Parallel Execution and Collaborative Systems
Directory of Open Access Journals (Sweden)
Loredana MOCEAN
2009-01-01
Full Text Available In the last years, there were made efforts for delineation of a stabile and unitary frame, where the problems of logical parallel processing must find solutions at least at the level of imperative languages. The results obtained by now are not at the level of the made efforts. This paper wants to be a little contribution at these efforts. We propose an overview in parallel programming, parallel execution and collaborative systems.
Parallel computational in nuclear group constant calculation
International Nuclear Information System (INIS)
Su'ud, Zaki; Rustandi, Yaddi K.; Kurniadi, Rizal
2002-01-01
In this paper parallel computational method in nuclear group constant calculation using collision probability method will be discuss. The main focus is on the calculation of collision matrix which need large amount of computational time. The geometry treated here is concentric cylinder. The calculation of collision probability matrix is carried out using semi analytic method using Beckley Naylor Function. To accelerate computation speed some computer parallel used to solve the problem. We used LINUX based parallelization using PVM software with C or fortran language. While in windows based we used socket programming using DELPHI or C builder. The calculation results shows the important of optimal weight for each processor in case there area many type of processor speed
High-energy physics software parallelization using database techniques
International Nuclear Information System (INIS)
Argante, E.; Van der Stok, P.D.V.; Willers, I.
1997-01-01
A programming model for software parallelization, called CoCa, is introduced that copes with problems caused by typical features of high-energy physics software. By basing CoCa on the database transaction paradigm, the complexity induced by the parallelization is for a large part transparent to the programmer, resulting in a higher level of abstraction than the native message passing software. CoCa is implemented on a Meiko CS-2 and on a SUN SPARCcenter 2000 parallel computer. On the CS-2, the performance is comparable with the performance of native PVM and MPI. (orig.)
Kalman Filter Tracking on Parallel Architectures
International Nuclear Information System (INIS)
Cerati, Giuseppe; Elmer, Peter; Krutelyov, Slava; Lantz, Steven; Lefebvre, Matthieu; McDermott, Kevin; Riley, Daniel; Tadel, Matevž; Wittich, Peter; Würthwein, Frank; Yagil, Avi
2016-01-01
Power density constraints are limiting the performance improvements of modern CPUs. To address this we have seen the introduction of lower-power, multi-core processors such as GPGPU, ARM and Intel MIC. In order to achieve the theoretical performance gains of these processors, it will be necessary to parallelize algorithms to exploit larger numbers of lightweight cores and specialized functions like large vector units. Track finding and fitting is one of the most computationally challenging problems for event reconstruction in particle physics. At the High-Luminosity Large Hadron Collider (HL-LHC), for example, this will be by far the dominant problem. The need for greater parallelism has driven investigations of very different track finding techniques such as Cellular Automata or Hough Transforms. The most common track finding techniques in use today, however, are those based on a Kalman filter approach. Significant experience has been accumulated with these techniques on real tracking detector systems, both in the trigger and offline. They are known to provide high physics performance, are robust, and are in use today at the LHC. Given the utility of the Kalman filter in track finding, we have begun to port these algorithms to parallel architectures, namely Intel Xeon and Xeon Phi. We report here on our progress towards an end-to-end track reconstruction algorithm fully exploiting vectorization and parallelization techniques in a simplified experimental environment
The problem of large samples. An activation analysis study of electronic waste material
International Nuclear Information System (INIS)
Segebade, C.; Goerner, W.; Bode, P.
2007-01-01
Large-volume instrumental photon activation analysis (IPAA) was used for the investigation of shredded electronic waste material. Sample masses from 1 to 150 grams were analyzed to obtain an estimate of the minimum sample size to be taken to achieve a representativeness of the results which is satisfactory for a defined investigation task. Furthermore, the influence of irradiation and measurement parameters upon the quality of the analytical results were studied. Finally, the analytical data obtained from IPAA and instrumental neutron activation analysis (INAA), both carried out in a large-volume mode, were compared. Only parts of the values were found in satisfactory agreement. (author)
A parametric study of contact problem on a large size flange
International Nuclear Information System (INIS)
Mukherjee, A.B.; Narayanan, T.; Dhondkar, J.K.; Mehra, V.K.
1989-01-01
Continuous change of contact point on gasket face with the application of bolt load makes it a non-linear problem. Thus the geometric non-linearity of the structure is simulated and a stress distribution over the gasket face is presented in this paper. The paper also describes the use of taper on the gasket face to reduce the stress peaking and to optimize the gasket face separation
Solving and Interpreting Large-scale Harvest Scheduling Problems by Duality and Decomposition
Berck, Peter; Bible, Thomas
1982-01-01
This paper presents a solution to the forest planning problem that takes advantage of both the duality of linear programming formulations currently being used for harvest scheduling and the characteristics of decomposition inherent in the forest land class-relationship. The subproblems of decomposition, defined as the dual, can be solved in a simple, recursive fashion. In effect, such a technique reduces the computational burden in terms of time and computer storage as compared to the traditi...
Problems and Challenges in Human Resource Management: Case of a Large Organization in Pakistan
Directory of Open Access Journals (Sweden)
Ali Irshad
2008-09-01
Full Text Available This paper critically analyzes the work culture for a mainstream financial organization operating within Pakistan, while drawing a specific example to elucidate certain dilemmas that impede the potential growth for the financial sector and its constituent workforce, besides hampering the performance of the organizations. The case study is related to an organization in financial sector which conducts a Management Trainee Program with the purpose to select, train and develop a high-potential pool of talent into future leaders and fore-runners of the organization. This paper critically analyzes several inherent problems that face the successful implementation of the trainee program under the frameworks of various theories of organizational management. To solve these problems, this article presents a detailed diagnosis of the management shortcomings to improve the firm‟s corporate culture, work ethics and employee handling strategy and mechanism. Recommendations are also made to minimize the problems and maximize the success of the Management Trainee Program in the case study organization.
An adaptive large neighborhood search heuristic for the Electric Vehicle Scheduling Problem
DEFF Research Database (Denmark)
Wen, M.; Linde, Esben; Røpke, Stefan
2016-01-01
to minimizing the total deadheading distance. A mixed integer programming formulation as well as an Adaptive Large Neighborhood Search (ALNS) heuristic for the E-VSP are presented. ALNS is tested on newly generated E-VSP benchmark instances. Result shows that the proposed heuristic can provide good solutions...
Parallel computation for solving the tridiagonal linear system of equations
International Nuclear Information System (INIS)
Ishiguro, Misako; Harada, Hiroo; Fujii, Minoru; Fujimura, Toichiro; Nakamura, Yasuhiro; Nanba, Katsumi.
1981-09-01
Recently, applications of parallel computation for scientific calculations have increased from the need of the high speed calculation of large scale programs. At the JAERI computing center, an array processor FACOM 230-75 APU has installed to study the applicability of parallel computation for nuclear codes. We made some numerical experiments by using the APU on the methods of solution of tridiagonal linear equation which is an important problem in scientific calculations. Referring to the recent papers with parallel methods, we investigate eight ones. These are Gauss elimination method, Parallel Gauss method, Accelerated parallel Gauss method, Jacobi method, Recursive doubling method, Cyclic reduction method, Chebyshev iteration method, and Conjugate gradient method. The computing time and accuracy were compared among the methods on the basis of the numerical experiments. As the result, it is found that the Cyclic reduction method is best both in computing time and accuracy and the Gauss elimination method is the second one. (author)
DEFF Research Database (Denmark)
Uchoa, Eduardo; Fukasawa, Ricardo; Lysgaard, Jens
This paper presents a robust branch-cut-and-price algorithm for the Capacitated Minimum Spanning Tree Problem (CMST). The variables are associated to q-arbs, a structure that arises from a relaxation of the capacitated prize-collecting arborescence problem in order to make it solvable in pseudo......-polynomial time. Traditional inequalities over the arc formulation, like Capacity Cuts, are also used. Moreover, a novel feature is introduced in such kind of algorithms. Powerful new cuts expressed over a very large set of variables could be added, without increasing the complexity of the pricing subproblem...
International Nuclear Information System (INIS)
Tsuji, Masashi; Chiba, Gou
2000-01-01
A hierarchical domain decomposition boundary element method (HDD-BEM) for solving the multiregion neutron diffusion equation (NDE) has been fully parallelized, both for numerical computations and for data communications, to accomplish a high parallel efficiency on distributed memory message passing parallel computers. Data exchanges between node processors that are repeated during iteration processes of HDD-BEM are implemented, without any intervention of the host processor that was used to supervise parallel processing in the conventional parallelized HDD-BEM (P-HDD-BEM). Thus, the parallel processing can be executed with only cooperative operations of node processors. The communication overhead was even the dominant time consuming part in the conventional P-HDD-BEM, and the parallelization efficiency decreased steeply with the increase of the number of processors. With the parallel data communication, the efficiency is affected only by the number of boundary elements assigned to decomposed subregions, and the communication overhead can be drastically reduced. This feature can be particularly advantageous in the analysis of three-dimensional problems where a large number of processors are required. The proposed P-HDD-BEM offers a promising solution to the deterioration problem of parallel efficiency and opens a new path to parallel computations of NDEs on distributed memory message passing parallel computers. (author)
Parallel processing based decomposition technique for efficient collaborative optimization
International Nuclear Information System (INIS)
Park, Hyung Wook; Kim, Sung Chan; Kim, Min Soo; Choi, Dong Hoon
2001-01-01
In practical design studies, most of designers solve multidisciplinary problems with large sized and complex design system. These multidisciplinary problems have hundreds of analysis and thousands of variables. The sequence of process to solve these problems affects the speed of total design cycle. Thus it is very important for designer to reorder the original design processes to minimize total computational cost. This is accomplished by decomposing large multidisciplinary problem into several MultiDisciplinary Analysis SubSystem (MDASS) and processing it in parallel. This paper proposes new strategy for parallel decomposition of multidisciplinary problem to raise design efficiency by using genetic algorithm and shows the relationship between decomposition and Multidisciplinary Design Optimization(MDO) methodology
Problems related to the definition of the shielding of a large fast power reactor
International Nuclear Information System (INIS)
Moreau, J.
Solutions for the shielding of a 1000 MW(e) power plant in the same technological line as Phenix are given. They have been evaluated with a monodimensional transport code. The choice is based on the comparison of their efficiency towards neutrons and on the consequences of their characteristics on the conception of the reactor tank. A few economical considerations give an idea of the influence of the choice in shielding on the cost of the power plant. At last the problem of the optimization possibilities is approached from the designer's point of view
Bui-Thanh, T.; Girolami, M.
2014-11-01
We consider the Riemann manifold Hamiltonian Monte Carlo (RMHMC) method for solving statistical inverse problems governed by partial differential equations (PDEs). The Bayesian framework is employed to cast the inverse problem into the task of statistical inference whose solution is the posterior distribution in infinite dimensional parameter space conditional upon observation data and Gaussian prior measure. We discretize both the likelihood and the prior using the H1-conforming finite element method together with a matrix transfer technique. The power of the RMHMC method is that it exploits the geometric structure induced by the PDE constraints of the underlying inverse problem. Consequently, each RMHMC posterior sample is almost uncorrelated/independent from the others providing statistically efficient Markov chain simulation. However this statistical efficiency comes at a computational cost. This motivates us to consider computationally more efficient strategies for RMHMC. At the heart of our construction is the fact that for Gaussian error structures the Fisher information matrix coincides with the Gauss-Newton Hessian. We exploit this fact in considering a computationally simplified RMHMC method combining state-of-the-art adjoint techniques and the superiority of the RMHMC method. Specifically, we first form the Gauss-Newton Hessian at the maximum a posteriori point and then use it as a fixed constant metric tensor throughout RMHMC simulation. This eliminates the need for the computationally costly differential geometric Christoffel symbols, which in turn greatly reduces computational effort at a corresponding loss of sampling efficiency. We further reduce the cost of forming the Fisher information matrix by using a low rank approximation via a randomized singular value decomposition technique. This is efficient since a small number of Hessian-vector products are required. The Hessian-vector product in turn requires only two extra PDE solves using the adjoint
Optimising a parallel conjugate gradient solver
Energy Technology Data Exchange (ETDEWEB)
Field, M.R. [O`Reilly Institute, Dublin (Ireland)
1996-12-31
This work arises from the introduction of a parallel iterative solver to a large structural analysis finite element code. The code is called FEX and it was developed at Hitachi`s Mechanical Engineering Laboratory. The FEX package can deal with a large range of structural analysis problems using a large number of finite element techniques. FEX can solve either stress or thermal analysis problems of a range of different types from plane stress to a full three-dimensional model. These problems can consist of a number of different materials which can be modelled by a range of material models. The structure being modelled can have the load applied at either a point or a surface, or by a pressure, a centrifugal force or just gravity. Alternatively a thermal load can be applied with a given initial temperature. The displacement of the structure can be constrained by having a fixed boundary or by prescribing the displacement at a boundary.
Solving sparse linear least squares problems on some supercomputers by using large dense blocks
DEFF Research Database (Denmark)
Hansen, Per Christian; Ostromsky, T; Sameh, A
1997-01-01
technique is preferable to sparse matrix technique when the matrices are not large, because the high computational speed compensates fully the disadvantages of using more arithmetic operations and more storage. For very large matrices the computations must be organized as a sequence of tasks in each......Efficient subroutines for dense matrix computations have recently been developed and are available on many high-speed computers. On some computers the speed of many dense matrix operations is near to the peak-performance. For sparse matrices storage and operations can be saved by operating only...... and storing only nonzero elements. However, the price is a great degradation of the speed of computations on supercomputers (due to the use of indirect addresses, to the need to insert new nonzeros in the sparse storage scheme, to the lack of data locality, etc.). On many high-speed computers a dense matrix...
Sparse direct solver for large finite element problems based on the minimum degree algorithm
Czech Academy of Sciences Publication Activity Database
Pařík, Petr; Plešek, Jiří
2017-01-01
Roč. 113, November (2017), s. 2-6 ISSN 0965-9978 R&D Projects: GA ČR(CZ) GA15-20666S; GA MŠk(CZ) EF15_003/0000493 Institutional support: RVO:61388998 Keywords : sparse direct solution * finite element method * large sparse Linear systems Subject RIV: JR - Other Machinery OBOR OECD: Mechanical engineering Impact factor: 3.000, year: 2016 https://www.sciencedirect.com/science/article/pii/S0965997817302582
Parallel hierarchical global illumination
Energy Technology Data Exchange (ETDEWEB)
Snell, Quinn O. [Iowa State Univ., Ames, IA (United States)
1997-10-08
Solving the global illumination problem is equivalent to determining the intensity of every wavelength of light in all directions at every point in a given scene. The complexity of the problem has led researchers to use approximation methods for solving the problem on serial computers. Rather than using an approximation method, such as backward ray tracing or radiosity, the authors have chosen to solve the Rendering Equation by direct simulation of light transport from the light sources. This paper presents an algorithm that solves the Rendering Equation to any desired accuracy, and can be run in parallel on distributed memory or shared memory computer systems with excellent scaling properties. It appears superior in both speed and physical correctness to recent published methods involving bidirectional ray tracing or hybrid treatments of diffuse and specular surfaces. Like progressive radiosity methods, it dynamically refines the geometry decomposition where required, but does so without the excessive storage requirements for ray histories. The algorithm, called Photon, produces a scene which converges to the global illumination solution. This amounts to a huge task for a 1997-vintage serial computer, but using the power of a parallel supercomputer significantly reduces the time required to generate a solution. Currently, Photon can be run on most parallel environments from a shared memory multiprocessor to a parallel supercomputer, as well as on clusters of heterogeneous workstations.
International Nuclear Information System (INIS)
Whitesides, G.E.; Stephens, M.E.
1986-01-01
A study has been conducted by an Office of Economic Cooperation and Development-Committee on the Safety of Nuclear Installations (OECD-CSNI) Working Group that examined computational methods used to compute k/sub eff/ for large greater than or equal to5 3 arrays of fissile material (in which each unit is a substantial fraction of a critical mass). Five fissile materials that might typically be transported were used in the study. The ''packages'' used for this exercise were simplified to allow studies unperturbed by the variety of structural materials which would exist in an actual package. The only material present other than the fissile material was a variation in the moderator (water) surrounding the fissile material. Consistent results were obtained from calculations using several computational methods. That is, when the bias demonstrated by each method for actual critical experiments was used to ''correct'' the results obtained for systems for which there were no experimental data, there was good agreement between the methods. Two major areas of concern were raised by this exercise. First, the lack of experimental data for arrays with size greater than 5 3 limits validation for large systems. Second, there is a distinct possibility that the comingling of two shipments of unlike units could result in a reduction of the safety margins. Additional experiments and calculations will be required to satisfactorily resolve the remaining questions regarding the safe transport of large arrays of fissile materials
Scalability of Parallel Scientific Applications on the Cloud
Directory of Open Access Journals (Sweden)
Satish Narayana Srirama
2011-01-01
Full Text Available Cloud computing, with its promise of virtually infinite resources, seems to suit well in solving resource greedy scientific computing problems. To study the effects of moving parallel scientific applications onto the cloud, we deployed several benchmark applications like matrix–vector operations and NAS parallel benchmarks, and DOUG (Domain decomposition On Unstructured Grids on the cloud. DOUG is an open source software package for parallel iterative solution of very large sparse systems of linear equations. The detailed analysis of DOUG on the cloud showed that parallel applications benefit a lot and scale reasonable on the cloud. We could also observe the limitations of the cloud and its comparison with cluster in terms of performance. However, for efficiently running the scientific applications on the cloud infrastructure, the applications must be reduced to frameworks that can successfully exploit the cloud resources, like the MapReduce framework. Several iterative and embarrassingly parallel algorithms are reduced to the MapReduce model and their performance is measured and analyzed. The analysis showed that Hadoop MapReduce has significant problems with iterative methods, while it suits well for embarrassingly parallel algorithms. Scientific computing often uses iterative methods to solve large problems. Thus, for scientific computing on the cloud, this paper raises the necessity for better frameworks or optimizations for MapReduce.
Modeling and solving a large-scale generation expansion planning problem under uncertainty
Energy Technology Data Exchange (ETDEWEB)
Jin, Shan; Ryan, Sarah M. [Iowa State University, Department of Industrial and Manufacturing Systems Engineering, Ames (United States); Watson, Jean-Paul [Sandia National Laboratories, Discrete Math and Complex Systems Department, Albuquerque (United States); Woodruff, David L. [University of California Davis, Graduate School of Management, Davis (United States)
2011-11-15
We formulate a generation expansion planning problem to determine the type and quantity of power plants to be constructed over each year of an extended planning horizon, considering uncertainty regarding future demand and fuel prices. Our model is expressed as a two-stage stochastic mixed-integer program, which we use to compute solutions independently minimizing the expected cost and the Conditional Value-at-Risk; i.e., the risk of significantly larger-than-expected operational costs. We introduce stochastic process models to capture demand and fuel price uncertainty, which are in turn used to generate trees that accurately represent the uncertainty space. Using a realistic problem instance based on the Midwest US, we explore two fundamental, unexplored issues that arise when solving any stochastic generation expansion model. First, we introduce and discuss the use of an algorithm for computing confidence intervals on obtained solution costs, to account for the fact that a finite sample of scenarios was used to obtain a particular solution. Second, we analyze the nature of solutions obtained under different parameterizations of this method, to assess whether the recommended solutions themselves are invariant to changes in costs. The issues are critical for decision makers who seek truly robust recommendations for generation expansion planning. (orig.)
Matzen, Laura E; Benz, Zachary O; Dixon, Kevin R; Posey, Jamie; Kroger, James K; Speed, Ann E
2010-05-01
Raven's Progressive Matrices is a widely used test for assessing intelligence and reasoning ability (Raven, Court, & Raven, 1998). Since the test is nonverbal, it can be applied to many different populations and has been used all over the world (Court & Raven, 1995). However, relatively few matrices are in the sets developed by Raven, which limits their use in experiments requiring large numbers of stimuli. For the present study, we analyzed the types of relations that appear in Raven's original Standard Progressive Matrices (SPMs) and created a software tool that can combine the same types of relations according to parameters chosen by the experimenter, to produce very large numbers of matrix problems with specific properties. We then conducted a norming study in which the matrices we generated were compared with the actual SPMs. This study showed that the generated matrices both covered and expanded on the range of problem difficulties provided by the SPMs.
PARALLEL IMPORT: REALITY FOR RUSSIA
Directory of Open Access Journals (Sweden)
Т. А. Сухопарова
2014-01-01
Full Text Available Problem of parallel import is urgent question at now. Parallel import legalization in Russia is expedient. Such statement based on opposite experts opinion analysis. At the same time it’s necessary to negative consequences consider of this decision and to apply remedies to its minimization.Purchase on Elibrary.ru > Buy now
Parallel adaptation of a vectorised quantumchemical program system
International Nuclear Information System (INIS)
Van Corler, L.C.H.; Van Lenthe, J.H.
1987-01-01
Supercomputers, like the CRAY 1 or the Cyber 205, have had, and still have, a marked influence on Quantum Chemistry. Vectorization has led to a considerable increase in the performance of Quantum Chemistry programs. However, clockcycle times more than a factor 10 smaller than those of the present supercomputers are not to be expected. Therefore future supercomputers will have to depend on parallel structures. Recently, the first examples of such supercomputers have been installed. To be prepared for this new generation of (parallel) supercomputers one should consider the concepts one wants to use and the kind of problems one will encounter during implementation of existing vectorized programs on those parallel systems. The authors implemented four important parts of a large quantumchemical program system (ATMOL), i.e. integrals, SCF, 4-index and Direct-CI in the parallel environment at ECSEC (Rome, Italy). This system offers simulated parallellism on the host computer (IBM 4381) and real parallellism on at most 10 attached processors (FPS-164). Quantumchemical programs usually handle large amounts of data and very large, often sparse matrices. The transfer of that many data can cause problems concerning communication and overhead, in view of which shared memory and shared disks must be considered. The strategy and the tools that were used to parallellise the programs are shown. Also, some examples are presented to illustrate effectiveness and performance of the system in Rome for these type of calculations
The solar elemental abundances problem: Large enhancements in photoionization and bound-free opacity
Pradhan, A.; Nahar, S.
2016-05-01
Aimed at solving the outstanding problem of solar opacity and radiation transport, we report substantial photoabsorption in the high-energy regime due to atomic core photo-excitations not heretofore considered. In an extensive R-Matrix calculations of unprecedented complexity for an important iron ion Fe XVII, with a wave function expansion of 99 Fe XVIII core states from n current opacity models, and ii) demonstrate convergence with respect to successive core excitations. These findings may explain the ``higher-than-predicted'' monochromatic iron opacity measured recently at the Sandia Z-pinch fusion device at solar interior conditions. The findings will also impact the total atomic photoabsorption and radiation transport in laboratory and astrophysical plasmas, such as UV emission from host stars of extra-solar planets. Support: NSF, DOE, Ohio Supercomputer Center, Columbus, OH.
International Nuclear Information System (INIS)
Sushkevich, T.A.
2011-01-01
This review is to remind scientists of the older generation of some memorable historical pages and of many famous researchers, teachers and colleagues. For the younger researchers and foreign colleagues it will be useful to get to know about pioneer advancements of the Soviet scientists in the field of information and mathematical supply for cosmonautic problems on the eve of the space era. Main attention is paid to the scientific experiments conducted on the piloted space vehicles and the research teams who created the information and mathematical tools for the first space projects. The role of Mstislav Vsevolodovich Keldysh, the Major Theoretician of cosmonautics, is particularly emphasized. He determined for the most part the basic directions of development of space research and remote sensing of the Earth and planets that are shortly called remote sensing
Use of endochronic plasticity for multi-dimensional small and large strain problems
International Nuclear Information System (INIS)
Hsieh, B.J.
1980-04-01
The endochronic plasticity theory was proposed in its general form by K.C. Valanis. An intrinsic time measure, which is a property of the material, is used in the theory. the explicit forms of the constitutive equation resemble closely those of the classical theory of linear viscoelasticity. Excellent agreement between the predicted and experimental results is obtained for some metallic and non-metallic materials for one dimensional cases. No reference on the use of endochronic plasticity consistent with the general theory proposed by Valanis is available in the open literature. In this report, the explicit constitutive equations are derived that are consistent with the general theory for one-dimensional (simple tension or compression), two-dimensional plane strain or stress and three-dimensional axisymmetric problems
International Nuclear Information System (INIS)
Emiel, G.
2008-01-01
This manuscript deals with large-scale non-smooth optimization that may typically arise when performing Lagrangian relaxation of difficult problems. This technique is commonly used to tackle mixed-integer linear programming - or large-scale convex problems. For example, a classical approach when dealing with power generation planning problems in a stochastic environment is to perform a Lagrangian relaxation of the coupling constraints of demand. In this approach, a master problem coordinates local subproblems, specific to each generation unit. The master problem deals with a separable non-smooth dual function which can be maximized with, for example, bundle algorithms. In chapter 2, we introduce basic tools of non-smooth analysis and some recent results regarding incremental or inexact instances of non-smooth algorithms. However, in some situations, the dual problem may still be very hard to solve. For instance, when the number of dualized constraints is very large (exponential in the dimension of the primal problem), explicit dualization may no longer be possible or the update of dual variables may fail. In order to reduce the dual dimension, different heuristics were proposed. They involve a separation procedure to dynamically select a restricted set of constraints to be dualized along the iterations. This relax-and-cut type approach has shown its numerical efficiency in many combinatorial problems. In chapter 3, we show Primal-dual convergence of such strategy when using an adapted sub-gradient method for the dual step and under minimal assumptions on the separation procedure. Another limit of Lagrangian relaxation may appear when the dual function is separable in highly numerous or complex sub-functions. In such situation, the computational burden of solving all local subproblems may be preponderant in the whole iterative process. A natural strategy would be here to take full advantage of the dual separable structure, performing a dual iteration after having
Energy Technology Data Exchange (ETDEWEB)
Stuhrmann, D.
1979-01-01
This paper discusses health hazards, such as noise, vibrations and air pollution caused by the operation of large machines and their effect on the human body. Means of reducing the influence of these factors are: improved ear protection against noise, special gymnastics at the end of each shift, warm water swimming and a weekly sauna treatment for the effects of vibrations. Air pollution caused by diesel engines is monitored. Engine exhaust standards are examined in detail and methods are proposed to further reduce air pollution.
Performance of the improved version of Monte Carlo code A 3MCNP for large-scale shielding problems
International Nuclear Information System (INIS)
Omura, M.; Miyake, Y.; Hasegawa, T.; Ueki, K.; Sato, O.; Haghighat, A.; Sjoden, G. E.
2005-01-01
A 3MCNP (Automatic Adjoint Accelerated MCNP) is a revised version of the MCNP Monte Carlo code, which automatically prepares variance reduction parameters for the CADIS (Consistent Adjoint Driven Importance Sampling) methodology. Using a deterministic 'importance' (or adjoint) function, CADIS performs source and transport biasing within the weight-window technique. The current version of A 3MCNP uses the three-dimensional (3-D) Sn transport TORT code to determine a 3-D importance function distribution. Based on simulation of several real-life problems, it is demonstrated that A 3MCNP provides precise calculation results with a remarkably short computation time by using the proper and objective variance reduction parameters. However, since the first version of A 3MCNP provided only a point source configuration option for large-scale shielding problems, such as spent-fuel transport casks, a large amount of memory may be necessary to store enough points to properly represent the source. Hence, we have developed an improved version of A 3MCNP (referred to as A 3MCNPV) which has a volumetric source configuration option. This paper describes the successful use of A 3MCNPV for a concrete cask neutron and gamma-ray shielding problem, and a PWR dosimetry problem. (authors)
Number of deaths due to lung diseases: How large is the problem?
International Nuclear Information System (INIS)
Wagener, D.K.
1990-01-01
The importance of lung disease as an indicator of environmentally induced adverse health effects has been recognized by inclusion among the Health Objectives for the Nation. The 1990 Health Objectives for the Nation (US Department of Health and Human Services, 1986) includes an objective that there should be virtually no new cases among newly exposed workers for four preventable occupational lung diseases-asbestosis, byssinosis, silicosis, and coal workers' pneumoconiosis. This brief communication describes two types of cause-of-death statistics- underlying and multiple cause-and demonstrates the differences between the two statistics using lung disease deaths among adult men. The choice of statistic has a large impact on estimated lung disease mortality rates. The choice of statistics also may have large effect on the estimated mortality rates due to other chromic diseases thought to be environmentally mediated. Issues of comorbidity and the way causes of death are reported become important in the interpretation of these statistics. The choice of which statistic to use when comparing data from a study population with national statistics may greatly affect the interpretations of the study findings
Neural nets for massively parallel optimization
Dixon, Laurence C. W.; Mills, David
1992-07-01
To apply massively parallel processing systems to the solution of large scale optimization problems it is desirable to be able to evaluate any function f(z), z (epsilon) Rn in a parallel manner. The theorem of Cybenko, Hecht Nielsen, Hornik, Stinchcombe and White, and Funahasi shows that this can be achieved by a neural network with one hidden layer. In this paper we address the problem of the number of nodes required in the layer to achieve a given accuracy in the function and gradient values at all points within a given n dimensional interval. The type of activation function needed to obtain nonsingular Hessian matrices is described and a strategy for obtaining accurate minimal networks presented.
Analytical and experimental analysis of a parallel leaf spring guidance
Meijaard, Jacob Philippus; Brouwer, Dannis Michel; Jonker, Jan B.; Denier, J.; Finn, M.
2008-01-01
A parallel leaf spring guidance is defined as a benchmark problem for flexible multibody formalisms and codes. The mechanism is loaded by forces and an additional moment or misalignment. Buckling loads, changes in compliance and frequencies, and large-amplitude vibrations are calculated. A
Comparative efficiencies of three parallel algorithms for nonlinear ...
Indian Academy of Sciences (India)
R. Narasimhan (Krishtel eMaging) 1461 1996 Oct 15 13:05:22
This algorithm is better suited for large size problems on coarse ... and reliable time integration algorithms for solving the second-order dynamic equilibrium equations that arise due ... Programming models required to take advantage of the parallel and distributed ..... In addition, MPI added the concept of a 'virtual topology'.
Parallel Monte Carlo reactor neutronics
International Nuclear Information System (INIS)
Blomquist, R.N.; Brown, F.B.
1994-01-01
The issues affecting implementation of parallel algorithms for large-scale engineering Monte Carlo neutron transport simulations are discussed. For nuclear reactor calculations, these include load balancing, recoding effort, reproducibility, domain decomposition techniques, I/O minimization, and strategies for different parallel architectures. Two codes were parallelized and tested for performance. The architectures employed include SIMD, MIMD-distributed memory, and workstation network with uneven interactive load. Speedups linear with the number of nodes were achieved
Taylor, David McD; Wolfe, Rory; Cameron, Peter A
2002-03-01
Emergency department patient complaints are often justified and may lead to apology, remedial action or compensation. The aim of the present study was to analyse emergency department patient complaints in order to identify procedures or practices that require change and to make recommendations for intervention strategies aimed at decreasing complaint rates. We undertook a retrospective analysis of patient complaints from 36 Victorian emergency departments during a 61 month period. Data were obtained from the Health Complaint Information Program (Health Services Commissioner). In all, 2,419 emergency department patients complained about a total of 3,418 separate issues (15.4% of all issues from all hospital departments). Of these, 1,157 complaints (47.80%) were received by telephone and 829 (34.3%) were received by letter; 1,526 (63.1 %) complaints were made by a person other than the patient. Highest complaint rates were received from patients who were female, born in non-English-speaking countries and very young or very old. One thousand one hundred and forty-one issues (33.4%) related to patient treatment, including inadequate treatment (329 issues) and inadequate diagnosis (249 issues); 1079 (31.6%) issues related to communication, including poor staff attitude, discourtesy and rudeness (444 issues); 407 (11.9%) issues related to delay in treatment. Overall, 2516 issues (73.6%) were resolved satisfactorily, usually by explanation or apology. Only 59 issues (1.7%) resulted in a procedure or policy change. Remedial action was taken in 109 issues (3.2%) and compensation was paid to eight patients. Communication remains a significant factor in emergency department patient dissatisfaction. While patient complaints have resulted in major changes to policy and procedure, research and intervention strategies into communication problems are indicated. In the short term, focused staff training is recommended.
International Nuclear Information System (INIS)
Chiche, A.
2012-01-01
This manuscript deals with large-scale optimization problems, and more specifically with solving the electricity unit commitment problem arising at EDF. First, we focused on the augmented Lagrangian algorithm. The behavior of that algorithm on an infeasible convex quadratic optimization problem is analyzed. It is shown that the algorithm finds a point that satisfies the shifted constraints with the smallest possible shift in the sense of the Euclidean norm and that it minimizes the objective on the corresponding shifted constrained set. The convergence to such a point is realized at a global linear rate, which depends explicitly on the augmentation parameter. This suggests us a rule for determining the augmentation parameter to control the speed of convergence of the shifted constraint norm to zero. This rule has the advantage of generating bounded augmentation parameters even when the problem is infeasible. As a by-product, the algorithm computes the smallest translation in the Euclidean norm that makes the constraints feasible. Furthermore, this work provides solution methods for stochastic optimization industrial problems decomposed on a scenario tree, based on the progressive hedging algorithm introduced by [Rockafellar et Wets, 1991]. We also focus on the convergence of that algorithm. On the one hand, we offer a counter-example showing that the algorithm could diverge if its augmentation parameter is iteratively updated. On the other hand, we show how to recover the multipliers associated with the non-dualized constraints defined on the scenario tree from those associated with the corresponding constraints of the scenario subproblems. Their convergence is also analyzed for convex problems. The practical interest of theses solutions techniques is corroborated by numerical experiments performed on the electric production management problem. We apply the progressive hedging algorithm to a realistic industrial problem. More precisely, we solve the French medium
A parallel orbital-updating based plane-wave basis method for electronic structure calculations
International Nuclear Information System (INIS)
Pan, Yan; Dai, Xiaoying; Gironcoli, Stefano de; Gong, Xin-Gao; Rignanese, Gian-Marco; Zhou, Aihui
2017-01-01
Highlights: • Propose three parallel orbital-updating based plane-wave basis methods for electronic structure calculations. • These new methods can avoid the generating of large scale eigenvalue problems and then reduce the computational cost. • These new methods allow for two-level parallelization which is particularly interesting for large scale parallelization. • Numerical experiments show that these new methods are reliable and efficient for large scale calculations on modern supercomputers. - Abstract: Motivated by the recently proposed parallel orbital-updating approach in real space method , we propose a parallel orbital-updating based plane-wave basis method for electronic structure calculations, for solving the corresponding eigenvalue problems. In addition, we propose two new modified parallel orbital-updating methods. Compared to the traditional plane-wave methods, our methods allow for two-level parallelization, which is particularly interesting for large scale parallelization. Numerical experiments show that these new methods are more reliable and efficient for large scale calculations on modern supercomputers.
Plasma Physics Calculations on a Parallel Macintosh Cluster
Decyk, Viktor; Dauger, Dean; Kokelaar, Pieter
2000-03-01
We have constructed a parallel cluster consisting of 16 Apple Macintosh G3 computers running the MacOS, and achieved very good performance on numerically intensive, parallel plasma particle-in-cell simulations. A subset of the MPI message-passing library was implemented in Fortran77 and C. This library enabled us to port code, without modification, from other parallel processors to the Macintosh cluster. For large problems where message packets are large and relatively few in number, performance of 50-150 MFlops/node is possible, depending on the problem. This is fast enough that 3D calculations can be routinely done. Unlike Unix-based clusters, no special expertise in operating systems is required to build and run the cluster. Full details are available on our web site: http://exodus.physics.ucla.edu/appleseed/.
Parallel Framework for Cooperative Processes
Directory of Open Access Journals (Sweden)
Mitică Craus
2005-01-01
Full Text Available This paper describes the work of an object oriented framework designed to be used in the parallelization of a set of related algorithms. The idea behind the system we are describing is to have a re-usable framework for running several sequential algorithms in a parallel environment. The algorithms that the framework can be used with have several things in common: they have to run in cycles and the work should be possible to be split between several "processing units". The parallel framework uses the message-passing communication paradigm and is organized as a master-slave system. Two applications are presented: an Ant Colony Optimization (ACO parallel algorithm for the Travelling Salesman Problem (TSP and an Image Processing (IP parallel algorithm for the Symmetrical Neighborhood Filter (SNF. The implementations of these applications by means of the parallel framework prove to have good performances: approximatively linear speedup and low communication cost.
Xyce parallel electronic simulator : users' guide.
Energy Technology Data Exchange (ETDEWEB)
Mei, Ting; Rankin, Eric Lamont; Thornquist, Heidi K.; Santarelli, Keith R.; Fixel, Deborah A.; Coffey, Todd Stirling; Russo, Thomas V.; Schiek, Richard Louis; Warrender, Christina E.; Keiter, Eric Richard; Pawlowski, Roger Patrick
2011-05-01
This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: (1) Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). Note that this includes support for most popular parallel and serial computers; (2) Improved performance for all numerical kernels (e.g., time integrator, nonlinear and linear solvers) through state-of-the-art algorithms and novel techniques. (3) Device models which are specifically tailored to meet Sandia's needs, including some radiation-aware devices (for Sandia users only); and (4) Object-oriented code design and implementation using modern coding practices that ensure that the Xyce Parallel Electronic Simulator will be maintainable and extensible far into the future. Xyce is a parallel code in the most general sense of the phrase - a message passing parallel implementation - which allows it to run efficiently on the widest possible number of computing platforms. These include serial, shared-memory and distributed-memory parallel as well as heterogeneous platforms. Careful attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows. The development of Xyce provides a platform for computational research and development aimed specifically at the needs of the Laboratory. With Xyce, Sandia has an 'in-house' capability with which both new electrical (e.g., device model development) and algorithmic (e.g., faster time-integration methods, parallel solver algorithms) research and development can be performed. As a result, Xyce is
Hemmelmayr, Vera C.; Cordeau, Jean-François; Crainic, Teodor Gabriel
2012-01-01
In this paper, we propose an adaptive large neighborhood search heuristic for the Two-Echelon Vehicle Routing Problem (2E-VRP) and the Location Routing Problem (LRP). The 2E-VRP arises in two-level transportation systems such as those encountered in the context of city logistics. In such systems, freight arrives at a major terminal and is shipped through intermediate satellite facilities to the final customers. The LRP can be seen as a special case of the 2E-VRP in which vehicle routing is performed only at the second level. We have developed new neighborhood search operators by exploiting the structure of the two problem classes considered and have also adapted existing operators from the literature. The operators are used in a hierarchical scheme reflecting the multi-level nature of the problem. Computational experiments conducted on several sets of instances from the literature show that our algorithm outperforms existing solution methods for the 2E-VRP and achieves excellent results on the LRP. PMID:23483764
Fuller, Nathaniel J.; Licata, Nicholas A.
2018-05-01
Obtaining a detailed understanding of the physical interactions between a cell and its environment often requires information about the flow of fluid surrounding the cell. Cells must be able to effectively absorb and discard material in order to survive. Strategies for nutrient acquisition and toxin disposal, which have been evolutionarily selected for their efficacy, should reflect knowledge of the physics underlying this mass transport problem. Motivated by these considerations, in this paper we discuss the results from an undergraduate research project on the advection-diffusion equation at small Reynolds number and large Péclet number. In particular, we consider the problem of mass transport for a Stokesian spherical swimmer. We approach the problem numerically and analytically through a rescaling of the concentration boundary layer. A biophysically motivated first-passage problem for the absorption of material by the swimming cell demonstrates quantitative agreement between the numerical and analytical approaches. We conclude by discussing the connections between our results and the design of smart toxin disposal systems.
Hemmelmayr, Vera C; Cordeau, Jean-François; Crainic, Teodor Gabriel
2012-12-01
In this paper, we propose an adaptive large neighborhood search heuristic for the Two-Echelon Vehicle Routing Problem (2E-VRP) and the Location Routing Problem (LRP). The 2E-VRP arises in two-level transportation systems such as those encountered in the context of city logistics. In such systems, freight arrives at a major terminal and is shipped through intermediate satellite facilities to the final customers. The LRP can be seen as a special case of the 2E-VRP in which vehicle routing is performed only at the second level. We have developed new neighborhood search operators by exploiting the structure of the two problem classes considered and have also adapted existing operators from the literature. The operators are used in a hierarchical scheme reflecting the multi-level nature of the problem. Computational experiments conducted on several sets of instances from the literature show that our algorithm outperforms existing solution methods for the 2E-VRP and achieves excellent results on the LRP.
Betchov, R
2012-01-01
Stability of Parallel Flows provides information pertinent to hydrodynamical stability. This book explores the stability problems that occur in various fields, including electronics, mechanics, oceanography, administration, economics, as well as naval and aeronautical engineering. Organized into two parts encompassing 10 chapters, this book starts with an overview of the general equations of a two-dimensional incompressible flow. This text then explores the stability of a laminar boundary layer and presents the equation of the inviscid approximation. Other chapters present the general equation
Directory of Open Access Journals (Sweden)
James G. Worner
2017-05-01
Full Text Available James Worner is an Australian-based writer and scholar currently pursuing a PhD at the University of Technology Sydney. His research seeks to expose masculinities lost in the shadow of Australia’s Anzac hegemony while exploring new opportunities for contemporary historiography. He is the recipient of the Doctoral Scholarship in Historical Consciousness at the university’s Australian Centre of Public History and will be hosted by the University of Bologna during 2017 on a doctoral research writing scholarship. ‘Parallel Lines’ is one of a collection of stories, The Shapes of Us, exploring liminal spaces of modern life: class, gender, sexuality, race, religion and education. It looks at lives, like lines, that do not meet but which travel in proximity, simultaneously attracted and repelled. James’ short stories have been published in various journals and anthologies.
Parallel computing solution of Boltzmann neutron transport equation
International Nuclear Information System (INIS)
Ansah-Narh, T.
2010-01-01
The focus of the research was on developing parallel computing algorithm for solving Eigen-values of the Boltzmam Neutron Transport Equation (BNTE) in a slab geometry using multi-grid approach. In response to the problem of slow execution of serial computing when solving large problems, such as BNTE, the study was focused on the design of parallel computing systems which was an evolution of serial computing that used multiple processing elements simultaneously to solve complex physical and mathematical problems. Finite element method (FEM) was used for the spatial discretization scheme, while angular discretization was accomplished by expanding the angular dependence in terms of Legendre polynomials. The eigenvalues representing the multiplication factors in the BNTE were determined by the power method. MATLAB Compiler Version 4.1 (R2009a) was used to compile the MATLAB codes of BNTE. The implemented parallel algorithms were enabled with matlabpool, a Parallel Computing Toolbox function. The option UseParallel was set to 'always' and the default value of the option was 'never'. When those conditions held, the solvers computed estimated gradients in parallel. The parallel computing system was used to handle all the bottlenecks in the matrix generated from the finite element scheme and each domain of the power method generated. The parallel algorithm was implemented on a Symmetric Multi Processor (SMP) cluster machine, which had Intel 32 bit quad-core x 86 processors. Convergence rates and timings for the algorithm on the SMP cluster machine were obtained. Numerical experiments indicated the designed parallel algorithm could reach perfect speedup and had good stability and scalability. (au)
DEFF Research Database (Denmark)
Uchoa, Eduardo; Fukasawa, Ricardo; Lysgaard, Jens
2008-01-01
-polynomial time. Traditional inequalities over the arc formulation, like Capacity Cuts, are also used. Moreover, a novel feature is introduced in such kind of algorithms: powerful new cuts expressed over a very large set of variables are added, without increasing the complexity of the pricing subproblem......This paper presents a robust branch-cut-and-price algorithm for the Capacitated Minimum Spanning Tree Problem (CMST). The variables are associated to q-arbs, a structure that arises from a relaxation of the capacitated prize-collecting arborescence problem in order to make it solvable in pseudo...... or the size of the LPs that are actually solved. Computational results on benchmark instances from the OR-Library show very significant improvements over previous algorithms. Several open instances could be solved to optimality....
Energy Technology Data Exchange (ETDEWEB)
Geng, Hai-qing [State Environmental Protection Administration, Beijing (China). Appraisal Center for Environment and Engineering
2008-05-15
Due to rapid industrialization and urbanization, the number of China's large proposed coal mines has increased very quickly in recent years. 144 environmental impact assessment reports were submitted to SEPA during 2001 2006, so the problem of reconciling coal exploitation with the environmental and social impacts has become urgent in China's sustainable development strategy. Based on analysis of data on coal mine exploitation, it is pointed out that there are four main problems in the management of the coal sector: the SEA (Strategic Environmental Assessment) lags behind the practical needs; the policy is not clear; migration is neglected; and an ecological compensatory mechanism is absent. Four countermeasures are recommended: accelerating the execution of SEA; compartmentalizing typical zones for environmental management; improving the organisation of resettlement in the coal district; and installing an ecological compensatory mechanism in the coal mining industry. 13 refs., 2 figs., 1 tab.
Scheduling Parallel Jobs Using Migration and Consolidation in the Cloud
Directory of Open Access Journals (Sweden)
Xiaocheng Liu
2012-01-01
Full Text Available An increasing number of high performance computing parallel applications leverages the power of the cloud for parallel processing. How to schedule the parallel applications to improve the quality of service is the key to the successful host of parallel applications in the cloud. The large scale of the cloud makes the parallel job scheduling more complicated as even simple parallel job scheduling problem is NP-complete. In this paper, we propose a parallel job scheduling algorithm named MEASY. MEASY adopts migration and consolidation to enhance the most popular EASY scheduling algorithm. Our extensive experiments on well-known workloads show that our algorithm takes very good care of the quality of service. For two common parallel job scheduling objectives, our algorithm produces an up to 41.1% and an average of 23.1% improvement on the average response time; an up to 82.9% and an average of 69.3% improvement on the average slowdown. Our algorithm is robust even in terms that it allows inaccurate CPU usage estimation and high migration cost. Our approach involves trivial modification on EASY and requires no additional technique; it is practical and effective in the cloud environment.
Parallel computing by Monte Carlo codes MVP/GMVP
International Nuclear Information System (INIS)
Nagaya, Yasunobu; Nakagawa, Masayuki; Mori, Takamasa
2001-01-01
General-purpose Monte Carlo codes MVP/GMVP are well-vectorized and thus enable us to perform high-speed Monte Carlo calculations. In order to achieve more speedups, we parallelized the codes on the different types of parallel computing platforms or by using a standard parallelization library MPI. The platforms used for benchmark calculations are a distributed-memory vector-parallel computer Fujitsu VPP500, a distributed-memory massively parallel computer Intel paragon and a distributed-memory scalar-parallel computer Hitachi SR2201, IBM SP2. As mentioned generally, linear speedup could be obtained for large-scale problems but parallelization efficiency decreased as the batch size per a processing element(PE) was smaller. It was also found that the statistical uncertainty for assembly powers was less than 0.1% by the PWR full-core calculation with more than 10 million histories and it took about 1.5 hours by massively parallel computing. (author)
A development framework for parallel CFD applications: TRIOU project
International Nuclear Information System (INIS)
Calvin, Ch.
2003-01-01
We present in this paper the parallel structure of a thermal-hydraulic framework: Trio-U. This development platform has been designed in order to solve large 3-dimensional structured or unstructured CFD (computational fluid dynamics) problems. The code is intrinsically parallel, and an object-oriented design, UML, is used. The implementation language chosen is C++. All the parallelism management and the communication routines have been encapsulated. Parallel I/O and communication classes over standard I/O streams of C++ have been defined, which allows the developer an easy use of the different modules of the application without dealing with basic parallel process management and communications. Moreover, the encapsulation of the communication routines, guarantees the portability of the application and allows an efficient tuning of basic communication methods in order to achieve the best performances of the target architecture. The speed-up of parallel applications designed using the Trio U framework are very good since we obtained, for instance, on complex turbulent flow Large Eddy Simulation (LES) simulations an efficiency of up to 90% on 20 processors. The efficiencies obtained on direct numerical simulations of two phase flow fluids are similar since the speed-up is nearly equals to 7.5 for a 3-dimensional simulation using a one million element mesh on 8 processors. The purpose of this paper is to focus on the main concepts and their implementation that were the guidelines of the design of the parallel architecture of the code. (author)
Yao, Hui; Zhang, Chao; Li, Zhi-Jian; Nie, Yi-Hang; Niu, Peng-bin
2018-05-01
We theoretically investigate the thermoelectric properties in a tunneling-coupled parallel DQD-AB ring attached to one normal and one superconducting lead. The role of the intrinsic and extrinsic parameters in improving thermoelectric properties is discussed. The peak value of figure of merit near gap edges increases with the asymmetry parameter decreasing, particularly, when asymmetry parameter is less than 0.5, the figure of merit near gap edges rapidly rises. When the interdot coupling strengh is less than the superconducting gap the thermopower spectrum presents a single-platform structure. While when the interdot coupling strengh is larger than the gap, a double-platform structure appears in thermopower spectrum. Outside the gap the peak values of figure of merit might reach the order of 102. On the basis of optimizing internal parameters the thermoelectric conversion efficiency of the device can be further improved by appropriately matching the total magnetic flux and the flux difference between two subrings.
Exploiting Symmetry on Parallel Architectures.
Stiller, Lewis Benjamin
1995-01-01
This thesis describes techniques for the design of parallel programs that solve well-structured problems with inherent symmetry. Part I demonstrates the reduction of such problems to generalized matrix multiplication by a group-equivariant matrix. Fast techniques for this multiplication are described, including factorization, orbit decomposition, and Fourier transforms over finite groups. Our algorithms entail interaction between two symmetry groups: one arising at the software level from the problem's symmetry and the other arising at the hardware level from the processors' communication network. Part II illustrates the applicability of our symmetry -exploitation techniques by presenting a series of case studies of the design and implementation of parallel programs. First, a parallel program that solves chess endgames by factorization of an associated dihedral group-equivariant matrix is described. This code runs faster than previous serial programs, and discovered it a number of results. Second, parallel algorithms for Fourier transforms for finite groups are developed, and preliminary parallel implementations for group transforms of dihedral and of symmetric groups are described. Applications in learning, vision, pattern recognition, and statistics are proposed. Third, parallel implementations solving several computational science problems are described, including the direct n-body problem, convolutions arising from molecular biology, and some communication primitives such as broadcast and reduce. Some of our implementations ran orders of magnitude faster than previous techniques, and were used in the investigation of various physical phenomena.
Energy Technology Data Exchange (ETDEWEB)
Jafarzadeh, Hassan; Moradinasab, Nazanin; Gerami, Ali
2017-07-01
Adjusted discrete Multi-Objective Invasive Weed Optimization (DMOIWO) algorithm, which uses fuzzy dominant approach for ordering, has been proposed to solve No-wait two-stage flexible flow shop scheduling problem. Design/methodology/approach: No-wait two-stage flexible flow shop scheduling problem by considering sequence-dependent setup times and probable rework in both stations, different ready times for all jobs and rework times for both stations as well as unrelated parallel machines with regards to the simultaneous minimization of maximum job completion time and average latency functions have been investigated in a multi-objective manner. In this study, the parameter setting has been carried out using Taguchi Method based on the quality indicator for beater performance of the algorithm. Findings: The results of this algorithm have been compared with those of conventional, multi-objective algorithms to show the better performance of the proposed algorithm. The results clearly indicated the greater performance of the proposed algorithm. Originality/value: This study provides an efficient method for solving multi objective no-wait two-stage flexible flow shop scheduling problem by considering sequence-dependent setup times, probable rework in both stations, different ready times for all jobs, rework times for both stations and unrelated parallel machines which are the real constraints.
International Nuclear Information System (INIS)
Jafarzadeh, Hassan; Moradinasab, Nazanin; Gerami, Ali
2017-01-01
Adjusted discrete Multi-Objective Invasive Weed Optimization (DMOIWO) algorithm, which uses fuzzy dominant approach for ordering, has been proposed to solve No-wait two-stage flexible flow shop scheduling problem. Design/methodology/approach: No-wait two-stage flexible flow shop scheduling problem by considering sequence-dependent setup times and probable rework in both stations, different ready times for all jobs and rework times for both stations as well as unrelated parallel machines with regards to the simultaneous minimization of maximum job completion time and average latency functions have been investigated in a multi-objective manner. In this study, the parameter setting has been carried out using Taguchi Method based on the quality indicator for beater performance of the algorithm. Findings: The results of this algorithm have been compared with those of conventional, multi-objective algorithms to show the better performance of the proposed algorithm. The results clearly indicated the greater performance of the proposed algorithm. Originality/value: This study provides an efficient method for solving multi objective no-wait two-stage flexible flow shop scheduling problem by considering sequence-dependent setup times, probable rework in both stations, different ready times for all jobs, rework times for both stations and unrelated parallel machines which are the real constraints.
Bravo, Adrian J; Kelley, Michelle L; Hollis, Brittany F
2017-10-01
This study examined how work stressors were associated with sleep quality and alcohol-related problems among U.S. Navy members over the course of deployment. Participants were 101 U.S. Navy members assigned to an Arleigh Burke-class destroyer who experienced an 8-month deployment after Operational Enduring Freedom/Operation Iraqi Freedom. Approximately 6 weeks prior to deployment, 6 weeks after deployment, and 6 months reintegration, participants completed measures that assessed work stressors, sleep quality, and alcohol-related problems. A piecewise latent growth model was conducted in which the structural paths assessed if work stressors influenced sleep quality or its growth over time, and in turn if sleep quality influenced alcohol-related problems intercepts or growth over time. A significant indirect effect was found such that increases in work stressors from pre- to postdeployment predicted decreases in sleep quality, which in turn were associated with increases in alcohol-related problems from pre- to postdeployment. These effects were maintained from postdeployment through the 6-month reintegration. Findings suggest that work stressors may have important implications for sleep quality and alcohol-related problems. Positive methods of addressing stress and techniques to improve sleep quality are needed as both may be associated with alcohol-related problems among current Navy members. Copyright © 2016 John Wiley & Sons, Ltd.
Parallel computing for event reconstruction in high-energy physics
International Nuclear Information System (INIS)
Wolbers, S.
1993-01-01
Parallel computing has been recognized as a solution to large computing problems. In High Energy Physics offline event reconstruction of detector data is a very large computing problem that has been solved with parallel computing techniques. A review of the parallel programming package CPS (Cooperative Processes Software) developed and used at Fermilab for offline reconstruction of Terabytes of data requiring the delivery of hundreds of Vax-Years per experiment is given. The Fermilab UNIX farms, consisting of 180 Silicon Graphics workstations and 144 IBM RS6000 workstations, are used to provide the computing power for the experiments. Fermilab has had a long history of providing production parallel computing starting with the ACP (Advanced Computer Project) Farms in 1986. The Fermilab UNIX Farms have been in production for over 2 years with 24 hour/day service to experimental user groups. Additional tools for management, control and monitoring these large systems will be described. Possible future directions for parallel computing in High Energy Physics will be given
Parallel hierarchical radiosity rendering
Energy Technology Data Exchange (ETDEWEB)
Carter, Michael [Iowa State Univ., Ames, IA (United States)
1993-07-01
In this dissertation, the step-by-step development of a scalable parallel hierarchical radiosity renderer is documented. First, a new look is taken at the traditional radiosity equation, and a new form is presented in which the matrix of linear system coefficients is transformed into a symmetric matrix, thereby simplifying the problem and enabling a new solution technique to be applied. Next, the state-of-the-art hierarchical radiosity methods are examined for their suitability to parallel implementation, and scalability. Significant enhancements are also discovered which both improve their theoretical foundations and improve the images they generate. The resultant hierarchical radiosity algorithm is then examined for sources of parallelism, and for an architectural mapping. Several architectural mappings are discussed. A few key algorithmic changes are suggested during the process of making the algorithm parallel. Next, the performance, efficiency, and scalability of the algorithm are analyzed. The dissertation closes with a discussion of several ideas which have the potential to further enhance the hierarchical radiosity method, or provide an entirely new forum for the application of hierarchical methods.
Directory of Open Access Journals (Sweden)
Ana Cristina Nakamura Tome
2013-02-01
Full Text Available Health safety during trips is based on previous counseling, vaccination and prevention of infections, previous diseases or specific problems related to the destination. Our aim was to assess two aspects, incidence of health problems related to travel and the traveler's awareness of health safety. To this end we phone-interviewed faculty members of a large public University, randomly selected from humanities, engineering and health schools. Out of 520 attempts, we were able to contact 67 (12.9% and 46 (68.6% agreed to participate in the study. There was a large male proportion (37/44, 84.1%, mature adults mostly in their forties and fifties (32/44, 72.7%, all of them with higher education, as you would expect of faculty members. Most described themselves as being sedentary or as taking occasional exercise, with only 15.9% (7/44 taking regular exercise. Preexisting diseases were reported by 15 travelers. Most trips lasted usually one week or less. Duration of the travel was related to the destination, with (12h or longer trips being taken by 68.2% (30/44 of travelers, and the others taking shorter (3h domestic trips. Most travelling was made by air (41/44 and only 31.8% (14/44 of the trips were motivated by leisure. Field research trips were not reported. Specific health counseling previous to travel was reported only by two (4.5%. Twenty seven of them (61.4% reported updated immunization, but 11/30 reported unchecked immunizations. 30% (9/30 reported travel without any health insurance coverage. As a whole group, 6 (13.6% travelers reported at least one health problem attributed to the trip. All of them were males travelling abroad. Five presented respiratory infections, such as influenza and common cold, one neurological, one orthopedic, one social and one hypertension. There were no gender differences regarding age groups, destination, type of transport, previous health counseling, leisure travel motivation or pre-existing diseases
Tome, Ana Cristina Nakamura; Canello, Thaís Brandi; Luna, Expedito José de Albuquerque; Andrade Junior, Heitor Franco de
2013-01-01
Health safety during trips is based on previous counseling, vaccination and prevention of infections, previous diseases or specific problems related to the destination. Our aim was to assess two aspects, incidence of health problems related to travel and the traveler's awareness of health safety. To this end we phone-interviewed faculty members of a large public University, randomly selected from humanities, engineering and health schools. Out of 520 attempts, we were able to contact 67 (12.9%) and 46 (68.6%) agreed to participate in the study. There was a large male proportion (37/44, 84.1%), mature adults mostly in their forties and fifties (32/44, 72.7%), all of them with higher education, as you would expect of faculty members. Most described themselves as being sedentary or as taking occasional exercise, with only 15.9% (7/44) taking regular exercise. Preexisting diseases were reported by 15 travelers. Most trips lasted usually one week or less. Duration of the travel was related to the destination, with (12h) or longer trips being taken by 68.2% (30/44) of travelers, and the others taking shorter (3h) domestic trips. Most travelling was made by air (41/44) and only 31.8% (14/44) of the trips were motivated by leisure. Field research trips were not reported. Specific health counseling previous to travel was reported only by two (4.5%). Twenty seven of them (61.4%) reported updated immunization, but 11/30 reported unchecked immunizations. 30% (9/30) reported travel without any health insurance coverage. As a whole group, 6 (13.6%) travelers reported at least one health problem attributed to the trip. All of them were males travelling abroad. Five presented respiratory infections, such as influenza and common cold, one neurological, one orthopedic, one social and one hypertension. There were no gender differences regarding age groups, destination, type of transport, previous health counseling, leisure travel motivation or pre-existing diseases. Interestingly
An Introduction to Parallel Computation R
Indian Academy of Sciences (India)
How are they programmed? This article provides an introduction. A parallel computer is a network of processors built for ... and have been used to solve problems much faster than a single ... in parallel computer design is to select an organization which ..... The most ambitious approach to parallel computing is to develop.
Cui, Tiangang; Marzouk, Youssef; Willcox, Karen
2016-06-01
Two major bottlenecks to the solution of large-scale Bayesian inverse problems are the scaling of posterior sampling algorithms to high-dimensional parameter spaces and the computational cost of forward model evaluations. Yet incomplete or noisy data, the state variation and parameter dependence of the forward model, and correlations in the prior collectively provide useful structure that can be exploited for dimension reduction in this setting-both in the parameter space of the inverse problem and in the state space of the forward model. To this end, we show how to jointly construct low-dimensional subspaces of the parameter space and the state space in order to accelerate the Bayesian solution of the inverse problem. As a byproduct of state dimension reduction, we also show how to identify low-dimensional subspaces of the data in problems with high-dimensional observations. These subspaces enable approximation of the posterior as a product of two factors: (i) a projection of the posterior onto a low-dimensional parameter subspace, wherein the original likelihood is replaced by an approximation involving a reduced model; and (ii) the marginal prior distribution on the high-dimensional complement of the parameter subspace. We present and compare several strategies for constructing these subspaces using only a limited number of forward and adjoint model simulations. The resulting posterior approximations can rapidly be characterized using standard sampling techniques, e.g., Markov chain Monte Carlo. Two numerical examples demonstrate the accuracy and efficiency of our approach: inversion of an integral equation in atmospheric remote sensing, where the data dimension is very high; and the inference of a heterogeneous transmissivity field in a groundwater system, which involves a partial differential equation forward model with high dimensional state and parameters.
Becker, Stephen P; Jarrett, Matthew A; Luebbe, Aaron M; Garner, Annie A; Burns, G Leonard; Kofler, Michael J
2018-04-01
To (1) describe sleep problems in a large, multi-university sample of college students; (2) evaluate sex differences; and (3) examine the unique associations of mental health symptoms (i.e., anxiety, depression, attention-deficit/hyperactivity disorder inattention [ADHD-IN], ADHD hyperactivity-impulsivity [ADHD-HI]) in relation to sleep problems. 7,626 students (70% female; 81% White) ages 18-29 years (M=19.14, SD=1.42) from six universities completed measures assessing mental health symptoms and the Pittsburgh Sleep Quality Index (PSQI). A substantial minority of students endorsed sleep problems across specific sleep components. Specifically, 27% described their sleep quality as poor, 36% reported obtaining less than 7 hours of sleep per night, and 43% reported that it takes >30 minutes to fall asleep at least once per week. 62% of participants met cut-off criteria for poor sleep, though rates differed between females (64%) and males (57%). In structural regression models, both anxiety and depression symptoms were uniquely associated with disruptions in most PSQI sleep component domains. However, anxiety (but not depression) symptoms were uniquely associated with more sleep disturbances and sleep medication use, whereas depression (but not anxiety) symptoms were uniquely associated with increased daytime dysfunction. ADHD-IN symptoms were uniquely associated with poorer sleep quality and increased daytime dysfunction, whereas ADHD-HI symptoms were uniquely associated with more sleep disturbances and less daytime dysfunction. Lastly, ADHD-IN, anxiety, and depression symptoms were each independently associated with poor sleep status. This study documents a high prevalence of poor sleep among college students, some sex differences, and distinct patterns of mental health symptoms in relation to sleep problems. Copyright © 2018. Published by Elsevier Inc.
Huang, Tsung-Ming; Lin, Wen-Wei; Tian, Heng; Chen, Guan-Hua
2018-03-01
Full spectrum of a large sparse ⊤-palindromic quadratic eigenvalue problem (⊤-PQEP) is considered arguably for the first time in this article. Such a problem is posed by calculation of surface Green's functions (SGFs) of mesoscopic transistors with a tremendous non-periodic cross-section. For this problem, general purpose eigensolvers are not efficient, nor is advisable to resort to the decimation method etc. to obtain the Wiener-Hopf factorization. After reviewing some rigorous understanding of SGF calculation from the perspective of ⊤-PQEP and nonlinear matrix equation, we present our new approach to this problem. In a nutshell, the unit disk where the spectrum of interest lies is broken down adaptively into pieces small enough that they each can be locally tackled by the generalized ⊤-skew-Hamiltonian implicitly restarted shift-and-invert Arnoldi (G⊤SHIRA) algorithm with suitable shifts and other parameters, and the eigenvalues missed by this divide-and-conquer strategy can be recovered thanks to the accurate estimation provided by our newly developed scheme. Notably the novel non-equivalence deflation is proposed to avoid as much as possible duplication of nearby known eigenvalues when a new shift of G⊤SHIRA is determined. We demonstrate our new approach by calculating the SGF of a realistic nanowire whose unit cell is described by a matrix of size 4000 × 4000 at the density functional tight binding level, corresponding to a 8 × 8nm2 cross-section. We believe that quantum transport simulation of realistic nano-devices in the mesoscopic regime will greatly benefit from this work.
The numerical parallel computing of photon transport
International Nuclear Information System (INIS)
Huang Qingnan; Liang Xiaoguang; Zhang Lifa
1998-12-01
The parallel computing of photon transport is investigated, the parallel algorithm and the parallelization of programs on parallel computers both with shared memory and with distributed memory are discussed. By analyzing the inherent law of the mathematics and physics model of photon transport according to the structure feature of parallel computers, using the strategy of 'to divide and conquer', adjusting the algorithm structure of the program, dissolving the data relationship, finding parallel liable ingredients and creating large grain parallel subtasks, the sequential computing of photon transport into is efficiently transformed into parallel and vector computing. The program was run on various HP parallel computers such as the HY-1 (PVP), the Challenge (SMP) and the YH-3 (MPP) and very good parallel speedup has been gotten
Energy Technology Data Exchange (ETDEWEB)
Takemiya, Hiroshi; Ohta, Hirofumi; Honma, Ichirou
1996-03-01
The parallelization of Electro-Magnetic Cascade Monte Carlo Simulation Code, EGS4 on distributed memory scalar parallel computer: Intel Paragon XP/S15-256 is described. EGS4 has the feature that calculation time for one incident particle is quite different from each other because of the dynamic generation of secondary particles and different behavior of each particle. Granularity for parallel processing, parallel programming model and the algorithm of parallel random number generation are discussed and two kinds of method, each of which allocates particles dynamically or statically, are used for the purpose of realizing high speed parallel processing of this code. Among four problems chosen for performance evaluation, the speedup factors for three problems have been attained to nearly 100 times with 128 processor. It has been found that when both the calculation time for each incident particles and its dispersion are large, it is preferable to use dynamic particle allocation method which can average the load for each processor. And it has also been found that when they are small, it is preferable to use static particle allocation method which reduces the communication overhead. Moreover, it is pointed out that to get the result accurately, it is necessary to use double precision variables in EGS4 code. Finally, the workflow of program parallelization is analyzed and tools for program parallelization through the experience of the EGS4 parallelization are discussed. (author).
Directory of Open Access Journals (Sweden)
KIM Jong Woon
2017-01-01
In this paper, we describe a brief overview of AETIUS and provide numerical results from both AETIUS and a Monte Carlo code, MCNP5, in a deep penetration problem with small detection volumes. The results demonstrate the effectiveness and efficiency of AETIUS for such calculations.
KIM, Jong Woon; LEE, Young-Ouk
2017-09-01
As computing power gets better and better, computer codes that use a deterministic method seem to be less useful than those using the Monte Carlo method. In addition, users do not like to think about space, angles, and energy discretization for deterministic codes. However, a deterministic method is still powerful in that we can obtain a solution of the flux throughout the problem, particularly as when particles can barely penetrate, such as in a deep penetration problem with small detection volumes. Recently, a new state-of-the-art discrete-ordinates code, ATTILA, was developed and has been widely used in several applications. ATTILA provides the capabilities to solve geometrically complex 3-D transport problems by using an unstructured tetrahedral mesh. Since 2009, we have been developing our own code by benchmarking ATTILA. AETIUS is a discrete ordinates code that uses an unstructured tetrahedral mesh such as ATTILA. For pre- and post- processing, Gmsh is used to generate an unstructured tetrahedral mesh by importing a CAD file (*.step) and visualizing the calculation results of AETIUS. Using a CAD tool, the geometry can be modeled very easily. In this paper, we describe a brief overview of AETIUS and provide numerical results from both AETIUS and a Monte Carlo code, MCNP5, in a deep penetration problem with small detection volumes. The results demonstrate the effectiveness and efficiency of AETIUS for such calculations.
Performance studies of the parallel VIM code
International Nuclear Information System (INIS)
Shi, B.; Blomquist, R.N.
1996-01-01
In this paper, the authors evaluate the performance of the parallel version of the VIM Monte Carlo code on the IBM SPx at the High Performance Computing Research Facility at ANL. Three test problems with contrasting computational characteristics were used to assess effects in performance. A statistical method for estimating the inefficiencies due to load imbalance and communication is also introduced. VIM is a large scale continuous energy Monte Carlo radiation transport program and was parallelized using history partitioning, the master/worker approach, and p4 message passing library. Dynamic load balancing is accomplished when the master processor assigns chunks of histories to workers that have completed a previously assigned task, accommodating variations in the lengths of histories, processor speeds, and worker loads. At the end of each batch (generation), the fission sites and tallies are sent from each worker to the master process, contributing to the parallel inefficiency. All communications are between master and workers, and are serial. The SPx is a scalable 128-node parallel supercomputer with high-performance Omega switches of 63 microsec latency and 35 MBytes/sec bandwidth. For uniform and reproducible performance, they used only the 120 identical regular processors (IBM RS/6000) and excluded the remaining eight planet nodes, which may be loaded by other's jobs
Distributed and parallel approach for handle and perform huge datasets
Konopko, Joanna
2015-12-01
Big Data refers to the dynamic, large and disparate volumes of data comes from many different sources (tools, machines, sensors, mobile devices) uncorrelated with each others. It requires new, innovative and scalable technology to collect, host and analytically process the vast amount of data. Proper architecture of the system that perform huge data sets is needed. In this paper, the comparison of distributed and parallel system architecture is presented on the example of MapReduce (MR) Hadoop platform and parallel database platform (DBMS). This paper also analyzes the problem of performing and handling valuable information from petabytes of data. The both paradigms: MapReduce and parallel DBMS are described and compared. The hybrid architecture approach is also proposed and could be used to solve the analyzed problem of storing and processing Big Data.
Parallel computers and three-dimensional computational electromagnetics
International Nuclear Information System (INIS)
Madsen, N.K.
1994-01-01
The authors have continued to enhance their ability to use new massively parallel processing computers to solve time-domain electromagnetic problems. New vectorization techniques have improved the performance of their code DSI3D by factors of 5 to 15, depending on the computer used. New radiation boundary conditions and far-field transformations now allow the computation of radar cross-section values for complex objects. A new parallel-data extraction code has been developed that allows the extraction of data subsets from large problems, which have been run on parallel computers, for subsequent post-processing on workstations with enhanced graphics capabilities. A new charged-particle-pushing version of DSI3D is under development. Finally, DSI3D has become a focal point for several new Cooperative Research and Development Agreement activities with industrial companies such as Lockheed Advanced Development Company, Varian, Hughes Electron Dynamics Division, General Atomic, and Cray
Directory of Open Access Journals (Sweden)
Jiuping Xu
2012-01-01
Full Text Available The aim of this study is to deal with a minimum cost network flow problem (MCNFP in a large-scale construction project using a nonlinear multiobjective bilevel model with birandom variables. The main target of the upper level is to minimize both direct and transportation time costs. The target of the lower level is to minimize transportation costs. After an analysis of the birandom variables, an expectation multiobjective bilevel programming model with chance constraints is formulated to incorporate decision makers’ preferences. To solve the identified special conditions, an equivalent crisp model is proposed with an additional multiobjective bilevel particle swarm optimization (MOBLPSO developed to solve the model. The Shuibuya Hydropower Project is used as a real-world example to verify the proposed approach. Results and analysis are presented to highlight the performances of the MOBLPSO, which is very effective and efficient compared to a genetic algorithm and a simulated annealing algorithm.
Energy Technology Data Exchange (ETDEWEB)
Cobb, J.W.
1995-02-01
There is an increasing need for more accurate numerical methods for large-scale nonlinear magneto-fluid turbulence calculations. These methods should not only increase the current state of the art in terms of accuracy, but should also continue to optimize other desired properties such as simplicity, minimized computation, minimized memory requirements, and robust stability. This includes the ability to stably solve stiff problems with long time-steps. This work discusses a general methodology for deriving higher-order numerical methods. It also discusses how the selection of various choices can affect the desired properties. The explicit discussion focuses on third-order Runge-Kutta methods, including general solutions and five examples. The study investigates the linear numerical analysis of these methods, including their accuracy, general stability, and stiff stability. Additional appendices discuss linear multistep methods, discuss directions for further work, and exhibit numerical analysis results for some other commonly used lower-order methods.
Masrour, Moussa; Lkebir, Noura; Pérez-Lorente, Félix
2017-10-01
The Anza site shows large ichnological surfaces indicating the coexistence in the same area of different vertebrate footprints (dinosaur and pterosaur) and of different types (tridactyl and tetradactyl, semiplantigrade and rounded without digit marks) and the footprint variability of long trackways. This area may become a world reference in ichnology because it contains the second undebatable African site with Cretaceous pterosaur footprints - described in part I - and the first African site with Macropodosaurus footprints. In this work, problems related to long trackways are also analyzed, such as their sinuosity, the order-disorder of the variability (long-short) of the pace length and the difficulty of morphological classification of the theropod footprints due to their morphological variability.
Concurrent computation of attribute filters on shared memory parallel machines
Wilkinson, Michael H.F.; Gao, Hui; Hesselink, Wim H.; Jonker, Jan-Eppo; Meijster, Arnold
2008-01-01
Morphological attribute filters have not previously been parallelized mainly because they are both global and nonseparable. We propose a parallel algorithm that achieves efficient parallelism for a large class of attribute filters, including attribute openings, closings, thinnings, and thickenings,