large-scale parallel nonlinear: Topics by WorldWideScience.org

Sample records for large-scale parallel nonlinear

Parallel supercomputing: Advanced methods, algorithms, and software for large-scale linear and nonlinear problems

Energy Technology Data Exchange (ETDEWEB)

Carey, G.F.; Young, D.M.

1993-12-31

The program outlined here is directed to research on methods, algorithms, and software for distributed parallel supercomputers. Of particular interest are finite element methods and finite difference methods together with sparse iterative solution schemes for scientific and engineering computations of very large-scale systems. Both linear and nonlinear problems will be investigated. In the nonlinear case, applications with bifurcation to multiple solutions will be considered using continuation strategies. The parallelizable numerical methods of particular interest are a family of partitioning schemes embracing domain decomposition, element-by-element strategies, and multi-level techniques. The methods will be further developed incorporating parallel iterative solution algorithms with associated preconditioners in parallel computer software. The schemes will be implemented on distributed memory parallel architectures such as the CRAY MPP, Intel Paragon, the NCUBE3, and the Connection Machine. We will also consider other new architectures such as the Kendall-Square (KSQ) and proposed machines such as the TERA. The applications will focus on large-scale three-dimensional nonlinear flow and reservoir problems with strong convective transport contributions. These are legitimate grand challenge class computational fluid dynamics (CFD) problems of significant practical interest to DOE. The methods developed and algorithms will, however, be of wider interest.
Robust large-scale parallel nonlinear solvers for simulations.

Energy Technology Data Exchange (ETDEWEB)

Bader, Brett William; Pawlowski, Roger Patrick; Kolda, Tamara Gibson (Sandia National Laboratories, Livermore, CA)

2005-11-01

This report documents research to develop robust and efficient solution techniques for solving large-scale systems of nonlinear equations. The most widely used method for solving systems of nonlinear equations is Newton's method. While much research has been devoted to augmenting Newton-based solvers (usually with globalization techniques), little has been devoted to exploring the application of different models. Our research has been directed at evaluating techniques using different models than Newton's method: a lower order model, Broyden's method, and a higher order model, the tensor method. We have developed large-scale versions of each of these models and have demonstrated their use in important applications at Sandia. Broyden's method replaces the Jacobian with an approximation, allowing codes that cannot evaluate a Jacobian or have an inaccurate Jacobian to converge to a solution. Limited-memory methods, which have been successful in optimization, allow us to extend this approach to large-scale problems. We compare the robustness and efficiency of Newton's method, modified Newton's method, Jacobian-free Newton-Krylov method, and our limited-memory Broyden method. Comparisons are carried out for large-scale applications of fluid flow simulations and electronic circuit simulations. Results show that, in cases where the Jacobian was inaccurate or could not be computed, Broyden's method converged in some cases where Newton's method failed to converge. We identify conditions where Broyden's method can be more efficient than Newton's method. We also present modifications to a large-scale tensor method, originally proposed by Bouaricha, for greater efficiency, better robustness, and wider applicability. Tensor methods are an alternative to Newton-based methods and are based on computing a step based on a local quadratic model rather than a linear model. The advantage of Bouaricha's method is that it can use any
Imprint of non-linear effects on HI intensity mapping on large scales

Energy Technology Data Exchange (ETDEWEB)

Umeh, Obinna, E-mail: umeobinna@gmail.com [Department of Physics and Astronomy, University of the Western Cape, Cape Town 7535 (South Africa)

2017-06-01

Intensity mapping of the HI brightness temperature provides a unique way of tracing large-scale structures of the Universe up to the largest possible scales. This is achieved by using a low angular resolution radio telescopes to detect emission line from cosmic neutral Hydrogen in the post-reionization Universe. We use general relativistic perturbation theory techniques to derive for the first time the full expression for the HI brightness temperature up to third order in perturbation theory without making any plane-parallel approximation. We use this result and the renormalization prescription for biased tracers to study the impact of nonlinear effects on the power spectrum of HI brightness temperature both in real and redshift space. We show how mode coupling at nonlinear order due to nonlinear bias parameters and redshift space distortion terms modulate the power spectrum on large scales. The large scale modulation may be understood to be due to the effective bias parameter and effective shot noise.
Implicit solvers for large-scale nonlinear problems

International Nuclear Information System (INIS)

Keyes, David E; Reynolds, Daniel R; Woodward, Carol S

2006-01-01

Computational scientists are grappling with increasingly complex, multi-rate applications that couple such physical phenomena as fluid dynamics, electromagnetics, radiation transport, chemical and nuclear reactions, and wave and material propagation in inhomogeneous media. Parallel computers with large storage capacities are paving the way for high-resolution simulations of coupled problems; however, hardware improvements alone will not prove enough to enable simulations based on brute-force algorithmic approaches. To accurately capture nonlinear couplings between dynamically relevant phenomena, often while stepping over rapid adjustments to quasi-equilibria, simulation scientists are increasingly turning to implicit formulations that require a discrete nonlinear system to be solved for each time step or steady state solution. Recent advances in iterative methods have made fully implicit formulations a viable option for solution of these large-scale problems. In this paper, we overview one of the most effective iterative methods, Newton-Krylov, for nonlinear systems and point to software packages with its implementation. We illustrate the method with an example from magnetically confined plasma fusion and briefly survey other areas in which implicit methods have bestowed important advantages, such as allowing high-order temporal integration and providing a pathway to sensitivity analyses and optimization. Lastly, we overview algorithm extensions under development motivated by current SciDAC applications
Parallel Quasi Newton Algorithms for Large Scale Non Linear Unconstrained Optimization

International Nuclear Information System (INIS)

Rahman, M. A.; Basarudin, T.

1997-01-01

This paper discusses about Quasi Newton (QN) method to solve non-linear unconstrained minimization problems. One of many important of QN method is choice of matrix Hk. to be positive definite and satisfies to QN method. Our interest here is the parallel QN methods which will suite for the solution of large-scale optimization problems. The QN methods became less attractive in large-scale problems because of the storage and computational requirements. How ever, it is often the case that the Hessian is space matrix. In this paper we include the mechanism of how to reduce the Hessian update and hold the Hessian properties.One major reason of our research is that the QN method may be good in solving certain type of minimization problems, but it is efficiency degenerate when is it applied to solve other category of problems. For this reason, we use an algorithm containing several direction strategies which are processed in parallel. We shall attempt to parallelized algorithm by exploring different search directions which are generated by various QN update during the minimization process. The different line search strategies will be employed simultaneously in the process of locating the minimum along each direction.The code of algorithm will be written in Occam language 2 which is run on the transputer machine
Parallel Dynamic Analysis of a Large-Scale Water Conveyance Tunnel under Seismic Excitation Using ALE Finite-Element Method

Directory of Open Access Journals (Sweden)

Xiaoqing Wang

2016-01-01

Full Text Available Parallel analyses about the dynamic responses of a large-scale water conveyance tunnel under seismic excitation are presented in this paper. A full three-dimensional numerical model considering the water-tunnel-soil coupling is established and adopted to investigate the tunnel’s dynamic responses. The movement and sloshing of the internal water are simulated using the multi-material Arbitrary Lagrangian Eulerian (ALE method. Nonlinear fluid–structure interaction (FSI between tunnel and inner water is treated by using the penalty method. Nonlinear soil-structure interaction (SSI between soil and tunnel is dealt with by using the surface to surface contact algorithm. To overcome computing power limitations and to deal with such a large-scale calculation, a parallel algorithm based on the modified recursive coordinate bisection (MRCB considering the balance of SSI and FSI loads is proposed and used. The whole simulation is accomplished on Dawning 5000 A using the proposed MRCB based parallel algorithm optimized to run on supercomputers. The simulation model and the proposed approaches are validated by comparison with the added mass method. Dynamic responses of the tunnel are analyzed and the parallelism is discussed. Besides, factors affecting the dynamic responses are investigated. Better speedup and parallel efficiency show the scalability of the parallel method and the analysis results can be used to aid in the design of water conveyance tunnels.
An inertia-free filter line-search algorithm for large-scale nonlinear programming

Energy Technology Data Exchange (ETDEWEB)

Chiang, Nai-Yuan; Zavala, Victor M.

2016-02-15

We present a filter line-search algorithm that does not require inertia information of the linear system. This feature enables the use of a wide range of linear algebra strategies and libraries, which is essential to tackle large-scale problems on modern computing architectures. The proposed approach performs curvature tests along the search step to detect negative curvature and to trigger convexification. We prove that the approach is globally convergent and we implement the approach within a parallel interior-point framework to solve large-scale and highly nonlinear problems. Our numerical tests demonstrate that the inertia-free approach is as efficient as inertia detection via symmetric indefinite factorizations. We also demonstrate that the inertia-free approach can lead to reductions in solution time because it reduces the amount of convexification needed.
Parameter estimation in large-scale systems biology models: a parallel and self-adaptive cooperative strategy.

Science.gov (United States)

Penas, David R; González, Patricia; Egea, Jose A; Doallo, Ramón; Banga, Julio R

2017-01-21

The development of large-scale kinetic models is one of the current key issues in computational systems biology and bioinformatics. Here we consider the problem of parameter estimation in nonlinear dynamic models. Global optimization methods can be used to solve this type of problems but the associated computational cost is very large. Moreover, many of these methods need the tuning of a number of adjustable search parameters, requiring a number of initial exploratory runs and therefore further increasing the computation times. Here we present a novel parallel method, self-adaptive cooperative enhanced scatter search (saCeSS), to accelerate the solution of this class of problems. The method is based on the scatter search optimization metaheuristic and incorporates several key new mechanisms: (i) asynchronous cooperation between parallel processes, (ii) coarse and fine-grained parallelism, and (iii) self-tuning strategies. The performance and robustness of saCeSS is illustrated by solving a set of challenging parameter estimation problems, including medium and large-scale kinetic models of the bacterium E. coli, bakerés yeast S. cerevisiae, the vinegar fly D. melanogaster, Chinese Hamster Ovary cells, and a generic signal transduction network. The results consistently show that saCeSS is a robust and efficient method, allowing very significant reduction of computation times with respect to several previous state of the art methods (from days to minutes, in several cases) even when only a small number of processors is used. The new parallel cooperative method presented here allows the solution of medium and large scale parameter estimation problems in reasonable computation times and with small hardware requirements. Further, the method includes self-tuning mechanisms which facilitate its use by non-experts. We believe that this new method can play a key role in the development of large-scale and even whole-cell dynamic models.
Parallel clustering algorithm for large-scale biological data sets.

Science.gov (United States)

Wang, Minchao; Zhang, Wu; Ding, Wang; Dai, Dongbo; Zhang, Huiran; Xie, Hao; Chen, Luonan; Guo, Yike; Xie, Jiang

2014-01-01

Recent explosion of biological data brings a great challenge for the traditional clustering algorithms. With increasing scale of data sets, much larger memory and longer runtime are required for the cluster identification problems. The affinity propagation algorithm outperforms many other classical clustering algorithms and is widely applied into the biological researches. However, the time and space complexity become a great bottleneck when handling the large-scale data sets. Moreover, the similarity matrix, whose constructing procedure takes long runtime, is required before running the affinity propagation algorithm, since the algorithm clusters data sets based on the similarities between data pairs. Two types of parallel architectures are proposed in this paper to accelerate the similarity matrix constructing procedure and the affinity propagation algorithm. The memory-shared architecture is used to construct the similarity matrix, and the distributed system is taken for the affinity propagation algorithm, because of its large memory size and great computing capacity. An appropriate way of data partition and reduction is designed in our method, in order to minimize the global communication cost among processes. A speedup of 100 is gained with 128 cores. The runtime is reduced from serval hours to a few seconds, which indicates that parallel algorithm is capable of handling large-scale data sets effectively. The parallel affinity propagation also achieves a good performance when clustering large-scale gene data (microarray) and detecting families in large protein superfamilies.
On Modeling Large-Scale Multi-Agent Systems with Parallel, Sequential and Genuinely Asynchronous Cellular Automata

International Nuclear Information System (INIS)

Tosic, P.T.

2011-01-01

We study certain types of Cellular Automata (CA) viewed as an abstraction of large-scale Multi-Agent Systems (MAS). We argue that the classical CA model needs to be modified in several important respects, in order to become a relevant and sufficiently general model for the large-scale MAS, and so that thus generalized model can capture many important MAS properties at the level of agent ensembles and their long-term collective behavior patterns. We specifically focus on the issue of inter-agent communication in CA, and propose sequential cellular automata (SCA) as the first step, and genuinely Asynchronous Cellular Automata (ACA) as the ultimate deterministic CA-based abstract models for large-scale MAS made of simple reactive agents. We first formulate deterministic and nondeterministic versions of sequential CA, and then summarize some interesting configuration space properties (i.e., possible behaviors) of a restricted class of sequential CA. In particular, we compare and contrast those properties of sequential CA with the corresponding properties of the classical (that is, parallel and perfectly synchronous) CA with the same restricted class of update rules. We analytically demonstrate failure of the studied sequential CA models to simulate all possible behaviors of perfectly synchronous parallel CA, even for a very restricted class of non-linear totalistic node update rules. The lesson learned is that the interleaving semantics of concurrency, when applied to sequential CA, is not refined enough to adequately capture the perfect synchrony of parallel CA updates. Last but not least, we outline what would be an appropriate CA-like abstraction for large-scale distributed computing insofar as the inter-agent communication model is concerned, and in that context we propose genuinely asynchronous CA. (author)
Methods for Large-Scale Nonlinear Optimization.

Science.gov (United States)

1980-05-01

STANFORD, CALIFORNIA 94305 METHODS FOR LARGE-SCALE NONLINEAR OPTIMIZATION by Philip E. Gill, Waiter Murray, I Michael A. Saunden, and Masgaret H. Wright...typical iteration can be partitioned so that where B is an m X m basise matrix. This partition effectively divides the vari- ables into three classes... attention is given to the standard of the coding or the documentation. A much better way of obtaining mathematical software is from a software library
Solving Large Scale Nonlinear Eigenvalue Problem in Next-Generation Accelerator Design

Energy Technology Data Exchange (ETDEWEB)

Liao, Ben-Shan; Bai, Zhaojun; /UC, Davis; Lee, Lie-Quan; Ko, Kwok; /SLAC

2006-09-28

A number of numerical methods, including inverse iteration, method of successive linear problem and nonlinear Arnoldi algorithm, are studied in this paper to solve a large scale nonlinear eigenvalue problem arising from finite element analysis of resonant frequencies and external Q{sub e} values of a waveguide loaded cavity in the next-generation accelerator design. They present a nonlinear Rayleigh-Ritz iterative projection algorithm, NRRIT in short and demonstrate that it is the most promising approach for a model scale cavity design. The NRRIT algorithm is an extension of the nonlinear Arnoldi algorithm due to Voss. Computational challenges of solving such a nonlinear eigenvalue problem for a full scale cavity design are outlined.
A new asynchronous parallel algorithm for inferring large-scale gene regulatory networks.

Directory of Open Access Journals (Sweden)

Xiangyun Xiao

Full Text Available The reconstruction of gene regulatory networks (GRNs from high-throughput experimental data has been considered one of the most important issues in systems biology research. With the development of high-throughput technology and the complexity of biological problems, we need to reconstruct GRNs that contain thousands of genes. However, when many existing algorithms are used to handle these large-scale problems, they will encounter two important issues: low accuracy and high computational cost. To overcome these difficulties, the main goal of this study is to design an effective parallel algorithm to infer large-scale GRNs based on high-performance parallel computing environments. In this study, we proposed a novel asynchronous parallel framework to improve the accuracy and lower the time complexity of large-scale GRN inference by combining splitting technology and ordinary differential equation (ODE-based optimization. The presented algorithm uses the sparsity and modularity of GRNs to split whole large-scale GRNs into many small-scale modular subnetworks. Through the ODE-based optimization of all subnetworks in parallel and their asynchronous communications, we can easily obtain the parameters of the whole network. To test the performance of the proposed approach, we used well-known benchmark datasets from Dialogue for Reverse Engineering Assessments and Methods challenge (DREAM, experimentally determined GRN of Escherichia coli and one published dataset that contains more than 10 thousand genes to compare the proposed approach with several popular algorithms on the same high-performance computing environments in terms of both accuracy and time complexity. The numerical results demonstrate that our parallel algorithm exhibits obvious superiority in inferring large-scale GRNs.
A new asynchronous parallel algorithm for inferring large-scale gene regulatory networks.

Science.gov (United States)

Xiao, Xiangyun; Zhang, Wei; Zou, Xiufen

2015-01-01

The reconstruction of gene regulatory networks (GRNs) from high-throughput experimental data has been considered one of the most important issues in systems biology research. With the development of high-throughput technology and the complexity of biological problems, we need to reconstruct GRNs that contain thousands of genes. However, when many existing algorithms are used to handle these large-scale problems, they will encounter two important issues: low accuracy and high computational cost. To overcome these difficulties, the main goal of this study is to design an effective parallel algorithm to infer large-scale GRNs based on high-performance parallel computing environments. In this study, we proposed a novel asynchronous parallel framework to improve the accuracy and lower the time complexity of large-scale GRN inference by combining splitting technology and ordinary differential equation (ODE)-based optimization. The presented algorithm uses the sparsity and modularity of GRNs to split whole large-scale GRNs into many small-scale modular subnetworks. Through the ODE-based optimization of all subnetworks in parallel and their asynchronous communications, we can easily obtain the parameters of the whole network. To test the performance of the proposed approach, we used well-known benchmark datasets from Dialogue for Reverse Engineering Assessments and Methods challenge (DREAM), experimentally determined GRN of Escherichia coli and one published dataset that contains more than 10 thousand genes to compare the proposed approach with several popular algorithms on the same high-performance computing environments in terms of both accuracy and time complexity. The numerical results demonstrate that our parallel algorithm exhibits obvious superiority in inferring large-scale GRNs.
Hierarchical approach to optimization of parallel matrix multiplication on large-scale platforms

KAUST Repository

Hasanov, Khalid

2014-03-04

© 2014, Springer Science+Business Media New York. Many state-of-the-art parallel algorithms, which are widely used in scientific applications executed on high-end computing systems, were designed in the twentieth century with relatively small-scale parallelism in mind. Indeed, while in 1990s a system with few hundred cores was considered a powerful supercomputer, modern top supercomputers have millions of cores. In this paper, we present a hierarchical approach to optimization of message-passing parallel algorithms for execution on large-scale distributed-memory systems. The idea is to reduce the communication cost by introducing hierarchy and hence more parallelism in the communication scheme. We apply this approach to SUMMA, the state-of-the-art parallel algorithm for matrix–matrix multiplication, and demonstrate both theoretically and experimentally that the modified Hierarchical SUMMA significantly improves the communication cost and the overall performance on large-scale platforms.
Fast Simulation of Large-Scale Floods Based on GPU Parallel Computing

Directory of Open Access Journals (Sweden)

Qiang Liu

2018-05-01

Full Text Available Computing speed is a significant issue of large-scale flood simulations for real-time response to disaster prevention and mitigation. Even today, most of the large-scale flood simulations are generally run on supercomputers due to the massive amounts of data and computations necessary. In this work, a two-dimensional shallow water model based on an unstructured Godunov-type finite volume scheme was proposed for flood simulation. To realize a fast simulation of large-scale floods on a personal computer, a Graphics Processing Unit (GPU-based, high-performance computing method using the OpenACC application was adopted to parallelize the shallow water model. An unstructured data management method was presented to control the data transportation between the GPU and CPU (Central Processing Unit with minimum overhead, and then both computation and data were offloaded from the CPU to the GPU, which exploited the computational capability of the GPU as much as possible. The parallel model was validated using various benchmarks and real-world case studies. The results demonstrate that speed-ups of up to one order of magnitude can be achieved in comparison with the serial model. The proposed parallel model provides a fast and reliable tool with which to quickly assess flood hazards in large-scale areas and, thus, has a bright application prospect for dynamic inundation risk identification and disaster assessment.
Parallel Optimization of Polynomials for Large-scale Problems in Stability and Control

Science.gov (United States)

Kamyar, Reza

In this thesis, we focus on some of the NP-hard problems in control theory. Thanks to the converse Lyapunov theory, these problems can often be modeled as optimization over polynomials. To avoid the problem of intractability, we establish a trade off between accuracy and complexity. In particular, we develop a sequence of tractable optimization problems --- in the form of Linear Programs (LPs) and/or Semi-Definite Programs (SDPs) --- whose solutions converge to the exact solution of the NP-hard problem. However, the computational and memory complexity of these LPs and SDPs grow exponentially with the progress of the sequence - meaning that improving the accuracy of the solutions requires solving SDPs with tens of thousands of decision variables and constraints. Setting up and solving such problems is a significant challenge. The existing optimization algorithms and software are only designed to use desktop computers or small cluster computers --- machines which do not have sufficient memory for solving such large SDPs. Moreover, the speed-up of these algorithms does not scale beyond dozens of processors. This in fact is the reason we seek parallel algorithms for setting-up and solving large SDPs on large cluster- and/or super-computers. We propose parallel algorithms for stability analysis of two classes of systems: 1) Linear systems with a large number of uncertain parameters; 2) Nonlinear systems defined by polynomial vector fields. First, we develop a distributed parallel algorithm which applies Polya's and/or Handelman's theorems to some variants of parameter-dependent Lyapunov inequalities with parameters defined over the standard simplex. The result is a sequence of SDPs which possess a block-diagonal structure. We then develop a parallel SDP solver which exploits this structure in order to map the computation, memory and communication to a distributed parallel environment. Numerical tests on a supercomputer demonstrate the ability of the algorithm to
Visual analysis of inter-process communication for large-scale parallel computing.

Science.gov (United States)

Muelder, Chris; Gygi, Francois; Ma, Kwan-Liu

2009-01-01

In serial computation, program profiling is often helpful for optimization of key sections of code. When moving to parallel computation, not only does the code execution need to be considered but also communication between the different processes which can induce delays that are detrimental to performance. As the number of processes increases, so does the impact of the communication delays on performance. For large-scale parallel applications, it is critical to understand how the communication impacts performance in order to make the code more efficient. There are several tools available for visualizing program execution and communications on parallel systems. These tools generally provide either views which statistically summarize the entire program execution or process-centric views. However, process-centric visualizations do not scale well as the number of processes gets very large. In particular, the most common representation of parallel processes is a Gantt char t with a row for each process. As the number of processes increases, these charts can become difficult to work with and can even exceed screen resolution. We propose a new visualization approach that affords more scalability and then demonstrate it on systems running with up to 16,384 processes.
Large-Scale Parallel Finite Element Analysis of the Stress Singular Problems

International Nuclear Information System (INIS)

Noriyuki Kushida; Hiroshi Okuda; Genki Yagawa

2002-01-01

In this paper, the convergence behavior of large-scale parallel finite element method for the stress singular problems was investigated. The convergence behavior of iterative solvers depends on the efficiency of the pre-conditioners. However, efficiency of pre-conditioners may be influenced by the domain decomposition that is necessary for parallel FEM. In this study the following results were obtained: Conjugate gradient method without preconditioning and the diagonal scaling preconditioned conjugate gradient method were not influenced by the domain decomposition as expected. symmetric successive over relaxation method preconditioned conjugate gradient method converged 6% faster as maximum if the stress singular area was contained in one sub-domain. (authors)
Application of parallel computing techniques to a large-scale reservoir simulation

International Nuclear Information System (INIS)

Zhang, Keni; Wu, Yu-Shu; Ding, Chris; Pruess, Karsten

2001-01-01

Even with the continual advances made in both computational algorithms and computer hardware used in reservoir modeling studies, large-scale simulation of fluid and heat flow in heterogeneous reservoirs remains a challenge. The problem commonly arises from intensive computational requirement for detailed modeling investigations of real-world reservoirs. This paper presents the application of a massive parallel-computing version of the TOUGH2 code developed for performing large-scale field simulations. As an application example, the parallelized TOUGH2 code is applied to develop a three-dimensional unsaturated-zone numerical model simulating flow of moisture, gas, and heat in the unsaturated zone of Yucca Mountain, Nevada, a potential repository for high-level radioactive waste. The modeling approach employs refined spatial discretization to represent the heterogeneous fractured tuffs of the system, using more than a million 3-D gridblocks. The problem of two-phase flow and heat transfer within the model domain leads to a total of 3,226,566 linear equations to be solved per Newton iteration. The simulation is conducted on a Cray T3E-900, a distributed-memory massively parallel computer. Simulation results indicate that the parallel computing technique, as implemented in the TOUGH2 code, is very efficient. The reliability and accuracy of the model results have been demonstrated by comparing them to those of small-scale (coarse-grid) models. These comparisons show that simulation results obtained with the refined grid provide more detailed predictions of the future flow conditions at the site, aiding in the assessment of proposed repository performance

A Dual Super-Element Domain Decomposition Approach for Parallel Nonlinear Finite Element Analysis

Science.gov (United States)

Jokhio, G. A.; Izzuddin, B. A.

2015-05-01

This article presents a new domain decomposition method for nonlinear finite element analysis introducing the concept of dual partition super-elements. The method extends ideas from the displacement frame method and is ideally suited for parallel nonlinear static/dynamic analysis of structural systems. In the new method, domain decomposition is realized by replacing one or more subdomains in a "parent system," each with a placeholder super-element, where the subdomains are processed separately as "child partitions," each wrapped by a dual super-element along the partition boundary. The analysis of the overall system, including the satisfaction of equilibrium and compatibility at all partition boundaries, is realized through direct communication between all pairs of placeholder and dual super-elements. The proposed method has particular advantages for matrix solution methods based on the frontal scheme, and can be readily implemented for existing finite element analysis programs to achieve parallelization on distributed memory systems with minimal intervention, thus overcoming memory bottlenecks typically faced in the analysis of large-scale problems. Several examples are presented in this article which demonstrate the computational benefits of the proposed parallel domain decomposition approach and its applicability to the nonlinear structural analysis of realistic structural systems.
Parallelizing Gene Expression Programming Algorithm in Enabling Large-Scale Classification

Directory of Open Access Journals (Sweden)

Lixiong Xu

2017-01-01

Full Text Available As one of the most effective function mining algorithms, Gene Expression Programming (GEP algorithm has been widely used in classification, pattern recognition, prediction, and other research fields. Based on the self-evolution, GEP is able to mine an optimal function for dealing with further complicated tasks. However, in big data researches, GEP encounters low efficiency issue due to its long time mining processes. To improve the efficiency of GEP in big data researches especially for processing large-scale classification tasks, this paper presents a parallelized GEP algorithm using MapReduce computing model. The experimental results show that the presented algorithm is scalable and efficient for processing large-scale classification tasks.
Large-Scale, Parallel, Multi-Sensor Data Fusion in the Cloud

Science.gov (United States)

Wilson, B. D.; Manipon, G.; Hua, H.

2012-12-01

NASA's Earth Observing System (EOS) is an ambitious facility for studying global climate change. The mandate now is to combine measurements from the instruments on the "A-Train" platforms (AIRS, AMSR-E, MODIS, MISR, MLS, and CloudSat) and other Earth probes to enable large-scale studies of climate change over periods of years to decades. However, moving from predominantly single-instrument studies to a multi-sensor, measurement-based model for long-duration analysis of important climate variables presents serious challenges for large-scale data mining and data fusion. For example, one might want to compare temperature and water vapor retrievals from one instrument (AIRS) to another instrument (MODIS), and to a model (ECMWF), stratify the comparisons using a classification of the "cloud scenes" from CloudSat, and repeat the entire analysis over years of AIRS data. To perform such an analysis, one must discover & access multiple datasets from remote sites, find the space/time "matchups" between instruments swaths and model grids, understand the quality flags and uncertainties for retrieved physical variables, assemble merged datasets, and compute fused products for further scientific and statistical analysis. To efficiently assemble such decade-scale datasets in a timely manner, we are utilizing Elastic Computing in the Cloud and parallel map/reduce-based algorithms. "SciReduce" is a Hadoop-like parallel analysis system, programmed in parallel python, that is designed from the ground up for Earth science. SciReduce executes inside VMWare images and scales to any number of nodes in the Cloud. Unlike Hadoop, in which simple tuples (keys & values) are passed between the map and reduce functions, SciReduce operates on bundles of named numeric arrays, which can be passed in memory or serialized to disk in netCDF4 or HDF5. Thus, SciReduce uses the native datatypes (geolocated grids, swaths, and points) that geo-scientists are familiar with. We are deploying within Sci
A quasi-Newton algorithm for large-scale nonlinear equations

Directory of Open Access Journals (Sweden)

Linghua Huang

2017-02-01

Full Text Available Abstract In this paper, the algorithm for large-scale nonlinear equations is designed by the following steps: (i a conjugate gradient (CG algorithm is designed as a sub-algorithm to obtain the initial points of the main algorithm, where the sub-algorithm’s initial point does not have any restrictions; (ii a quasi-Newton algorithm with the initial points given by sub-algorithm is defined as main algorithm, where a new nonmonotone line search technique is presented to get the step length α k $\\alpha_{k}$ . The given nonmonotone line search technique can avoid computing the Jacobian matrix. The global convergence and the 1 + q $1+q$ -order convergent rate of the main algorithm are established under suitable conditions. Numerical results show that the proposed method is competitive with a similar method for large-scale problems.
Parallel Framework for Dimensionality Reduction of Large-Scale Datasets

Directory of Open Access Journals (Sweden)

Sai Kiranmayee Samudrala

2015-01-01

Full Text Available Dimensionality reduction refers to a set of mathematical techniques used to reduce complexity of the original high-dimensional data, while preserving its selected properties. Improvements in simulation strategies and experimental data collection methods are resulting in a deluge of heterogeneous and high-dimensional data, which often makes dimensionality reduction the only viable way to gain qualitative and quantitative understanding of the data. However, existing dimensionality reduction software often does not scale to datasets arising in real-life applications, which may consist of thousands of points with millions of dimensions. In this paper, we propose a parallel framework for dimensionality reduction of large-scale data. We identify key components underlying the spectral dimensionality reduction techniques, and propose their efficient parallel implementation. We show that the resulting framework can be used to process datasets consisting of millions of points when executed on a 16,000-core cluster, which is beyond the reach of currently available methods. To further demonstrate applicability of our framework we perform dimensionality reduction of 75,000 images representing morphology evolution during manufacturing of organic solar cells in order to identify how processing parameters affect morphology evolution.
Streaming Parallel GPU Acceleration of Large-Scale filter-based Spiking Neural Networks

NARCIS (Netherlands)

L.P. Slazynski (Leszek); S.M. Bohte (Sander)

2012-01-01

htmlabstractThe arrival of graphics processing (GPU) cards suitable for massively parallel computing promises a↵ordable large-scale neural network simulation previously only available at supercomputing facil- ities. While the raw numbers suggest that GPUs may outperform CPUs by at least an order of
Random number generators for large-scale parallel Monte Carlo simulations on FPGA

Science.gov (United States)

Lin, Y.; Wang, F.; Liu, B.

2018-05-01

Through parallelization, field programmable gate array (FPGA) can achieve unprecedented speeds in large-scale parallel Monte Carlo (LPMC) simulations. FPGA presents both new constraints and new opportunities for the implementations of random number generators (RNGs), which are key elements of any Monte Carlo (MC) simulation system. Using empirical and application based tests, this study evaluates all of the four RNGs used in previous FPGA based MC studies and newly proposed FPGA implementations for two well-known high-quality RNGs that are suitable for LPMC studies on FPGA. One of the newly proposed FPGA implementations: a parallel version of additive lagged Fibonacci generator (Parallel ALFG) is found to be the best among the evaluated RNGs in fulfilling the needs of LPMC simulations on FPGA.
Exploiting multi-scale parallelism for large scale numerical modelling of laser wakefield accelerators

International Nuclear Information System (INIS)

Fonseca, R A; Vieira, J; Silva, L O; Fiuza, F; Davidson, A; Tsung, F S; Mori, W B

2013-01-01

A new generation of laser wakefield accelerators (LWFA), supported by the extreme accelerating fields generated in the interaction of PW-Class lasers and underdense targets, promises the production of high quality electron beams in short distances for multiple applications. Achieving this goal will rely heavily on numerical modelling to further understand the underlying physics and identify optimal regimes, but large scale modelling of these scenarios is computationally heavy and requires the efficient use of state-of-the-art petascale supercomputing systems. We discuss the main difficulties involved in running these simulations and the new developments implemented in the OSIRIS framework to address these issues, ranging from multi-dimensional dynamic load balancing and hybrid distributed/shared memory parallelism to the vectorization of the PIC algorithm. We present the results of the OASCR Joule Metric program on the issue of large scale modelling of LWFA, demonstrating speedups of over 1 order of magnitude on the same hardware. Finally, scalability to over ∼10 6 cores and sustained performance over ∼2 P Flops is demonstrated, opening the way for large scale modelling of LWFA scenarios. (paper)
Bonus algorithm for large scale stochastic nonlinear programming problems

CERN Document Server

Diwekar, Urmila

2015-01-01

This book presents the details of the BONUS algorithm and its real world applications in areas like sensor placement in large scale drinking water networks, sensor placement in advanced power systems, water management in power systems, and capacity expansion of energy systems. A generalized method for stochastic nonlinear programming based on a sampling based approach for uncertainty analysis and statistical reweighting to obtain probability information is demonstrated in this book. Stochastic optimization problems are difficult to solve since they involve dealing with optimization and uncertainty loops. There are two fundamental approaches used to solve such problems. The first being the decomposition techniques and the second method identifies problem specific structures and transforms the problem into a deterministic nonlinear programming problem. These techniques have significant limitations on either the objective function type or the underlying distributions for the uncertain variables. Moreover, these ...
Three-point phase correlations: A new measure of non-linear large-scale structure

CERN Document Server

Wolstenhulme, Richard; Obreschkow, Danail

2015-01-01

We derive an analytical expression for a novel large-scale structure observable: the line correlation function. The line correlation function, which is constructed from the three-point correlation function of the phase of the density field, is a robust statistical measure allowing the extraction of information in the non-linear and non-Gaussian regime. We show that, in perturbation theory, the line correlation is sensitive to the coupling kernel F_2, which governs the non-linear gravitational evolution of the density field. We compare our analytical expression with results from numerical simulations and find a very good agreement for separations r>20 Mpc/h. Fitting formulae for the power spectrum and the non-linear coupling kernel at small scales allow us to extend our prediction into the strongly non-linear regime. We discuss the advantages of the line correlation relative to standard statistical measures like the bispectrum. Unlike the latter, the line correlation is independent of the linear bias. Furtherm...
Decomposition and parallelization strategies for solving large-scale MDO problems

Energy Technology Data Exchange (ETDEWEB)

Grauer, M.; Eschenauer, H.A. [Research Center for Multidisciplinary Analyses and Applied Structural Optimization, FOMAAS, Univ. of Siegen (Germany)

2007-07-01

During previous years, structural optimization has been recognized as a useful tool within the discriptiones of engineering and economics. However, the optimization of large-scale systems or structures is impeded by an immense solution effort. This was the reason to start a joint research and development (R and D) project between the Institute of Mechanics and Control Engineering and the Information and Decision Sciences Institute within the Research Center for Multidisciplinary Analyses and Applied Structural Optimization (FOMAAS) on cluster computing for parallel and distributed solution of multidisciplinary optimization (MDO) problems based on the OpTiX-Workbench. Here the focus of attention will be put on coarsegrained parallelization and its implementation on clusters of workstations. A further point of emphasis was laid on the development of a parallel decomposition strategy called PARDEC, for the solution of very complex optimization problems which cannot be solved efficiently by sequential integrated optimization. The use of the OptiX-Workbench together with the FEM ground water simulation system FEFLOW is shown for a special water management problem. (orig.)
Large-scale parallel genome assembler over cloud computing environment.

Science.gov (United States)

Das, Arghya Kusum; Koppa, Praveen Kumar; Goswami, Sayan; Platania, Richard; Park, Seung-Jong

2017-06-01

The size of high throughput DNA sequencing data has already reached the terabyte scale. To manage this huge volume of data, many downstream sequencing applications started using locality-based computing over different cloud infrastructures to take advantage of elastic (pay as you go) resources at a lower cost. However, the locality-based programming model (e.g. MapReduce) is relatively new. Consequently, developing scalable data-intensive bioinformatics applications using this model and understanding the hardware environment that these applications require for good performance, both require further research. In this paper, we present a de Bruijn graph oriented Parallel Giraph-based Genome Assembler (GiGA), as well as the hardware platform required for its optimal performance. GiGA uses the power of Hadoop (MapReduce) and Giraph (large-scale graph analysis) to achieve high scalability over hundreds of compute nodes by collocating the computation and data. GiGA achieves significantly higher scalability with competitive assembly quality compared to contemporary parallel assemblers (e.g. ABySS and Contrail) over traditional HPC cluster. Moreover, we show that the performance of GiGA is significantly improved by using an SSD-based private cloud infrastructure over traditional HPC cluster. We observe that the performance of GiGA on 256 cores of this SSD-based cloud infrastructure closely matches that of 512 cores of traditional HPC cluster.
THREE-POINT PHASE CORRELATIONS: A NEW MEASURE OF NONLINEAR LARGE-SCALE STRUCTURE

Energy Technology Data Exchange (ETDEWEB)

Wolstenhulme, Richard; Bonvin, Camille [Kavli Institute for Cosmology Cambridge and Institute of Astronomy, Madingley Road, Cambridge CB3 OHA (United Kingdom); Obreschkow, Danail [International Centre for Radio Astronomy Research (ICRAR), M468, University of Western Australia, 35 Stirling Hwy, Crawley, WA 6009 (Australia)

2015-05-10

We derive an analytical expression for a novel large-scale structure observable: the line correlation function. The line correlation function, which is constructed from the three-point correlation function of the phase of the density field, is a robust statistical measure allowing the extraction of information in the nonlinear and non-Gaussian regime. We show that, in perturbation theory, the line correlation is sensitive to the coupling kernel F{sub 2}, which governs the nonlinear gravitational evolution of the density field. We compare our analytical expression with results from numerical simulations and find a 1σ agreement for separations r ≳ 30 h{sup −1} Mpc. Fitting formulae for the power spectrum and the nonlinear coupling kernel at small scales allow us to extend our prediction into the strongly nonlinear regime, where we find a 1σ agreement with the simulations for r ≳ 2 h{sup −1} Mpc. We discuss the advantages of the line correlation relative to standard statistical measures like the bispectrum. Unlike the latter, the line correlation is independent of the bias, in the regime where the bias is local and linear. Furthermore, the variance of the line correlation is independent of the Gaussian variance on the modulus of the density field. This suggests that the line correlation can probe more precisely the nonlinear regime of gravity, with less contamination from the power spectrum variance.
A family of conjugate gradient methods for large-scale nonlinear equations.

Science.gov (United States)

Feng, Dexiang; Sun, Min; Wang, Xueyong

2017-01-01

In this paper, we present a family of conjugate gradient projection methods for solving large-scale nonlinear equations. At each iteration, it needs low storage and the subproblem can be easily solved. Compared with the existing solution methods for solving the problem, its global convergence is established without the restriction of the Lipschitz continuity on the underlying mapping. Preliminary numerical results are reported to show the efficiency of the proposed method.
Implementation of highly parallel and large scale GW calculations within the OpenAtom software

Science.gov (United States)

Ismail-Beigi, Sohrab

The need to describe electronic excitations with better accuracy than provided by band structures produced by Density Functional Theory (DFT) has been a long-term enterprise for the computational condensed matter and materials theory communities. In some cases, appropriate theoretical frameworks have existed for some time but have been difficult to apply widely due to computational cost. For example, the GW approximation incorporates a great deal of important non-local and dynamical electronic interaction effects but has been too computationally expensive for routine use in large materials simulations. OpenAtom is an open source massively parallel ab initiodensity functional software package based on plane waves and pseudopotentials (http://charm.cs.uiuc.edu/OpenAtom/) that takes advantage of the Charm + + parallel framework. At present, it is developed via a three-way collaboration, funded by an NSF SI2-SSI grant (ACI-1339804), between Yale (Ismail-Beigi), IBM T. J. Watson (Glenn Martyna) and the University of Illinois at Urbana Champaign (Laxmikant Kale). We will describe the project and our current approach towards implementing large scale GW calculations with OpenAtom. Potential applications of large scale parallel GW software for problems involving electronic excitations in semiconductor and/or metal oxide systems will be also be pointed out.
Parallel Motion Simulation of Large-Scale Real-Time Crowd in a Hierarchical Environmental Model

Directory of Open Access Journals (Sweden)

Xin Wang

2012-01-01

Full Text Available This paper presents a parallel real-time crowd simulation method based on a hierarchical environmental model. A dynamical model of the complex environment should be constructed to simulate the state transition and propagation of individual motions. By modeling of a virtual environment where virtual crowds reside, we employ different parallel methods on a topological layer, a path layer and a perceptual layer. We propose a parallel motion path matching method based on the path layer and a parallel crowd simulation method based on the perceptual layer. The large-scale real-time crowd simulation becomes possible with these methods. Numerical experiments are carried out to demonstrate the methods and results.
Very Large-Scale Neighborhoods with Performance Guarantees for Minimizing Makespan on Parallel Machines

NARCIS (Netherlands)

Brueggemann, T.; Hurink, Johann L.; Vredeveld, T.; Woeginger, Gerhard

2006-01-01

We study the problem of minimizing the makespan on m parallel machines. We introduce a very large-scale neighborhood of exponential size (in the number of machines) that is based on a matching in a complete graph. The idea is to partition the jobs assigned to the same machine into two sets. This
DGDFT: A massively parallel method for large scale density functional theory calculations.

Science.gov (United States)

Hu, Wei; Lin, Lin; Yang, Chao

2015-09-28

We describe a massively parallel implementation of the recently developed discontinuous Galerkin density functional theory (DGDFT) method, for efficient large-scale Kohn-Sham DFT based electronic structure calculations. The DGDFT method uses adaptive local basis (ALB) functions generated on-the-fly during the self-consistent field iteration to represent the solution to the Kohn-Sham equations. The use of the ALB set provides a systematic way to improve the accuracy of the approximation. By using the pole expansion and selected inversion technique to compute electron density, energy, and atomic forces, we can make the computational complexity of DGDFT scale at most quadratically with respect to the number of electrons for both insulating and metallic systems. We show that for the two-dimensional (2D) phosphorene systems studied here, using 37 basis functions per atom allows us to reach an accuracy level of 1.3 × 10(-4) Hartree/atom in terms of the error of energy and 6.2 × 10(-4) Hartree/bohr in terms of the error of atomic force, respectively. DGDFT can achieve 80% parallel efficiency on 128,000 high performance computing cores when it is used to study the electronic structure of 2D phosphorene systems with 3500-14 000 atoms. This high parallel efficiency results from a two-level parallelization scheme that we will describe in detail.
DGDFT: A massively parallel method for large scale density functional theory calculations

International Nuclear Information System (INIS)

Hu, Wei; Yang, Chao; Lin, Lin

2015-01-01

We describe a massively parallel implementation of the recently developed discontinuous Galerkin density functional theory (DGDFT) method, for efficient large-scale Kohn-Sham DFT based electronic structure calculations. The DGDFT method uses adaptive local basis (ALB) functions generated on-the-fly during the self-consistent field iteration to represent the solution to the Kohn-Sham equations. The use of the ALB set provides a systematic way to improve the accuracy of the approximation. By using the pole expansion and selected inversion technique to compute electron density, energy, and atomic forces, we can make the computational complexity of DGDFT scale at most quadratically with respect to the number of electrons for both insulating and metallic systems. We show that for the two-dimensional (2D) phosphorene systems studied here, using 37 basis functions per atom allows us to reach an accuracy level of 1.3 × 10 −4 Hartree/atom in terms of the error of energy and 6.2 × 10 −4 Hartree/bohr in terms of the error of atomic force, respectively. DGDFT can achieve 80% parallel efficiency on 128,000 high performance computing cores when it is used to study the electronic structure of 2D phosphorene systems with 3500-14 000 atoms. This high parallel efficiency results from a two-level parallelization scheme that we will describe in detail
DGDFT: A massively parallel method for large scale density functional theory calculations

Energy Technology Data Exchange (ETDEWEB)

Hu, Wei, E-mail: whu@lbl.gov; Yang, Chao, E-mail: cyang@lbl.gov [Computational Research Division, Lawrence Berkeley National Laboratory, Berkeley, California 94720 (United States); Lin, Lin, E-mail: linlin@math.berkeley.edu [Computational Research Division, Lawrence Berkeley National Laboratory, Berkeley, California 94720 (United States); Department of Mathematics, University of California, Berkeley, California 94720 (United States)

2015-09-28

We describe a massively parallel implementation of the recently developed discontinuous Galerkin density functional theory (DGDFT) method, for efficient large-scale Kohn-Sham DFT based electronic structure calculations. The DGDFT method uses adaptive local basis (ALB) functions generated on-the-fly during the self-consistent field iteration to represent the solution to the Kohn-Sham equations. The use of the ALB set provides a systematic way to improve the accuracy of the approximation. By using the pole expansion and selected inversion technique to compute electron density, energy, and atomic forces, we can make the computational complexity of DGDFT scale at most quadratically with respect to the number of electrons for both insulating and metallic systems. We show that for the two-dimensional (2D) phosphorene systems studied here, using 37 basis functions per atom allows us to reach an accuracy level of 1.3 × 10{sup −4} Hartree/atom in terms of the error of energy and 6.2 × 10{sup −4} Hartree/bohr in terms of the error of atomic force, respectively. DGDFT can achieve 80% parallel efficiency on 128,000 high performance computing cores when it is used to study the electronic structure of 2D phosphorene systems with 3500-14 000 atoms. This high parallel efficiency results from a two-level parallelization scheme that we will describe in detail.

Large-Scale Parallel Viscous Flow Computations using an Unstructured Multigrid Algorithm

Science.gov (United States)

Mavriplis, Dimitri J.

1999-01-01

The development and testing of a parallel unstructured agglomeration multigrid algorithm for steady-state aerodynamic flows is discussed. The agglomeration multigrid strategy uses a graph algorithm to construct the coarse multigrid levels from the given fine grid, similar to an algebraic multigrid approach, but operates directly on the non-linear system using the FAS (Full Approximation Scheme) approach. The scalability and convergence rate of the multigrid algorithm are examined on the SGI Origin 2000 and the Cray T3E. An argument is given which indicates that the asymptotic scalability of the multigrid algorithm should be similar to that of its underlying single grid smoothing scheme. For medium size problems involving several million grid points, near perfect scalability is obtained for the single grid algorithm, while only a slight drop-off in parallel efficiency is observed for the multigrid V- and W-cycles, using up to 128 processors on the SGI Origin 2000, and up to 512 processors on the Cray T3E. For a large problem using 25 million grid points, good scalability is observed for the multigrid algorithm using up to 1450 processors on a Cray T3E, even when the coarsest grid level contains fewer points than the total number of processors.
A family of conjugate gradient methods for large-scale nonlinear equations

Directory of Open Access Journals (Sweden)

Dexiang Feng

2017-09-01

Full Text Available Abstract In this paper, we present a family of conjugate gradient projection methods for solving large-scale nonlinear equations. At each iteration, it needs low storage and the subproblem can be easily solved. Compared with the existing solution methods for solving the problem, its global convergence is established without the restriction of the Lipschitz continuity on the underlying mapping. Preliminary numerical results are reported to show the efficiency of the proposed method.
Efficient graph-based dynamic load-balancing for parallel large-scale agent-based traffic simulation

NARCIS (Netherlands)

Xu, Y.; Cai, W.; Aydt, H.; Lees, M.; Tolk, A.; Diallo, S.Y.; Ryzhov, I.O.; Yilmaz, L.; Buckley, S.; Miller, J.A.

2014-01-01

One of the issues of parallelizing large-scale agent-based traffic simulations is partitioning and load-balancing. Traffic simulations are dynamic applications where the distribution of workload in the spatial domain constantly changes. Dynamic load-balancing at run-time has shown better efficiency
NonLinear Parallel OPtimization Tool, Phase II

Data.gov (United States)

National Aeronautics and Space Administration — The technological advancement proposed is a novel large-scale Noninear Parallel OPtimization Tool (NLPAROPT). This software package will eliminate the computational...
Parallel Tensor Compression for Large-Scale Scientific Data.

Energy Technology Data Exchange (ETDEWEB)

Kolda, Tamara G. [Sandia National Lab. (SNL-CA), Livermore, CA (United States); Ballard, Grey [Sandia National Lab. (SNL-CA), Livermore, CA (United States); Austin, Woody Nathan [Univ. of Texas, Austin, TX (United States)

2015-10-01

As parallel computing trends towards the exascale, scientific data produced by high-fidelity simulations are growing increasingly massive. For instance, a simulation on a three-dimensional spatial grid with 512 points per dimension that tracks 64 variables per grid point for 128 time steps yields 8 TB of data. By viewing the data as a dense five way tensor, we can compute a Tucker decomposition to find inherent low-dimensional multilinear structure, achieving compression ratios of up to 10000 on real-world data sets with negligible loss in accuracy. So that we can operate on such massive data, we present the first-ever distributed memory parallel implementation for the Tucker decomposition, whose key computations correspond to parallel linear algebra operations, albeit with nonstandard data layouts. Our approach specifies a data distribution for tensors that avoids any tensor data redistribution, either locally or in parallel. We provide accompanying analysis of the computation and communication costs of the algorithms. To demonstrate the compression and accuracy of the method, we apply our approach to real-world data sets from combustion science simulations. We also provide detailed performance results, including parallel performance in both weak and strong scaling experiments.
Nonlinear evolution of large-scale structure in the universe

International Nuclear Information System (INIS)

Frenk, C.S.; White, S.D.M.; Davis, M.

1983-01-01

Using N-body simulations we study the nonlinear development of primordial density perturbation in an Einstein--de Sitter universe. We compare the evolution of an initial distribution without small-scale density fluctuations to evolution from a random Poisson distribution. These initial conditions mimic the assumptions of the adiabatic and isothermal theories of galaxy formation. The large-scale structures which form in the two cases are markedly dissimilar. In particular, the correlation function xi(r) and the visual appearance of our adiabatic (or ''pancake'') models match better the observed distribution of galaxies. This distribution is characterized by large-scale filamentary structure. Because the pancake models do not evolve in a self-similar fashion, the slope of xi(r) steepens with time; as a result there is a unique epoch at which these models fit the galaxy observations. We find the ratio of cutoff length to correlation length at this time to be lambda/sub min//r 0 = 5.1; its expected value in a neutrino dominated universe is 4(Ωh) -1 (H 0 = 100h km s -1 Mpc -1 ). At early epochs these models predict a negligible amplitude for xi(r) and could explain the lack of measurable clustering in the Lyα absorption lines of high-redshift quasars. However, large-scale structure in our models collapses after z = 2. If this collapse precedes galaxy formation as in the usual pancake theory, galaxies formed uncomfortably recently. The extent of this problem may depend on the cosmological model used; the present series of experiments should be extended in the future to include models with Ω<1
Multilevel parallel strategy on Monte Carlo particle transport for the large-scale full-core pin-by-pin simulations

International Nuclear Information System (INIS)

Zhang, B.; Li, G.; Wang, W.; Shangguan, D.; Deng, L.

2015-01-01

This paper introduces the Strategy of multilevel hybrid parallelism of JCOGIN Infrastructure on Monte Carlo Particle Transport for the large-scale full-core pin-by-pin simulations. The particle parallelism, domain decomposition parallelism and MPI/OpenMP parallelism are designed and implemented. By the testing, JMCT presents the parallel scalability of JCOGIN, which reaches the parallel efficiency 80% on 120,000 cores for the pin-by-pin computation of the BEAVRS benchmark. (author)
Parallel simulation of tsunami inundation on a large-scale supercomputer

Science.gov (United States)

Oishi, Y.; Imamura, F.; Sugawara, D.

2013-12-01

An accurate prediction of tsunami inundation is important for disaster mitigation purposes. One approach is to approximate the tsunami wave source through an instant inversion analysis using real-time observation data (e.g., Tsushima et al., 2009) and then use the resulting wave source data in an instant tsunami inundation simulation. However, a bottleneck of this approach is the large computational cost of the non-linear inundation simulation and the computational power of recent massively parallel supercomputers is helpful to enable faster than real-time execution of a tsunami inundation simulation. Parallel computers have become approximately 1000 times faster in 10 years (www.top500.org), and so it is expected that very fast parallel computers will be more and more prevalent in the near future. Therefore, it is important to investigate how to efficiently conduct a tsunami simulation on parallel computers. In this study, we are targeting very fast tsunami inundation simulations on the K computer, currently the fastest Japanese supercomputer, which has a theoretical peak performance of 11.2 PFLOPS. One computing node of the K computer consists of 1 CPU with 8 cores that share memory, and the nodes are connected through a high-performance torus-mesh network. The K computer is designed for distributed-memory parallel computation, so we have developed a parallel tsunami model. Our model is based on TUNAMI-N2 model of Tohoku University, which is based on a leap-frog finite difference method. A grid nesting scheme is employed to apply high-resolution grids only at the coastal regions. To balance the computation load of each CPU in the parallelization, CPUs are first allocated to each nested layer in proportion to the number of grid points of the nested layer. Using CPUs allocated to each layer, 1-D domain decomposition is performed on each layer. In the parallel computation, three types of communication are necessary: (1) communication to adjacent neighbours for the
A review of parallel computing for large-scale remote sensing image mosaicking

OpenAIRE

Chen, Lajiao; Ma, Yan; Liu, Peng; Wei, Jingbo; Jie, Wei; He, Jijun

2015-01-01

Interest in image mosaicking has been spurred by a wide variety of research and management needs. However, for large-scale applications, remote sensing image mosaicking usually requires significant computational capabilities. Several studies have attempted to apply parallel computing to improve image mosaicking algorithms and to speed up calculation process. The state of the art of this field has not yet been summarized, which is, however, essential for a better understanding and for further ...
Parallel time domain solvers for electrically large transient scattering problems

KAUST Repository

Liu, Yang

2014-09-26

Marching on in time (MOT)-based integral equation solvers represent an increasingly appealing avenue for analyzing transient electromagnetic interactions with large and complex structures. MOT integral equation solvers for analyzing electromagnetic scattering from perfect electrically conducting objects are obtained by enforcing electric field boundary conditions and implicitly time advance electric surface current densities by iteratively solving sparse systems of equations at all time steps. Contrary to finite difference and element competitors, these solvers apply to nonlinear and multi-scale structures comprising geometrically intricate and deep sub-wavelength features residing atop electrically large platforms. Moreover, they are high-order accurate, stable in the low- and high-frequency limits, and applicable to conducting and penetrable structures represented by highly irregular meshes. This presentation reviews some recent advances in the parallel implementations of time domain integral equation solvers, specifically those that leverage multilevel plane-wave time-domain algorithm (PWTD) on modern manycore computer architectures including graphics processing units (GPUs) and distributed memory supercomputers. The GPU-based implementation achieves at least one order of magnitude speedups compared to serial implementations while the distributed parallel implementation are highly scalable to thousands of compute-nodes. A distributed parallel PWTD kernel has been adopted to solve time domain surface/volume integral equations (TDSIE/TDVIE) for analyzing transient scattering from large and complex-shaped perfectly electrically conducting (PEC)/dielectric objects involving ten million/tens of millions of spatial unknowns.
MOOSE: A parallel computational framework for coupled systems of nonlinear equations

International Nuclear Information System (INIS)

Gaston, Derek; Newman, Chris; Hansen, Glen; Lebrun-Grandie, Damien

2009-01-01

Systems of coupled, nonlinear partial differential equations (PDEs) often arise in simulation of nuclear processes. MOOSE: Multiphysics Object Oriented Simulation Environment, a parallel computational framework targeted at the solution of such systems, is presented. As opposed to traditional data-flow oriented computational frameworks, MOOSE is instead founded on the mathematical principle of Jacobian-free Newton-Krylov (JFNK). Utilizing the mathematical structure present in JFNK, physics expressions are modularized into 'Kernels,' allowing for rapid production of new simulation tools. In addition, systems are solved implicitly and fully coupled, employing physics-based preconditioning, which provides great flexibility even with large variance in time scales. A summary of the mathematics, an overview of the structure of MOOSE, and several representative solutions from applications built on the framework are presented.
Visual Data-Analytics of Large-Scale Parallel Discrete-Event Simulations

Energy Technology Data Exchange (ETDEWEB)

Ross, Caitlin; Carothers, Christopher D.; Mubarak, Misbah; Carns, Philip; Ross, Robert; Li, Jianping Kelvin; Ma, Kwan-Liu

2016-11-13

Parallel discrete-event simulation (PDES) is an important tool in the codesign of extreme-scale systems because PDES provides a cost-effective way to evaluate designs of highperformance computing systems. Optimistic synchronization algorithms for PDES, such as Time Warp, allow events to be processed without global synchronization among the processing elements. A rollback mechanism is provided when events are processed out of timestamp order. Although optimistic synchronization protocols enable the scalability of large-scale PDES, the performance of the simulations must be tuned to reduce the number of rollbacks and provide an improved simulation runtime. To enable efficient large-scale optimistic simulations, one has to gain insight into the factors that affect the rollback behavior and simulation performance. We developed a tool for ROSS model developers that gives them detailed metrics on the performance of their large-scale optimistic simulations at varying levels of simulation granularity. Model developers can use this information for parameter tuning of optimistic simulations in order to achieve better runtime and fewer rollbacks. In this work, we instrument the ROSS optimistic PDES framework to gather detailed statistics about the simulation engine. We have also developed an interactive visualization interface that uses the data collected by the ROSS instrumentation to understand the underlying behavior of the simulation engine. The interface connects real time to virtual time in the simulation and provides the ability to view simulation data at different granularities. We demonstrate the usefulness of our framework by performing a visual analysis of the dragonfly network topology model provided by the CODES simulation framework built on top of ROSS. The instrumentation needs to minimize overhead in order to accurately collect data about the simulation performance. To ensure that the instrumentation does not introduce unnecessary overhead, we perform a
Large-Scale, Parallel, Multi-Sensor Atmospheric Data Fusion Using Cloud Computing

Science.gov (United States)

Wilson, B. D.; Manipon, G.; Hua, H.; Fetzer, E. J.

2013-12-01

NASA's Earth Observing System (EOS) is an ambitious facility for studying global climate change. The mandate now is to combine measurements from the instruments on the 'A-Train' platforms (AIRS, AMSR-E, MODIS, MISR, MLS, and CloudSat) and other Earth probes to enable large-scale studies of climate change over decades. Moving to multi-sensor, long-duration analyses of important climate variables presents serious challenges for large-scale data mining and fusion. For example, one might want to compare temperature and water vapor retrievals from one instrument (AIRS) to another (MODIS), and to a model (MERRA), stratify the comparisons using a classification of the 'cloud scenes' from CloudSat, and repeat the entire analysis over 10 years of data. To efficiently assemble such datasets, we are utilizing Elastic Computing in the Cloud and parallel map/reduce-based algorithms. However, these problems are Data Intensive computing so the data transfer times and storage costs (for caching) are key issues. SciReduce is a Hadoop-like parallel analysis system, programmed in parallel python, that is designed from the ground up for Earth science. SciReduce executes inside VMWare images and scales to any number of nodes in the Cloud. Unlike Hadoop, SciReduce operates on bundles of named numeric arrays, which can be passed in memory or serialized to disk in netCDF4 or HDF5. Figure 1 shows the architecture of the full computational system, with SciReduce at the core. Multi-year datasets are automatically 'sharded' by time and space across a cluster of nodes so that years of data (millions of files) can be processed in a massively parallel way. Input variables (arrays) are pulled on-demand into the Cloud using OPeNDAP URLs or other subsetting services, thereby minimizing the size of the cached input and intermediate datasets. We are using SciReduce to automate the production of multiple versions of a ten-year A-Train water vapor climatology under a NASA MEASURES grant. We will
Algorithm 896: LSA: Algorithms for Large-Scale Optimization

Czech Academy of Sciences Publication Activity Database

Lukšan, Ladislav; Matonoha, Ctirad; Vlček, Jan

2009-01-01

Roč. 36, č. 3 (2009), 16-1-16-29 ISSN 0098-3500 R&D Pro jects: GA AV ČR IAA1030405; GA ČR GP201/06/P397 Institutional research plan: CEZ:AV0Z10300504 Keywords : algorithms * design * large-scale optimization * large-scale nonsmooth optimization * large-scale nonlinear least squares * large-scale nonlinear minimax * large-scale systems of nonlinear equations * sparse pro blems * partially separable pro blems * limited-memory methods * discrete Newton methods * quasi-Newton methods * primal interior-point methods Subject RIV: BB - Applied Statistics, Operational Research Impact factor: 1.904, year: 2009
Parallel Computational Fluid Dynamics 2007 : Implementations and Experiences on Large Scale and Grid Computing

CERN Document Server

2009-01-01

At the 19th Annual Conference on Parallel Computational Fluid Dynamics held in Antalya, Turkey, in May 2007, the most recent developments and implementations of large-scale and grid computing were presented. This book, comprised of the invited and selected papers of this conference, details those advances, which are of particular interest to CFD and CFD-related communities. It also offers the results related to applications of various scientific and engineering problems involving flows and flow-related topics. Intended for CFD researchers and graduate students, this book is a state-of-the-art presentation of the relevant methodology and implementation techniques of large-scale computing.
The parallel-sequential field subtraction technique for coherent nonlinear ultrasonic imaging

Science.gov (United States)

Cheng, Jingwei; Potter, Jack N.; Drinkwater, Bruce W.

2018-06-01

Nonlinear imaging techniques have recently emerged which have the potential to detect cracks at a much earlier stage than was previously possible and have sensitivity to partially closed defects. This study explores a coherent imaging technique based on the subtraction of two modes of focusing: parallel, in which the elements are fired together with a delay law and sequential, in which elements are fired independently. In the parallel focusing a high intensity ultrasonic beam is formed in the specimen at the focal point. However, in sequential focusing only low intensity signals from individual elements enter the sample and the full matrix of transmit-receive signals is recorded and post-processed to form an image. Under linear elastic assumptions, both parallel and sequential images are expected to be identical. Here we measure the difference between these images and use this to characterise the nonlinearity of small closed fatigue cracks. In particular we monitor the change in relative phase and amplitude at the fundamental frequencies for each focal point and use this nonlinear coherent imaging metric to form images of the spatial distribution of nonlinearity. The results suggest the subtracted image can suppress linear features (e.g. back wall or large scatters) effectively when instrumentation noise compensation in applied, thereby allowing damage to be detected at an early stage (c. 15% of fatigue life) and reliably quantified in later fatigue life.
SQDFT: Spectral Quadrature method for large-scale parallel O(N) Kohn-Sham calculations at high temperature

Science.gov (United States)

Suryanarayana, Phanish; Pratapa, Phanisri P.; Sharma, Abhiraj; Pask, John E.

2018-03-01

We present SQDFT: a large-scale parallel implementation of the Spectral Quadrature (SQ) method for O(N) Kohn-Sham Density Functional Theory (DFT) calculations at high temperature. Specifically, we develop an efficient and scalable finite-difference implementation of the infinite-cell Clenshaw-Curtis SQ approach, in which results for the infinite crystal are obtained by expressing quantities of interest as bilinear forms or sums of bilinear forms, that are then approximated by spatially localized Clenshaw-Curtis quadrature rules. We demonstrate the accuracy of SQDFT by showing systematic convergence of energies and atomic forces with respect to SQ parameters to reference diagonalization results, and convergence with discretization to established planewave results, for both metallic and insulating systems. We further demonstrate that SQDFT achieves excellent strong and weak parallel scaling on computer systems consisting of tens of thousands of processors, with near perfect O(N) scaling with system size and wall times as low as a few seconds per self-consistent field iteration. Finally, we verify the accuracy of SQDFT in large-scale quantum molecular dynamics simulations of aluminum at high temperature.
Large-scale parallel configuration interaction. II. Two- and four-component double-group general active space implementation with application to BiH

DEFF Research Database (Denmark)

Knecht, Stefan; Jensen, Hans Jørgen Aagaard; Fleig, Timo

2010-01-01

We present a parallel implementation of a large-scale relativistic double-group configuration interaction CIprogram. It is applicable with a large variety of two- and four-component Hamiltonians. The parallel algorithm is based on a distributed data model in combination with a static load balanci...
Large amplitude parallel propagating electromagnetic oscillitons

International Nuclear Information System (INIS)

Cattaert, Tom; Verheest, Frank

2005-01-01

Earlier systematic nonlinear treatments of parallel propagating electromagnetic waves have been given within a fluid dynamic approach, in a frame where the nonlinear structures are stationary and various constraining first integrals can be obtained. This has lead to the concept of oscillitons that has found application in various space plasmas. The present paper differs in three main aspects from the previous studies: first, the invariants are derived in the plasma frame, as customary in the Sagdeev method, thus retaining in Maxwell's equations all possible effects. Second, a single differential equation is obtained for the parallel fluid velocity, in a form reminiscent of the Sagdeev integrals, hence allowing a fully nonlinear discussion of the oscilliton properties, at such amplitudes as the underlying Mach number restrictions allow. Third, the transition to weakly nonlinear whistler oscillitons is done in an analytical rather than a numerical fashion
Parallel Solution of Robust Nonlinear Model Predictive Control Problems in Batch Crystallization

Directory of Open Access Journals (Sweden)

Yankai Cao

2016-06-01

Full Text Available Representing the uncertainties with a set of scenarios, the optimization problem resulting from a robust nonlinear model predictive control (NMPC strategy at each sampling instance can be viewed as a large-scale stochastic program. This paper solves these optimization problems using the parallel Schur complement method developed to solve stochastic programs on distributed and shared memory machines. The control strategy is illustrated with a case study of a multidimensional unseeded batch crystallization process. For this application, a robust NMPC based on min–max optimization guarantees satisfaction of all state and input constraints for a set of uncertainty realizations, and also provides better robust performance compared with open-loop optimal control, nominal NMPC, and robust NMPC minimizing the expected performance at each sampling instance. The performance of robust NMPC can be improved by generating optimization scenarios using Bayesian inference. With the efficient parallel solver, the solution time of one optimization problem is reduced from 6.7 min to 0.5 min, allowing for real-time application.

An efficient implementation of 3D high-resolution imaging for large-scale seismic data with GPU/CPU heterogeneous parallel computing

Science.gov (United States)

Xu, Jincheng; Liu, Wei; Wang, Jin; Liu, Linong; Zhang, Jianfeng

2018-02-01

De-absorption pre-stack time migration (QPSTM) compensates for the absorption and dispersion of seismic waves by introducing an effective Q parameter, thereby making it an effective tool for 3D, high-resolution imaging of seismic data. Although the optimal aperture obtained via stationary-phase migration reduces the computational cost of 3D QPSTM and yields 3D stationary-phase QPSTM, the associated computational efficiency is still the main problem in the processing of 3D, high-resolution images for real large-scale seismic data. In the current paper, we proposed a division method for large-scale, 3D seismic data to optimize the performance of stationary-phase QPSTM on clusters of graphics processing units (GPU). Then, we designed an imaging point parallel strategy to achieve an optimal parallel computing performance. Afterward, we adopted an asynchronous double buffering scheme for multi-stream to perform the GPU/CPU parallel computing. Moreover, several key optimization strategies of computation and storage based on the compute unified device architecture (CUDA) were adopted to accelerate the 3D stationary-phase QPSTM algorithm. Compared with the initial GPU code, the implementation of the key optimization steps, including thread optimization, shared memory optimization, register optimization and special function units (SFU), greatly improved the efficiency. A numerical example employing real large-scale, 3D seismic data showed that our scheme is nearly 80 times faster than the CPU-QPSTM algorithm. Our GPU/CPU heterogeneous parallel computing framework significant reduces the computational cost and facilitates 3D high-resolution imaging for large-scale seismic data.
Nonreciprocity in the dynamics of coupled oscillators with nonlinearity, asymmetry, and scale hierarchy

Science.gov (United States)

Moore, Keegan J.; Bunyan, Jonathan; Tawfick, Sameh; Gendelman, Oleg V.; Li, Shuangbao; Leamy, Michael; Vakakis, Alexander F.

2018-01-01

In linear time-invariant dynamical and acoustical systems, reciprocity holds by the Onsager-Casimir principle of microscopic reversibility, and this can be broken only by odd external biases, nonlinearities, or time-dependent properties. A concept is proposed in this work for breaking dynamic reciprocity based on irreversible nonlinear energy transfers from large to small scales in a system with nonlinear hierarchical internal structure, asymmetry, and intentional strong stiffness nonlinearity. The resulting nonreciprocal large-to-small scale energy transfers mimic analogous nonlinear energy transfer cascades that occur in nature (e.g., in turbulent flows), and are caused by the strong frequency-energy dependence of the essentially nonlinear small-scale components of the system considered. The theoretical part of this work is mainly based on action-angle transformations, followed by direct numerical simulations of the resulting system of nonlinear coupled oscillators. The experimental part considers a system with two scales—a linear large-scale oscillator coupled to a small scale by a nonlinear spring—and validates the theoretical findings demonstrating nonreciprocal large-to-small scale energy transfer. The proposed study promotes a paradigm for designing nonreciprocal acoustic materials harnessing strong nonlinearity, which in a future application will be implemented in designing lattices incorporating nonlinear hierarchical internal structures, asymmetry, and scale mixing.
Optimization under uncertainty of parallel nonlinear energy sinks

Science.gov (United States)

Boroson, Ethan; Missoum, Samy; Mattei, Pierre-Olivier; Vergez, Christophe

2017-04-01

Nonlinear Energy Sinks (NESs) are a promising technique for passively reducing the amplitude of vibrations. Through nonlinear stiffness properties, a NES is able to passively and irreversibly absorb energy. Unlike the traditional Tuned Mass Damper (TMD), NESs do not require a specific tuning and absorb energy over a wider range of frequencies. Nevertheless, they are still only efficient over a limited range of excitations. In order to mitigate this limitation and maximize the efficiency range, this work investigates the optimization of multiple NESs configured in parallel. It is well known that the efficiency of a NES is extremely sensitive to small perturbations in loading conditions or design parameters. In fact, the efficiency of a NES has been shown to be nearly discontinuous in the neighborhood of its activation threshold. For this reason, uncertainties must be taken into account in the design optimization of NESs. In addition, the discontinuities require a specific treatment during the optimization process. In this work, the objective of the optimization is to maximize the expected value of the efficiency of NESs in parallel. The optimization algorithm is able to tackle design variables with uncertainty (e.g., nonlinear stiffness coefficients) as well as aleatory variables such as the initial velocity of the main system. The optimal design of several parallel NES configurations for maximum mean efficiency is investigated. Specifically, NES nonlinear stiffness properties, considered random design variables, are optimized for cases with 1, 2, 3, 4, 5, and 10 NESs in parallel. The distributions of efficiency for the optimal parallel configurations are compared to distributions of efficiencies of non-optimized NESs. It is observed that the optimization enables a sharp increase in the mean value of efficiency while reducing the corresponding variance, thus leading to more robust NES designs.
Hierarchical approach to optimization of parallel matrix multiplication on large-scale platforms

KAUST Repository

Hasanov, Khalid; Quintin, Jean-Noë l; Lastovetsky, Alexey

2014-01-01

-scale parallelism in mind. Indeed, while in 1990s a system with few hundred cores was considered a powerful supercomputer, modern top supercomputers have millions of cores. In this paper, we present a hierarchical approach to optimization of message-passing parallel
Hierarchical optimal control of large-scale nonlinear chemical processes.

Science.gov (United States)

Ramezani, Mohammad Hossein; Sadati, Nasser

2009-01-01

In this paper, a new approach is presented for optimal control of large-scale chemical processes. In this approach, the chemical process is decomposed into smaller sub-systems at the first level, and a coordinator at the second level, for which a two-level hierarchical control strategy is designed. For this purpose, each sub-system in the first level can be solved separately, by using any conventional optimization algorithm. In the second level, the solutions obtained from the first level are coordinated using a new gradient-type strategy, which is updated by the error of the coordination vector. The proposed algorithm is used to solve the optimal control problem of a complex nonlinear chemical stirred tank reactor (CSTR), where its solution is also compared with the ones obtained using the centralized approach. The simulation results show the efficiency and the capability of the proposed hierarchical approach, in finding the optimal solution, over the centralized method.
Design and Nonlinear Control of a 2-DOF Flexible Parallel Humanoid Arm Joint Robot

Directory of Open Access Journals (Sweden)

Leijie Jiang

2017-01-01

Full Text Available The paper focuses on the design and nonlinear control of the humanoid wrist/shoulder joint based on the cable-driven parallel mechanism which can realize roll and pitch movement. In view of the existence of the flexible parts in the mechanism, it is necessary to solve the vibration control of the flexible wrist/shoulder joint. In this paper, a cable-driven parallel robot platform is developed for the experiment study of the humanoid wrist/shoulder joint. And the dynamic model of the mechanism is formulated by using the coupling theory of the flexible body’s large global motion and small flexible deformation. Based on derived dynamics, antivibration control of the joint robot is studied with a nonlinear control method. Finally, simulations and experiments were performed to validate the feasibility of the developed parallel robot prototype and the proposed control scheme.
Large-scale modeling of epileptic seizures: scaling properties of two parallel neuronal network simulation algorithms.

Science.gov (United States)

Pesce, Lorenzo L; Lee, Hyong C; Hereld, Mark; Visser, Sid; Stevens, Rick L; Wildeman, Albert; van Drongelen, Wim

2013-01-01

Our limited understanding of the relationship between the behavior of individual neurons and large neuronal networks is an important limitation in current epilepsy research and may be one of the main causes of our inadequate ability to treat it. Addressing this problem directly via experiments is impossibly complex; thus, we have been developing and studying medium-large-scale simulations of detailed neuronal networks to guide us. Flexibility in the connection schemas and a complete description of the cortical tissue seem necessary for this purpose. In this paper we examine some of the basic issues encountered in these multiscale simulations. We have determined the detailed behavior of two such simulators on parallel computer systems. The observed memory and computation-time scaling behavior for a distributed memory implementation were very good over the range studied, both in terms of network sizes (2,000 to 400,000 neurons) and processor pool sizes (1 to 256 processors). Our simulations required between a few megabytes and about 150 gigabytes of RAM and lasted between a few minutes and about a week, well within the capability of most multinode clusters. Therefore, simulations of epileptic seizures on networks with millions of cells should be feasible on current supercomputers.
Large-Scale Modeling of Epileptic Seizures: Scaling Properties of Two Parallel Neuronal Network Simulation Algorithms

Directory of Open Access Journals (Sweden)

Lorenzo L. Pesce

2013-01-01

Full Text Available Our limited understanding of the relationship between the behavior of individual neurons and large neuronal networks is an important limitation in current epilepsy research and may be one of the main causes of our inadequate ability to treat it. Addressing this problem directly via experiments is impossibly complex; thus, we have been developing and studying medium-large-scale simulations of detailed neuronal networks to guide us. Flexibility in the connection schemas and a complete description of the cortical tissue seem necessary for this purpose. In this paper we examine some of the basic issues encountered in these multiscale simulations. We have determined the detailed behavior of two such simulators on parallel computer systems. The observed memory and computation-time scaling behavior for a distributed memory implementation were very good over the range studied, both in terms of network sizes (2,000 to 400,000 neurons and processor pool sizes (1 to 256 processors. Our simulations required between a few megabytes and about 150 gigabytes of RAM and lasted between a few minutes and about a week, well within the capability of most multinode clusters. Therefore, simulations of epileptic seizures on networks with millions of cells should be feasible on current supercomputers.
Adaptive Fuzzy Output-Constrained Fault-Tolerant Control of Nonlinear Stochastic Large-Scale Systems With Actuator Faults.

Science.gov (United States)

Li, Yongming; Ma, Zhiyao; Tong, Shaocheng

2017-09-01

The problem of adaptive fuzzy output-constrained tracking fault-tolerant control (FTC) is investigated for the large-scale stochastic nonlinear systems of pure-feedback form. The nonlinear systems considered in this paper possess the unstructured uncertainties, unknown interconnected terms and unknown nonaffine nonlinear faults. The fuzzy logic systems are employed to identify the unknown lumped nonlinear functions so that the problems of structured uncertainties can be solved. An adaptive fuzzy state observer is designed to solve the nonmeasurable state problem. By combining the barrier Lyapunov function theory, adaptive decentralized and stochastic control principles, a novel fuzzy adaptive output-constrained FTC approach is constructed. All the signals in the closed-loop system are proved to be bounded in probability and the system outputs are constrained in a given compact set. Finally, the applicability of the proposed controller is well carried out by a simulation example.
Sharing of nonlinear load in parallel-connected three-phase converters

DEFF Research Database (Denmark)

Borup, Uffe; Blaabjerg, Frede; Enjeti, Prasad N.

2001-01-01

compensation are connected in parallel. Without the new solution, they are normally not able to distinguish the harmonic currents that flow to the load and harmonic currents that circulate between the converters. Analysis and experimental results on two 90-kVA 400-Hz converters in parallel are presented......In this paper, a new control method is presented which enables equal sharing of linear and nonlinear loads in three-phase power converters connected in parallel, without communication between the converters. The paper focuses on solving the problem that arises when two converters with harmonic....... The results show that both linear and nonlinear loads can be shared equally by the proposed concept....
Adaptive Neural Networks Decentralized FTC Design for Nonstrict-Feedback Nonlinear Interconnected Large-Scale Systems Against Actuator Faults.

Science.gov (United States)

Li, Yongming; Tong, Shaocheng

The problem of active fault-tolerant control (FTC) is investigated for the large-scale nonlinear systems in nonstrict-feedback form. The nonstrict-feedback nonlinear systems considered in this paper consist of unstructured uncertainties, unmeasured states, unknown interconnected terms, and actuator faults (e.g., bias fault and gain fault). A state observer is designed to solve the unmeasurable state problem. Neural networks (NNs) are used to identify the unknown lumped nonlinear functions so that the problems of unstructured uncertainties and unknown interconnected terms can be solved. By combining the adaptive backstepping design principle with the combination Nussbaum gain function property, a novel NN adaptive output-feedback FTC approach is developed. The proposed FTC controller can guarantee that all signals in all subsystems are bounded, and the tracking errors for each subsystem converge to a small neighborhood of zero. Finally, numerical results of practical examples are presented to further demonstrate the effectiveness of the proposed control strategy.The problem of active fault-tolerant control (FTC) is investigated for the large-scale nonlinear systems in nonstrict-feedback form. The nonstrict-feedback nonlinear systems considered in this paper consist of unstructured uncertainties, unmeasured states, unknown interconnected terms, and actuator faults (e.g., bias fault and gain fault). A state observer is designed to solve the unmeasurable state problem. Neural networks (NNs) are used to identify the unknown lumped nonlinear functions so that the problems of unstructured uncertainties and unknown interconnected terms can be solved. By combining the adaptive backstepping design principle with the combination Nussbaum gain function property, a novel NN adaptive output-feedback FTC approach is developed. The proposed FTC controller can guarantee that all signals in all subsystems are bounded, and the tracking errors for each subsystem converge to a small
Parallel Nonlinear Optimization for Astrodynamic Navigation, Phase I

Data.gov (United States)

National Aeronautics and Space Administration — CU Aerospace proposes the development of a new parallel nonlinear program (NLP) solver software package. NLPs allow the solution of complex optimization problems,...
Nonlinear effects in parallel magnetic fields in vanadyl and iron (111) ions solutions

International Nuclear Information System (INIS)

Ryzhov, V.A.; Fomichev, V.N.

1983-01-01

Nonlinear effects (NE) in vanadyl (VOSO 4 ) and iron (FeCl 3 x6H 2 O) solutions are investigated experimentally in the 268-323 K temperature range in parallel constant and variable linearly polarized magnetic fields, including conditions when EPR spectra are lacking due to strong resonance transition widening. It is shown that nonlinear effects are specified, on the one side, by the effect of a variable field on the relaxation processes and, on the other side, by resonance transitions in parallel fields. The relaxation and resonance effects contribute to different phase components of the second harmonic of magnetization, recorded in the experiment, at low frequences of a variable field (as compared to characteristic frequences of lattice motion). Therefore, separate analysis of the effects is possible. The presence of NE effects under conditions, when the EPR signal is not observed, and the possibility of the inverse problem solution using the variation technique on the base of simple models reveal that NE in parallel magnetic fields may be used for the investigation of paramagnets with a large EPR resonance transitions width
THE EFFECT OF INTERMITTENT GYRO-SCALE SLAB TURBULENCE ON PARALLEL AND PERPENDICULAR COSMIC-RAY TRANSPORT

International Nuclear Information System (INIS)

Le Roux, J. A.

2011-01-01

Earlier work based on nonlinear guiding center (NLGC) theory suggested that perpendicular cosmic-ray transport is diffusive when cosmic rays encounter random three-dimensional magnetohydrodynamic turbulence dominated by uniform two-dimensional (2D) turbulence with a minor uniform slab turbulence component. In this approach large-scale perpendicular cosmic-ray transport is due to cosmic rays microscopically diffusing along the meandering magnetic field dominated by 2D turbulence because of gyroresonant interactions with slab turbulence. However, turbulence in the solar wind is intermittent and it has been suggested that intermittent turbulence might be responsible for the observation of 'dropout' events in solar energetic particle fluxes on small scales. In a previous paper le Roux et al. suggested, using NLGC theory as a basis, that if gyro-scale slab turbulence is intermittent, large-scale perpendicular cosmic-ray transport in weak uniform 2D turbulence will be superdiffusive or subdiffusive depending on the statistical characteristics of the intermittent slab turbulence. In this paper we expand and refine our previous work further by investigating how both parallel and perpendicular transport are affected by intermittent slab turbulence for weak as well as strong uniform 2D turbulence. The main new finding is that both parallel and perpendicular transport are the net effect of an interplay between diffusive and nondiffusive (superdiffusive or subdiffusive) transport effects as a consequence of this intermittency.
THE EFFECT OF INTERMITTENT GYRO-SCALE SLAB TURBULENCE ON PARALLEL AND PERPENDICULAR COSMIC-RAY TRANSPORT

Energy Technology Data Exchange (ETDEWEB)

Le Roux, J. A. [Department of Physics, University of Alabama in Huntsville, Huntsville, AL 35899 (United States)

2011-12-10

Earlier work based on nonlinear guiding center (NLGC) theory suggested that perpendicular cosmic-ray transport is diffusive when cosmic rays encounter random three-dimensional magnetohydrodynamic turbulence dominated by uniform two-dimensional (2D) turbulence with a minor uniform slab turbulence component. In this approach large-scale perpendicular cosmic-ray transport is due to cosmic rays microscopically diffusing along the meandering magnetic field dominated by 2D turbulence because of gyroresonant interactions with slab turbulence. However, turbulence in the solar wind is intermittent and it has been suggested that intermittent turbulence might be responsible for the observation of 'dropout' events in solar energetic particle fluxes on small scales. In a previous paper le Roux et al. suggested, using NLGC theory as a basis, that if gyro-scale slab turbulence is intermittent, large-scale perpendicular cosmic-ray transport in weak uniform 2D turbulence will be superdiffusive or subdiffusive depending on the statistical characteristics of the intermittent slab turbulence. In this paper we expand and refine our previous work further by investigating how both parallel and perpendicular transport are affected by intermittent slab turbulence for weak as well as strong uniform 2D turbulence. The main new finding is that both parallel and perpendicular transport are the net effect of an interplay between diffusive and nondiffusive (superdiffusive or subdiffusive) transport effects as a consequence of this intermittency.
Parallel processing for nonlinear dynamics simulations of structures including rotating bladed-disk assemblies

Science.gov (United States)

Hsieh, Shang-Hsien

1993-01-01

The principal objective of this research is to develop, test, and implement coarse-grained, parallel-processing strategies for nonlinear dynamic simulations of practical structural problems. There are contributions to four main areas: finite element modeling and analysis of rotational dynamics, numerical algorithms for parallel nonlinear solutions, automatic partitioning techniques to effect load-balancing among processors, and an integrated parallel analysis system.
Large scale three-dimensional topology optimisation of heat sinks cooled by natural convection

DEFF Research Database (Denmark)

Alexandersen, Joe; Sigmund, Ole; Aage, Niels

2016-01-01

the Bousinessq approximation. The fully coupled non-linear multiphysics system is solved using stabilised trilinear equal-order finite elements in a parallel framework allowing for the optimisation of large scale problems with order of 20-330 million state degrees of freedom. The flow is assumed to be laminar...... topologies verify prior conclusions regarding fin length/thickness ratios and Biot numbers, but also indicate that carefully tailored and complex geometries may improve cooling behaviour considerably compared to simple heat fin geometries. (C) 2016 Elsevier Ltd. All rights reserved....
Identifiability of large-scale non-linear dynamic network models applied to the ADM1-case study.

Science.gov (United States)

Nimmegeers, Philippe; Lauwers, Joost; Telen, Dries; Logist, Filip; Impe, Jan Van

2017-06-01

In this work, both the structural and practical identifiability of the Anaerobic Digestion Model no. 1 (ADM1) is investigated, which serves as a relevant case study of large non-linear dynamic network models. The structural identifiability is investigated using the probabilistic algorithm, adapted to deal with the specifics of the case study (i.e., a large-scale non-linear dynamic system of differential and algebraic equations). The practical identifiability is analyzed using a Monte Carlo parameter estimation procedure for a 'non-informative' and 'informative' experiment, which are heuristically designed. The model structure of ADM1 has been modified by replacing parameters by parameter combinations, to provide a generally locally structurally identifiable version of ADM1. This means that in an idealized theoretical situation, the parameters can be estimated accurately. Furthermore, the generally positive structural identifiability results can be explained from the large number of interconnections between the states in the network structure. This interconnectivity, however, is also observed in the parameter estimates, making uncorrelated parameter estimations in practice difficult. Copyright © 2017. Published by Elsevier Inc.
Parallelization of a beam dynamics code and first large scale radio frequency quadrupole simulations

Directory of Open Access Journals (Sweden)

J. Xu

2007-01-01

Full Text Available The design and operation support of hadron (proton and heavy-ion linear accelerators require substantial use of beam dynamics simulation tools. The beam dynamics code TRACK has been originally developed at Argonne National Laboratory (ANL to fulfill the special requirements of the rare isotope accelerator (RIA accelerator systems. From the beginning, the code has been developed to make it useful in the three stages of a linear accelerator project, namely, the design, commissioning, and operation of the machine. To realize this concept, the code has unique features such as end-to-end simulations from the ion source to the final beam destination and automatic procedures for tuning of a multiple charge state heavy-ion beam. The TRACK code has become a general beam dynamics code for hadron linacs and has found wide applications worldwide. Until recently, the code has remained serial except for a simple parallelization used for the simulation of multiple seeds to study the machine errors. To speed up computation, the TRACK Poisson solver has been parallelized. This paper discusses different parallel models for solving the Poisson equation with the primary goal to extend the scalability of the code onto 1024 and more processors of the new generation of supercomputers known as BlueGene (BG/L. Domain decomposition techniques have been adapted and incorporated into the parallel version of the TRACK code. To demonstrate the new capabilities of the parallelized TRACK code, the dynamics of a 45 mA proton beam represented by 10^{8} particles has been simulated through the 325 MHz radio frequency quadrupole and initial accelerator section of the proposed FNAL proton driver. The results show the benefits and advantages of large-scale parallel computing in beam dynamics simulations.
Parallel Computation of the Jacobian Matrix for Nonlinear Equation Solvers Using MATLAB

Science.gov (United States)

Rose, Geoffrey K.; Nguyen, Duc T.; Newman, Brett A.

2017-01-01

Demonstrating speedup for parallel code on a multicore shared memory PC can be challenging in MATLAB due to underlying parallel operations that are often opaque to the user. This can limit potential for improvement of serial code even for the so-called embarrassingly parallel applications. One such application is the computation of the Jacobian matrix inherent to most nonlinear equation solvers. Computation of this matrix represents the primary bottleneck in nonlinear solver speed such that commercial finite element (FE) and multi-body-dynamic (MBD) codes attempt to minimize computations. A timing study using MATLAB's Parallel Computing Toolbox was performed for numerical computation of the Jacobian. Several approaches for implementing parallel code were investigated while only the single program multiple data (spmd) method using composite objects provided positive results. Parallel code speedup is demonstrated but the goal of linear speedup through the addition of processors was not achieved due to PC architecture.

Distributed parallel cooperative coevolutionary multi-objective large-scale immune algorithm for deployment of wireless sensor networks

DEFF Research Database (Denmark)

Cao, Bin; Zhao, Jianwei; Yang, Po

2018-01-01

-objective evolutionary algorithms the Cooperative Coevolutionary Generalized Differential Evolution 3, the Cooperative Multi-objective Differential Evolution and the Nondominated Sorting Genetic Algorithm III, the proposed algorithm addresses the deployment optimization problem efficiently and effectively.......Using immune algorithms is generally a time-intensive process especially for problems with a large number of variables. In this paper, we propose a distributed parallel cooperative coevolutionary multi-objective large-scale immune algorithm that is implemented using the message passing interface...... (MPI). The proposed algorithm is composed of three layers: objective, group and individual layers. First, for each objective in the multi-objective problem to be addressed, a subpopulation is used for optimization, and an archive population is used to optimize all the objectives. Second, the large...
A parallel multi-domain solution methodology applied to nonlinear thermal transport problems in nuclear fuel pins

Energy Technology Data Exchange (ETDEWEB)

Philip, Bobby, E-mail: philipb@ornl.gov [Oak Ridge National Laboratory, One Bethel Valley Road, Oak Ridge, TN 37831 (United States); Berrill, Mark A.; Allu, Srikanth; Hamilton, Steven P.; Sampath, Rahul S.; Clarno, Kevin T. [Oak Ridge National Laboratory, One Bethel Valley Road, Oak Ridge, TN 37831 (United States); Dilts, Gary A. [Los Alamos National Laboratory, PO Box 1663, Los Alamos, NM 87545 (United States)

2015-04-01

This paper describes an efficient and nonlinearly consistent parallel solution methodology for solving coupled nonlinear thermal transport problems that occur in nuclear reactor applications over hundreds of individual 3D physical subdomains. Efficiency is obtained by leveraging knowledge of the physical domains, the physics on individual domains, and the couplings between them for preconditioning within a Jacobian Free Newton Krylov method. Details of the computational infrastructure that enabled this work, namely the open source Advanced Multi-Physics (AMP) package developed by the authors is described. Details of verification and validation experiments, and parallel performance analysis in weak and strong scaling studies demonstrating the achieved efficiency of the algorithm are presented. Furthermore, numerical experiments demonstrate that the preconditioner developed is independent of the number of fuel subdomains in a fuel rod, which is particularly important when simulating different types of fuel rods. Finally, we demonstrate the power of the coupling methodology by considering problems with couplings between surface and volume physics and coupling of nonlinear thermal transport in fuel rods to an external radiation transport code.
Uncertainty Quantification for Large-Scale Ice Sheet Modeling

Energy Technology Data Exchange (ETDEWEB)

Ghattas, Omar [Univ. of Texas, Austin, TX (United States)

2016-02-05

This report summarizes our work to develop advanced forward and inverse solvers and uncertainty quantification capabilities for a nonlinear 3D full Stokes continental-scale ice sheet flow model. The components include: (1) forward solver: a new state-of-the-art parallel adaptive scalable high-order-accurate mass-conservative Newton-based 3D nonlinear full Stokes ice sheet flow simulator; (2) inverse solver: a new adjoint-based inexact Newton method for solution of deterministic inverse problems governed by the above 3D nonlinear full Stokes ice flow model; and (3) uncertainty quantification: a novel Hessian-based Bayesian method for quantifying uncertainties in the inverse ice sheet flow solution and propagating them forward into predictions of quantities of interest such as ice mass flux to the ocean.
A novel two-level dynamic parallel data scheme for large 3-D SN calculations

International Nuclear Information System (INIS)

Sjoden, G.E.; Shedlock, D.; Haghighat, A.; Yi, C.

2005-01-01

We introduce a new dynamic parallel memory optimization scheme for executing large scale 3-D discrete ordinates (Sn) simulations on distributed memory parallel computers. In order for parallel transport codes to be truly scalable, they must use parallel data storage, where only the variables that are locally computed are locally stored. Even with parallel data storage for the angular variables, cumulative storage requirements for large discrete ordinates calculations can be prohibitive. To address this problem, Memory Tuning has been implemented into the PENTRAN 3-D parallel discrete ordinates code as an optimized, two-level ('large' array, 'small' array) parallel data storage scheme. Memory Tuning can be described as the process of parallel data memory optimization. Memory Tuning dynamically minimizes the amount of required parallel data in allocated memory on each processor using a statistical sampling algorithm. This algorithm is based on the integral average and standard deviation of the number of fine meshes contained in each coarse mesh in the global problem. Because PENTRAN only stores the locally computed problem phase space, optimal two-level memory assignments can be unique on each node, depending upon the parallel decomposition used (hybrid combinations of angular, energy, or spatial). As demonstrated in the two large discrete ordinates models presented (a storage cask and an OECD MOX Benchmark), Memory Tuning can save a substantial amount of memory per parallel processor, allowing one to accomplish very large scale Sn computations. (authors)
Parallel integer sorting with medium and fine-scale parallelism

Science.gov (United States)

Dagum, Leonardo

1993-01-01

Two new parallel integer sorting algorithms, queue-sort and barrel-sort, are presented and analyzed in detail. These algorithms do not have optimal parallel complexity, yet they show very good performance in practice. Queue-sort designed for fine-scale parallel architectures which allow the queueing of multiple messages to the same destination. Barrel-sort is designed for medium-scale parallel architectures with a high message passing overhead. The performance results from the implementation of queue-sort on a Connection Machine CM-2 and barrel-sort on a 128 processor iPSC/860 are given. The two implementations are found to be comparable in performance but not as good as a fully vectorized bucket sort on the Cray YMP.
Parallel continuous simulated tempering and its applications in large-scale molecular simulations

Energy Technology Data Exchange (ETDEWEB)

Zang, Tianwu; Yu, Linglin; Zhang, Chong [Applied Physics Program and Department of Bioengineering, Rice University, Houston, Texas 77005 (United States); Ma, Jianpeng, E-mail: jpma@bcm.tmc.edu [Applied Physics Program and Department of Bioengineering, Rice University, Houston, Texas 77005 (United States); Verna and Marrs McLean Department of Biochemistry and Molecular Biology, Baylor College of Medicine, One Baylor Plaza, BCM-125, Houston, Texas 77030 (United States)

2014-07-28

In this paper, we introduce a parallel continuous simulated tempering (PCST) method for enhanced sampling in studying large complex systems. It mainly inherits the continuous simulated tempering (CST) method in our previous studies [C. Zhang and J. Ma, J. Chem. Phys. 130, 194112 (2009); C. Zhang and J. Ma, J. Chem. Phys. 132, 244101 (2010)], while adopts the spirit of parallel tempering (PT), or replica exchange method, by employing multiple copies with different temperature distributions. Differing from conventional PT methods, despite the large stride of total temperature range, the PCST method requires very few copies of simulations, typically 2–3 copies, yet it is still capable of maintaining a high rate of exchange between neighboring copies. Furthermore, in PCST method, the size of the system does not dramatically affect the number of copy needed because the exchange rate is independent of total potential energy, thus providing an enormous advantage over conventional PT methods in studying very large systems. The sampling efficiency of PCST was tested in two-dimensional Ising model, Lennard-Jones liquid and all-atom folding simulation of a small globular protein trp-cage in explicit solvent. The results demonstrate that the PCST method significantly improves sampling efficiency compared with other methods and it is particularly effective in simulating systems with long relaxation time or correlation time. We expect the PCST method to be a good alternative to parallel tempering methods in simulating large systems such as phase transition and dynamics of macromolecules in explicit solvent.
Fast Simulation of Large-Scale Floods Based on GPU Parallel Computing

OpenAIRE

Qiang Liu; Yi Qin; Guodong Li

2018-01-01

Computing speed is a significant issue of large-scale flood simulations for real-time response to disaster prevention and mitigation. Even today, most of the large-scale flood simulations are generally run on supercomputers due to the massive amounts of data and computations necessary. In this work, a two-dimensional shallow water model based on an unstructured Godunov-type finite volume scheme was proposed for flood simulation. To realize a fast simulation of large-scale floods on a personal...
Expectation propagation for large scale Bayesian inference of non-linear molecular networks from perturbation data.

Science.gov (United States)

Narimani, Zahra; Beigy, Hamid; Ahmad, Ashar; Masoudi-Nejad, Ali; Fröhlich, Holger

2017-01-01

Inferring the structure of molecular networks from time series protein or gene expression data provides valuable information about the complex biological processes of the cell. Causal network structure inference has been approached using different methods in the past. Most causal network inference techniques, such as Dynamic Bayesian Networks and ordinary differential equations, are limited by their computational complexity and thus make large scale inference infeasible. This is specifically true if a Bayesian framework is applied in order to deal with the unavoidable uncertainty about the correct model. We devise a novel Bayesian network reverse engineering approach using ordinary differential equations with the ability to include non-linearity. Besides modeling arbitrary, possibly combinatorial and time dependent perturbations with unknown targets, one of our main contributions is the use of Expectation Propagation, an algorithm for approximate Bayesian inference over large scale network structures in short computation time. We further explore the possibility of integrating prior knowledge into network inference. We evaluate the proposed model on DREAM4 and DREAM8 data and find it competitive against several state-of-the-art existing network inference methods.
Parallel Algorithm for Incremental Betweenness Centrality on Large Graphs

KAUST Repository

Jamour, Fuad Tarek

2017-10-17

Betweenness centrality quantifies the importance of nodes in a graph in many applications, including network analysis, community detection and identification of influential users. Typically, graphs in such applications evolve over time. Thus, the computation of betweenness centrality should be performed incrementally. This is challenging because updating even a single edge may trigger the computation of all-pairs shortest paths in the entire graph. Existing approaches cannot scale to large graphs: they either require excessive memory (i.e., quadratic to the size of the input graph) or perform unnecessary computations rendering them prohibitively slow. We propose iCentral; a novel incremental algorithm for computing betweenness centrality in evolving graphs. We decompose the graph into biconnected components and prove that processing can be localized within the affected components. iCentral is the first algorithm to support incremental betweeness centrality computation within a graph component. This is done efficiently, in linear space; consequently, iCentral scales to large graphs. We demonstrate with real datasets that the serial implementation of iCentral is up to 3.7 times faster than existing serial methods. Our parallel implementation that scales to large graphs, is an order of magnitude faster than the state-of-the-art parallel algorithm, while using an order of magnitude less computational resources.
MapReduce Based Parallel Neural Networks in Enabling Large Scale Machine Learning.

Science.gov (United States)

Liu, Yang; Yang, Jie; Huang, Yuan; Xu, Lixiong; Li, Siguang; Qi, Man

2015-01-01

Artificial neural networks (ANNs) have been widely used in pattern recognition and classification applications. However, ANNs are notably slow in computation especially when the size of data is large. Nowadays, big data has received a momentum from both industry and academia. To fulfill the potentials of ANNs for big data applications, the computation process must be speeded up. For this purpose, this paper parallelizes neural networks based on MapReduce, which has become a major computing model to facilitate data intensive applications. Three data intensive scenarios are considered in the parallelization process in terms of the volume of classification data, the size of the training data, and the number of neurons in the neural network. The performance of the parallelized neural networks is evaluated in an experimental MapReduce computer cluster from the aspects of accuracy in classification and efficiency in computation.
Large-scale computing with Quantum Espresso

International Nuclear Information System (INIS)

Giannozzi, P.; Cavazzoni, C.

2009-01-01

This paper gives a short introduction to Quantum Espresso: a distribution of software for atomistic simulations in condensed-matter physics, chemical physics, materials science, and to its usage in large-scale parallel computing.
MapReduce Based Parallel Neural Networks in Enabling Large Scale Machine Learning

Directory of Open Access Journals (Sweden)

Yang Liu

2015-01-01

Full Text Available Artificial neural networks (ANNs have been widely used in pattern recognition and classification applications. However, ANNs are notably slow in computation especially when the size of data is large. Nowadays, big data has received a momentum from both industry and academia. To fulfill the potentials of ANNs for big data applications, the computation process must be speeded up. For this purpose, this paper parallelizes neural networks based on MapReduce, which has become a major computing model to facilitate data intensive applications. Three data intensive scenarios are considered in the parallelization process in terms of the volume of classification data, the size of the training data, and the number of neurons in the neural network. The performance of the parallelized neural networks is evaluated in an experimental MapReduce computer cluster from the aspects of accuracy in classification and efficiency in computation.
Parallel Computing in SCALE

International Nuclear Information System (INIS)

DeHart, Mark D.; Williams, Mark L.; Bowman, Stephen M.

2010-01-01

The SCALE computational architecture has remained basically the same since its inception 30 years ago, although constituent modules and capabilities have changed significantly. This SCALE concept was intended to provide a framework whereby independent codes can be linked to provide a more comprehensive capability than possible with the individual programs - allowing flexibility to address a wide variety of applications. However, the current system was designed originally for mainframe computers with a single CPU and with significantly less memory than today's personal computers. It has been recognized that the present SCALE computation system could be restructured to take advantage of modern hardware and software capabilities, while retaining many of the modular features of the present system. Preliminary work is being done to define specifications and capabilities for a more advanced computational architecture. This paper describes the state of current SCALE development activities and plans for future development. With the release of SCALE 6.1 in 2010, a new phase of evolutionary development will be available to SCALE users within the TRITON and NEWT modules. The SCALE (Standardized Computer Analyses for Licensing Evaluation) code system developed by Oak Ridge National Laboratory (ORNL) provides a comprehensive and integrated package of codes and nuclear data for a wide range of applications in criticality safety, reactor physics, shielding, isotopic depletion and decay, and sensitivity/uncertainty (S/U) analysis. Over the last three years, since the release of version 5.1 in 2006, several important new codes have been introduced within SCALE, and significant advances applied to existing codes. Many of these new features became available with the release of SCALE 6.0 in early 2009. However, beginning with SCALE 6.1, a first generation of parallel computing is being introduced. In addition to near-term improvements, a plan for longer term SCALE enhancement
Landau fluid model for weakly nonlinear dispersive magnetohydrodynamics

International Nuclear Information System (INIS)

Passot, T.; Sulem, P. L.

2005-01-01

In may astrophysical plasmas such as the solar wind, the terrestrial magnetosphere, or in the interstellar medium at small enough scales, collisions are negligible. When interested in the large-scale dynamics, a hydrodynamic approach is advantageous not only because its numerical simulations is easier than of the full Vlasov-Maxwell equations, but also because it provides a deep understanding of cross-scale nonlinear couplings. It is thus of great interest to construct fluid models that extended the classical magnetohydrodynamic (MHD) equations to collisionless situations. Two ingredients need to be included in such a model to capture the main kinetic effects: finite Larmor radius (FLR) corrections and Landau damping, the only fluid-particle resonance that can affect large scales and can be modeled in a relatively simple way. The Modelization of Landau damping in a fluid formalism is hardly possible in the framework of a systematic asymptotic expansion and was addressed mainly by means of parameter fitting in a linearized setting. We introduced a similar Landau fluid model but, that has the advantage of taking dispersive effects into account. This model properly describes dispersive MHD waves in quasi-parallel propagation. Since, by construction, the system correctly reproduces their linear dynamics, appropriate tests should address the nonlinear regime. In a first case, we show analytically that the weakly nonlinear modulational dynamics of quasi-parallel propagating Alfven waves is well captured. As a second test we consider the parametric decay instability of parallel Alfven waves and show that numerical simulations of the dispersive Landau fluid model lead to results that closely match the outcome of hybrid simulations. (Author)
Topology Optimization of Large Scale Stokes Flow Problems

DEFF Research Database (Denmark)

Aage, Niels; Poulsen, Thomas Harpsøe; Gersborg-Hansen, Allan

2008-01-01

This note considers topology optimization of large scale 2D and 3D Stokes flow problems using parallel computations. We solve problems with up to 1.125.000 elements in 2D and 128.000 elements in 3D on a shared memory computer consisting of Sun UltraSparc IV CPUs.......This note considers topology optimization of large scale 2D and 3D Stokes flow problems using parallel computations. We solve problems with up to 1.125.000 elements in 2D and 128.000 elements in 3D on a shared memory computer consisting of Sun UltraSparc IV CPUs....
A concurrent visualization system for large-scale unsteady simulations. Parallel vector performance on an NEC SX-4

International Nuclear Information System (INIS)

Takei, Toshifumi; Doi, Shun; Matsumoto, Hideki; Muramatsu, Kazuhiro

2000-01-01

We have developed a concurrent visualization system RVSLIB (Real-time Visual Simulation Library). This paper shows the effectiveness of the system when it is applied to large-scale unsteady simulations, for which the conventional post-processing approach may no longer work, on high-performance parallel vector supercomputers. The system performs almost all of the visualization tasks on a computation server and uses compressed visualized image data for efficient communication between the server and the user terminal. We have introduced several techniques, including vectorization and parallelization, into the system to minimize the computational costs of the visualization tools. The performance of RVSLIB was evaluated by using an actual CFD code on an NEC SX-4. The computational time increase due to the concurrent visualization was at most 3% for a smaller (1.6 million) grid and less than 1% for a larger (6.2 million) one. (author)
Concurrent Programming Using Actors: Exploiting Large-Scale Parallelism,

Science.gov (United States)

1985-10-07

ORGANIZATION NAME AND ADDRESS 10. PROGRAM ELEMENT. PROJECT. TASK* Artificial Inteligence Laboratory AREA Is WORK UNIT NUMBERS 545 Technology Square...D-R162 422 CONCURRENT PROGRMMIZNG USING f"OS XL?ITP TEH l’ LARGE-SCALE PARALLELISH(U) NASI AC E Al CAMBRIDGE ARTIFICIAL INTELLIGENCE L. G AGHA ET AL...RESOLUTION TEST CHART N~ATIONAL BUREAU OF STANDA.RDS - -96 A -E. __ _ __ __’ .,*- - -- •. - MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL
Optimized parallel convolutions for non-linear fluid models of tokamak ηi turbulence

International Nuclear Information System (INIS)

Milovich, J.L.; Tomaschke, G.; Kerbel, G.D.

1993-01-01

Non-linear computational fluid models of plasma turbulence based on spectral methods typically spend a large fraction of the total computing time evaluating convolutions. Usually these convolutions arise from an explicit or semi implicit treatment of the convective non-linearities in the problem. Often the principal convective velocity is perpendicular to magnetic field lines allowing a reduction of the convolution to two dimensions in an appropriate geometry, but beyond this, different models vary widely in the particulars of which mode amplitudes are selectively evolved to get the most efficient representation of the turbulence. As the number of modes in the problem, N, increases, the amount of computation required for this part of the evolution algorithm then scales as N 2 /timestep for a direct or analytic method and N ln N/timestep for a pseudospectral method. The constants of proportionality depend on the particulars of mode selection and determine the size problem for which the method will perform equally. For large enough N, the pseudospectral method performance is always superior, though some problems do not require correspondingly high resolution. Further, the Courant condition for numerical stability requires that the timestep size must decrease proportionately as N increases, thus accentuating the need to have fast methods for larger N problems. The authors have developed a package for the Cray system which performs these convolutions for a rather arbitrary mode selection scheme using either method. The package is highly optimized using a combination of macro and microtasking techniques, as well as vectorization and in some cases assembly coded routines. Parts of the package have also been developed and optimized for the CM200 and CM5 system. Performance comparisons with respect to problem size, parallelization, selection schemes and architecture are presented
PetClaw: Parallelization and Performance Optimization of a Python-Based Nonlinear Wave Propagation Solver Using PETSc

KAUST Repository

Alghamdi, Amal Mohammed

2012-04-01

Clawpack, a conservation laws package implemented in Fortran, and its Python-based version, PyClaw, are existing tools providing nonlinear wave propagation solvers that use state of the art finite volume methods. Simulations using those tools can have extensive computational requirements to provide accurate results. Therefore, a number of tools, such as BearClaw and MPIClaw, have been developed based on Clawpack to achieve significant speedup by exploiting parallel architectures. However, none of them has been shown to scale on a large number of cores. Furthermore, these tools, implemented in Fortran, achieve parallelization by inserting parallelization logic and MPI standard routines throughout the serial code in a non modular manner. Our contribution in this thesis research is three-fold. First, we demonstrate an advantageous use case of Python in implementing easy-to-use modular extensible scalable scientific software tools by developing an implementation of a parallelization framework, PetClaw, for PyClaw using the well-known Portable Extensible Toolkit for Scientific Computation, PETSc, through its Python wrapper petsc4py. Second, we demonstrate the possibility of getting acceptable Python code performance when compared to Fortran performance after introducing a number of serial optimizations to the Python code including integrating Clawpack Fortran kernels into PyClaw for low-level computationally intensive parts of the code. As a result of those optimizations, the Python overhead in PetClaw for a shallow water application is only 12 percent when compared to the corresponding Fortran Clawpack application. Third, we provide a demonstration of PetClaw scalability on up to the entirety of Shaheen; a 16-rack Blue Gene/P IBM supercomputer that comprises 65,536 cores and located at King Abdullah University of Science and Technology (KAUST). The PetClaw solver achieved above 0.98 weak scaling efficiency for an Euler application on the whole machine excluding the
Parallel Index and Query for Large Scale Data Analysis

Energy Technology Data Exchange (ETDEWEB)

Chou, Jerry; Wu, Kesheng; Ruebel, Oliver; Howison, Mark; Qiang, Ji; Prabhat,; Austin, Brian; Bethel, E. Wes; Ryne, Rob D.; Shoshani, Arie

2011-07-18

Modern scientific datasets present numerous data management and analysis challenges. State-of-the-art index and query technologies are critical for facilitating interactive exploration of large datasets, but numerous challenges remain in terms of designing a system for process- ing general scientific datasets. The system needs to be able to run on distributed multi-core platforms, efficiently utilize underlying I/O infrastructure, and scale to massive datasets. We present FastQuery, a novel software framework that address these challenges. FastQuery utilizes a state-of-the-art index and query technology (FastBit) and is designed to process mas- sive datasets on modern supercomputing platforms. We apply FastQuery to processing of a massive 50TB dataset generated by a large scale accelerator modeling code. We demonstrate the scalability of the tool to 11,520 cores. Motivated by the scientific need to search for inter- esting particles in this dataset, we use our framework to reduce search time from hours to tens of seconds.

Large-scale Intelligent Transporation Systems simulation

Energy Technology Data Exchange (ETDEWEB)

Ewing, T.; Canfield, T.; Hannebutte, U.; Levine, D.; Tentner, A.

1995-06-01

A prototype computer system has been developed which defines a high-level architecture for a large-scale, comprehensive, scalable simulation of an Intelligent Transportation System (ITS) capable of running on massively parallel computers and distributed (networked) computer systems. The prototype includes the modelling of instrumented ``smart`` vehicles with in-vehicle navigation units capable of optimal route planning and Traffic Management Centers (TMC). The TMC has probe vehicle tracking capabilities (display position and attributes of instrumented vehicles), and can provide 2-way interaction with traffic to provide advisories and link times. Both the in-vehicle navigation module and the TMC feature detailed graphical user interfaces to support human-factors studies. The prototype has been developed on a distributed system of networked UNIX computers but is designed to run on ANL`s IBM SP-X parallel computer system for large scale problems. A novel feature of our design is that vehicles will be represented by autonomus computer processes, each with a behavior model which performs independent route selection and reacts to external traffic events much like real vehicles. With this approach, one will be able to take advantage of emerging massively parallel processor (MPP) systems.
The origin of large scale cosmic structure

International Nuclear Information System (INIS)

Jones, B.J.T.; Palmer, P.L.

1985-01-01

The paper concerns the origin of large scale cosmic structure. The evolution of density perturbations, the nonlinear regime (Zel'dovich's solution and others), the Gott and Rees clustering hierarchy, the spectrum of condensations, and biassed galaxy formation, are all discussed. (UK)
Multigrid Reduction in Time for Nonlinear Parabolic Problems

Energy Technology Data Exchange (ETDEWEB)

Falgout, R. D. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Manteuffel, T. A. [Univ. of Colorado, Boulder, CO (United States); O' Neill, B. [Univ. of Colorado, Boulder, CO (United States); Schroder, J. B. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)

2016-01-04

The need for parallel-in-time is being driven by changes in computer architectures, where future speed-ups will be available through greater concurrency, but not faster clock speeds, which are stagnant.This leads to a bottleneck for sequential time marching schemes, because they lack parallelism in the time dimension. Multigrid Reduction in Time (MGRIT) is an iterative procedure that allows for temporal parallelism by utilizing multigrid reduction techniques and a multilevel hierarchy of coarse time grids. MGRIT has been shown to be effective for linear problems, with speedups of up to 50 times. The goal of this work is the efficient solution of nonlinear problems with MGRIT, where efficient is defined as achieving similar performance when compared to a corresponding linear problem. As our benchmark, we use the p-Laplacian, where p = 4 corresponds to a well-known nonlinear diffusion equation and p = 2 corresponds to our benchmark linear diffusion problem. When considering linear problems and implicit methods, the use of optimal spatial solvers such as spatial multigrid imply that the cost of one time step evaluation is fixed across temporal levels, which have a large variation in time step sizes. This is not the case for nonlinear problems, where the work required increases dramatically on coarser time grids, where relatively large time steps lead to worse conditioned nonlinear solves and increased nonlinear iteration counts per time step evaluation. This is the key difficulty explored by this paper. We show that by using a variety of strategies, most importantly, spatial coarsening and an alternate initial guess to the nonlinear time-step solver, we can reduce the work per time step evaluation over all temporal levels to a range similar with the corresponding linear problem. This allows for parallel scaling behavior comparable to the corresponding linear problem.
Weak-periodic stochastic resonance in a parallel array of static nonlinearities.

Directory of Open Access Journals (Sweden)

Yumei Ma

Full Text Available This paper studies the output-input signal-to-noise ratio (SNR gain of an uncoupled parallel array of static, yet arbitrary, nonlinear elements for transmitting a weak periodic signal in additive white noise. In the small-signal limit, an explicit expression for the SNR gain is derived. It serves to prove that the SNR gain is always a monotonically increasing function of the array size for any given nonlinearity and noisy environment. It also determines the SNR gain maximized by the locally optimal nonlinearity as the upper bound of the SNR gain achieved by an array of static nonlinear elements. With locally optimal nonlinearity, it is demonstrated that stochastic resonance cannot occur, i.e. adding internal noise into the array never improves the SNR gain. However, in an array of suboptimal but easily implemented threshold nonlinearities, we show the feasibility of situations where stochastic resonance occurs, and also the possibility of the SNR gain exceeding unity for a wide range of input noise distributions.
Cosmic Shear With ACS Pure Parallels

Science.gov (United States)

Rhodes, Jason

2002-07-01

Small distortions in the shapes of background galaxies by foreground mass provide a powerful method of directly measuring the amount and distribution of dark matter. Several groups have recently detected this weak lensing by large-scale structure, also called cosmic shear. The high resolution and sensitivity of HST/ACS provide a unique opportunity to measure cosmic shear accurately on small scales. Using 260 parallel orbits in Sloan textiti {F775W} we will measure for the first time: beginlistosetlength sep0cm setlengthemsep0cm setlengthopsep0cm em the cosmic shear variance on scales Omega_m^0.5, with signal-to-noise {s/n} 20, and the mass density Omega_m with s/n=4. They will be done at small angular scales where non-linear effects dominate the power spectrum, providing a test of the gravitational instability paradigm for structure formation. Measurements on these scales are not possible from the ground, because of the systematic effects induced by PSF smearing from seeing. Having many independent lines of sight reduces the uncertainty due to cosmic variance, making parallel observations ideal.
Optical technologies for data communication in large parallel systems

International Nuclear Information System (INIS)

Ritter, M B; Vlasov, Y; Kash, J A; Benner, A

2011-01-01

Large, parallel systems have greatly aided scientific computation and data collection, but performance scaling now relies on chip and system-level parallelism. This has happened because power density limits have caused processor frequency growth to stagnate, driving the new multi-core architecture paradigm, which would seem to provide generations of performance increases as transistors scale. However, this paradigm will be constrained by electrical I/O bandwidth limits; first off the processor card, then off the processor module itself. We will present best-estimates of these limits, then show how optical technologies can help provide more bandwidth to allow continued system scaling. We will describe the current status of optical transceiver technology which is already being used to exceed off-board electrical bandwidth limits, then present work on silicon nanophotonic transceivers and 3D integration technologies which, taken together, promise to allow further increases in off-module and off-card bandwidth. Finally, we will show estimated limits of nanophotonic links and discuss breakthroughs that are needed for further progress, and will speculate on whether we will reach Exascale-class machine performance at affordable powers.
Optical technologies for data communication in large parallel systems

Energy Technology Data Exchange (ETDEWEB)

Ritter, M B; Vlasov, Y; Kash, J A [IBM T.J. Watson Research Center, Yorktown Heights, NY (United States); Benner, A, E-mail: mritter@us.ibm.com [IBM Poughkeepsie, Poughkeepsie, NY (United States)

2011-01-15

Large, parallel systems have greatly aided scientific computation and data collection, but performance scaling now relies on chip and system-level parallelism. This has happened because power density limits have caused processor frequency growth to stagnate, driving the new multi-core architecture paradigm, which would seem to provide generations of performance increases as transistors scale. However, this paradigm will be constrained by electrical I/O bandwidth limits; first off the processor card, then off the processor module itself. We will present best-estimates of these limits, then show how optical technologies can help provide more bandwidth to allow continued system scaling. We will describe the current status of optical transceiver technology which is already being used to exceed off-board electrical bandwidth limits, then present work on silicon nanophotonic transceivers and 3D integration technologies which, taken together, promise to allow further increases in off-module and off-card bandwidth. Finally, we will show estimated limits of nanophotonic links and discuss breakthroughs that are needed for further progress, and will speculate on whether we will reach Exascale-class machine performance at affordable powers.
Lagrangian space consistency relation for large scale structure

International Nuclear Information System (INIS)

Horn, Bart; Hui, Lam; Xiao, Xiao

2015-01-01

Consistency relations, which relate the squeezed limit of an (N+1)-point correlation function to an N-point function, are non-perturbative symmetry statements that hold even if the associated high momentum modes are deep in the nonlinear regime and astrophysically complex. Recently, Kehagias and Riotto and Peloso and Pietroni discovered a consistency relation applicable to large scale structure. We show that this can be recast into a simple physical statement in Lagrangian space: that the squeezed correlation function (suitably normalized) vanishes. This holds regardless of whether the correlation observables are at the same time or not, and regardless of whether multiple-streaming is present. The simplicity of this statement suggests that an analytic understanding of large scale structure in the nonlinear regime may be particularly promising in Lagrangian space
Regional-scale calculation of the LS factor using parallel processing

Science.gov (United States)

Liu, Kai; Tang, Guoan; Jiang, Ling; Zhu, A.-Xing; Yang, Jianyi; Song, Xiaodong

2015-05-01

With the increase of data resolution and the increasing application of USLE over large areas, the existing serial implementation of algorithms for computing the LS factor is becoming a bottleneck. In this paper, a parallel processing model based on message passing interface (MPI) is presented for the calculation of the LS factor, so that massive datasets at a regional scale can be processed efficiently. The parallel model contains algorithms for calculating flow direction, flow accumulation, drainage network, slope, slope length and the LS factor. According to the existence of data dependence, the algorithms are divided into local algorithms and global algorithms. Parallel strategy are designed according to the algorithm characters including the decomposition method for maintaining the integrity of the results, optimized workflow for reducing the time taken for exporting the unnecessary intermediate data and a buffer-communication-computation strategy for improving the communication efficiency. Experiments on a multi-node system show that the proposed parallel model allows efficient calculation of the LS factor at a regional scale with a massive dataset.
Photorealistic large-scale urban city model reconstruction.

Science.gov (United States)

Poullis, Charalambos; You, Suya

2009-01-01

The rapid and efficient creation of virtual environments has become a crucial part of virtual reality applications. In particular, civil and defense applications often require and employ detailed models of operations areas for training, simulations of different scenarios, planning for natural or man-made events, monitoring, surveillance, games, and films. A realistic representation of the large-scale environments is therefore imperative for the success of such applications since it increases the immersive experience of its users and helps reduce the difference between physical and virtual reality. However, the task of creating such large-scale virtual environments still remains a time-consuming and manual work. In this work, we propose a novel method for the rapid reconstruction of photorealistic large-scale virtual environments. First, a novel, extendible, parameterized geometric primitive is presented for the automatic building identification and reconstruction of building structures. In addition, buildings with complex roofs containing complex linear and nonlinear surfaces are reconstructed interactively using a linear polygonal and a nonlinear primitive, respectively. Second, we present a rendering pipeline for the composition of photorealistic textures, which unlike existing techniques, can recover missing or occluded texture information by integrating multiple information captured from different optical sensors (ground, aerial, and satellite).
Partial fourier and parallel MR image reconstruction with integrated gradient nonlinearity correction.

Science.gov (United States)

Tao, Shengzhen; Trzasko, Joshua D; Shu, Yunhong; Weavers, Paul T; Huston, John; Gray, Erin M; Bernstein, Matt A

2016-06-01

To describe how integrated gradient nonlinearity (GNL) correction can be used within noniterative partial Fourier (homodyne) and parallel (SENSE and GRAPPA) MR image reconstruction strategies, and demonstrate that performing GNL correction during, rather than after, these routines mitigates the image blurring and resolution loss caused by postreconstruction image domain based GNL correction. Starting from partial Fourier and parallel magnetic resonance imaging signal models that explicitly account for GNL, noniterative image reconstruction strategies for each accelerated acquisition technique are derived under the same core mathematical assumptions as their standard counterparts. A series of phantom and in vivo experiments on retrospectively undersampled data were performed to investigate the spatial resolution benefit of integrated GNL correction over conventional postreconstruction correction. Phantom and in vivo results demonstrate that the integrated GNL correction reduces the image blurring introduced by the conventional GNL correction, while still correcting GNL-induced coarse-scale geometrical distortion. Images generated from undersampled data using the proposed integrated GNL strategies offer superior depiction of fine image detail, for example, phantom resolution inserts and anatomical tissue boundaries. Noniterative partial Fourier and parallel imaging reconstruction methods with integrated GNL correction reduce the resolution loss that occurs during conventional postreconstruction GNL correction while preserving the computational efficiency of standard reconstruction techniques. Magn Reson Med 75:2534-2544, 2016. © 2015 Wiley Periodicals, Inc. © 2015 Wiley Periodicals, Inc.
Three-Dimensional Induced Polarization Parallel Inversion Using Nonlinear Conjugate Gradients Method

Directory of Open Access Journals (Sweden)

Huan Ma

2015-01-01

Full Text Available Four kinds of array of induced polarization (IP methods (surface, borehole-surface, surface-borehole, and borehole-borehole are widely used in resource exploration. However, due to the presence of large amounts of the sources, it will take much time to complete the inversion. In the paper, a new parallel algorithm is described which uses message passing interface (MPI and graphics processing unit (GPU to accelerate 3D inversion of these four methods. The forward finite differential equation is solved by ILU0 preconditioner and the conjugate gradient (CG solver. The inverse problem is solved by nonlinear conjugate gradients (NLCG iteration which is used to calculate one forward and two “pseudo-forward” modelings and update the direction, space, and model in turn. Because each source is independent in forward and “pseudo-forward” modelings, multiprocess modes are opened by calling MPI library. The iterative matrix solver within CULA is called in each process. Some tables and synthetic data examples illustrate that this parallel inversion algorithm is effective. Furthermore, we demonstrate that the joint inversion of surface and borehole data produces resistivity and chargeability results are superior to those obtained from inversions of individual surface data.
Determination of the onset nonlinearity hydrodynamic characteristics at two-phase flow in parallel vertical channels

International Nuclear Information System (INIS)

Jovic, V.; Afgan, N.; Jovic, L.; Spasojevic, D.

1993-01-01

The paper presents results of the experimental and theoretical analyses of linear and nonlinear characteristics of adiabatic two-phase water-air flow in vertical parallel channels. Regime character changes and linear to nonlinear dynamic characteristics transfer conditions were defined. (author)
Efficient numerical methods for the large-scale, parallel solution of elastoplastic contact problems

KAUST Repository

Frohne, Jö rg; Heister, Timo; Bangerth, Wolfgang

2015-01-01

© 2016 John Wiley & Sons, Ltd. Quasi-static elastoplastic contact problems are ubiquitous in many industrial processes and other contexts, and their numerical simulation is consequently of great interest in accurately describing and optimizing production processes. The key component in these simulations is the solution of a single load step of a time iteration. From a mathematical perspective, the problems to be solved in each time step are characterized by the difficulties of variational inequalities for both the plastic behavior and the contact problem. Computationally, they also often lead to very large problems. In this paper, we present and evaluate a complete set of methods that are (1) designed to work well together and (2) allow for the efficient solution of such problems. In particular, we use adaptive finite element meshes with linear and quadratic elements, a Newton linearization of the plasticity, active set methods for the contact problem, and multigrid-preconditioned linear solvers. Through a sequence of numerical experiments, we show the performance of these methods. This includes highly accurate solutions of a three-dimensional benchmark problem and scaling our methods in parallel to 1024 cores and more than a billion unknowns.
Efficient numerical methods for the large-scale, parallel solution of elastoplastic contact problems

KAUST Repository

Frohne, Jörg

2015-08-06

© 2016 John Wiley & Sons, Ltd. Quasi-static elastoplastic contact problems are ubiquitous in many industrial processes and other contexts, and their numerical simulation is consequently of great interest in accurately describing and optimizing production processes. The key component in these simulations is the solution of a single load step of a time iteration. From a mathematical perspective, the problems to be solved in each time step are characterized by the difficulties of variational inequalities for both the plastic behavior and the contact problem. Computationally, they also often lead to very large problems. In this paper, we present and evaluate a complete set of methods that are (1) designed to work well together and (2) allow for the efficient solution of such problems. In particular, we use adaptive finite element meshes with linear and quadratic elements, a Newton linearization of the plasticity, active set methods for the contact problem, and multigrid-preconditioned linear solvers. Through a sequence of numerical experiments, we show the performance of these methods. This includes highly accurate solutions of a three-dimensional benchmark problem and scaling our methods in parallel to 1024 cores and more than a billion unknowns.
Large-scale sequential quadratic programming algorithms

Energy Technology Data Exchange (ETDEWEB)

Eldersveld, S.K.

1992-09-01

The problem addressed is the general nonlinear programming problem: finding a local minimizer for a nonlinear function subject to a mixture of nonlinear equality and inequality constraints. The methods studied are in the class of sequential quadratic programming (SQP) algorithms, which have previously proved successful for problems of moderate size. Our goal is to devise an SQP algorithm that is applicable to large-scale optimization problems, using sparse data structures and storing less curvature information but maintaining the property of superlinear convergence. The main features are: 1. The use of a quasi-Newton approximation to the reduced Hessian of the Lagrangian function. Only an estimate of the reduced Hessian matrix is required by our algorithm. The impact of not having available the full Hessian approximation is studied and alternative estimates are constructed. 2. The use of a transformation matrix Q. This allows the QP gradient to be computed easily when only the reduced Hessian approximation is maintained. 3. The use of a reduced-gradient form of the basis for the null space of the working set. This choice of basis is more practical than an orthogonal null-space basis for large-scale problems. The continuity condition for this choice is proven. 4. The use of incomplete solutions of quadratic programming subproblems. Certain iterates generated by an active-set method for the QP subproblem are used in place of the QP minimizer to define the search direction for the nonlinear problem. An implementation of the new algorithm has been obtained by modifying the code MINOS. Results and comparisons with MINOS and NPSOL are given for the new algorithm on a set of 92 test problems.
Parallel computing in plasma physics: Nonlinear instabilities

International Nuclear Information System (INIS)

Pohn, E.; Kamelander, G.; Shoucri, M.

2000-01-01

A Vlasov-Poisson-system is used for studying the time evolution of the charge-separation at a spatial one- as well as a two-dimensional plasma-edge. Ions are advanced in time using the Vlasov-equation. The whole three-dimensional velocity-space is considered leading to very time-consuming four-resp. five-dimensional fully kinetic simulations. In the 1D simulations electrons are assumed to behave adiabatic, i.e. they are Boltzmann-distributed, leading to a nonlinear Poisson-equation. In the 2D simulations a gyro-kinetic approximation is used for the electrons. The plasma is assumed to be initially neutral. The simulations are performed at an equidistant grid. A constant time-step is used for advancing the density-distribution function in time. The time-evolution of the distribution function is performed using a splitting scheme. Each dimension (x, y, υ x , υ y , υ z ) of the phase-space is advanced in time separately. The value of the distribution function for the next time is calculated from the value of an - in general - interstitial point at the present time (fractional shift). One-dimensional cubic-spline interpolation is used for calculating the interstitial function values. After the fractional shifts are performed for each dimension of the phase-space, a whole time-step for advancing the distribution function is finished. Afterwards the charge density is calculated, the Poisson-equation is solved and the electric field is calculated before the next time-step is performed. The fractional shift method sketched above was parallelized for p processors as follows. Considering first the shifts in y-direction, a proper parallelization strategy is to split the grid into p disjoint υ z -slices, which are sub-grids, each containing a different 1/p-th part of the υ z range but the whole range of all other dimensions. Each processor is responsible for performing the y-shifts on a different slice, which can be done in parallel without any communication between
Newton-Krylov-BDDC solvers for nonlinear cardiac mechanics

KAUST Repository

Pavarino, L.F.; Scacchi, S.; Zampini, Stefano

2015-01-01

The aim of this work is to design and study a Balancing Domain Decomposition by Constraints (BDDC) solver for the nonlinear elasticity system modeling the mechanical deformation of cardiac tissue. The contraction–relaxation process in the myocardium is induced by the generation and spread of the bioelectrical excitation throughout the tissue and it is mathematically described by the coupling of cardiac electro-mechanical models consisting of systems of partial and ordinary differential equations. In this study, the discretization of the electro-mechanical models is performed by Q1 finite elements in space and semi-implicit finite difference schemes in time, leading to the solution of a large-scale linear system for the bioelectrical potentials and a nonlinear system for the mechanical deformation at each time step of the simulation. The parallel mechanical solver proposed in this paper consists in solving the nonlinear system with a Newton-Krylov-BDDC method, based on the parallel solution of local mechanical problems and a coarse problem for the so-called primal unknowns. Three-dimensional parallel numerical tests on different machines show that the proposed parallel solver is scalable in the number of subdomains, quasi-optimal in the ratio of subdomain to mesh sizes, and robust with respect to tissue anisotropy.
Newton-Krylov-BDDC solvers for nonlinear cardiac mechanics

KAUST Repository

Pavarino, L.F.

2015-07-18

The aim of this work is to design and study a Balancing Domain Decomposition by Constraints (BDDC) solver for the nonlinear elasticity system modeling the mechanical deformation of cardiac tissue. The contraction–relaxation process in the myocardium is induced by the generation and spread of the bioelectrical excitation throughout the tissue and it is mathematically described by the coupling of cardiac electro-mechanical models consisting of systems of partial and ordinary differential equations. In this study, the discretization of the electro-mechanical models is performed by Q1 finite elements in space and semi-implicit finite difference schemes in time, leading to the solution of a large-scale linear system for the bioelectrical potentials and a nonlinear system for the mechanical deformation at each time step of the simulation. The parallel mechanical solver proposed in this paper consists in solving the nonlinear system with a Newton-Krylov-BDDC method, based on the parallel solution of local mechanical problems and a coarse problem for the so-called primal unknowns. Three-dimensional parallel numerical tests on different machines show that the proposed parallel solver is scalable in the number of subdomains, quasi-optimal in the ratio of subdomain to mesh sizes, and robust with respect to tissue anisotropy.
Neurite, a finite difference large scale parallel program for the simulation of electrical signal propagation in neurites under mechanical loading.

Directory of Open Access Journals (Sweden)

Julián A García-Grajales

Full Text Available With the growing body of research on traumatic brain injury and spinal cord injury, computational neuroscience has recently focused its modeling efforts on neuronal functional deficits following mechanical loading. However, in most of these efforts, cell damage is generally only characterized by purely mechanistic criteria, functions of quantities such as stress, strain or their corresponding rates. The modeling of functional deficits in neurites as a consequence of macroscopic mechanical insults has been rarely explored. In particular, a quantitative mechanically based model of electrophysiological impairment in neuronal cells, Neurite, has only very recently been proposed. In this paper, we present the implementation details of this model: a finite difference parallel program for simulating electrical signal propagation along neurites under mechanical loading. Following the application of a macroscopic strain at a given strain rate produced by a mechanical insult, Neurite is able to simulate the resulting neuronal electrical signal propagation, and thus the corresponding functional deficits. The simulation of the coupled mechanical and electrophysiological behaviors requires computational expensive calculations that increase in complexity as the network of the simulated cells grows. The solvers implemented in Neurite--explicit and implicit--were therefore parallelized using graphics processing units in order to reduce the burden of the simulation costs of large scale scenarios. Cable Theory and Hodgkin-Huxley models were implemented to account for the electrophysiological passive and active regions of a neurite, respectively, whereas a coupled mechanical model accounting for the neurite mechanical behavior within its surrounding medium was adopted as a link between electrophysiology and mechanics. This paper provides the details of the parallel implementation of Neurite, along with three different application examples: a long myelinated axon

Nonlinear stability of supersonic jets

Science.gov (United States)

Tiwari, S. N. (Principal Investigator); Bhat, T. R. S. (Principal Investigator)

1996-01-01

The stability calculations made for a shock-free supersonic jet using the model based on parabolized stability equations are presented. In this analysis the large scale structures, which play a dominant role in the mixing as well as the noise radiated, are modeled as instability waves. This model takes into consideration non-parallel flow effects and also nonlinear interaction of the instability waves. The stability calculations have been performed for different frequencies and mode numbers over a range of jet operating temperatures. Comparisons are made, where appropriate, with the solutions to Rayleigh's equation (linear, inviscid analysis with the assumption of parallel flow). The comparison of the solutions obtained using the two approaches show very good agreement.
Parallelization and implementation of approximate root isolation for nonlinear system by Monte Carlo

Science.gov (United States)

Khosravi, Ebrahim

1998-12-01

This dissertation solves a fundamental problem of isolating the real roots of nonlinear systems of equations by Monte-Carlo that were published by Bush Jones. This algorithm requires only function values and can be applied readily to complicated systems of transcendental functions. The implementation of this sequential algorithm provides scientists with the means to utilize function analysis in mathematics or other fields of science. The algorithm, however, is so computationally intensive that the system is limited to a very small set of variables, and this will make it unfeasible for large systems of equations. Also a computational technique was needed for investigating a metrology of preventing the algorithm structure from converging to the same root along different paths of computation. The research provides techniques for improving the efficiency and correctness of the algorithm. The sequential algorithm for this technique was corrected and a parallel algorithm is presented. This parallel method has been formally analyzed and is compared with other known methods of root isolation. The effectiveness, efficiency, enhanced overall performance of the parallel processing of the program in comparison to sequential processing is discussed. The message passing model was used for this parallel processing, and it is presented and implemented on Intel/860 MIMD architecture. The parallel processing proposed in this research has been implemented in an ongoing high energy physics experiment: this algorithm has been used to track neutrinoes in a super K detector. This experiment is located in Japan, and data can be processed on-line or off-line locally or remotely.
Cosmic Shear With ACS Pure Parallels. Targeted Portion.

Science.gov (United States)

Rhodes, Jason

2002-07-01

Small distortions in the shapes of background galaxies by foreground mass provide a powerful method of directly measuring the amount and distribution of dark matter. Several groups have recently detected this weak lensing by large-scale structure, also called cosmic shear. The high resolution and sensitivity of HST/ACS provide a unique opportunity to measure cosmic shear accurately on small scales. Using 260 parallel orbits in Sloan i {F775W} we will measure for the first time: the cosmic shear variance on scales Omega_m^0.5, with signal-to-noise {s/n} 20, and the mass density Omega_m with s/n=4. They will be done at small angular scales where non-linear effects dominate the power spectrum, providing a test of the gravitational instability paradigm for structure formation. Measurements on these scales are not possible from the ground, because of the systematic effects induced by PSF smearing from seeing. Having many independent lines of sight reduces the uncertainty due to cosmic variance, making parallel observations ideal.
Speedup predictions on large scientific parallel programs

International Nuclear Information System (INIS)

Williams, E.; Bobrowicz, F.

1985-01-01

How much speedup can we expect for large scientific parallel programs running on supercomputers. For insight into this problem we extend the parallel processing environment currently existing on the Cray X-MP (a shared memory multiprocessor with at most four processors) to a simulated N-processor environment, where N greater than or equal to 1. Several large scientific parallel programs from Los Alamos National Laboratory were run in this simulated environment, and speedups were predicted. A speedup of 14.4 on 16 processors was measured for one of the three most used codes at the Laboratory
Subgrid-scale models for large-eddy simulation of rotating turbulent channel flows

Science.gov (United States)

Silvis, Maurits H.; Bae, Hyunji Jane; Trias, F. Xavier; Abkar, Mahdi; Moin, Parviz; Verstappen, Roel

2017-11-01

We aim to design subgrid-scale models for large-eddy simulation of rotating turbulent flows. Rotating turbulent flows form a challenging test case for large-eddy simulation due to the presence of the Coriolis force. The Coriolis force conserves the total kinetic energy while transporting it from small to large scales of motion, leading to the formation of large-scale anisotropic flow structures. The Coriolis force may also cause partial flow laminarization and the occurrence of turbulent bursts. Many subgrid-scale models for large-eddy simulation are, however, primarily designed to parametrize the dissipative nature of turbulent flows, ignoring the specific characteristics of transport processes. We, therefore, propose a new subgrid-scale model that, in addition to the usual dissipative eddy viscosity term, contains a nondissipative nonlinear model term designed to capture transport processes, such as those due to rotation. We show that the addition of this nonlinear model term leads to improved predictions of the energy spectra of rotating homogeneous isotropic turbulence as well as of the Reynolds stress anisotropy in spanwise-rotating plane-channel flows. This work is financed by the Netherlands Organisation for Scientific Research (NWO) under Project Number 613.001.212.
Novel probabilistic and distributed algorithms for guidance, control, and nonlinear estimation of large-scale multi-agent systems

Science.gov (United States)

Bandyopadhyay, Saptarshi

guidance algorithms using results from numerical simulations and closed-loop hardware experiments on multiple quadrotors. In the second part of this dissertation, we present two novel discrete-time algorithms for distributed estimation, which track a single target using a network of heterogeneous sensing agents. The Distributed Bayesian Filtering (DBF) algorithm, the sensing agents combine their normalized likelihood functions using the logarithmic opinion pool and the discrete-time dynamic average consensus algorithm. Each agent's estimated likelihood function converges to an error ball centered on the joint likelihood function of the centralized multi-sensor Bayesian filtering algorithm. Using a new proof technique, the convergence, stability, and robustness properties of the DBF algorithm are rigorously characterized. The explicit bounds on the time step of the robust DBF algorithm are shown to depend on the time-scale of the target dynamics. Furthermore, the DBF algorithm for linear-Gaussian models can be cast into a modified form of the Kalman information filter. In the Bayesian Consensus Filtering (BCF) algorithm, the agents combine their estimated posterior pdfs multiple times within each time step using the logarithmic opinion pool scheme. Thus, each agent's consensual pdf minimizes the sum of Kullback-Leibler divergences with the local posterior pdfs. The performance and robust properties of these algorithms are validated using numerical simulations. In the third part of this dissertation, we present an attitude control strategy and a new nonlinear tracking controller for a spacecraft carrying a large object, such as an asteroid or a boulder. If the captured object is larger or comparable in size to the spacecraft and has significant modeling uncertainties, conventional nonlinear control laws that use exact feed-forward cancellation are not suitable because they exhibit a large resultant disturbance torque. The proposed nonlinear tracking control law guarantees
A Parallel Solver for Large-Scale Markov Chains

Czech Academy of Sciences Publication Activity Database

Benzi, M.; Tůma, Miroslav

2002-01-01

Roč. 41, - (2002), s. 135-153 ISSN 0168-9274 R&D Projects: GA AV ČR IAA2030801; GA ČR GA101/00/1035 Keywords : parallel preconditioning * iterative methods * discrete Markov chains * generalized inverses * singular matrices * graph partitioning * AINV * Bi-CGSTAB Subject RIV: BA - General Mathematics Impact factor: 0.504, year: 2002
Limitations and tradeoffs in synchronization of large-scale networks with uncertain links

Science.gov (United States)

Diwadkar, Amit; Vaidya, Umesh

2016-01-01

The synchronization of nonlinear systems connected over large-scale networks has gained popularity in a variety of applications, such as power grids, sensor networks, and biology. Stochastic uncertainty in the interconnections is a ubiquitous phenomenon observed in these physical and biological networks. We provide a size-independent network sufficient condition for the synchronization of scalar nonlinear systems with stochastic linear interactions over large-scale networks. This sufficient condition, expressed in terms of nonlinear dynamics, the Laplacian eigenvalues of the nominal interconnections, and the variance and location of the stochastic uncertainty, allows us to define a synchronization margin. We provide an analytical characterization of important trade-offs between the internal nonlinear dynamics, network topology, and uncertainty in synchronization. For nearest neighbour networks, the existence of an optimal number of neighbours with a maximum synchronization margin is demonstrated. An analytical formula for the optimal gain that produces the maximum synchronization margin allows us to compare the synchronization properties of various complex network topologies. PMID:27067994
Modeling of fatigue crack induced nonlinear ultrasonics using a highly parallelized explicit local interaction simulation approach

Science.gov (United States)

Shen, Yanfeng; Cesnik, Carlos E. S.

2016-04-01

This paper presents a parallelized modeling technique for the efficient simulation of nonlinear ultrasonics introduced by the wave interaction with fatigue cracks. The elastodynamic wave equations with contact effects are formulated using an explicit Local Interaction Simulation Approach (LISA). The LISA formulation is extended to capture the contact-impact phenomena during the wave damage interaction based on the penalty method. A Coulomb friction model is integrated into the computation procedure to capture the stick-slip contact shear motion. The LISA procedure is coded using the Compute Unified Device Architecture (CUDA), which enables the highly parallelized supercomputing on powerful graphic cards. Both the explicit contact formulation and the parallel feature facilitates LISA's superb computational efficiency over the conventional finite element method (FEM). The theoretical formulations based on the penalty method is introduced and a guideline for the proper choice of the contact stiffness is given. The convergence behavior of the solution under various contact stiffness values is examined. A numerical benchmark problem is used to investigate the new LISA formulation and results are compared with a conventional contact finite element solution. Various nonlinear ultrasonic phenomena are successfully captured using this contact LISA formulation, including the generation of nonlinear higher harmonic responses. Nonlinear mode conversion of guided waves at fatigue cracks is also studied.
On the Renormalization of the Effective Field Theory of Large Scale Structures

OpenAIRE

Pajer, Enrico; Zaldarriaga, Matias

2013-01-01

Standard perturbation theory (SPT) for large-scale matter inhomogeneities is unsatisfactory for at least three reasons: there is no clear expansion parameter since the density contrast is not small on all scales; it does not fully account for deviations at large scales from a perfect pressureless fluid induced by short-scale non-linearities; for generic initial conditions, loop corrections are UV-divergent, making predictions cutoff dependent and hence unphysical. The Effective Field Theory o...
Identification of low order models for large scale processes

NARCIS (Netherlands)

Wattamwar, S.K.

2010-01-01

Many industrial chemical processes are complex, multi-phase and large scale in nature. These processes are characterized by various nonlinear physiochemical effects and fluid flows. Such processes often show coexistence of fast and slow dynamics during their time evolutions. The increasing demand
Understanding uncertainties in non-linear population trajectories: a Bayesian semi-parametric hierarchical approach to large-scale surveys of coral cover.

Directory of Open Access Journals (Sweden)

Julie Vercelloni

Full Text Available Recently, attempts to improve decision making in species management have focussed on uncertainties associated with modelling temporal fluctuations in populations. Reducing model uncertainty is challenging; while larger samples improve estimation of species trajectories and reduce statistical errors, they typically amplify variability in observed trajectories. In particular, traditional modelling approaches aimed at estimating population trajectories usually do not account well for nonlinearities and uncertainties associated with multi-scale observations characteristic of large spatio-temporal surveys. We present a Bayesian semi-parametric hierarchical model for simultaneously quantifying uncertainties associated with model structure and parameters, and scale-specific variability over time. We estimate uncertainty across a four-tiered spatial hierarchy of coral cover from the Great Barrier Reef. Coral variability is well described; however, our results show that, in the absence of additional model specifications, conclusions regarding coral trajectories become highly uncertain when considering multiple reefs, suggesting that management should focus more at the scale of individual reefs. The approach presented facilitates the description and estimation of population trajectories and associated uncertainties when variability cannot be attributed to specific causes and origins. We argue that our model can unlock value contained in large-scale datasets, provide guidance for understanding sources of uncertainty, and support better informed decision making.
Nonlinear Model-Based Predictive Control applied to Large Scale Cryogenic Facilities

CERN Document Server

Blanco Vinuela, Enrique; de Prada Moraga, Cesar

2001-01-01

The thesis addresses the study, analysis, development, and finally the real implementation of an advanced control system for the 1.8 K Cooling Loop of the LHC (Large Hadron Collider) accelerator. The LHC is the next accelerator being built at CERN (European Center for Nuclear Research), it will use superconducting magnets operating below a temperature of 1.9 K along a circumference of 27 kilometers. The temperature of these magnets is a control parameter with strict operating constraints. The first control implementations applied a procedure that included linear identification, modelling and regulation using a linear predictive controller. It did improve largely the overall performance of the plant with respect to a classical PID regulator, but the nature of the cryogenic processes pointed out the need of a more adequate technique, such as a nonlinear methodology. This thesis is a first step to develop a global regulation strategy for the overall control of the LHC cells when they will operate simultaneously....
Accelerating large-scale protein structure alignments with graphics processing units

Directory of Open Access Journals (Sweden)

Pang Bin

2012-02-01

Full Text Available Abstract Background Large-scale protein structure alignment, an indispensable tool to structural bioinformatics, poses a tremendous challenge on computational resources. To ensure structure alignment accuracy and efficiency, efforts have been made to parallelize traditional alignment algorithms in grid environments. However, these solutions are costly and of limited accessibility. Others trade alignment quality for speedup by using high-level characteristics of structure fragments for structure comparisons. Findings We present ppsAlign, a parallel protein structure Alignment framework designed and optimized to exploit the parallelism of Graphics Processing Units (GPUs. As a general-purpose GPU platform, ppsAlign could take many concurrent methods, such as TM-align and Fr-TM-align, into the parallelized algorithm design. We evaluated ppsAlign on an NVIDIA Tesla C2050 GPU card, and compared it with existing software solutions running on an AMD dual-core CPU. We observed a 36-fold speedup over TM-align, a 65-fold speedup over Fr-TM-align, and a 40-fold speedup over MAMMOTH. Conclusions ppsAlign is a high-performance protein structure alignment tool designed to tackle the computational complexity issues from protein structural data. The solution presented in this paper allows large-scale structure comparisons to be performed using massive parallel computing power of GPU.
Interior Point Methods for Large-Scale Nonlinear Programming

Czech Academy of Sciences Publication Activity Database

Lukšan, Ladislav; Matonoha, Ctirad; Vlček, Jan

2005-01-01

Roč. 20, č. 4-5 (2005), s. 569-582 ISSN 1055-6788 R&D Projects: GA AV ČR IAA1030405 Institutional research plan: CEZ:AV0Z10300504 Keywords : nonlinear programming * interior point methods * KKT systems * indefinite preconditioners * filter methods * algorithms Subject RIV: BA - General Mathematics Impact factor: 0.477, year: 2005
Mathematical models of non-linear phenomena, processes and systems: from molecular scale to planetary atmosphere

CERN Document Server

2013-01-01

This book consists of twenty seven chapters, which can be divided into three large categories: articles with the focus on the mathematical treatment of non-linear problems, including the methodologies, algorithms and properties of analytical and numerical solutions to particular non-linear problems; theoretical and computational studies dedicated to the physics and chemistry of non-linear micro-and nano-scale systems, including molecular clusters, nano-particles and nano-composites; and, papers focused on non-linear processes in medico-biological systems, including mathematical models of ferments, amino acids, blood fluids and polynucleic chains.
Newton Methods for Large Scale Problems in Machine Learning

Science.gov (United States)

Hansen, Samantha Leigh

2014-01-01

The focus of this thesis is on practical ways of designing optimization algorithms for minimizing large-scale nonlinear functions with applications in machine learning. Chapter 1 introduces the overarching ideas in the thesis. Chapters 2 and 3 are geared towards supervised machine learning applications that involve minimizing a sum of loss…
Hybrid parallel strategy for the simulation of fast transient accidental situations at reactor scale

International Nuclear Information System (INIS)

Faucher, V.; Galon, P.; Beccantini, A.; Crouzet, F.; Debaud, F.; Gautier, T.

2015-01-01

Highlights: • Reference accidental situations for current and future reactors are considered. • They require the modeling of complex fluid–structure systems at full reactor scale. • EPX software computes the non-linear transient solution with explicit time stepping. • Focus on the parallel hybrid solver specific to the proposed coupled equations. - Abstract: This contribution is dedicated to the latest methodological developments implemented in the fast transient dynamics software EUROPLEXUS (EPX) to simulate the mechanical response of fully coupled fluid–structure systems to accidental situations to be considered at reactor scale, among which the Loss of Coolant Accident, the Core Disruptive Accident and the Hydrogen Explosion. Time integration is explicit and the search for reference solutions within the safety framework prevents any simplification and approximations in the coupled algorithm: for instance, all kinematic constraints are dealt with using Lagrange Multipliers, yielding a complex flow chart when non-permanent constraints such as unilateral contact or immersed fluid–structure boundaries are considered. The parallel acceleration of the solution process is then achieved through a hybrid approach, based on a weighted domain decomposition for distributed memory computing and the use of the KAAPI library for self-balanced shared memory processing inside subdomains
Plasma turbulence driven by transversely large-scale standing shear Alfvén waves

International Nuclear Information System (INIS)

Singh, Nagendra; Rao, Sathyanarayan

2012-01-01

Using two-dimensional particle-in-cell simulations, we study generation of turbulence consisting of transversely small-scale dispersive Alfvén and electrostatic waves when plasma is driven by a large-scale standing shear Alfvén wave (LS-SAW). The standing wave is set up by reflecting a propagating LS-SAW. The ponderomotive force of the standing wave generates transversely large-scale density modifications consisting of density cavities and enhancements. The drifts of the charged particles driven by the ponderomotive force and those directly caused by the fields of the standing LS-SAW generate non-thermal features in the plasma. Parametric instabilities driven by the inherent plasma nonlinearities associated with the LS-SAW in combination with the non-thermal features generate small-scale electromagnetic and electrostatic waves, yielding a broad frequency spectrum ranging from below the source frequency of the LS-SAW to ion cyclotron and lower hybrid frequencies and beyond. The power spectrum of the turbulence has peaks at distinct perpendicular wave numbers (k ⊥ ) lying in the range d e −1 -6d e −1 , d e being the electron inertial length, suggesting non-local parametric decay from small to large k ⊥ . The turbulence spectrum encompassing both electromagnetic and electrostatic fluctuations is also broadband in parallel wave number (k || ). In a standing-wave supported density cavity, the ratio of the perpendicular electric to magnetic field amplitude is R(k ⊥ ) = |E ⊥ (k ⊥ )/|B ⊥ (k ⊥ )| ≪ V A for k ⊥ d e A is the Alfvén velocity. The characteristic features of the broadband plasma turbulence are compared with those available from satellite observations in space plasmas.
Research on precision grinding technology of large scale and ultra thin optics

Science.gov (United States)

Zhou, Lian; Wei, Qiancai; Li, Jie; Chen, Xianhua; Zhang, Qinghua

2018-03-01

The flatness and parallelism error of large scale and ultra thin optics have an important influence on the subsequent polishing efficiency and accuracy. In order to realize the high precision grinding of those ductile elements, the low deformation vacuum chuck was designed first, which was used for clamping the optics with high supporting rigidity in the full aperture. Then the optics was planar grinded under vacuum adsorption. After machining, the vacuum system was turned off. The form error of optics was on-machine measured using displacement sensor after elastic restitution. The flatness would be convergenced with high accuracy by compensation machining, whose trajectories were integrated with the measurement result. For purpose of getting high parallelism, the optics was turned over and compensation grinded using the form error of vacuum chuck. Finally, the grinding experiment of large scale and ultra thin fused silica optics with aperture of 430mm×430mm×10mm was performed. The best P-V flatness of optics was below 3 μm, and parallelism was below 3 ″. This machining technique has applied in batch grinding of large scale and ultra thin optics.

Leveraging human oversight and intervention in large-scale parallel processing of open-source data

Science.gov (United States)

Casini, Enrico; Suri, Niranjan; Bradshaw, Jeffrey M.

2015-05-01

The popularity of cloud computing along with the increased availability of cheap storage have led to the necessity of elaboration and transformation of large volumes of open-source data, all in parallel. One way to handle such extensive volumes of information properly is to take advantage of distributed computing frameworks like Map-Reduce. Unfortunately, an entirely automated approach that excludes human intervention is often unpredictable and error prone. Highly accurate data processing and decision-making can be achieved by supporting an automatic process through human collaboration, in a variety of environments such as warfare, cyber security and threat monitoring. Although this mutual participation seems easily exploitable, human-machine collaboration in the field of data analysis presents several challenges. First, due to the asynchronous nature of human intervention, it is necessary to verify that once a correction is made, all the necessary reprocessing is done in chain. Second, it is often needed to minimize the amount of reprocessing in order to optimize the usage of resources due to limited availability. In order to improve on these strict requirements, this paper introduces improvements to an innovative approach for human-machine collaboration in the processing of large amounts of open-source data in parallel.
Stability and Control of Large-Scale Dynamical Systems A Vector Dissipative Systems Approach

CERN Document Server

Haddad, Wassim M

2011-01-01

Modern complex large-scale dynamical systems exist in virtually every aspect of science and engineering, and are associated with a wide variety of physical, technological, environmental, and social phenomena, including aerospace, power, communications, and network systems, to name just a few. This book develops a general stability analysis and control design framework for nonlinear large-scale interconnected dynamical systems, and presents the most complete treatment on vector Lyapunov function methods, vector dissipativity theory, and decentralized control architectures. Large-scale dynami
Non-linear scaling of a musculoskeletal model of the lower limb using statistical shape models.

Science.gov (United States)

Nolte, Daniel; Tsang, Chui Kit; Zhang, Kai Yu; Ding, Ziyun; Kedgley, Angela E; Bull, Anthony M J

2016-10-03

Accurate muscle geometry for musculoskeletal models is important to enable accurate subject-specific simulations. Commonly, linear scaling is used to obtain individualised muscle geometry. More advanced methods include non-linear scaling using segmented bone surfaces and manual or semi-automatic digitisation of muscle paths from medical images. In this study, a new scaling method combining non-linear scaling with reconstructions of bone surfaces using statistical shape modelling is presented. Statistical Shape Models (SSMs) of femur and tibia/fibula were used to reconstruct bone surfaces of nine subjects. Reference models were created by morphing manually digitised muscle paths to mean shapes of the SSMs using non-linear transformations and inter-subject variability was calculated. Subject-specific models of muscle attachment and via points were created from three reference models. The accuracy was evaluated by calculating the differences between the scaled and manually digitised models. The points defining the muscle paths showed large inter-subject variability at the thigh and shank - up to 26mm; this was found to limit the accuracy of all studied scaling methods. Errors for the subject-specific muscle point reconstructions of the thigh could be decreased by 9% to 20% by using the non-linear scaling compared to a typical linear scaling method. We conclude that the proposed non-linear scaling method is more accurate than linear scaling methods. Thus, when combined with the ability to reconstruct bone surfaces from incomplete or scattered geometry data using statistical shape models our proposed method is an alternative to linear scaling methods. Copyright © 2016 The Author. Published by Elsevier Ltd.. All rights reserved.
Faster Parallel Traversal of Scale Free Graphs at Extreme Scale with Vertex Delegates

KAUST Repository

Pearce, Roger

2014-11-01

© 2014 IEEE. At extreme scale, irregularities in the structure of scale-free graphs such as social network graphs limit our ability to analyze these important and growing datasets. A key challenge is the presence of high-degree vertices (hubs), that leads to parallel workload and storage imbalances. The imbalances occur because existing partitioning techniques are not able to effectively partition high-degree vertices. We present techniques to distribute storage, computation, and communication of hubs for extreme scale graphs in distributed memory supercomputers. To balance the hub processing workload, we distribute hub data structures and related computation among a set of delegates. The delegates coordinate using highly optimized, yet portable, asynchronous broadcast and reduction operations. We demonstrate scalability of our new algorithmic technique using Breadth-First Search (BFS), Single Source Shortest Path (SSSP), K-Core Decomposition, and Page-Rank on synthetically generated scale-free graphs. Our results show excellent scalability on large scale-free graphs up to 131K cores of the IBM BG/P, and outperform the best known Graph500 performance on BG/P Intrepid by 15%
Faster Parallel Traversal of Scale Free Graphs at Extreme Scale with Vertex Delegates

KAUST Repository

Pearce, Roger; Gokhale, Maya; Amato, Nancy M.

2014-01-01

© 2014 IEEE. At extreme scale, irregularities in the structure of scale-free graphs such as social network graphs limit our ability to analyze these important and growing datasets. A key challenge is the presence of high-degree vertices (hubs), that leads to parallel workload and storage imbalances. The imbalances occur because existing partitioning techniques are not able to effectively partition high-degree vertices. We present techniques to distribute storage, computation, and communication of hubs for extreme scale graphs in distributed memory supercomputers. To balance the hub processing workload, we distribute hub data structures and related computation among a set of delegates. The delegates coordinate using highly optimized, yet portable, asynchronous broadcast and reduction operations. We demonstrate scalability of our new algorithmic technique using Breadth-First Search (BFS), Single Source Shortest Path (SSSP), K-Core Decomposition, and Page-Rank on synthetically generated scale-free graphs. Our results show excellent scalability on large scale-free graphs up to 131K cores of the IBM BG/P, and outperform the best known Graph500 performance on BG/P Intrepid by 15%
Large Top-Quark Mass and Nonlinear Representation of Flavor Symmetry

International Nuclear Information System (INIS)

Feldmann, Thorsten; Mannel, Thomas

2008-01-01

We consider an effective theory (ET) approach to flavor-violating processes beyond the standard model, where the breaking of flavor symmetry is described by spurion fields whose low-energy vacuum expectation values are identified with the standard model Yukawa couplings. Insisting on canonical mass dimensions for the spurion fields, the large top-quark Yukawa coupling also implies a large expectation value for the associated spurion, which breaks part of the flavor symmetry already at the UV scale Λ of the ET. Below that scale, flavor symmetry in the ET is represented in a nonlinear way by introducing Goldstone modes for the partly broken flavor symmetry and spurion fields transforming under the residual symmetry. As a result, the dominance of certain flavor structures in rare quark decays can be understood in terms of the 1/Λ expansion in the ET
Algorithm for solving the linear Cauchy problem for large systems of ordinary differential equations with the use of parallel computations

Energy Technology Data Exchange (ETDEWEB)

Moryakov, A. V., E-mail: sailor@orc.ru [National Research Centre Kurchatov Institute (Russian Federation)

2016-12-15

An algorithm for solving the linear Cauchy problem for large systems of ordinary differential equations is presented. The algorithm for systems of first-order differential equations is implemented in the EDELWEISS code with the possibility of parallel computations on supercomputers employing the MPI (Message Passing Interface) standard for the data exchange between parallel processes. The solution is represented by a series of orthogonal polynomials on the interval [0, 1]. The algorithm is characterized by simplicity and the possibility to solve nonlinear problems with a correction of the operator in accordance with the solution obtained in the previous iterative process.
Nonlinear and parallel algorithms for finite element discretizations of the incompressible Navier-Stokes equations

Science.gov (United States)

Arteaga, Santiago Egido

1998-12-01

The steady-state Navier-Stokes equations are of considerable interest because they are used to model numerous common physical phenomena. The applications encountered in practice often involve small viscosities and complicated domain geometries, and they result in challenging problems in spite of the vast attention that has been dedicated to them. In this thesis we examine methods for computing the numerical solution of the primitive variable formulation of the incompressible equations on distributed memory parallel computers. We use the Galerkin method to discretize the differential equations, although most results are stated so that they apply also to stabilized methods. We also reformulate some classical results in a single framework and discuss some issues frequently dismissed in the literature, such as the implementation of pressure space basis and non- homogeneous boundary values. We consider three nonlinear methods: Newton's method, Oseen's (or Picard) iteration, and sequences of Stokes problems. All these iterative nonlinear methods require solving a linear system at every step. Newton's method has quadratic convergence while that of the others is only linear; however, we obtain theoretical bounds showing that Oseen's iteration is more robust, and we confirm it experimentally. In addition, although Oseen's iteration usually requires more iterations than Newton's method, the linear systems it generates tend to be simpler and its overall costs (in CPU time) are lower. The Stokes problems result in linear systems which are easier to solve, but its convergence is much slower, so that it is competitive only for large viscosities. Inexact versions of these methods are studied, and we explain why the best timings are obtained using relatively modest error tolerances in solving the corresponding linear systems. We also present a new damping optimization strategy based on the quadratic nature of the Navier-Stokes equations, which improves the robustness of all the
Computing the universe: how large-scale simulations illuminate galaxies and dark energy

Science.gov (United States)

O'Shea, Brian

2015-04-01

High-performance and large-scale computing is absolutely to understanding astronomical objects such as stars, galaxies, and the cosmic web. This is because these are structures that operate on physical, temporal, and energy scales that cannot be reasonably approximated in the laboratory, and whose complexity and nonlinearity often defies analytic modeling. In this talk, I show how the growth of computing platforms over time has facilitated our understanding of astrophysical and cosmological phenomena, focusing primarily on galaxies and large-scale structure in the Universe.
Computational challenges of large-scale, long-time, first-principles molecular dynamics

International Nuclear Information System (INIS)

Kent, P R C

2008-01-01

Plane wave density functional calculations have traditionally been able to use the largest available supercomputing resources. We analyze the scalability of modern projector-augmented wave implementations to identify the challenges in performing molecular dynamics calculations of large systems containing many thousands of electrons. Benchmark calculations on the Cray XT4 demonstrate that global linear-algebra operations are the primary reason for limited parallel scalability. Plane-wave related operations can be made sufficiently scalable. Improving parallel linear-algebra performance is an essential step to reaching longer timescales in future large-scale molecular dynamics calculations
Efficient high-precision matrix algebra on parallel architectures for nonlinear combinatorial optimization

KAUST Repository

Gunnels, John; Lee, Jon; Margulies, Susan

2010-01-01

We provide a first demonstration of the idea that matrix-based algorithms for nonlinear combinatorial optimization problems can be efficiently implemented. Such algorithms were mainly conceived by theoretical computer scientists for proving efficiency. We are able to demonstrate the practicality of our approach by developing an implementation on a massively parallel architecture, and exploiting scalable and efficient parallel implementations of algorithms for ultra high-precision linear algebra. Additionally, we have delineated and implemented the necessary algorithmic and coding changes required in order to address problems several orders of magnitude larger, dealing with the limits of scalability from memory footprint, computational efficiency, reliability, and interconnect perspectives. © Springer and Mathematical Programming Society 2010.
Efficient high-precision matrix algebra on parallel architectures for nonlinear combinatorial optimization

KAUST Repository

Gunnels, John

2010-06-01

We provide a first demonstration of the idea that matrix-based algorithms for nonlinear combinatorial optimization problems can be efficiently implemented. Such algorithms were mainly conceived by theoretical computer scientists for proving efficiency. We are able to demonstrate the practicality of our approach by developing an implementation on a massively parallel architecture, and exploiting scalable and efficient parallel implementations of algorithms for ultra high-precision linear algebra. Additionally, we have delineated and implemented the necessary algorithmic and coding changes required in order to address problems several orders of magnitude larger, dealing with the limits of scalability from memory footprint, computational efficiency, reliability, and interconnect perspectives. © Springer and Mathematical Programming Society 2010.
Hierarchical Parallel Matrix Multiplication on Large-Scale Distributed Memory Platforms

KAUST Repository

Quintin, Jean-Noel

2013-10-01

Matrix multiplication is a very important computation kernel both in its own right as a building block of many scientific applications and as a popular representative for other scientific applications. Cannon\\'s algorithm which dates back to 1969 was the first efficient algorithm for parallel matrix multiplication providing theoretically optimal communication cost. However this algorithm requires a square number of processors. In the mid-1990s, the SUMMA algorithm was introduced. SUMMA overcomes the shortcomings of Cannon\\'s algorithm as it can be used on a nonsquare number of processors as well. Since then the number of processors in HPC platforms has increased by two orders of magnitude making the contribution of communication in the overall execution time more significant. Therefore, the state of the art parallel matrix multiplication algorithms should be revisited to reduce the communication cost further. This paper introduces a new parallel matrix multiplication algorithm, Hierarchical SUMMA (HSUMMA), which is a redesign of SUMMA. Our algorithm reduces the communication cost of SUMMA by introducing a two-level virtual hierarchy into the two-dimensional arrangement of processors. Experiments on an IBM BlueGene/P demonstrate the reduction of communication cost up to 2.08 times on 2048 cores and up to 5.89 times on 16384 cores. © 2013 IEEE.
Hierarchical Parallel Matrix Multiplication on Large-Scale Distributed Memory Platforms

KAUST Repository

Quintin, Jean-Noel; Hasanov, Khalid; Lastovetsky, Alexey

2013-01-01

Matrix multiplication is a very important computation kernel both in its own right as a building block of many scientific applications and as a popular representative for other scientific applications. Cannon's algorithm which dates back to 1969 was the first efficient algorithm for parallel matrix multiplication providing theoretically optimal communication cost. However this algorithm requires a square number of processors. In the mid-1990s, the SUMMA algorithm was introduced. SUMMA overcomes the shortcomings of Cannon's algorithm as it can be used on a nonsquare number of processors as well. Since then the number of processors in HPC platforms has increased by two orders of magnitude making the contribution of communication in the overall execution time more significant. Therefore, the state of the art parallel matrix multiplication algorithms should be revisited to reduce the communication cost further. This paper introduces a new parallel matrix multiplication algorithm, Hierarchical SUMMA (HSUMMA), which is a redesign of SUMMA. Our algorithm reduces the communication cost of SUMMA by introducing a two-level virtual hierarchy into the two-dimensional arrangement of processors. Experiments on an IBM BlueGene/P demonstrate the reduction of communication cost up to 2.08 times on 2048 cores and up to 5.89 times on 16384 cores. © 2013 IEEE.
Building a parallel file system simulator

International Nuclear Information System (INIS)

Molina-Estolano, E; Maltzahn, C; Brandt, S A; Bent, J

2009-01-01

Parallel file systems are gaining in popularity in high-end computing centers as well as commercial data centers. High-end computing systems are expected to scale exponentially and to pose new challenges to their storage scalability in terms of cost and power. To address these challenges scientists and file system designers will need a thorough understanding of the design space of parallel file systems. Yet there exist few systematic studies of parallel file system behavior at petabyte- and exabyte scale. An important reason is the significant cost of getting access to large-scale hardware to test parallel file systems. To contribute to this understanding we are building a parallel file system simulator that can simulate parallel file systems at very large scale. Our goal is to simulate petabyte-scale parallel file systems on a small cluster or even a single machine in reasonable time and fidelity. With this simulator, file system experts will be able to tune existing file systems for specific workloads, scientists and file system deployment engineers will be able to better communicate workload requirements, file system designers and researchers will be able to try out design alternatives and innovations at scale, and instructors will be able to study very large-scale parallel file system behavior in the class room. In this paper we describe our approach and provide preliminary results that are encouraging both in terms of fidelity and simulation scalability.
Flexible non-linear predictive models for large-scale wind turbine diagnostics

DEFF Research Database (Denmark)

Bach-Andersen, Martin; Rømer-Odgaard, Bo; Winther, Ole

2017-01-01

We demonstrate how flexible non-linear models can provide accurate and robust predictions on turbine component temperature sensor data using data-driven principles and only a minimum of system modeling. The merits of different model architectures are evaluated using data from a large set...... of turbines operating under diverse conditions. We then go on to test the predictive models in a diagnostic setting, where the output of the models are used to detect mechanical faults in rotor bearings. Using retrospective data from 22 actual rotor bearing failures, the fault detection performance...... of the models are quantified using a structured framework that provides the metrics required for evaluating the performance in a fleet wide monitoring setup. It is demonstrated that faults are identified with high accuracy up to 45 days before a warning from the hard-threshold warning system....
Large Scale Simulations of the Euler Equations on GPU Clusters

KAUST Repository

Liebmann, Manfred; Douglas, Craig C.; Haase, Gundolf; Horvá th, Zoltá n

2010-01-01

The paper investigates the scalability of a parallel Euler solver, using the Vijayasundaram method, on a GPU cluster with 32 Nvidia Geforce GTX 295 boards. The aim of this research is to enable large scale fluid dynamics simulations with up to one
Parallel processors and nonlinear structural dynamics algorithms and software

Science.gov (United States)

Belytschko, Ted

1989-01-01

A nonlinear structural dynamics finite element program was developed to run on a shared memory multiprocessor with pipeline processors. The program, WHAMS, was used as a framework for this work. The program employs explicit time integration and has the capability to handle both the nonlinear material behavior and large displacement response of 3-D structures. The elasto-plastic material model uses an isotropic strain hardening law which is input as a piecewise linear function. Geometric nonlinearities are handled by a corotational formulation in which a coordinate system is embedded at the integration point of each element. Currently, the program has an element library consisting of a beam element based on Euler-Bernoulli theory and trianglar and quadrilateral plate element based on Mindlin theory.
Nonlinear Elastodynamic Behaviour Analysis of High-Speed Spatial Parallel Coordinate Measuring Machines

Directory of Open Access Journals (Sweden)

Xiulong Chen

2012-10-01

Full Text Available In order to study the elastodynamic behaviour of 4- universal joints- prismatic pairs- spherical joints / universal joints- prismatic pairs- universal joints 4-UPS-UPU high-speed spatial PCMMs(parallel coordinate measuring machines, the nonlinear time-varying dynamics model, which comprehensively considers geometric nonlinearity and the rigid-flexible coupling effect, is derived by using Lagrange equations and finite element methods. Based on the Newmark method, the kinematics output response of 4-UPS-UPU PCMMs is illustrated through numerical simulation. The results of the simulation show that the flexibility of the links is demonstrated to have a significant impact on the system dynamics response. This research can provide the important theoretical base of the optimization design and vibration control for 4-UPS-UPU PCMMs.
Time-sliced perturbation theory for large scale structure I: general formalism

Energy Technology Data Exchange (ETDEWEB)

Blas, Diego; Garny, Mathias; Sibiryakov, Sergey [Theory Division, CERN, CH-1211 Genève 23 (Switzerland); Ivanov, Mikhail M., E-mail: diego.blas@cern.ch, E-mail: mathias.garny@cern.ch, E-mail: mikhail.ivanov@cern.ch, E-mail: sergey.sibiryakov@cern.ch [FSB/ITP/LPPC, École Polytechnique Fédérale de Lausanne, CH-1015, Lausanne (Switzerland)

2016-07-01

We present a new analytic approach to describe large scale structure formation in the mildly non-linear regime. The central object of the method is the time-dependent probability distribution function generating correlators of the cosmological observables at a given moment of time. Expanding the distribution function around the Gaussian weight we formulate a perturbative technique to calculate non-linear corrections to cosmological correlators, similar to the diagrammatic expansion in a three-dimensional Euclidean quantum field theory, with time playing the role of an external parameter. For the physically relevant case of cold dark matter in an Einstein-de Sitter universe, the time evolution of the distribution function can be found exactly and is encapsulated by a time-dependent coupling constant controlling the perturbative expansion. We show that all building blocks of the expansion are free from spurious infrared enhanced contributions that plague the standard cosmological perturbation theory. This paves the way towards the systematic resummation of infrared effects in large scale structure formation. We also argue that the approach proposed here provides a natural framework to account for the influence of short-scale dynamics on larger scales along the lines of effective field theory.

Final Report: Migration Mechanisms for Large-scale Parallel Applications

Energy Technology Data Exchange (ETDEWEB)

Jason Nieh

2009-10-30

Process migration is the ability to transfer a process from one machine to another. It is a useful facility in distributed computing environments, especially as computing devices become more pervasive and Internet access becomes more ubiquitous. The potential benefits of process migration, among others, are fault resilience by migrating processes off of faulty hosts, data access locality by migrating processes closer to the data, better system response time by migrating processes closer to users, dynamic load balancing by migrating processes to less loaded hosts, and improved service availability and administration by migrating processes before host maintenance so that applications can continue to run with minimal downtime. Although process migration provides substantial potential benefits and many approaches have been considered, achieving transparent process migration functionality has been difficult in practice. To address this problem, our work has designed, implemented, and evaluated new and powerful transparent process checkpoint-restart and migration mechanisms for desktop, server, and parallel applications that operate across heterogeneous cluster and mobile computing environments. A key aspect of this work has been to introduce lightweight operating system virtualization to provide processes with private, virtual namespaces that decouple and isolate processes from dependencies on the host operating system instance. This decoupling enables processes to be transparently checkpointed and migrated without modifying, recompiling, or relinking applications or the operating system. Building on this lightweight operating system virtualization approach, we have developed novel technologies that enable (1) coordinated, consistent checkpoint-restart and migration of multiple processes, (2) fast checkpointing of process and file system state to enable restart of multiple parallel execution environments and time travel, (3) process migration across heterogeneous
Towards a Gravity Dual for the Large Scale Structure of the Universe

CERN Document Server

Kehagias, A.

2016-01-01

The dynamics of the large-scale structure of the universe enjoys at all scales, even in the highly non-linear regime, a Lifshitz symmetry during the matter-dominated period. In this paper we propose a general class of six-dimensional spacetimes which could be a gravity dual to the four-dimensional large-scale structure of the universe. In this set-up, the Lifshitz symmetry manifests itself as an isometry in the bulk and our universe is a four-dimensional brane moving in such six-dimensional bulk. After finding the correspondence between the bulk and the brane dynamical Lifshitz exponents, we find the intriguing result that the preferred value of the dynamical Lifshitz exponent of our observed universe, at both linear and non-linear scales, corresponds to a fixed point of the RGE flow of the dynamical Lifshitz exponent in the dual system where the symmetry is enhanced to the Schrodinger group containing a non-relativistic conformal symmetry. We also investigate the RGE flow between fixed points of the Lifshitz...
Performance characteristics of hybrid MPI/OpenMP implementations of NAS parallel benchmarks SP and BT on large-scale multicore supercomputers

KAUST Repository

Wu, Xingfu; Taylor, Valerie

2011-01-01

The NAS Parallel Benchmarks (NPB) are well-known applications with the fixed algorithms for evaluating parallel systems and tools. Multicore supercomputers provide a natural programming paradigm for hybrid programs, whereby OpenMP can be used with the data sharing with the multicores that comprise a node and MPI can be used with the communication between nodes. In this paper, we use SP and BT benchmarks of MPI NPB 3.3 as a basis for a comparative approach to implement hybrid MPI/OpenMP versions of SP and BT. In particular, we can compare the performance of the hybrid SP and BT with the MPI counterparts on large-scale multicore supercomputers. Our performance results indicate that the hybrid SP outperforms the MPI SP by up to 20.76%, and the hybrid BT outperforms the MPI BT by up to 8.58% on up to 10,000 cores on BlueGene/P at Argonne National Laboratory and Jaguar (Cray XT4/5) at Oak Ridge National Laboratory. We also use performance tools and MPI trace libraries available on these supercomputers to further investigate the performance characteristics of the hybrid SP and BT.
Performance characteristics of hybrid MPI/OpenMP implementations of NAS parallel benchmarks SP and BT on large-scale multicore supercomputers

KAUST Repository

Wu, Xingfu

2011-03-29

The NAS Parallel Benchmarks (NPB) are well-known applications with the fixed algorithms for evaluating parallel systems and tools. Multicore supercomputers provide a natural programming paradigm for hybrid programs, whereby OpenMP can be used with the data sharing with the multicores that comprise a node and MPI can be used with the communication between nodes. In this paper, we use SP and BT benchmarks of MPI NPB 3.3 as a basis for a comparative approach to implement hybrid MPI/OpenMP versions of SP and BT. In particular, we can compare the performance of the hybrid SP and BT with the MPI counterparts on large-scale multicore supercomputers. Our performance results indicate that the hybrid SP outperforms the MPI SP by up to 20.76%, and the hybrid BT outperforms the MPI BT by up to 8.58% on up to 10,000 cores on BlueGene/P at Argonne National Laboratory and Jaguar (Cray XT4/5) at Oak Ridge National Laboratory. We also use performance tools and MPI trace libraries available on these supercomputers to further investigate the performance characteristics of the hybrid SP and BT.
Parallel Multivariate Spatio-Temporal Clustering of Large Ecological Datasets on Hybrid Supercomputers

Energy Technology Data Exchange (ETDEWEB)

Sreepathi, Sarat [ORNL; Kumar, Jitendra [ORNL; Mills, Richard T. [Argonne National Laboratory; Hoffman, Forrest M. [ORNL; Sripathi, Vamsi [Intel Corporation; Hargrove, William Walter [United States Department of Agriculture (USDA), United States Forest Service (USFS)

2017-09-01

A proliferation of data from vast networks of remote sensing platforms (satellites, unmanned aircraft systems (UAS), airborne etc.), observational facilities (meteorological, eddy covariance etc.), state-of-the-art sensors, and simulation models offer unprecedented opportunities for scientific discovery. Unsupervised classification is a widely applied data mining approach to derive insights from such data. However, classification of very large data sets is a complex computational problem that requires efficient numerical algorithms and implementations on high performance computing (HPC) platforms. Additionally, increasing power, space, cooling and efficiency requirements has led to the deployment of hybrid supercomputing platforms with complex architectures and memory hierarchies like the Titan system at Oak Ridge National Laboratory. The advent of such accelerated computing architectures offers new challenges and opportunities for big data analytics in general and specifically, large scale cluster analysis in our case. Although there is an existing body of work on parallel cluster analysis, those approaches do not fully meet the needs imposed by the nature and size of our large data sets. Moreover, they had scaling limitations and were mostly limited to traditional distributed memory computing platforms. We present a parallel Multivariate Spatio-Temporal Clustering (MSTC) technique based on k-means cluster analysis that can target hybrid supercomputers like Titan. We developed a hybrid MPI, CUDA and OpenACC implementation that can utilize both CPU and GPU resources on computational nodes. We describe performance results on Titan that demonstrate the scalability and efficacy of our approach in processing large ecological data sets.
Breaking Computational Barriers: Real-time Analysis and Optimization with Large-scale Nonlinear Models via Model Reduction

Energy Technology Data Exchange (ETDEWEB)

Carlberg, Kevin Thomas [Sandia National Lab. (SNL-CA), Livermore, CA (United States). Quantitative Modeling and Analysis; Drohmann, Martin [Sandia National Lab. (SNL-CA), Livermore, CA (United States). Quantitative Modeling and Analysis; Tuminaro, Raymond S. [Sandia National Lab. (SNL-CA), Livermore, CA (United States). Computational Mathematics; Boggs, Paul T. [Sandia National Lab. (SNL-CA), Livermore, CA (United States). Quantitative Modeling and Analysis; Ray, Jaideep [Sandia National Lab. (SNL-CA), Livermore, CA (United States). Quantitative Modeling and Analysis; van Bloemen Waanders, Bart Gustaaf [Sandia National Lab. (SNL-CA), Livermore, CA (United States). Optimization and Uncertainty Estimation

2014-10-01

Model reduction for dynamical systems is a promising approach for reducing the computational cost of large-scale physics-based simulations to enable high-fidelity models to be used in many- query (e.g., Bayesian inference) and near-real-time (e.g., fast-turnaround simulation) contexts. While model reduction works well for specialized problems such as linear time-invariant systems, it is much more difficult to obtain accurate, stable, and efficient reduced-order models (ROMs) for systems with general nonlinearities. This report describes several advances that enable nonlinear reduced-order models (ROMs) to be deployed in a variety of time-critical settings. First, we present an error bound for the Gauss-Newton with Approximated Tensors (GNAT) nonlinear model reduction technique. This bound allows the state-space error for the GNAT method to be quantified when applied with the backward Euler time-integration scheme. Second, we present a methodology for preserving classical Lagrangian structure in nonlinear model reduction. This technique guarantees that important properties--such as energy conservation and symplectic time-evolution maps--are preserved when performing model reduction for models described by a Lagrangian formalism (e.g., molecular dynamics, structural dynamics). Third, we present a novel technique for decreasing the temporal complexity --defined as the number of Newton-like iterations performed over the course of the simulation--by exploiting time-domain data. Fourth, we describe a novel method for refining projection-based reduced-order models a posteriori using a goal-oriented framework similar to mesh-adaptive h -refinement in finite elements. The technique allows the ROM to generate arbitrarily accurate solutions, thereby providing the ROM with a 'failsafe' mechanism in the event of insufficient training data. Finally, we present the reduced-order model error surrogate (ROMES) method for statistically quantifying reduced- order
A Nonlinear Multiobjective Bilevel Model for Minimum Cost Network Flow Problem in a Large-Scale Construction Project

Directory of Open Access Journals (Sweden)

Jiuping Xu

2012-01-01

Full Text Available The aim of this study is to deal with a minimum cost network flow problem (MCNFP in a large-scale construction project using a nonlinear multiobjective bilevel model with birandom variables. The main target of the upper level is to minimize both direct and transportation time costs. The target of the lower level is to minimize transportation costs. After an analysis of the birandom variables, an expectation multiobjective bilevel programming model with chance constraints is formulated to incorporate decision makers’ preferences. To solve the identified special conditions, an equivalent crisp model is proposed with an additional multiobjective bilevel particle swarm optimization (MOBLPSO developed to solve the model. The Shuibuya Hydropower Project is used as a real-world example to verify the proposed approach. Results and analysis are presented to highlight the performances of the MOBLPSO, which is very effective and efficient compared to a genetic algorithm and a simulated annealing algorithm.
Microstructure and nonlinear signatures of yielding in a heterogeneous colloidal gel under large amplitude oscillatory shear

Energy Technology Data Exchange (ETDEWEB)

Kim, Juntae; Helgeson, Matthew E., E-mail: helgeson@engineering.ucsb.edu [Department of Chemical Engineering, University of California Santa Barbara, Santa Barbara, California 93106 (United States); Merger, Dimitri; Wilhelm, Manfred [Institute for Chemical Technology and Polymer Chemistry, Karlsruhe Institute of Technology, 76131 Karlsruhe (Germany)

2014-09-01

We investigate yielding in a colloidal gel that forms a heterogeneous structure, consisting of a two-phase bicontinuous network of colloid-rich domains of fractal clusters and colloid-poor domains. Combining large amplitude oscillatory shear measurements with simultaneous small and ultra-small angle neutron scattering (rheo-SANS/USANS), we characterize both the nonlinear mechanical processes and strain amplitude-dependent microstructure underlying yielding. We observe a broad, three-stage yielding process that evolves over an order of magnitude in strain amplitude between the onset of nonlinearity and flow. Analyzing the intracycle response as a sequence of physical processes reveals a transition from elastic straining to elastoplastic thinning (which dominates in region I) and eventually yielding (which evolves through region II) and flow (which saturates in region III), and allows quantification of instantaneous nonlinear parameters associated with yielding. These measures exhibit significant strain rate amplitude dependence above a characteristic frequency, which we argue is governed by poroelastic effects. Correlating these results with time-averaged rheo-USANS measurements reveals that the material passes through a cascade of structural breakdown from large to progressively smaller length scales. In region I, compression of the fractal domains leads to the formation of large voids. In regions II and III, cluster-cluster correlations become increasingly homogeneous, suggesting breakage and eventually depercolation of intercluster bonds at the yield point. All significant structural changes occur on the micron-scale, suggesting that large-scale rearrangements of hundreds or thousands of particles, rather than the homogeneous rearrangement of particle-particle bonds, dominate the initial yielding of heterogeneous colloidal gels.
LARGE-SCALE STRUCTURE OF THE UNIVERSE AS A COSMIC STANDARD RULER

International Nuclear Information System (INIS)

Park, Changbom; Kim, Young-Rae

2010-01-01

We propose to use the large-scale structure (LSS) of the universe as a cosmic standard ruler. This is possible because the pattern of large-scale distribution of matter is scale-dependent and does not change in comoving space during the linear-regime evolution of structure. By examining the pattern of LSS in several redshift intervals it is possible to reconstruct the expansion history of the universe, and thus to measure the cosmological parameters governing the expansion of the universe. The features of the large-scale matter distribution that can be used as standard rulers include the topology of LSS and the overall shapes of the power spectrum and correlation function. The genus, being an intrinsic topology measure, is insensitive to systematic effects such as the nonlinear gravitational evolution, galaxy biasing, and redshift-space distortion, and thus is an ideal cosmic ruler when galaxies in redshift space are used to trace the initial matter distribution. The genus remains unchanged as far as the rank order of density is conserved, which is true for linear and weakly nonlinear gravitational evolution, monotonic galaxy biasing, and mild redshift-space distortions. The expansion history of the universe can be constrained by comparing the theoretically predicted genus corresponding to an adopted set of cosmological parameters with the observed genus measured by using the redshift-comoving distance relation of the same cosmological model.
Linear and Nonlinear Theories of Cosmic Ray Transport

International Nuclear Information System (INIS)

Shalchi, A.

2005-01-01

The transport of charged cosmic rays in plasmawave turbulence is a modern and interesting field of research. We are mainly interested in spatial diffusion parallel and perpendicular to a large scale magnetic field. During the last decades quasilinear theory was the standard tool for the calculation of diffusion coefficients. Through comparison with numerical simulations we found several cases where quasilinear theory is invalid. On could define three major problems of transport theory. I will demonstrate that new nonlinear theories which were proposed recently can solve at least some to these problems
Parallel real-time visualization system for large-scale simulation. Application to WSPEEDI

International Nuclear Information System (INIS)

Muramatsu, Kazuhiro; Otani, Takayuki; Kitabata, Hideyuki; Matsumoto, Hideki; Takei, Toshifumi; Doi, Shun

2000-01-01

The real-time visualization system, PATRAS (PArallel TRAcking Steering system) has been developed on parallel computing servers. The system performs almost all of the visualization tasks on a parallel computing server, and uses image data compression technique for efficient communication between the server and the client terminal. Therefore, the system realizes high performance concurrent visualization in an internet computing environment. The experience in applying PATRAS to WSPEEDI (Worldwide version of System for Prediction Environmental Emergency Dose Information) is reported. The application of PATRAS to WSPEEDI enables users to understand behaviours of radioactive tracers from different release points easily and quickly. (author)
The Hamburg large scale geostrophic ocean general circulation model. Cycle 1

International Nuclear Information System (INIS)

Maier-Reimer, E.; Mikolajewicz, U.

1992-02-01

The rationale for the Large Scale Geostrophic ocean circulation model (LSG-OGCM) is based on the observations that for a large scale ocean circulation model designed for climate studies, the relevant characteristic spatial scales are large compared with the internal Rossby radius throughout most of the ocean, while the characteristic time scales are large compared with the periods of gravity modes and barotropic Rossby wave modes. In the present version of the model, the fast modes have been filtered out by a conventional technique of integrating the full primitive equations, including all terms except the nonlinear advection of momentum, by an implicit time integration method. The free surface is also treated prognostically, without invoking a rigid lid approximation. The numerical scheme is unconditionally stable and has the additional advantage that it can be applied uniformly to the entire globe, including the equatorial and coastal current regions. (orig.)
Parallel multiple instance learning for extremely large histopathology image analysis.

Science.gov (United States)

Xu, Yan; Li, Yeshu; Shen, Zhengyang; Wu, Ziwei; Gao, Teng; Fan, Yubo; Lai, Maode; Chang, Eric I-Chao

2017-08-03

Histopathology images are critical for medical diagnosis, e.g., cancer and its treatment. A standard histopathology slice can be easily scanned at a high resolution of, say, 200,000×200,000 pixels. These high resolution images can make most existing imaging processing tools infeasible or less effective when operated on a single machine with limited memory, disk space and computing power. In this paper, we propose an algorithm tackling this new emerging "big data" problem utilizing parallel computing on High-Performance-Computing (HPC) clusters. Experimental results on a large-scale data set (1318 images at a scale of 10 billion pixels each) demonstrate the efficiency and effectiveness of the proposed algorithm for low-latency real-time applications. The framework proposed an effective and efficient system for extremely large histopathology image analysis. It is based on the multiple instance learning formulation for weakly-supervised learning for image classification, segmentation and clustering. When a max-margin concept is adopted for different clusters, we obtain further improvement in clustering performance.
Nonlinear continuum mechanics and large inelastic deformations

CERN Document Server

Dimitrienko, Yuriy I

2010-01-01

This book provides a rigorous axiomatic approach to continuum mechanics under large deformation. In addition to the classical nonlinear continuum mechanics - kinematics, fundamental laws, the theory of functions having jump discontinuities across singular surfaces, etc. - the book presents the theory of co-rotational derivatives, dynamic deformation compatibility equations, and the principles of material indifference and symmetry, all in systematized form. The focus of the book is a new approach to the formulation of the constitutive equations for elastic and inelastic continua under large deformation. This new approach is based on using energetic and quasi-energetic couples of stress and deformation tensors. This approach leads to a unified treatment of large, anisotropic elastic, viscoelastic, and plastic deformations. The author analyses classical problems, including some involving nonlinear wave propagation, using different models for continua under large deformation, and shows how different models lead t...
KINETIC ALFVÉN WAVE GENERATION BY LARGE-SCALE PHASE MIXING

International Nuclear Information System (INIS)

Vásconez, C. L.; Pucci, F.; Valentini, F.; Servidio, S.; Malara, F.; Matthaeus, W. H.

2015-01-01

One view of the solar wind turbulence is that the observed highly anisotropic fluctuations at spatial scales near the proton inertial length d p may be considered as kinetic Alfvén waves (KAWs). In the present paper, we show how phase mixing of large-scale parallel-propagating Alfvén waves is an efficient mechanism for the production of KAWs at wavelengths close to d p and at a large propagation angle with respect to the magnetic field. Magnetohydrodynamic (MHD), Hall magnetohydrodynamic (HMHD), and hybrid Vlasov–Maxwell (HVM) simulations modeling the propagation of Alfvén waves in inhomogeneous plasmas are performed. In the linear regime, the role of dispersive effects is singled out by comparing MHD and HMHD results. Fluctuations produced by phase mixing are identified as KAWs through a comparison of polarization of magnetic fluctuations and wave-group velocity with analytical linear predictions. In the nonlinear regime, a comparison of HMHD and HVM simulations allows us to point out the role of kinetic effects in shaping the proton-distribution function. We observe the generation of temperature anisotropy with respect to the local magnetic field and the production of field-aligned beams. The regions where the proton-distribution function highly departs from thermal equilibrium are located inside the shear layers, where the KAWs are excited, this suggesting that the distortions of the proton distribution are driven by a resonant interaction of protons with KAW fluctuations. Our results are relevant in configurations where magnetic-field inhomogeneities are present, as, for example, in the solar corona, where the presence of Alfvén waves has been ascertained
KINETIC ALFVÉN WAVE GENERATION BY LARGE-SCALE PHASE MIXING

Energy Technology Data Exchange (ETDEWEB)

Vásconez, C. L.; Pucci, F.; Valentini, F.; Servidio, S.; Malara, F. [Dipartimento di Fisica, Università della Calabria, I-87036, Rende (CS) (Italy); Matthaeus, W. H. [Department of Physics and Astronomy, University of Delaware, DE 19716 (United States)

2015-12-10

One view of the solar wind turbulence is that the observed highly anisotropic fluctuations at spatial scales near the proton inertial length d{sub p} may be considered as kinetic Alfvén waves (KAWs). In the present paper, we show how phase mixing of large-scale parallel-propagating Alfvén waves is an efficient mechanism for the production of KAWs at wavelengths close to d{sub p} and at a large propagation angle with respect to the magnetic field. Magnetohydrodynamic (MHD), Hall magnetohydrodynamic (HMHD), and hybrid Vlasov–Maxwell (HVM) simulations modeling the propagation of Alfvén waves in inhomogeneous plasmas are performed. In the linear regime, the role of dispersive effects is singled out by comparing MHD and HMHD results. Fluctuations produced by phase mixing are identified as KAWs through a comparison of polarization of magnetic fluctuations and wave-group velocity with analytical linear predictions. In the nonlinear regime, a comparison of HMHD and HVM simulations allows us to point out the role of kinetic effects in shaping the proton-distribution function. We observe the generation of temperature anisotropy with respect to the local magnetic field and the production of field-aligned beams. The regions where the proton-distribution function highly departs from thermal equilibrium are located inside the shear layers, where the KAWs are excited, this suggesting that the distortions of the proton distribution are driven by a resonant interaction of protons with KAW fluctuations. Our results are relevant in configurations where magnetic-field inhomogeneities are present, as, for example, in the solar corona, where the presence of Alfvén waves has been ascertained.
Parallel Scaling Characteristics of Selected NERSC User ProjectCodes

Energy Technology Data Exchange (ETDEWEB)

Skinner, David; Verdier, Francesca; Anand, Harsh; Carter,Jonathan; Durst, Mark; Gerber, Richard

2005-03-05

This report documents parallel scaling characteristics of NERSC user project codes between Fiscal Year 2003 and the first half of Fiscal Year 2004 (Oct 2002-March 2004). The codes analyzed cover 60% of all the CPU hours delivered during that time frame on seaborg, a 6080 CPU IBM SP and the largest parallel computer at NERSC. The scale in terms of concurrency and problem size of the workload is analyzed. Drawing on batch queue logs, performance data and feedback from researchers we detail the motivations, benefits, and challenges of implementing highly parallel scientific codes on current NERSC High Performance Computing systems. An evaluation and outlook of the NERSC workload for Allocation Year 2005 is presented.
Universal scaling and nonlinearity of aggregate price impact in financial markets

Science.gov (United States)

Patzelt, Felix; Bouchaud, Jean-Philippe

2018-01-01

How and why stock prices move is a centuries-old question still not answered conclusively. More recently, attention shifted to higher frequencies, where trades are processed piecewise across different time scales. Here we reveal that price impact has a universal nonlinear shape for trades aggregated on any intraday scale. Its shape varies little across instruments, but drastically different master curves are obtained for order-volume and -sign impact. The scaling is largely determined by the relevant Hurst exponents. We further show that extreme order-flow imbalance is not associated with large returns. To the contrary, it is observed when the price is pinned to a particular level. Prices move only when there is sufficient balance in the local order flow. In fact, the probability that a trade changes the midprice falls to zero with increasing (absolute) order-sign bias along an arc-shaped curve for all intraday scales. Our findings challenge the widespread assumption of linear aggregate impact. They imply that market dynamics on all intraday time scales are shaped by correlations and bilateral adaptation in the flows of liquidity provision and taking.
Multi Scale Finite Element Analyses By Using SEM-EBSD Crystallographic Modeling and Parallel Computing

International Nuclear Information System (INIS)

Nakamachi, Eiji

2005-01-01

A crystallographic homogenization procedure is introduced to the conventional static-explicit and dynamic-explicit finite element formulation to develop a multi scale - double scale - analysis code to predict the plastic strain induced texture evolution, yield loci and formability of sheet metal. The double-scale structure consists of a crystal aggregation - micro-structure - and a macroscopic elastic plastic continuum. At first, we measure crystal morphologies by using SEM-EBSD apparatus, and define a unit cell of micro structure, which satisfy the periodicity condition in the real scale of polycrystal. Next, this crystallographic homogenization FE code is applied to 3N pure-iron and 'Benchmark' aluminum A6022 polycrystal sheets. It reveals that the initial crystal orientation distribution - the texture - affects very much to a plastic strain induced texture and anisotropic hardening evolutions and sheet deformation. Since, the multi-scale finite element analysis requires a large computation time, a parallel computing technique by using PC cluster is developed for a quick calculation. In this parallelization scheme, a dynamic workload balancing technique is introduced for quick and efficient calculations
Comparative efficiencies of three parallel algorithms for nonlinear ...

Indian Academy of Sciences (India)

R. Narasimhan (Krishtel eMaging) 1461 1996 Oct 15 13:05:22

This algorithm is better suited for large size problems on coarse ... and reliable time integration algorithms for solving the second-order dynamic equilibrium equations that arise due ... Programming models required to take advantage of the parallel and distributed ..... In addition, MPI added the concept of a 'virtual topology'.

A massively parallel GPU-accelerated model for analysis of fully nonlinear free surface waves

DEFF Research Database (Denmark)

Engsig-Karup, Allan Peter; Madsen, Morten G.; Glimberg, Stefan Lemvig

2011-01-01

-storage flexible-order accurate finite difference method that is known to be efficient and scalable on a CPU core (single thread). To achieve parallel performance of the relatively complex numerical model, we investigate a new trend in high-performance computing where many-core GPUs are utilized as high......-throughput co-processors to the CPU. We describe and demonstrate how this approach makes it possible to do fast desktop computations for large nonlinear wave problems in numerical wave tanks (NWTs) with close to 50/100 million total grid points in double/ single precision with 4 GB global device memory...... available. A new code base has been developed in C++ and compute unified device architecture C and is found to improve the runtime more than an order in magnitude in double precision arithmetic for the same accuracy over an existing CPU (single thread) Fortran 90 code when executed on a single modern GPU...
Large Scale GW Calculations on the Cori System

Science.gov (United States)

Deslippe, Jack; Del Ben, Mauro; da Jornada, Felipe; Canning, Andrew; Louie, Steven

The NERSC Cori system, powered by 9000+ Intel Xeon-Phi processors, represents one of the largest HPC systems for open-science in the United States and the world. We discuss the optimization of the GW methodology for this system, including both node level and system-scale optimizations. We highlight multiple large scale (thousands of atoms) case studies and discuss both absolute application performance and comparison to calculations on more traditional HPC architectures. We find that the GW method is particularly well suited for many-core architectures due to the ability to exploit a large amount of parallelism across many layers of the system. This work was supported by the U.S. Department of Energy, Office of Science, Basic Energy Sciences, Materials Sciences and Engineering Division, as part of the Computational Materials Sciences Program.
Large-Scale Multi-Resolution Representations for Accurate Interactive Image and Volume Operations

KAUST Repository

Sicat, Ronell B.

2015-11-25

The resolutions of acquired image and volume data are ever increasing. However, the resolutions of commodity display devices remain limited. This leads to an increasing gap between data and display resolutions. To bridge this gap, the standard approach is to employ output-sensitive operations on multi-resolution data representations. Output-sensitive operations facilitate interactive applications since their required computations are proportional only to the size of the data that is visible, i.e., the output, and not the full size of the input. Multi-resolution representations, such as image mipmaps, and volume octrees, are crucial in providing these operations direct access to any subset of the data at any resolution corresponding to the output. Despite its widespread use, this standard approach has some shortcomings in three important application areas, namely non-linear image operations, multi-resolution volume rendering, and large-scale image exploration. This dissertation presents new multi-resolution representations for large-scale images and volumes that address these shortcomings. Standard multi-resolution representations require low-pass pre-filtering for anti- aliasing. However, linear pre-filters do not commute with non-linear operations. This becomes problematic when applying non-linear operations directly to any coarse resolution levels in standard representations. Particularly, this leads to inaccurate output when applying non-linear image operations, e.g., color mapping and detail-aware filters, to multi-resolution images. Similarly, in multi-resolution volume rendering, this leads to inconsistency artifacts which manifest as erroneous differences in rendering outputs across resolution levels. To address these issues, we introduce the sparse pdf maps and sparse pdf volumes representations for large-scale images and volumes, respectively. These representations sparsely encode continuous probability density functions (pdfs) of multi-resolution pixel
A Decentralized Multivariable Robust Adaptive Voltage and Speed Regulator for Large-Scale Power Systems

Science.gov (United States)

Okou, Francis A.; Akhrif, Ouassima; Dessaint, Louis A.; Bouchard, Derrick

2013-05-01

This papter introduces a decentralized multivariable robust adaptive voltage and frequency regulator to ensure the stability of large-scale interconnnected generators. Interconnection parameters (i.e. load, line and transormer parameters) are assumed to be unknown. The proposed design approach requires the reformulation of conventiaonal power system models into a multivariable model with generator terminal voltages as state variables, and excitation and turbine valve inputs as control signals. This model, while suitable for the application of modern control methods, introduces problems with regards to current design techniques for large-scale systems. Interconnection terms, which are treated as perturbations, do not meet the common matching condition assumption. A new adaptive method for a certain class of large-scale systems is therefore introduces that does not require the matching condition. The proposed controller consists of nonlinear inputs that cancel some nonlinearities of the model. Auxiliary controls with linear and nonlinear components are used to stabilize the system. They compensate unknown parametes of the model by updating both the nonlinear component gains and excitation parameters. The adaptation algorithms involve the sigma-modification approach for auxiliary control gains, and the projection approach for excitation parameters to prevent estimation drift. The computation of the matrix-gain of the controller linear component requires the resolution of an algebraic Riccati equation and helps to solve the perturbation-mismatching problem. A realistic power system is used to assess the proposed controller performance. The results show that both stability and transient performance are considerably improved following a severe contingency.
Large scale electrolysers

International Nuclear Information System (INIS)

B Bello; M Junker

2006-01-01

Hydrogen production by water electrolysis represents nearly 4 % of the world hydrogen production. Future development of hydrogen vehicles will require large quantities of hydrogen. Installation of large scale hydrogen production plants will be needed. In this context, development of low cost large scale electrolysers that could use 'clean power' seems necessary. ALPHEA HYDROGEN, an European network and center of expertise on hydrogen and fuel cells, has performed for its members a study in 2005 to evaluate the potential of large scale electrolysers to produce hydrogen in the future. The different electrolysis technologies were compared. Then, a state of art of the electrolysis modules currently available was made. A review of the large scale electrolysis plants that have been installed in the world was also realized. The main projects related to large scale electrolysis were also listed. Economy of large scale electrolysers has been discussed. The influence of energy prices on the hydrogen production cost by large scale electrolysis was evaluated. (authors)
Reconstructing Information in Large-Scale Structure via Logarithmic Mapping

Science.gov (United States)

Szapudi, Istvan

We propose to develop a new method to extract information from large-scale structure data combining two-point statistics and non-linear transformations; before, this information was available only with substantially more complex higher-order statistical methods. Initially, most of the cosmological information in large-scale structure lies in two-point statistics. With non- linear evolution, some of that useful information leaks into higher-order statistics. The PI and group has shown in a series of theoretical investigations how that leakage occurs, and explained the Fisher information plateau at smaller scales. This plateau means that even as more modes are added to the measurement of the power spectrum, the total cumulative information (loosely speaking the inverse errorbar) is not increasing. Recently we have shown in Neyrinck et al. (2009, 2010) that a logarithmic (and a related Gaussianization or Box-Cox) transformation on the non-linear Dark Matter or galaxy field reconstructs a surprisingly large fraction of this missing Fisher information of the initial conditions. This was predicted by the earlier wave mechanical formulation of gravitational dynamics by Szapudi & Kaiser (2003). The present proposal is focused on working out the theoretical underpinning of the method to a point that it can be used in practice to analyze data. In particular, one needs to deal with the usual real-life issues of galaxy surveys, such as complex geometry, discrete sam- pling (Poisson or sub-Poisson noise), bias (linear, or non-linear, deterministic, or stochastic), redshift distortions, pro jection effects for 2D samples, and the effects of photometric redshift errors. We will develop methods for weak lensing and Sunyaev-Zeldovich power spectra as well, the latter specifically targetting Planck. In addition, we plan to investigate the question of residual higher- order information after the non-linear mapping, and possible applications for cosmology. Our aim will be to work out
The relationship between small-scale and large-scale ionospheric electron density irregularities generated by powerful HF electromagnetic waves at high latitudes

Directory of Open Access Journals (Sweden)

E. D. Tereshchenko

2006-11-01

Full Text Available Satellite radio beacons were used in June 2001 to probe the ionosphere modified by a radio beam produced by the EISCAT high-power, high-frequency (HF transmitter located near Tromsø (Norway. Amplitude scintillations and variations of the phase of 150- and 400-MHz signals from Russian navigational satellites passing over the modified region were observed at three receiver sites. In several papers it has been stressed that in the polar ionosphere the thermal self-focusing on striations during ionospheric modification is the main mechanism resulting in the formation of large-scale (hundreds of meters to kilometers nonlinear structures aligned along the geomagnetic field (magnetic zenith effect. It has also been claimed that the maximum effects caused by small-scale (tens of meters irregularities detected in satellite signals are also observed in the direction parallel to the magnetic field. Contrary to those studies, the present paper shows that the maximum in amplitude scintillations does not correspond strictly to the magnetic zenith direction because high latitude drifts typically cause a considerable anisotropy of small-scale irregularities in a plane perpendicular to the geomagnetic field resulting in a deviation of the amplitude-scintillation peak relative to the minimum angle between the line-of-sight to the satellite and direction of the geomagnetic field lines. The variance of the logarithmic relative amplitude fluctuations is considered here, which is a useful quantity in such studies. The experimental values of the variance are compared with model calculations and good agreement has been found. It is also shown from the experimental data that in most of the satellite passes a variance maximum occurs at a minimum in the phase fluctuations indicating that the artificial excitation of large-scale irregularities is minimum when the excitation of small-scale irregularities is maximum.
Computational chaos in massively parallel neural networks

Science.gov (United States)

Barhen, Jacob; Gulati, Sandeep

1989-01-01

A fundamental issue which directly impacts the scalability of current theoretical neural network models to massively parallel embodiments, in both software as well as hardware, is the inherent and unavoidable concurrent asynchronicity of emerging fine-grained computational ensembles and the possible emergence of chaotic manifestations. Previous analyses attributed dynamical instability to the topology of the interconnection matrix, to parasitic components or to propagation delays. However, researchers have observed the existence of emergent computational chaos in a concurrently asynchronous framework, independent of the network topology. Researcher present a methodology enabling the effective asynchronous operation of large-scale neural networks. Necessary and sufficient conditions guaranteeing concurrent asynchronous convergence are established in terms of contracting operators. Lyapunov exponents are computed formally to characterize the underlying nonlinear dynamics. Simulation results are presented to illustrate network convergence to the correct results, even in the presence of large delays.
Parallel computing works!

CERN Document Server

Fox, Geoffrey C; Messina, Guiseppe C

2014-01-01

A clear illustration of how parallel computers can be successfully appliedto large-scale scientific computations. This book demonstrates how avariety of applications in physics, biology, mathematics and other scienceswere implemented on real parallel computers to produce new scientificresults. It investigates issues of fine-grained parallelism relevant forfuture supercomputers with particular emphasis on hypercube architecture. The authors describe how they used an experimental approach to configuredifferent massively parallel machines, design and implement basic systemsoftware, and develop
Generation and saturation of large-scale flows in flute turbulence

International Nuclear Information System (INIS)

Sandberg, I.; Isliker, H.; Pavlenko, V. P.; Hizanidis, K.; Vlahos, L.

2005-01-01

The excitation and suppression of large-scale anisotropic modes during the temporal evolution of a magnetic-curvature-driven electrostatic flute instability are numerically investigated. The formation of streamerlike structures is attributed to the linear development of the instability while the subsequent excitation of the zonal modes is the result of the nonlinear coupling between linearly grown flute modes. When the amplitudes of the zonal modes become of the same order as that of the streamer modes, the flute instabilities get suppressed and poloidal (zonal) flows dominate. In the saturated state that follows, the dominant large-scale modes of the potential and the density are self-organized in different ways, depending on the value of the ion temperature
Scaling of chaos in strongly nonlinear lattices.

Science.gov (United States)

Mulansky, Mario

2014-06-01

Although it is now understood that chaos in complex classical systems is the foundation of thermodynamic behavior, the detailed relations between the microscopic properties of the chaotic dynamics and the macroscopic thermodynamic observations still remain mostly in the dark. In this work, we numerically analyze the probability of chaos in strongly nonlinear Hamiltonian systems and find different scaling properties depending on the nonlinear structure of the model. We argue that these different scaling laws of chaos have definite consequences for the macroscopic diffusive behavior, as chaos is the microscopic mechanism of diffusion. This is compared with previous results on chaotic diffusion [M. Mulansky and A. Pikovsky, New J. Phys. 15, 053015 (2013)], and a relation between microscopic chaos and macroscopic diffusion is established.
NONLINEAR DYNAMICS OF CARBON NANOTUBES UNDER LARGE ELECTROSTATIC FORCE

KAUST Repository

Xu, Tiantian

2015-06-01

Because of the inherent nonlinearities involving the behavior of CNTs when excited by electrostatic forces, modeling and simulating their behavior is challenging. The complicated form of the electrostatic force describing the interaction of their cylindrical shape, forming upper electrodes, to lower electrodes poises serious computational challenges. This presents an obstacle against applying and using several nonlinear dynamics tools typically used to analyze the behavior of complicated nonlinear systems undergoing large motion, such as shooting, continuation, and integrity analysis techniques. This works presents an attempt to resolve this issue. We present an investigation of the nonlinear dynamics of carbon nanotubes when actuated by large electrostatic forces. We study expanding the complicated form of the electrostatic force into enough number of terms of the Taylor series. Then, we utilize this form along with an Euler-Bernoulli beam model to study for the first time the dynamic behavior of CNTs when excited by large electrostatic force. The geometric nonlinearity and the nonlinear electrostatic force are considered. An efficient reduced-order model (ROM) based on the Galerkin method is developed and utilized to simulate the static and dynamic responses of the CNTs. Several results are generated demonstrating softening and hardening behavior of the CNTs near their primary and secondary resonances. The effects of the DC and AC voltage loads on the behavior have been studied. The impacts of the initial slack level and CNT diameter are also demonstrated.
Nonlinear Dynamics of Carbon Nanotubes Under Large Electrostatic Force

KAUST Repository

Xu, Tiantian

2015-06-01

Because of the inherent nonlinearities involving the behavior of CNTs when excited by electrostatic forces, modeling and simulating their behavior is challenging. The complicated form of the electrostatic force describing the interaction of their cylindrical shape, forming upper electrodes, to lower electrodes poises serious computational challenges. This presents an obstacle against applying and using several nonlinear dynamics tools typically used to analyze the behavior of complicated nonlinear systems undergoing large motion, such as shooting, continuation, and integrity analysis techniques. This works presents an attempt to resolve this issue. We present an investigation of the nonlinear dynamics of carbon nanotubes when actuated by large electrostatic forces. We study expanding the complicated form of the electrostatic force into enough number of terms of the Taylor series. Then, we utilize this form along with an Euler-Bernoulli beam model to study for the first time the dynamic behavior of CNTs when excited by large electrostatic force. The geometric nonlinearity and the nonlinear electrostatic force are considered. An efficient reduced-order model (ROM) based on the Galerkin method is developed and utilized to simulate the static and dynamic responses of the CNTs. Several results are generated demonstrating softening and hardening behavior of the CNTs near their primary and secondary resonances. The effects of the DC and AC voltage loads on the behavior have been studied. The impacts of the initial slack level and CNT diameter are also demonstrated.
A parallel orbital-updating based plane-wave basis method for electronic structure calculations

International Nuclear Information System (INIS)

Pan, Yan; Dai, Xiaoying; Gironcoli, Stefano de; Gong, Xin-Gao; Rignanese, Gian-Marco; Zhou, Aihui

2017-01-01

Highlights: • Propose three parallel orbital-updating based plane-wave basis methods for electronic structure calculations. • These new methods can avoid the generating of large scale eigenvalue problems and then reduce the computational cost. • These new methods allow for two-level parallelization which is particularly interesting for large scale parallelization. • Numerical experiments show that these new methods are reliable and efficient for large scale calculations on modern supercomputers. - Abstract: Motivated by the recently proposed parallel orbital-updating approach in real space method , we propose a parallel orbital-updating based plane-wave basis method for electronic structure calculations, for solving the corresponding eigenvalue problems. In addition, we propose two new modified parallel orbital-updating methods. Compared to the traditional plane-wave methods, our methods allow for two-level parallelization, which is particularly interesting for large scale parallelization. Numerical experiments show that these new methods are more reliable and efficient for large scale calculations on modern supercomputers.
Large-scale hydrogen production using nuclear reactors

Energy Technology Data Exchange (ETDEWEB)

Ryland, D.; Stolberg, L.; Kettner, A.; Gnanapragasam, N.; Suppiah, S. [Atomic Energy of Canada Limited, Chalk River, ON (Canada)

2014-07-01

For many years, Atomic Energy of Canada Limited (AECL) has been studying the feasibility of using nuclear reactors, such as the Supercritical Water-cooled Reactor, as an energy source for large scale hydrogen production processes such as High Temperature Steam Electrolysis and the Copper-Chlorine thermochemical cycle. Recent progress includes the augmentation of AECL's experimental capabilities by the construction of experimental systems to test high temperature steam electrolysis button cells at ambient pressure and temperatures up to 850{sup o}C and CuCl/HCl electrolysis cells at pressures up to 7 bar and temperatures up to 100{sup o}C. In parallel, detailed models of solid oxide electrolysis cells and the CuCl/HCl electrolysis cell are being refined and validated using experimental data. Process models are also under development to assess options for economic integration of these hydrogen production processes with nuclear reactors. Options for large-scale energy storage, including hydrogen storage, are also under study. (author)
PALNS - A software framework for parallel large neighborhood search

DEFF Research Database (Denmark)

Røpke, Stefan

2009-01-01

This paper propose a simple, parallel, portable software framework for the metaheuristic named large neighborhood search (LNS). The aim is to provide a framework where the user has to set up a few data structures and implement a few functions and then the framework provides a metaheuristic where ...... parallelization "comes for free". We apply the parallel LNS heuristic to two different problems: the traveling salesman problem with pickup and delivery (TSPPD) and the capacitated vehicle routing problem (CVRP)....
Large scale simulations of lattice QCD thermodynamics on Columbia Parallel Supercomputers

International Nuclear Information System (INIS)

Ohta, Shigemi

1989-01-01

The Columbia Parallel Supercomputer project aims at the construction of a parallel processing, multi-gigaflop computer optimized for numerical simulations of lattice QCD. The project has three stages; 16-node, 1/4GF machine completed in April 1985, 64-node, 1GF machine completed in August 1987, and 256-node, 16GF machine now under construction. The machines all share a common architecture; a two dimensional torus formed from a rectangular array of N 1 x N 2 independent and identical processors. A processor is capable of operating in a multi-instruction multi-data mode, except for periods of synchronous interprocessor communication with its four nearest neighbors. Here the thermodynamics simulations on the two working machines are reported. (orig./HSI)
Using Agent Base Models to Optimize Large Scale Network for Large System Inventories

Science.gov (United States)

Shameldin, Ramez Ahmed; Bowling, Shannon R.

2010-01-01

The aim of this paper is to use Agent Base Models (ABM) to optimize large scale network handling capabilities for large system inventories and to implement strategies for the purpose of reducing capital expenses. The models used in this paper either use computational algorithms or procedure implementations developed by Matlab to simulate agent based models in a principal programming language and mathematical theory using clusters, these clusters work as a high performance computational performance to run the program in parallel computational. In both cases, a model is defined as compilation of a set of structures and processes assumed to underlie the behavior of a network system.
Massively Parallel and Scalable Implicit Time Integration Algorithms for Structural Dynamics

Science.gov (United States)

Farhat, Charbel

1997-01-01

Explicit codes are often used to simulate the nonlinear dynamics of large-scale structural systems, even for low frequency response, because the storage and CPU requirements entailed by the repeated factorizations traditionally found in implicit codes rapidly overwhelm the available computing resources. With the advent of parallel processing, this trend is accelerating because of the following additional facts: (a) explicit schemes are easier to parallelize than implicit ones, and (b) explicit schemes induce short range interprocessor communications that are relatively inexpensive, while the factorization methods used in most implicit schemes induce long range interprocessor communications that often ruin the sought-after speed-up. However, the time step restriction imposed by the Courant stability condition on all explicit schemes cannot yet be offset by the speed of the currently available parallel hardware. Therefore, it is essential to develop efficient alternatives to direct methods that are also amenable to massively parallel processing because implicit codes using unconditionally stable time-integration algorithms are computationally more efficient when simulating the low-frequency dynamics of aerospace structures.
Cosmological streaming velocities and large-scale density maxima

International Nuclear Information System (INIS)

Peacock, J.A.; Lumsden, S.L.; Heavens, A.F.

1987-01-01

The statistical testing of models for galaxy formation against the observed peculiar velocities on 10-100 Mpc scales is considered. If it is assumed that observers are likely to be sited near maxima in the primordial field of density perturbations, then the observed filtered velocity field will be biased to low values by comparison with a point selected at random. This helps to explain how the peculiar velocities (relative to the microwave background) of the local supercluster and the Rubin-Ford shell can be so similar in magnitude. Using this assumption to predict peculiar velocities on two scales, we test models with large-scale damping (i.e. adiabatic perturbations). Allowed models have a damping length close to the Rubin-Ford scale and are mildly non-linear. Both purely baryonic universes and universes dominated by massive neutrinos can account for the observed velocities, provided 0.1 ≤ Ω ≤ 1. (author)

Calculations on nonlinear optical properties for large systems the elongation method

CERN Document Server

Gu, Feng Long; Springborg, Michael; Kirtman, Bernard

2014-01-01

For design purposes one needs to relate the structure of proposed materials to their NLO (nonlinear optical) and other properties, which is a situation where theoretical approaches can be very helpful in providing suggestions for candidate systems that subsequently can be synthesized and studied experimentally. This brief describes the quantum-mechanical treatment of the response to one or more external oscillating electric fields for molecular and macroscopic, crystalline systems. To calculate NLO properties of large systems, a linear scaling generalized elongation method for the efficient and accurate calculation is introduced. The reader should be aware that this treatment is particularly feasible for complicated three-dimensional and/or delocalized systems that are intractable when applied to conventional or other linear scaling methods.
Enabling parallel simulation of large-scale HPC network systems

International Nuclear Information System (INIS)

Mubarak, Misbah; Carothers, Christopher D.; Ross, Robert B.; Carns, Philip

2016-01-01

Here, with the increasing complexity of today’s high-performance computing (HPC) architectures, simulation has become an indispensable tool for exploring the design space of HPC systems—in particular, networks. In order to make effective design decisions, simulations of these systems must possess the following properties: (1) have high accuracy and fidelity, (2) produce results in a timely manner, and (3) be able to analyze a broad range of network workloads. Most state-of-the-art HPC network simulation frameworks, however, are constrained in one or more of these areas. In this work, we present a simulation framework for modeling two important classes of networks used in today’s IBM and Cray supercomputers: torus and dragonfly networks. We use the Co-Design of Multi-layer Exascale Storage Architecture (CODES) simulation framework to simulate these network topologies at a flit-level detail using the Rensselaer Optimistic Simulation System (ROSS) for parallel discrete-event simulation. Our simulation framework meets all the requirements of a practical network simulation and can assist network designers in design space exploration. First, it uses validated and detailed flit-level network models to provide an accurate and high-fidelity network simulation. Second, instead of relying on serial time-stepped or traditional conservative discrete-event simulations that limit simulation scalability and efficiency, we use the optimistic event-scheduling capability of ROSS to achieve efficient and scalable HPC network simulations on today’s high-performance cluster systems. Third, our models give network designers a choice in simulating a broad range of network workloads, including HPC application workloads using detailed network traces, an ability that is rarely offered in parallel with high-fidelity network simulations
Parameter Scaling in Non-Linear Microwave Tomography

DEFF Research Database (Denmark)

Jensen, Peter Damsgaard; Rubæk, Tonny; Talcoth, Oskar

2012-01-01

Non-linear microwave tomographic imaging of the breast is a challenging computational problem. The breast is heterogeneous and contains several high-contrast and lossy regions, resulting in large differences in the measured signal levels. This implies that special care must be taken when the imag......Non-linear microwave tomographic imaging of the breast is a challenging computational problem. The breast is heterogeneous and contains several high-contrast and lossy regions, resulting in large differences in the measured signal levels. This implies that special care must be taken when...... the imaging problem is formulated. Under such conditions, microwave imaging systems will most often be considerably more sensitive to changes in the electromagnetic properties in certain regions of the breast. The result is that the parameters might not be reconstructed correctly in the less sensitive regions...... introduced as a measure of the sensitivity. The scaling of the parameters is shown to improve performance of the microwave imaging system when applied to reconstruction of images from 2-D simulated data and measurement data....
Vacuum Large Current Parallel Transfer Numerical Analysis

Directory of Open Access Journals (Sweden)

Enyuan Dong

2014-01-01

Full Text Available The stable operation and reliable breaking of large generator current are a difficult problem in power system. It can be solved successfully by the parallel interrupters and proper timing sequence with phase-control technology, in which the strategy of breaker’s control is decided by the time of both the first-opening phase and second-opening phase. The precise transfer current’s model can provide the proper timing sequence to break the generator circuit breaker. By analysis of the transfer current’s experiments and data, the real vacuum arc resistance and precise correctional model in the large transfer current’s process are obtained in this paper. The transfer time calculated by the correctional model of transfer current is very close to the actual transfer time. It can provide guidance for planning proper timing sequence and breaking the vacuum generator circuit breaker with the parallel interrupters.
Performance Characteristics of Hybrid MPI/OpenMP Implementations of NAS Parallel Benchmarks SP and BT on Large-Scale Multicore Clusters

KAUST Repository

Wu, X.; Taylor, V.

2011-01-01

The NAS Parallel Benchmarks (NPB) are well-known applications with fixed algorithms for evaluating parallel systems and tools. Multicore clusters provide a natural programming paradigm for hybrid programs, whereby OpenMP can be used with the data sharing with the multicores that comprise a node, and MPI can be used with the communication between nodes. In this paper, we use Scalar Pentadiagonal (SP) and Block Tridiagonal (BT) benchmarks of MPI NPB 3.3 as a basis for a comparative approach to implement hybrid MPI/OpenMP versions of SP and BT. In particular, we can compare the performance of the hybrid SP and BT with the MPI counterparts on large-scale multicore clusters, Intrepid (BlueGene/P) at Argonne National Laboratory and Jaguar (Cray XT4/5) at Oak Ridge National Laboratory. Our performance results indicate that the hybrid SP outperforms the MPI SP by up to 20.76 %, and the hybrid BT outperforms the MPI BT by up to 8.58 % on up to 10 000 cores on Intrepid and Jaguar. We also use performance tools and MPI trace libraries available on these clusters to further investigate the performance characteristics of the hybrid SP and BT. © 2011 The Author. Published by Oxford University Press on behalf of The British Computer Society. All rights reserved.
Performance Characteristics of Hybrid MPI/OpenMP Implementations of NAS Parallel Benchmarks SP and BT on Large-Scale Multicore Clusters

KAUST Repository

Wu, X.

2011-07-18

The NAS Parallel Benchmarks (NPB) are well-known applications with fixed algorithms for evaluating parallel systems and tools. Multicore clusters provide a natural programming paradigm for hybrid programs, whereby OpenMP can be used with the data sharing with the multicores that comprise a node, and MPI can be used with the communication between nodes. In this paper, we use Scalar Pentadiagonal (SP) and Block Tridiagonal (BT) benchmarks of MPI NPB 3.3 as a basis for a comparative approach to implement hybrid MPI/OpenMP versions of SP and BT. In particular, we can compare the performance of the hybrid SP and BT with the MPI counterparts on large-scale multicore clusters, Intrepid (BlueGene/P) at Argonne National Laboratory and Jaguar (Cray XT4/5) at Oak Ridge National Laboratory. Our performance results indicate that the hybrid SP outperforms the MPI SP by up to 20.76 %, and the hybrid BT outperforms the MPI BT by up to 8.58 % on up to 10 000 cores on Intrepid and Jaguar. We also use performance tools and MPI trace libraries available on these clusters to further investigate the performance characteristics of the hybrid SP and BT. © 2011 The Author. Published by Oxford University Press on behalf of The British Computer Society. All rights reserved.
Development of design technology on thermal-hydraulic performance in tight-lattice rod bundle. 4. Large paralleled simulation by the advanced two-fluid model code

International Nuclear Information System (INIS)

Misawa, Takeharu; Yoshida, Hiroyuki; Akimoto, Hajime

2008-01-01

In Japan Atomic Energy Agency (JAEA), the Innovative Water Reactor for Flexible Fuel Cycle (FLWR) has been developed. For thermal design of FLWR, it is necessary to develop analytical method to predict boiling transition of FLWR. Japan Atomic Energy Agency (JAEA) has been developing three-dimensional two-fluid model analysis code ACE-3D, which adopts boundary fitted coordinate system to simulate complex shape channel flow. In this paper, as a part of development of ACE-3D to apply to rod bundle analysis, introduction of parallelization to ACE-3D and assessments of ACE-3D are shown. In analysis of large-scale domain such as a rod bundle, even two-fluid model requires large number of computational cost, which exceeds upper limit of memory amount of 1 CPU. Therefore, parallelization was introduced to ACE-3D to divide data amount for analysis of large-scale domain among large number of CPUs, and it is confirmed that analysis of large-scale domain such as a rod bundle can be performed by parallel computation with keeping parallel computation performance even using large number of CPUs. ACE-3D adopts two-phase flow models, some of which are dependent upon channel geometry. Therefore, analyses in the domains, which simulate individual subchannel and 37 rod bundle, are performed, and compared with experiments. It is confirmed that the results obtained by both analyses using ACE-3D show agreement with past experimental result qualitatively. (author)
Scaling Optimization of the SIESTA MHD Code

Science.gov (United States)

Seal, Sudip; Hirshman, Steven; Perumalla, Kalyan

2013-10-01

SIESTA is a parallel three-dimensional plasma equilibrium code capable of resolving magnetic islands at high spatial resolutions for toroidal plasmas. Originally designed to exploit small-scale parallelism, SIESTA has now been scaled to execute efficiently over several thousands of processors P. This scaling improvement was accomplished with minimal intrusion to the execution flow of the original version. First, the efficiency of the iterative solutions was improved by integrating the parallel tridiagonal block solver code BCYCLIC. Krylov-space generation in GMRES was then accelerated using a customized parallel matrix-vector multiplication algorithm. Novel parallel Hessian generation algorithms were integrated and memory access latencies were dramatically reduced through loop nest optimizations and data layout rearrangement. These optimizations sped up equilibria calculations by factors of 30-50. It is possible to compute solutions with granularity N/P near unity on extremely fine radial meshes (N > 1024 points). Grid separation in SIESTA, which manifests itself primarily in the resonant components of the pressure far from rational surfaces, is strongly suppressed by finer meshes. Large problem sizes of up to 300 K simultaneous non-linear coupled equations have been solved on the NERSC supercomputers. Work supported by U.S. DOE under Contract DE-AC05-00OR22725 with UT-Battelle, LLC.
The nonlinear Galerkin method: A multi-scale method applied to the simulation of homogeneous turbulent flows

Science.gov (United States)

Debussche, A.; Dubois, T.; Temam, R.

1993-01-01

Using results of Direct Numerical Simulation (DNS) in the case of two-dimensional homogeneous isotropic flows, the behavior of the small and large scales of Kolmogorov like flows at moderate Reynolds numbers are first analyzed in detail. Several estimates on the time variations of the small eddies and the nonlinear interaction terms were derived; those terms play the role of the Reynolds stress tensor in the case of LES. Since the time step of a numerical scheme is determined as a function of the energy-containing eddies of the flow, the variations of the small scales and of the nonlinear interaction terms over one iteration can become negligible by comparison with the accuracy of the computation. Based on this remark, a multilevel scheme which treats differently the small and the large eddies was proposed. Using mathematical developments, estimates of all the parameters involved in the algorithm, which then becomes a completely self-adaptive procedure were derived. Finally, realistic simulations of (Kolmorov like) flows over several eddy-turnover times were performed. The results are analyzed in detail and a parametric study of the nonlinear Galerkin method is performed.
Final report LDRD project 105816 : model reduction of large dynamic systems with localized nonlinearities.

Energy Technology Data Exchange (ETDEWEB)

Lehoucq, Richard B.; Segalman, Daniel Joseph; Hetmaniuk, Ulrich L. (University of Washington, Seattle, WA); Dohrmann, Clark R.

2009-10-01

Advanced computing hardware and software written to exploit massively parallel architectures greatly facilitate the computation of extremely large problems. On the other hand, these tools, though enabling higher fidelity models, have often resulted in much longer run-times and turn-around-times in providing answers to engineering problems. The impediments include smaller elements and consequently smaller time steps, much larger systems of equations to solve, and the inclusion of nonlinearities that had been ignored in days when lower fidelity models were the norm. The research effort reported focuses on the accelerating the analysis process for structural dynamics though combinations of model reduction and mitigation of some factors that lead to over-meshing.
Xyce parallel electronic simulator : users' guide.

Energy Technology Data Exchange (ETDEWEB)

Mei, Ting; Rankin, Eric Lamont; Thornquist, Heidi K.; Santarelli, Keith R.; Fixel, Deborah A.; Coffey, Todd Stirling; Russo, Thomas V.; Schiek, Richard Louis; Warrender, Christina E.; Keiter, Eric Richard; Pawlowski, Roger Patrick

2011-05-01

This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: (1) Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). Note that this includes support for most popular parallel and serial computers; (2) Improved performance for all numerical kernels (e.g., time integrator, nonlinear and linear solvers) through state-of-the-art algorithms and novel techniques. (3) Device models which are specifically tailored to meet Sandia's needs, including some radiation-aware devices (for Sandia users only); and (4) Object-oriented code design and implementation using modern coding practices that ensure that the Xyce Parallel Electronic Simulator will be maintainable and extensible far into the future. Xyce is a parallel code in the most general sense of the phrase - a message passing parallel implementation - which allows it to run efficiently on the widest possible number of computing platforms. These include serial, shared-memory and distributed-memory parallel as well as heterogeneous platforms. Careful attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows. The development of Xyce provides a platform for computational research and development aimed specifically at the needs of the Laboratory. With Xyce, Sandia has an 'in-house' capability with which both new electrical (e.g., device model development) and algorithmic (e.g., faster time-integration methods, parallel solver algorithms) research and development can be performed. As a result, Xyce is
Nonlinear Force-free Field Extrapolation of a Coronal Magnetic Flux Rope Supporting a Large-scale Solar Filament from a Photospheric Vector Magnetogram

Science.gov (United States)

Jiang, Chaowei; Wu, S. T.; Feng, Xueshang; Hu, Qiang

2014-05-01

Solar filaments are commonly thought to be supported in magnetic dips, in particular, in those of magnetic flux ropes (FRs). In this Letter, based on the observed photospheric vector magnetogram, we implement a nonlinear force-free field (NLFFF) extrapolation of a coronal magnetic FR that supports a large-scale intermediate filament between an active region and a weak polarity region. This result is a first, in the sense that current NLFFF extrapolations including the presence of FRs are limited to relatively small-scale filaments that are close to sunspots and along main polarity inversion lines (PILs) with strong transverse field and magnetic shear, and the existence of an FR is usually predictable. In contrast, the present filament lies along the weak-field region (photospheric field strength barbs very well, which strongly supports the FR-dip model for filaments. The filament is stably sustained because the FR is weakly twisted and strongly confined by the overlying closed arcades.
Parallel Implementation of the Multi-Dimensional Spectral Code SPECT3D on large 3D grids.

Science.gov (United States)

Golovkin, Igor E.; Macfarlane, Joseph J.; Woodruff, Pamela R.; Pereyra, Nicolas A.

2006-10-01

The multi-dimensional collisional-radiative, spectral analysis code SPECT3D can be used to study radiation from complex plasmas. SPECT3D can generate instantaneous and time-gated images and spectra, space-resolved and streaked spectra, which makes it a valuable tool for post-processing hydrodynamics calculations and direct comparison between simulations and experimental data. On large three dimensional grids, transporting radiation along lines of sight (LOS) requires substantial memory and CPU resources. Currently, the parallel option in SPECT3D is based on parallelization over photon frequencies and allows for a nearly linear speed-up for a variety of problems. In addition, we are introducing a new parallel mechanism that will greatly reduce memory requirements. In the new implementation, spatial domain decomposition will be utilized allowing transport along a LOS to be performed only on the mesh cells the LOS crosses. The ability to operate on a fraction of the grid is crucial for post-processing the results of large-scale three-dimensional hydrodynamics simulations. We will present a parallel implementation of the code and provide a scalability study performed on a Linux cluster.
Parallel algorithms for large-scale biological sequence alignment on Xeon-Phi based clusters.

Science.gov (United States)

Lan, Haidong; Chan, Yuandong; Xu, Kai; Schmidt, Bertil; Peng, Shaoliang; Liu, Weiguo

2016-07-19

Computing alignments between two or more sequences are common operations frequently performed in computational molecular biology. The continuing growth of biological sequence databases establishes the need for their efficient parallel implementation on modern accelerators. This paper presents new approaches to high performance biological sequence database scanning with the Smith-Waterman algorithm and the first stage of progressive multiple sequence alignment based on the ClustalW heuristic on a Xeon Phi-based compute cluster. Our approach uses a three-level parallelization scheme to take full advantage of the compute power available on this type of architecture; i.e. cluster-level data parallelism, thread-level coarse-grained parallelism, and vector-level fine-grained parallelism. Furthermore, we re-organize the sequence datasets and use Xeon Phi shuffle operations to improve I/O efficiency. Evaluations show that our method achieves a peak overall performance up to 220 GCUPS for scanning real protein sequence databanks on a single node consisting of two Intel E5-2620 CPUs and two Intel Xeon Phi 7110P cards. It also exhibits good scalability in terms of sequence length and size, and number of compute nodes for both database scanning and multiple sequence alignment. Furthermore, the achieved performance is highly competitive in comparison to optimized Xeon Phi and GPU implementations. Our implementation is available at https://github.com/turbo0628/LSDBS-mpi .
On the interaction of small-scale linear waves with nonlinear solitary waves

Science.gov (United States)

Xu, Chengzhu; Stastna, Marek

2017-04-01

In the study of environmental and geophysical fluid flows, linear wave theory is well developed and its application has been considered for phenomena of various length and time scales. However, due to the nonlinear nature of fluid flows, in many cases results predicted by linear theory do not agree with observations. One of such cases is internal wave dynamics. While small-amplitude wave motion may be approximated by linear theory, large amplitude waves tend to be solitary-like. In some cases, when the wave is highly nonlinear, even weakly nonlinear theories fail to predict the wave properties correctly. We study the interaction of small-scale linear waves with nonlinear solitary waves using highly accurate pseudo spectral simulations that begin with a fully nonlinear solitary wave and a train of small-amplitude waves initialized from linear waves. The solitary wave then interacts with the linear waves through either an overtaking collision or a head-on collision. During the collision, there is a net energy transfer from the linear wave train to the solitary wave, resulting in an increase in the kinetic energy carried by the solitary wave and a phase shift of the solitary wave with respect to a freely propagating solitary wave. At the same time the linear waves are greatly reduced in amplitude. The percentage of energy transferred depends primarily on the wavelength of the linear waves. We found that after one full collision cycle, the longest waves may retain as much as 90% of the kinetic energy they had initially, while the shortest waves lose almost all of their initial energy. We also found that a head-on collision is more efficient in destroying the linear waves than an overtaking collision. On the other hand, the initial amplitude of the linear waves has very little impact on the percentage of energy that can be transferred to the solitary wave. Because of the nonlinearity of the solitary wave, these results provide us some insight into wave-mean flow
A parallel form of the Gudjonsson Suggestibility Scale.

Science.gov (United States)

Gudjonsson, G H

1987-09-01

The purpose of this study is twofold: (1) to present a parallel form of the Gudjonsson Suggestibility Scale (GSS, Form 1); (2) to study test-retest reliabilities of interrogative suggestibility. Three groups of subjects were administered the two suggestibility scales in a counterbalanced order. Group 1 (28 normal subjects) and Group 2 (32 'forensic' patients) completed both scales within the same testing session, whereas Group 3 (30 'forensic' patients) completed the two scales between one week and eight months apart. All the correlations were highly significant, giving support for high 'temporal consistency' of interrogative suggestibility.
Constructing sites on a large scale

DEFF Research Database (Denmark)

Braae, Ellen Marie; Tietjen, Anne

2011-01-01

Since the 1990s, the regional scale has regained importance in urban and landscape design. In parallel, the focus in design tasks has shifted from master plans for urban extension to strategic urban transformation projects. A prominent example of a contemporary spatial development approach...... for setting the design brief in a large scale urban landscape in Norway, the Jaeren region around the city of Stavanger. In this paper, we first outline the methodological challenges and then present and discuss the proposed method based on our teaching experiences. On this basis, we discuss aspects...... is the IBA Emscher Park in the Ruhr area in Germany. Over a 10 years period (1988-1998), more than a 100 local transformation projects contributed to the transformation from an industrial to a post-industrial region. The current paradigm of planning by projects reinforces the role of the design disciplines...
Large scale parallel FEM computations of far/near stress field changes in rocks

Czech Academy of Sciences Publication Activity Database

Blaheta, Radim; Byczanski, Petr; Jakl, Ondřej; Kohut, Roman; Kolcun, Alexej; Krečmer, Karel; Starý, Jiří

2006-01-01

Roč. 22, č. 4 (2006), s. 449-459 ISSN 0167-739X R&D Projects: GA ČR(CZ) GA105/02/0492; GA AV ČR(CZ) 1ET400300415 Institutional research plan: CEZ:AV0Z30860518 Keywords : large scale finite element analysis Subject RIV: BA - General Mathematics Impact factor: 0.722, year: 2006
High performance parallel I/O

CERN Document Server

Prabhat

2014-01-01

Gain Critical Insight into the Parallel I/O EcosystemParallel I/O is an integral component of modern high performance computing (HPC), especially in storing and processing very large datasets to facilitate scientific discovery. Revealing the state of the art in this field, High Performance Parallel I/O draws on insights from leading practitioners, researchers, software architects, developers, and scientists who shed light on the parallel I/O ecosystem.The first part of the book explains how large-scale HPC facilities scope, configure, and operate systems, with an emphasis on choices of I/O har
Nonlinear interaction of a parallel-flow relativistic electron beam with a plasma

International Nuclear Information System (INIS)

Jungwirth, K.; Koerbel, S.; Simon, P.; Vrba, P.

1975-01-01

Nonlinear evolution of single-mode high-frequency instabilities (ω approximately ksub(parallel)vsub(b)) excited by a parallel-flow high-current relativistic electron beam in a magnetized plasma is investigated. Fairly general dimensionless equations are derived. They describe both the temporal and the spatial evolution of amplitude and phase of the fundamental wave. Numerically, the special case of excitation of the linearly most unstable mode is solved in detail assuming that the wave energy dissipation is negligible. Then the strength of interaction and the relativistic properties of the beam are fully respected by a single parameter lambda. The value of lambda ensuring the optimum efficiency of the wave excitation as well as the efficiency of the self-acceleration of some beam electrons at higher values of lambda>1 are determined in the case of a fully compensated relativistic beam. Finally, the effect of the return current dissipation is also included (phenomenologically) into the theoretical model, its role for the beam-plasma interaction being checked numerically. (J.U.)

Large-scale pool fires

Directory of Open Access Journals (Sweden)

Steinhaus Thomas

2007-01-01

Full Text Available A review of research into the burning behavior of large pool fires and fuel spill fires is presented. The features which distinguish such fires from smaller pool fires are mainly associated with the fire dynamics at low source Froude numbers and the radiative interaction with the fire source. In hydrocarbon fires, higher soot levels at increased diameters result in radiation blockage effects around the perimeter of large fire plumes; this yields lower emissive powers and a drastic reduction in the radiative loss fraction; whilst there are simplifying factors with these phenomena, arising from the fact that soot yield can saturate, there are other complications deriving from the intermittency of the behavior, with luminous regions of efficient combustion appearing randomly in the outer surface of the fire according the turbulent fluctuations in the fire plume. Knowledge of the fluid flow instabilities, which lead to the formation of large eddies, is also key to understanding the behavior of large-scale fires. Here modeling tools can be effectively exploited in order to investigate the fluid flow phenomena, including RANS- and LES-based computational fluid dynamics codes. The latter are well-suited to representation of the turbulent motions, but a number of challenges remain with their practical application. Massively-parallel computational resources are likely to be necessary in order to be able to adequately address the complex coupled phenomena to the level of detail that is necessary.
Generation of weakly nonlinear nonhydrostatic internal tides over large topography: a multi-modal approach

Directory of Open Access Journals (Sweden)

R. Maugé

2008-03-01

Full Text Available A set of evolution equations is derived for the modal coefficients in a weakly nonlinear nonhydrostatic internal-tide generation problem. The equations allow for the presence of large-amplitude topography, e.g. a continental slope, which is formally assumed to have a length scale much larger than that of the internal tide. However, comparison with results from more sophisticated numerical models show that this restriction can in practice be relaxed. It is shown that a topographically induced coupling between modes occurs that is distinct from nonlinear coupling. Nonlinear effects include the generation of higher harmonics by reflection from boundaries, i.e. steeper tidal beams at frequencies that are multiples of the basic tidal frequency. With a seasonal thermocline included, the model is capable of reproducing the phenomenon of local generation of internal solitary waves by a tidal beam impinging on the seasonal thermocline.
A Topology Visualization Early Warning Distribution Algorithm for Large-Scale Network Security Incidents

Directory of Open Access Journals (Sweden)

Hui He

2013-01-01

Full Text Available It is of great significance to research the early warning system for large-scale network security incidents. It can improve the network system’s emergency response capabilities, alleviate the cyber attacks’ damage, and strengthen the system’s counterattack ability. A comprehensive early warning system is presented in this paper, which combines active measurement and anomaly detection. The key visualization algorithm and technology of the system are mainly discussed. The large-scale network system’s plane visualization is realized based on the divide and conquer thought. First, the topology of the large-scale network is divided into some small-scale networks by the MLkP/CR algorithm. Second, the sub graph plane visualization algorithm is applied to each small-scale network. Finally, the small-scale networks’ topologies are combined into a topology based on the automatic distribution algorithm of force analysis. As the algorithm transforms the large-scale network topology plane visualization problem into a series of small-scale network topology plane visualization and distribution problems, it has higher parallelism and is able to handle the display of ultra-large-scale network topology.
GPU-based large-scale visualization

KAUST Repository

Hadwiger, Markus

2013-11-19

Recent advances in image and volume acquisition as well as computational advances in simulation have led to an explosion of the amount of data that must be visualized and analyzed. Modern techniques combine the parallel processing power of GPUs with out-of-core methods and data streaming to enable the interactive visualization of giga- and terabytes of image and volume data. A major enabler for interactivity is making both the computational and the visualization effort proportional to the amount of data that is actually visible on screen, decoupling it from the full data size. This leads to powerful display-aware multi-resolution techniques that enable the visualization of data of almost arbitrary size. The course consists of two major parts: An introductory part that progresses from fundamentals to modern techniques, and a more advanced part that discusses details of ray-guided volume rendering, novel data structures for display-aware visualization and processing, and the remote visualization of large online data collections. You will learn how to develop efficient GPU data structures and large-scale visualizations, implement out-of-core strategies and concepts such as virtual texturing that have only been employed recently, as well as how to use modern multi-resolution representations. These approaches reduce the GPU memory requirements of extremely large data to a working set size that fits into current GPUs. You will learn how to perform ray-casting of volume data of almost arbitrary size and how to render and process gigapixel images using scalable, display-aware techniques. We will describe custom virtual texturing architectures as well as recent hardware developments in this area. We will also describe client/server systems for distributed visualization, on-demand data processing and streaming, and remote visualization. We will describe implementations using OpenGL as well as CUDA, exploiting parallelism on GPUs combined with additional asynchronous
Tradeoffs between quality-of-control and quality-of-service in large-scale nonlinear networked control systems

NARCIS (Netherlands)

Borgers, D. P.; Geiselhart, R.; Heemels, W. P. M. H.

2017-01-01

In this paper we study input-to-state stability (ISS) of large-scale networked control systems (NCSs) in which sensors, controllers and actuators are connected via multiple (local) communication networks which operate asynchronously and independently of each other. We model the large-scale NCS as an
NR-code: Nonlinear reconstruction code

Science.gov (United States)

Yu, Yu; Pen, Ue-Li; Zhu, Hong-Ming

2018-04-01

NR-code applies nonlinear reconstruction to the dark matter density field in redshift space and solves for the nonlinear mapping from the initial Lagrangian positions to the final redshift space positions; this reverses the large-scale bulk flows and improves the precision measurement of the baryon acoustic oscillations (BAO) scale.
NONLINEAR FORCE-FREE FIELD EXTRAPOLATION OF A CORONAL MAGNETIC FLUX ROPE SUPPORTING A LARGE-SCALE SOLAR FILAMENT FROM A PHOTOSPHERIC VECTOR MAGNETOGRAM

Energy Technology Data Exchange (ETDEWEB)

Jiang, Chaowei; Wu, S. T.; Hu, Qiang [Center for Space Plasma and Aeronomic Research, The University of Alabama in Huntsville, Huntsville, AL 35899 (United States); Feng, Xueshang, E-mail: cwjiang@spaceweather.ac.cn, E-mail: wus@uah.edu, E-mail: qh0001@uah.edu, E-mail: fengx@spaceweather.ac.cn [SIGMA Weather Group, State Key Laboratory for Space Weather, Center for Space Science and Applied Research, Chinese Academy of Sciences, Beijing 100190 (China)

2014-05-10

Solar filaments are commonly thought to be supported in magnetic dips, in particular, in those of magnetic flux ropes (FRs). In this Letter, based on the observed photospheric vector magnetogram, we implement a nonlinear force-free field (NLFFF) extrapolation of a coronal magnetic FR that supports a large-scale intermediate filament between an active region and a weak polarity region. This result is a first, in the sense that current NLFFF extrapolations including the presence of FRs are limited to relatively small-scale filaments that are close to sunspots and along main polarity inversion lines (PILs) with strong transverse field and magnetic shear, and the existence of an FR is usually predictable. In contrast, the present filament lies along the weak-field region (photospheric field strength ≲ 100 G), where the PIL is very fragmented due to small parasitic polarities on both sides of the PIL and the transverse field has a low signal-to-noise ratio. Thus, extrapolating a large-scale FR in such a case represents a far more difficult challenge. We demonstrate that our CESE-MHD-NLFFF code is sufficient for the challenge. The numerically reproduced magnetic dips of the extrapolated FR match observations of the filament and its barbs very well, which strongly supports the FR-dip model for filaments. The filament is stably sustained because the FR is weakly twisted and strongly confined by the overlying closed arcades.
Parallelizing ATLAS Reconstruction and Simulation: Issues and Optimization Solutions for Scaling on Multi- and Many-CPU Platforms

International Nuclear Information System (INIS)

Leggett, C; Jackson, K; Tatarkhanov, M; Yao, Y; Binet, S; Levinthal, D

2011-01-01

Thermal limitations have forced CPU manufacturers to shift from simply increasing clock speeds to improve processor performance, to producing chip designs with multi- and many-core architectures. Further the cores themselves can run multiple threads as a zero overhead context switch allowing low level resource sharing (Intel Hyperthreading). To maximize bandwidth and minimize memory latency, memory access has become non uniform (NUMA). As manufacturers add more cores to each chip, a careful understanding of the underlying architecture is required in order to fully utilize the available resources. We present AthenaMP and the Atlas event loop manager, the driver of the simulation and reconstruction engines, which have been rewritten to make use of multiple cores, by means of event based parallelism, and final stage I/O synchronization. However, initial studies on 8 andl6 core Intel architectures have shown marked non-linearities as parallel process counts increase, with as much as 30% reductions in event throughput in some scenarios. Since the Intel Nehalem architecture (both Gainestown and Westmere) will be the most common choice for the next round of hardware procurements, an understanding of these scaling issues is essential. Using hardware based event counters and Intel's Performance Tuning Utility, we have studied the performance bottlenecks at the hardware level, and discovered optimization schemes to maximize processor throughput. We have also produced optimization mechanisms, common to all large experiments, that address the extreme nature of today's HEP code, which due to it's size, places huge burdens on the memory infrastructure of today's processors.
Nonlinear seismic analysis of a large sodium pump

International Nuclear Information System (INIS)

Huang, S.N.

1985-01-01

The bearings and seismic bumpers used in a large sodium pump of a typical breeder reactor plant may need to be characterized by nonlinear springs and gaps. Then, nonlinear seismic analysis utilizing the time-history method is an effective way to predict the pump behaviors during seismic events, especially at those bearing and seismic bumper areas. In this study, synthesized time histories were developed based on specified seismic response spectra. A nonlinear seismic analysis was then conducted and results were compared with those obtained by linear seismic analysis using the response spectrum method. In contrast to some previous nonlinear analysis trends, the bearing impact forces predicted by nonlinear analysis were higher than those obtained by the response spectrum method. This might be due to the larger gaps and stiffer bearing supports used in this specific pump. However, at locations distant from the impact source, the nonlinear seismic analysis has predicted slightly less responses than those obtained by linear seismic analysis. The seismically induced bearing impact forces were used to study the friction induced thermal stresses on the hydrostatic bearing and to predict the coastdown time of the pump. Results and discussions are presented
Nonlinear seismic analysis of a large sodium pump

International Nuclear Information System (INIS)

Huang, S.N.

1985-01-01

The bearings and seismic bumpers used in a large sodium pump of a typical breeder reactor plant may need to be characterized by nonlinear springs and gaps. Then, nonlinear seismic analysis utilizing the time-history method is an effective way to predict the pump behaviors during seismic events - especially at those bearing and seismic bumper areas. In this study, synthesized time histories were developed based on specified seismic response spectra. A nonlinear seismic analysis was then conducted and results were compared with those obtained by linear seismic analysis using the response spectrum method. In contrast to some previous nonlinear analysis trends, the bearing impact forces predicted by nonlinear analysis were higher than those obtained by the response spectrum method. This might be due to the larger gaps and stiffer bearing supports used in this specific pump. However, at locations distant from the impact source, the nonlinear seismic analysis has predicted slightly less responses than those obtained by linear seismic analysis. The seismically induced bearing impact forces were used to study the friction induced thermal stresses on the hydrostatic bearing and to predict the coastdown time of the pump. Results and discussions are presented
Impact of large-scale tides on cosmological distortions via redshift-space power spectrum

Science.gov (United States)

Akitsu, Kazuyuki; Takada, Masahiro

2018-03-01

Although large-scale perturbations beyond a finite-volume survey region are not direct observables, these affect measurements of clustering statistics of small-scale (subsurvey) perturbations in large-scale structure, compared with the ensemble average, via the mode-coupling effect. In this paper we show that a large-scale tide induced by scalar perturbations causes apparent anisotropic distortions in the redshift-space power spectrum of galaxies in a way depending on an alignment between the tide, wave vector of small-scale modes and line-of-sight direction. Using the perturbation theory of structure formation, we derive a response function of the redshift-space power spectrum to large-scale tide. We then investigate the impact of large-scale tide on estimation of cosmological distances and the redshift-space distortion parameter via the measured redshift-space power spectrum for a hypothetical large-volume survey, based on the Fisher matrix formalism. To do this, we treat the large-scale tide as a signal, rather than an additional source of the statistical errors, and show that a degradation in the parameter is restored if we can employ the prior on the rms amplitude expected for the standard cold dark matter (CDM) model. We also discuss whether the large-scale tide can be constrained at an accuracy better than the CDM prediction, if the effects up to a larger wave number in the nonlinear regime can be included.
Dark energy and modified gravity in the Effective Field Theory of Large-Scale Structure

Science.gov (United States)

Cusin, Giulia; Lewandowski, Matthew; Vernizzi, Filippo

2018-04-01

We develop an approach to compute observables beyond the linear regime of dark matter perturbations for general dark energy and modified gravity models. We do so by combining the Effective Field Theory of Dark Energy and Effective Field Theory of Large-Scale Structure approaches. In particular, we parametrize the linear and nonlinear effects of dark energy on dark matter clustering in terms of the Lagrangian terms introduced in a companion paper [1], focusing on Horndeski theories and assuming the quasi-static approximation. The Euler equation for dark matter is sourced, via the Newtonian potential, by new nonlinear vertices due to modified gravity and, as in the pure dark matter case, by the effects of short-scale physics in the form of the divergence of an effective stress tensor. The effective fluid introduces a counterterm in the solution to the matter continuity and Euler equations, which allows a controlled expansion of clustering statistics on mildly nonlinear scales. We use this setup to compute the one-loop dark-matter power spectrum.
Large-time asymptotic behaviour of solutions of non-linear Sobolev-type equations

International Nuclear Information System (INIS)

Kaikina, Elena I; Naumkin, Pavel I; Shishmarev, Il'ya A

2009-01-01

The large-time asymptotic behaviour of solutions of the Cauchy problem is investigated for a non-linear Sobolev-type equation with dissipation. For small initial data the approach taken is based on a detailed analysis of the Green's function of the linear problem and the use of the contraction mapping method. The case of large initial data is also closely considered. In the supercritical case the asymptotic formulae are quasi-linear. The asymptotic behaviour of solutions of a non-linear Sobolev-type equation with a critical non-linearity of the non-convective kind differs by a logarithmic correction term from the behaviour of solutions of the corresponding linear equation. For a critical convective non-linearity, as well as for a subcritical non-convective non-linearity it is proved that the leading term of the asymptotic expression for large times is a self-similar solution. For Sobolev equations with convective non-linearity the asymptotic behaviour of solutions in the subcritical case is the product of a rarefaction wave and a shock wave. Bibliography: 84 titles.
Finite-Time Stability of Large-Scale Systems with Interval Time-Varying Delay in Interconnection

Directory of Open Access Journals (Sweden)

T. La-inchua

2017-01-01

Full Text Available We investigate finite-time stability of a class of nonlinear large-scale systems with interval time-varying delays in interconnection. Time-delay functions are continuous but not necessarily differentiable. Based on Lyapunov stability theory and new integral bounding technique, finite-time stability of large-scale systems with interval time-varying delays in interconnection is derived. The finite-time stability criteria are delays-dependent and are given in terms of linear matrix inequalities which can be solved by various available algorithms. Numerical examples are given to illustrate effectiveness of the proposed method.
Scalable Nonlinear Solvers for Fully Implicit Coupled Nuclear Fuel Modeling. Final Report

International Nuclear Information System (INIS)

Cai, Xiao-Chuan; Yang, Chao; Pernice, Michael

2014-01-01

The focus of the project is on the development and customization of some highly scalable domain decomposition based preconditioning techniques for the numerical solution of nonlinear, coupled systems of partial differential equations (PDEs) arising from nuclear fuel simulations. These high-order PDEs represent multiple interacting physical fields (for example, heat conduction, oxygen transport, solid deformation), each is modeled by a certain type of Cahn-Hilliard and/or Allen-Cahn equations. Most existing approaches involve a careful splitting of the fields and the use of field-by-field iterations to obtain a solution of the coupled problem. Such approaches have many advantages such as ease of implementation since only single field solvers are needed, but also exhibit disadvantages. For example, certain nonlinear interactions between the fields may not be fully captured, and for unsteady problems, stable time integration schemes are difficult to design. In addition, when implemented on large scale parallel computers, the sequential nature of the field-by-field iterations substantially reduces the parallel efficiency. To overcome the disadvantages, fully coupled approaches have been investigated in order to obtain full physics simulations.
Large scale structure from viscous dark matter

CERN Document Server

Blas, Diego; Garny, Mathias; Tetradis, Nikolaos; Wiedemann, Urs Achim

2015-01-01

Cosmological perturbations of sufficiently long wavelength admit a fluid dynamic description. We consider modes with wavevectors below a scale $k_m$ for which the dynamics is only mildly non-linear. The leading effect of modes above that scale can be accounted for by effective non-equilibrium viscosity and pressure terms. For mildly non-linear scales, these mainly arise from momentum transport within the ideal and cold but inhomogeneous fluid, while momentum transport due to more microscopic degrees of freedom is suppressed. As a consequence, concrete expressions with no free parameters, except the matching scale $k_m$, can be derived from matching evolution equations to standard cosmological perturbation theory. Two-loop calculations of the matter power spectrum in the viscous theory lead to excellent agreement with $N$-body simulations up to scales $k=0.2 \\, h/$Mpc. The convergence properties in the ultraviolet are better than for standard perturbation theory and the results are robust with respect to varia...
Mirror dark matter and large scale structure

International Nuclear Information System (INIS)

Ignatiev, A.Yu.; Volkas, R.R.

2003-01-01

Mirror matter is a dark matter candidate. In this paper, we reexamine the linear regime of density perturbation growth in a universe containing mirror dark matter. Taking adiabatic scale-invariant perturbations as the input, we confirm that the resulting processed power spectrum is richer than for the more familiar cases of cold, warm and hot dark matter. The new features include a maximum at a certain scale λ max , collisional damping below a smaller characteristic scale λ S ' , with oscillatory perturbations between the two. These scales are functions of the fundamental parameters of the theory. In particular, they decrease for decreasing x, the ratio of the mirror plasma temperature to that of the ordinary. For x∼0.2, the scale λ max becomes galactic. Mirror dark matter therefore leads to bottom-up large scale structure formation, similar to conventional cold dark matter, for x(less-or-similar sign)0.2. Indeed, the smaller the value of x, the closer mirror dark matter resembles standard cold dark matter during the linear regime. The differences pertain to scales smaller than λ S ' in the linear regime, and generally in the nonlinear regime because mirror dark matter is chemically complex and to some extent dissipative. Lyman-α forest data and the early reionization epoch established by WMAP may hold the key to distinguishing mirror dark matter from WIMP-style cold dark matter
Volterra representation enables modeling of complex synaptic nonlinear dynamics in large-scale simulations.

Science.gov (United States)

Hu, Eric Y; Bouteiller, Jean-Marie C; Song, Dong; Baudry, Michel; Berger, Theodore W

2015-01-01

Chemical synapses are comprised of a wide collection of intricate signaling pathways involving complex dynamics. These mechanisms are often reduced to simple spikes or exponential representations in order to enable computer simulations at higher spatial levels of complexity. However, these representations cannot capture important nonlinear dynamics found in synaptic transmission. Here, we propose an input-output (IO) synapse model capable of generating complex nonlinear dynamics while maintaining low computational complexity. This IO synapse model is an extension of a detailed mechanistic glutamatergic synapse model capable of capturing the input-output relationships of the mechanistic model using the Volterra functional power series. We demonstrate that the IO synapse model is able to successfully track the nonlinear dynamics of the synapse up to the third order with high accuracy. We also evaluate the accuracy of the IO synapse model at different input frequencies and compared its performance with that of kinetic models in compartmental neuron models. Our results demonstrate that the IO synapse model is capable of efficiently replicating complex nonlinear dynamics that were represented in the original mechanistic model and provide a method to replicate complex and diverse synaptic transmission within neuron network simulations.
Unstable ‘black branes’ from scaled membranes at large D

Energy Technology Data Exchange (ETDEWEB)

Dandekar, Yogesh; Mazumdar, Subhajit; Minwalla, Shiraz; Saha, Arunabha [Department of Theoretical Physics, Tata Institute of Fundamental Research,Homi Bhabha Road, Mumbai, 400005 (India)

2016-12-28

It has recently been demonstrated that the dynamics of black holes at large D can be recast as a set of non gravitational membrane equations. These membrane equations admit a simple static solution with shape S{sup D−p−2}×R{sup p,1}. In this note we study the equations for small fluctuations about this solution in a limit in which amplitude and length scale of the fluctuations are simultaneously scaled to zero as D is taken to infinity. We demonstrate that the resultant nonlinear equations, which capture the Gregory-Laflamme instability and its end point, exactly agree with the effective dynamical ‘black brane’ equations of Emparan Suzuki and Tanabe. Our results thus identify the ‘black brane’ equations as a special limit of the membrane equations and so unify these approaches to large D black hole dynamics.
Large third-order nonlinearity of nonpolar A-plane GaN film at 800 nm determined by Z-scan technology

Science.gov (United States)

Zhang, Feng; Han, Xiangyun

2014-09-01

We report an investigation on the optical third-order nonlinear property of the nonpolar A-plane GaN film. The film sample with a thickness of ~2 μm was grown on an r-plane sapphire substrate by metal-organic chemical vapor deposition system. By performing the Z-scan method combined with a mode-locked femtosecond Ti:sapphire laser (800 nm, 50 fs), the optical nonlinearity of the nonpolar A-plane GaN film was measured with the electric vector E of the laser beam being polarized parallel (//) and perpendicular (⊥) to the c axis of the film. The results show that both the third-order nonlinear absorption coefficient β and the nonlinear refractive index n2 of the sample film possess negative and large values, i.e. β// = -135 ± 29 cm/GW, n2// = -(4.0 ± 0.3) × 10-3 cm2/GW and β⊥ = -234 ± 29 cm/GW, n2⊥ = -(4.9 ± 0.4) × 10-3 cm2/GW, which are much larger than those of conventional C-plane GaN film, GaN bulk, and even the other oxide semiconductors.

Sub-grid-scale effects on short-wave instability in magnetized hall-MHD plasma

International Nuclear Information System (INIS)

Miura, H.; Nakajima, N.

2010-11-01

Aiming to clarify effects of short-wave modes on nonlinear evolution/saturation of the ballooning instability in the Large Helical Device, fully three-dimensional simulations of the single-fluid MHD and the Hall MHD equations are carried out. A moderate parallel heat conductivity plays an important role both in the two kinds of simulations. In the single-fluid MHD simulations, the parallel heat conduction effectively suppresses short-wave ballooning modes but it turns out that the suppression is insufficient in comparison to an experimental result. In the Hall MHD simulations, the parallel heat conduction triggers a rapid growth of the parallel flow and enhance nonlinear couplings. A comparison between single-fluid and the Hall MHD simulations reveals that the Hall MHD model does not necessarily improve the saturated pressure profile, and that we may need a further extension of the model. We also find by a comparison between two Hall MHD simulations with different numerical resolutions that sub-grid-scales of the Hall term should be modeled to mimic an inverse energy transfer in the wave number space. (author)
Using Python to Construct a Scalable Parallel Nonlinear Wave Solver

KAUST Repository

Mandli, Kyle

2011-01-01

Computational scientists seek to provide efficient, easy-to-use tools and frameworks that enable application scientists within a specific discipline to build and/or apply numerical models with up-to-date computing technologies that can be executed on all available computing systems. Although many tools could be useful for groups beyond a specific application, it is often difficult and time consuming to combine existing software, or to adapt it for a more general purpose. Python enables a high-level approach where a general framework can be supplemented with tools written for different fields and in different languages. This is particularly important when a large number of tools are necessary, as is the case for high performance scientific codes. This motivated our development of PetClaw, a scalable distributed-memory solver for time-dependent nonlinear wave propagation, as a case-study for how Python can be used as a highlevel framework leveraging a multitude of codes, efficient both in the reuse of code and programmer productivity. We present scaling results for computations on up to four racks of Shaheen, an IBM BlueGene/P supercomputer at King Abdullah University of Science and Technology. One particularly important issue that PetClaw has faced is the overhead associated with dynamic loading leading to catastrophic scaling. We use the walla library to solve the issue which does so by supplanting high-cost filesystem calls with MPI operations at a low enough level that developers may avoid any changes to their codes.
Review of Dynamic Modeling and Simulation of Large Scale Belt Conveyor System

Science.gov (United States)

He, Qing; Li, Hong

Belt conveyor is one of the most important devices to transport bulk-solid material for long distance. Dynamic analysis is the key to decide whether the design is rational in technique, safe and reliable in running, feasible in economy. It is very important to study dynamic properties, improve efficiency and productivity, guarantee conveyor safe, reliable and stable running. The dynamic researches and applications of large scale belt conveyor are discussed. The main research topics, the state-of-the-art of dynamic researches on belt conveyor are analyzed. The main future works focus on dynamic analysis, modeling and simulation of main components and whole system, nonlinear modeling, simulation and vibration analysis of large scale conveyor system.
Some Nonlinear Dynamic Inequalities on Time Scales

Indian Academy of Sciences (India)

The aim of this paper is to investigate some nonlinear dynamic inequalities on time scales, which provide explicit bounds on unknown functions. The inequalities given here unify and extend some inequalities in (B G Pachpatte, On some new inequalities related to a certain inequality arising in the theory of differential ...
Large Scale Parallel DNA Detection by Two-Dimensional Solid-State Multipore Systems.

Science.gov (United States)

Athreya, Nagendra Bala Murali; Sarathy, Aditya; Leburton, Jean-Pierre

2018-04-23

We describe a scalable device design of a dense array of multiple nanopores made from nanoscale semiconductor materials to detect and identify translocations of many biomolecules in a massively parallel detection scheme. We use molecular dynamics coupled to nanoscale device simulations to illustrate the ability of this device setup to uniquely identify DNA parallel translocations. We show that the transverse sheet currents along membranes are immune to the crosstalk effects arising from simultaneous translocations of biomolecules through multiple pores, due to their ability to sense only the local potential changes. We also show that electronic sensing across the nanopore membrane offers a higher detection resolution compared to ionic current blocking technique in a multipore setup, irrespective of the irregularities that occur while fabricating the nanopores in a two-dimensional membrane.
On the renormalization of the effective field theory of large scale structures

International Nuclear Information System (INIS)

Pajer, Enrico; Zaldarriaga, Matias

2013-01-01

Standard perturbation theory (SPT) for large-scale matter inhomogeneities is unsatisfactory for at least three reasons: there is no clear expansion parameter since the density contrast is not small on all scales; it does not fully account for deviations at large scales from a perfect pressureless fluid induced by short-scale non-linearities; for generic initial conditions, loop corrections are UV-divergent, making predictions cutoff dependent and hence unphysical. The Effective Field Theory of Large Scale Structures successfully addresses all three issues. Here we focus on the third one and show explicitly that the terms induced by integrating out short scales, neglected in SPT, have exactly the right scale dependence to cancel all UV-divergences at one loop, and this should hold at all loops. A particularly clear example is an Einstein deSitter universe with no-scale initial conditions P in ∼ k n . After renormalizing the theory, we use self-similarity to derive a very simple result for the final power spectrum for any n, excluding two-loop corrections and higher. We show how the relative importance of different corrections depends on n. For n ∼ −1.5, relevant for our universe, pressure and dissipative corrections are more important than the two-loop corrections
On the renormalization of the effective field theory of large scale structures

Energy Technology Data Exchange (ETDEWEB)

Pajer, Enrico [Department of Physics, Princeton University, Princeton, NJ 08544 (United States); Zaldarriaga, Matias, E-mail: enrico.pajer@gmail.com, E-mail: matiasz@ias.edu [Institute for Advanced Study, Princeton, NJ 08544 (United States)

2013-08-01

Standard perturbation theory (SPT) for large-scale matter inhomogeneities is unsatisfactory for at least three reasons: there is no clear expansion parameter since the density contrast is not small on all scales; it does not fully account for deviations at large scales from a perfect pressureless fluid induced by short-scale non-linearities; for generic initial conditions, loop corrections are UV-divergent, making predictions cutoff dependent and hence unphysical. The Effective Field Theory of Large Scale Structures successfully addresses all three issues. Here we focus on the third one and show explicitly that the terms induced by integrating out short scales, neglected in SPT, have exactly the right scale dependence to cancel all UV-divergences at one loop, and this should hold at all loops. A particularly clear example is an Einstein deSitter universe with no-scale initial conditions P{sub in} ∼ k{sup n}. After renormalizing the theory, we use self-similarity to derive a very simple result for the final power spectrum for any n, excluding two-loop corrections and higher. We show how the relative importance of different corrections depends on n. For n ∼ −1.5, relevant for our universe, pressure and dissipative corrections are more important than the two-loop corrections.
Time history nonlinear earthquake response analysis considering materials and geometrical nonlinearity

International Nuclear Information System (INIS)

Kobayashi, T.; Yoshikawa, K.; Takaoka, E.; Nakazawa, M.; Shikama, Y.

2002-01-01

A time history nonlinear earthquake response analysis method was proposed and applied to earthquake response prediction analysis for a Large Scale Seismic Test (LSST) Program in Hualien, Taiwan, in which a 1/4 scale model of a nuclear reactor containment structure was constructed on sandy gravel layer. In the analysis both of strain-dependent material nonlinearity, and geometrical nonlinearity by base mat uplift, were considered. The 'Lattice Model' for the soil-structure interaction model was employed. An earthquake record on soil surface at the site was used as control motion, and deconvoluted to the input motion of the analysis model at GL-52 m with 300 Gal of maximum acceleration. The following two analyses were considered: (A) time history nonlinear, (B) equivalent linear, and the advantage of time history nonlinear earthquake response analysis method is discussed
Detector correction in large container inspection systems

CERN Document Server

Kang Ke Jun; Chen Zhi Qiang

2002-01-01

In large container inspection systems, the image is constructed by parallel scanning with a one-dimensional detector array with a linac used as the X-ray source. The linear nonuniformity and nonlinearity of multiple detectors and the nonuniform intensity distribution of the X-ray sector beam result in horizontal striations in the scan image. This greatly impairs the image quality, so the image needs to be corrected. The correction parameters are determined experimentally by scaling the detector responses at multiple points with logarithm interpolation of the results. The horizontal striations are eliminated by modifying the original image data with the correction parameters. This method has proven to be effective and applicable in large container inspection systems
What can asymptotic expansions tell us about large-scale quasi-geostrophic anticyclonic vortices?

Directory of Open Access Journals (Sweden)

A. Stegner

1995-01-01

Full Text Available The problem of the large-scale quasi-geostrophic anticyclonic vortices is studied in the framework of the baratropic rotating shallow- water equations on the β-plane. A systematic approach based on the multiplescale asymptotic expansions is used leading to a hierarchy of governing equations for the large-scale vortices depending on their characteristic size, velocity and a free surface elevation. Among them are the Charney-Obukhov equation, the intermediate geostrophic model equation, the frontal dynamics equation and some new nonlinear quasi-geostrophic equation. We are looking for steady-drifting axisymmetric anticyclonic solutions and find them in a consistent way only in this last equation. These solutions are soliton-like in the sense that the effects of weak non-linearity and dispersion balance each other. The same regimes on the paraboloidal β-plane are studied, all giving a negative result in what concerns the axisymmetric steady solutions, except for a strong elevation case where any circular profile is found to be steadily propagating within the accuracy of the approximation.
Developing a Massively Parallel Forward Projection Radiography Model for Large-Scale Industrial Applications

Energy Technology Data Exchange (ETDEWEB)

Bauerle, Matthew [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)

2014-08-01

This project utilizes Graphics Processing Units (GPUs) to compute radiograph simulations for arbitrary objects. The generation of radiographs, also known as the forward projection imaging model, is computationally intensive and not widely utilized. The goal of this research is to develop a massively parallel algorithm that can compute forward projections for objects with a trillion voxels (3D pixels). To achieve this end, the data are divided into blocks that can each t into GPU memory. The forward projected image is also divided into segments to allow for future parallelization and to avoid needless computations.
Large-scale parallel uncontracted multireference-averaged quadratic coupled cluster: the ground state of the chromium dimer revisited.

Science.gov (United States)

Müller, Thomas

2009-11-12

The accurate prediction of the potential energy function of the X1Sigmag+ state of Cr2 is a remarkable challenge; large differential electron correlation effects, significant scalar relativistic contributions, the need for large flexible basis sets containing g functions, the importance of semicore valence electron correlation, and its multireference nature pose considerable obstacles. So far, the only reasonable successful approaches were based on multireference perturbation theory (MRPT). Recently, there was some controversy in the literature about the role of error compensation and systematic defects of various MRPT implementations that cannot be easily overcome. A detailed basis set study of the potential energy function is presented, adopting a variational method. The method of choice for this electron-rich target with up to 28 correlated electrons is fully uncontracted multireference-averaged quadratic coupled cluster (MR-AQCC), which shares the flexibility of the multireference configuration interaction (MRCI) approach and is, in addition, approximately size-extensive (0.02 eV in error as compared to the MRCI value of 1.37 eV for two noninteracting chromium atoms). The best estimate for De arrives at 1.48 eV and agrees well with the experimental data of 1.47 +/- 0.056 eV. At the estimated CBS limit, the equilibrium bond distance (1.685 A) and vibrational frequency (459 cm-1) are in agreement with experiment (1.679 A, 481 cm-1). Large basis sets and reference configuration spaces invariably result in huge wave function expansions (here, up to 2.8 billion configuration state functions), and efficient parallel implementations of the method are crucial. Hence, relevant details on implementation and general performance of the parallel program code are discussed as well.
Nonlinear triple-point problems on time scales

Directory of Open Access Journals (Sweden)

Douglas R. Anderson

2004-04-01

Full Text Available We establish the existence of multiple positive solutions to the nonlinear second-order triple-point boundary-value problem on time scales, $$displaylines{ u^{Delta abla}(t+h(tf(t,u(t=0, cr u(a=alpha u(b+delta u^Delta(a,quad eta u(c+gamma u^Delta(c=0 }$$ for $tin[a,c]subsetmathbb{T}$, where $mathbb{T}$ is a time scale, $eta, gamma, deltage 0$ with $Beta+gamma>0$, $0
Non-parametric co-clustering of large scale sparse bipartite networks on the GPU

DEFF Research Database (Denmark)

Hansen, Toke Jansen; Mørup, Morten; Hansen, Lars Kai

2011-01-01

of row and column clusters from a hypothesis space of an infinite number of clusters. To reach large scale applications of co-clustering we exploit that parameter inference for co-clustering is well suited for parallel computing. We develop a generic GPU framework for efficient inference on large scale...... sparse bipartite networks and achieve a speedup of two orders of magnitude compared to estimation based on conventional CPUs. In terms of scalability we find for networks with more than 100 million links that reliable inference can be achieved in less than an hour on a single GPU. To efficiently manage...
Node-based finite element method for large-scale adaptive fluid analysis in parallel environments

International Nuclear Information System (INIS)

Toshimitsu, Fujisawa; Genki, Yagawa

2003-01-01

In this paper, a FEM-based (finite element method) mesh free method with a probabilistic node generation technique is presented. In the proposed method, all computational procedures, from the mesh generation to the solution of a system of equations, can be performed fluently in parallel in terms of nodes. Local finite element mesh is generated robustly around each node, even for harsh boundary shapes such as cracks. The algorithm and the data structure of finite element calculation are based on nodes, and parallel computing is realized by dividing a system of equations by the row of the global coefficient matrix. In addition, the node-based finite element method is accompanied by a probabilistic node generation technique, which generates good-natured points for nodes of finite element mesh. Furthermore, the probabilistic node generation technique can be performed in parallel environments. As a numerical example of the proposed method, we perform a compressible flow simulation containing strong shocks. Numerical simulations with frequent mesh refinement, which are required for such kind of analysis, can effectively be performed on parallel processors by using the proposed method. (authors)
Node-based finite element method for large-scale adaptive fluid analysis in parallel environments

Energy Technology Data Exchange (ETDEWEB)

Toshimitsu, Fujisawa [Tokyo Univ., Collaborative Research Center of Frontier Simulation Software for Industrial Science, Institute of Industrial Science (Japan); Genki, Yagawa [Tokyo Univ., Department of Quantum Engineering and Systems Science (Japan)

2003-07-01

In this paper, a FEM-based (finite element method) mesh free method with a probabilistic node generation technique is presented. In the proposed method, all computational procedures, from the mesh generation to the solution of a system of equations, can be performed fluently in parallel in terms of nodes. Local finite element mesh is generated robustly around each node, even for harsh boundary shapes such as cracks. The algorithm and the data structure of finite element calculation are based on nodes, and parallel computing is realized by dividing a system of equations by the row of the global coefficient matrix. In addition, the node-based finite element method is accompanied by a probabilistic node generation technique, which generates good-natured points for nodes of finite element mesh. Furthermore, the probabilistic node generation technique can be performed in parallel environments. As a numerical example of the proposed method, we perform a compressible flow simulation containing strong shocks. Numerical simulations with frequent mesh refinement, which are required for such kind of analysis, can effectively be performed on parallel processors by using the proposed method. (authors)
Scaling versus asymptotic scaling in the non-linear σ-model in 2D. Continuum version

International Nuclear Information System (INIS)

Flyvbjerg, H.

1990-01-01

The two-point function of the O(N)-symmetric non-linear σ-model in two dimensions is large-N expanded and renormalized, neglecting terms of O(1/N 2 ). At finite cut-off, universal, analytical expressions relate the magnetic susceptibility and the dressed mass to the bare coupling. Removing the cut-off, a similar relation gives the renormalized coupling as a function of the mass gap. In the weak-coupling limit these relations reproduce the results of renormalization group improved weak-coupling perturbation theory to two-loop order. The constant left unknown, when the renormalization group is integrated, is determined here. The approach to asymptotic scaling is studied for various values of N. (orig.)
Neural Parallel Engine: A toolbox for massively parallel neural signal processing.

Science.gov (United States)

Tam, Wing-Kin; Yang, Zhi

2018-05-01

Large-scale neural recordings provide detailed information on neuronal activities and can help elicit the underlying neural mechanisms of the brain. However, the computational burden is also formidable when we try to process the huge data stream generated by such recordings. In this study, we report the development of Neural Parallel Engine (NPE), a toolbox for massively parallel neural signal processing on graphical processing units (GPUs). It offers a selection of the most commonly used routines in neural signal processing such as spike detection and spike sorting, including advanced algorithms such as exponential-component-power-component (EC-PC) spike detection and binary pursuit spike sorting. We also propose a new method for detecting peaks in parallel through a parallel compact operation. Our toolbox is able to offer a 5× to 110× speedup compared with its CPU counterparts depending on the algorithms. A user-friendly MATLAB interface is provided to allow easy integration of the toolbox into existing workflows. Previous efforts on GPU neural signal processing only focus on a few rudimentary algorithms, are not well-optimized and often do not provide a user-friendly programming interface to fit into existing workflows. There is a strong need for a comprehensive toolbox for massively parallel neural signal processing. A new toolbox for massively parallel neural signal processing has been created. It can offer significant speedup in processing signals from large-scale recordings up to thousands of channels. Copyright © 2018 Elsevier B.V. All rights reserved.
OpenMP parallelization of a gridded SWAT (SWATG)

Science.gov (United States)

Zhang, Ying; Hou, Jinliang; Cao, Yongpan; Gu, Juan; Huang, Chunlin

2017-12-01

Large-scale, long-term and high spatial resolution simulation is a common issue in environmental modeling. A Gridded Hydrologic Response Unit (HRU)-based Soil and Water Assessment Tool (SWATG) that integrates grid modeling scheme with different spatial representations also presents such problems. The time-consuming problem affects applications of very high resolution large-scale watershed modeling. The OpenMP (Open Multi-Processing) parallel application interface is integrated with SWATG (called SWATGP) to accelerate grid modeling based on the HRU level. Such parallel implementation takes better advantage of the computational power of a shared memory computer system. We conducted two experiments at multiple temporal and spatial scales of hydrological modeling using SWATG and SWATGP on a high-end server. At 500-m resolution, SWATGP was found to be up to nine times faster than SWATG in modeling over a roughly 2000 km2 watershed with 1 CPU and a 15 thread configuration. The study results demonstrate that parallel models save considerable time relative to traditional sequential simulation runs. Parallel computations of environmental models are beneficial for model applications, especially at large spatial and temporal scales and at high resolutions. The proposed SWATGP model is thus a promising tool for large-scale and high-resolution water resources research and management in addition to offering data fusion and model coupling ability.
The three-point function as a probe of models for large-scale structure

International Nuclear Information System (INIS)

Frieman, J.A.; Gaztanaga, E.

1993-01-01

The authors analyze the consequences of models of structure formation for higher-order (n-point) galaxy correlation functions in the mildly non-linear regime. Several variations of the standard Ω = 1 cold dark matter model with scale-invariant primordial perturbations have recently been introduced to obtain more power on large scales, R p ∼20 h -1 Mpc, e.g., low-matter-density (non-zero cosmological constant) models, open-quote tilted close-quote primordial spectra, and scenarios with a mixture of cold and hot dark matter. They also include models with an effective scale-dependent bias, such as the cooperative galaxy formation scenario of Bower, et al. The authors show that higher-order (n-point) galaxy correlation functions can provide a useful test of such models and can discriminate between models with true large-scale power in the density field and those where the galaxy power arises from scale-dependent bias: a bias with rapid scale-dependence leads to a dramatic decrease of the hierarchical amplitudes Q J at large scales, r approx-gt R p . Current observational constraints on the three-point amplitudes Q 3 and S 3 can place limits on the bias parameter(s) and appear to disfavor, but not yet rule out, the hypothesis that scale-dependent bias is responsible for the extra power observed on large scales

Large optical second-order nonlinearity of poled WO3-TeO2 glass.

Science.gov (United States)

Tanaka, K; Narazaki, A; Hirao, K

2000-02-15

Second-harmonic generation, one of the second-order nonlinear optical properties of thermally and electrically poled WO>(3)-TeO>(2) glasses, has been examined. We poled glass samples with two thicknesses (0.60 and 0.86 mm) at various temperatures to explore the effects of external electric field strength and poling temperature on second-order nonlinearity. The dependence of second-harmonic intensity on the poling temperature is maximum at a specific poling temperature. A second-order nonlinear susceptibility of 2.1 pm/V was attained for the 0.60-mm-thick glass poled at 250 degrees C. This value is fairly large compared with those for poled silica and tellurite glasses reported thus far. We speculate that the large third-order nonlinear susceptibility of WO>(3)- TeO>(2) glasses gives rise to the large second-order nonlinearity by means of a X((2)) = 3X((3)) E(dc) process.
Evidence and effects of a wave-driven nonlinear current in the equatorial electrojet

Directory of Open Access Journals (Sweden)

M. Oppenheim

1997-07-01

Full Text Available Ionospheric two-stream waves and gradient-drift waves nonlinearly drive a large-scale (D.C. current in the E-region ionosphere. This current flows parallel to, and with a comparable magnitude to, the fundamental Pedersen current. Evidence for the existence and magnitude of wave-driven currents derives from a theoretical understanding of E-region waves, supported by a series of nonlinear 2D simulations of two-stream waves and by data collected by rocket instruments in the equatorial electrojet. Wave-driven currents will modify the large-scale dynamics of the equatorial electrojet during highly active periods. A simple model shows how a wave-driven current appreciably reduces the horizontally flowing electron current of the electrojet. This reduction may account for the observation that type-I radar echoes almost always have a Doppler velocity close to the acoustic speed, and also for the rocket observation that electrojet regions containing gradient-drift waves do not appear also to contain horizontally propagating two-stream waves. Additionally, a simple model of a gradient-drift instability shows that wave-driven currents can cause nonsinusoidal electric fields similar to those measured in situ.
Relativistic effects on large amplitude nonlinear Langmuir waves in a two-fluid plasma

International Nuclear Information System (INIS)

Nejoh, Yasunori

1994-07-01

Large amplitude relativistic nonlinear Langmuir waves are analyzed by the pseudo-potential method. The existence conditions for nonlinear Langmuir waves are confirmed by considering relativistic high-speed electrons in a two-fluid plasma. The significant feature of this investigation is that the propagation of nonlinear Langmuir waves depends on the ratio of the electron streaming velocity to the velocity of light, the normalized potential and the ion mass to electron mass ratio. The constant energy is determined by the specific range of the relativistic effect. In the non-relativistic limit, large amplitude relativistic Langmuir waves do not exist. The present investigation predicts new findings of large amplitude nonlinear Langmuir waves in space plasma phenomena in which relativistic electrons are important. (author)
Large-scale micromagnetics simulations with dipolar interaction using all-to-all communications

Directory of Open Access Journals (Sweden)

Hiroshi Tsukahara

2016-05-01

Full Text Available We implement on our micromagnetics simulator low-complexity parallel fast-Fourier-transform algorithms, which reduces the frequency of all-to-all communications from six to two times. Almost all the computation time of micromagnetics simulation is taken up by the calculation of the magnetostatic field which can be calculated using the fast Fourier transform method. The results show that the simulation time is decreased with good scalability, even if the micromagentics simulation is performed using 8192 physical cores. This high parallelization effect enables large-scale micromagentics simulation using over one billion to be performed. Because massively parallel computing is needed to simulate the magnetization dynamics of real permanent magnets composed of many micron-sized grains, it is expected that our simulator reveals how magnetization dynamics influences the coercivity of the permanent magnet.
MOEA based design of decentralized controllers for LFC of interconnected power systems with nonlinearities, AC-DC parallel tie-lines and SMES units

International Nuclear Information System (INIS)

Ganapathy, S.; Velusami, S.

2010-01-01

A new design of Multi-Objective Evolutionary Algorithm based decentralized controllers for load-frequency control of interconnected power systems with Governor Dead Band and Generation Rate Constraint nonlinearities, AC-DC parallel tie-lines and Superconducting Magnetic Energy Storage (SMES) units, is proposed in this paper. The HVDC link is used as system interconnection in parallel with AC tie-line to effectively damp the frequency oscillations of AC system while the SMES unit provides bulk energy storage and release, thereby achieving combined benefits. The proposed controller satisfies two main objectives, namely, minimum Integral Squared Error of the system output and maximum closed-loop stability of the system. Simulation studies are conducted on a two area interconnected power system with nonlinearities, AC-DC tie-lines and SMES units. Results indicate that the proposed controller improves the transient responses and guarantees the closed-loop stability of the overall system even in the presence of system nonlinearities and with parameter changes.
Superposition of elliptic functions as solutions for a large number of nonlinear equations

International Nuclear Information System (INIS)

Khare, Avinash; Saxena, Avadh

2014-01-01

For a large number of nonlinear equations, both discrete and continuum, we demonstrate a kind of linear superposition. We show that whenever a nonlinear equation admits solutions in terms of both Jacobi elliptic functions cn(x, m) and dn(x, m) with modulus m, then it also admits solutions in terms of their sum as well as difference. We have checked this in the case of several nonlinear equations such as the nonlinear Schrödinger equation, MKdV, a mixed KdV-MKdV system, a mixed quadratic-cubic nonlinear Schrödinger equation, the Ablowitz-Ladik equation, the saturable nonlinear Schrödinger equation, λϕ 4 , the discrete MKdV as well as for several coupled field equations. Further, for a large number of nonlinear equations, we show that whenever a nonlinear equation admits a periodic solution in terms of dn 2 (x, m), it also admits solutions in terms of dn 2 (x,m)±√(m) cn (x,m) dn (x,m), even though cn(x, m)dn(x, m) is not a solution of these nonlinear equations. Finally, we also obtain superposed solutions of various forms for several coupled nonlinear equations
Visual coherence for large-scale line-plot visualizations

KAUST Repository

Muigg, Philipp

2011-06-01

Displaying a large number of lines within a limited amount of screen space is a task that is common to many different classes of visualization techniques such as time-series visualizations, parallel coordinates, link-node diagrams, and phase-space diagrams. This paper addresses the challenging problems of cluttering and overdraw inherent to such visualizations. We generate a 2x2 tensor field during line rasterization that encodes the distribution of line orientations through each image pixel. Anisotropic diffusion of a noise texture is then used to generate a dense, coherent visualization of line orientation. In order to represent features of different scales, we employ a multi-resolution representation of the tensor field. The resulting technique can easily be applied to a wide variety of line-based visualizations. We demonstrate this for parallel coordinates, a time-series visualization, and a phase-space diagram. Furthermore, we demonstrate how to integrate a focus+context approach by incorporating a second tensor field. Our approach achieves interactive rendering performance for large data sets containing millions of data items, due to its image-based nature and ease of implementation on GPUs. Simulation results from computational fluid dynamics are used to evaluate the performance and usefulness of the proposed method. © 2011 The Author(s).
Visual coherence for large-scale line-plot visualizations

KAUST Repository

Muigg, Philipp; Hadwiger, Markus; Doleisch, Helmut; Grö ller, Eduard M.

2011-01-01

Displaying a large number of lines within a limited amount of screen space is a task that is common to many different classes of visualization techniques such as time-series visualizations, parallel coordinates, link-node diagrams, and phase-space diagrams. This paper addresses the challenging problems of cluttering and overdraw inherent to such visualizations. We generate a 2x2 tensor field during line rasterization that encodes the distribution of line orientations through each image pixel. Anisotropic diffusion of a noise texture is then used to generate a dense, coherent visualization of line orientation. In order to represent features of different scales, we employ a multi-resolution representation of the tensor field. The resulting technique can easily be applied to a wide variety of line-based visualizations. We demonstrate this for parallel coordinates, a time-series visualization, and a phase-space diagram. Furthermore, we demonstrate how to integrate a focus+context approach by incorporating a second tensor field. Our approach achieves interactive rendering performance for large data sets containing millions of data items, due to its image-based nature and ease of implementation on GPUs. Simulation results from computational fluid dynamics are used to evaluate the performance and usefulness of the proposed method. © 2011 The Author(s).
PetClaw: A scalable parallel nonlinear wave propagation solver for Python

KAUST Repository

Alghamdi, Amal; Ahmadia, Aron; Ketcheson, David I.; Knepley, Matthew; Mandli, Kyle; Dalcin, Lisandro

2011-01-01

We present PetClaw, a scalable distributed-memory solver for time-dependent nonlinear wave propagation. PetClaw unifies two well-known scientific computing packages, Clawpack and PETSc, using Python interfaces into both. We rely on Clawpack to provide the infrastructure and kernels for time-dependent nonlinear wave propagation. Similarly, we rely on PETSc to manage distributed data arrays and the communication between them.We describe both the implementation and performance of PetClaw as well as our challenges and accomplishments in scaling a Python-based code to tens of thousands of cores on the BlueGene/P architecture. The capabilities of PetClaw are demonstrated through application to a novel problem involving elastic waves in a heterogeneous medium. Very finely resolved simulations are used to demonstrate the suppression of shock formation in this system.
NONLINEAR DYNAMO IN A ROTATING ELECTRICALLY CONDUCTING FLUID

Directory of Open Access Journals (Sweden)

M. I. Kopp

2017-05-01

Full Text Available We found a new large-scale instability, which arises in the rotating conductive fluid with small-scale turbulence. Turbulence is generated by small-scale external force with a low Reynolds number. The theory is built simply by the method of multiscale asymptotic expansions. Nonlinear equations for vortex and magnetic perturbations obtained in the third order for small Reynolds number. It is shown that the combined effects of the Coriolis force and the small external forces in a rotating conducting fluid possible large-scale instability. The large-scale increments of the instability, correspond to generation as the vortex and magnetic disturbances. This type of instability is classified as hydrodynamic and MHD alpha-effect. We studied the stationary regimes of nonlinear equations of magneto-vortex dynamo. In the limit of weakly conducting fluid found stationary solutions in the form of helical kinks. In the limit of high conductivity fluid was obtained stationary solutions in the form of nonlinear periodic waves and kinks.
Performance Analysis and Scaling Behavior of the Terrestrial Systems Modeling Platform TerrSysMP in Large-Scale Supercomputing Environments

Science.gov (United States)

Kollet, S. J.; Goergen, K.; Gasper, F.; Shresta, P.; Sulis, M.; Rihani, J.; Simmer, C.; Vereecken, H.

2013-12-01

In studies of the terrestrial hydrologic, energy and biogeochemical cycles, integrated multi-physics simulation platforms take a central role in characterizing non-linear interactions, variances and uncertainties of system states and fluxes in reciprocity with observations. Recently developed integrated simulation platforms attempt to honor the complexity of the terrestrial system across multiple time and space scales from the deeper subsurface including groundwater dynamics into the atmosphere. Technically, this requires the coupling of atmospheric, land surface, and subsurface-surface flow models in supercomputing environments, while ensuring a high-degree of efficiency in the utilization of e.g., standard Linux clusters and massively parallel resources. A systematic performance analysis including profiling and tracing in such an application is crucial in the understanding of the runtime behavior, to identify optimum model settings, and is an efficient way to distinguish potential parallel deficiencies. On sophisticated leadership-class supercomputers, such as the 28-rack 5.9 petaFLOP IBM Blue Gene/Q 'JUQUEEN' of the Jülich Supercomputing Centre (JSC), this is a challenging task, but even more so important, when complex coupled component models are to be analysed. Here we want to present our experience from coupling, application tuning (e.g. 5-times speedup through compiler optimizations), parallel scaling and performance monitoring of the parallel Terrestrial Systems Modeling Platform TerrSysMP. The modeling platform consists of the weather prediction system COSMO of the German Weather Service; the Community Land Model, CLM of NCAR; and the variably saturated surface-subsurface flow code ParFlow. The model system relies on the Multiple Program Multiple Data (MPMD) execution model where the external Ocean-Atmosphere-Sea-Ice-Soil coupler (OASIS3) links the component models. TerrSysMP has been instrumented with the performance analysis tool Scalasca and analyzed
Linear and nonlinear excitations in two stacks of parallel arrays of long Josephson junctions

DEFF Research Database (Denmark)

Carapella, G.; Constabile, Giovanni; Latempa, R.

2000-01-01

We investigate a structure consisting of two parallel arrays of long Josephson junctions sharing a common electrode that allows inductive coupling between the arrays. A model for this structure is derived starting from the description of its continuous limit. The excitation of linear cavity modes...... known from continuous and discrete systems as well as the excitation of a new state exhibiting synchronization in two dimensions are inferred from the mathematical model of the system. The stable nonlinear solution of the coupled sine-Gordon equations describing the system is found to consist...
TOPOLOGY OF A LARGE-SCALE STRUCTURE AS A TEST OF MODIFIED GRAVITY

International Nuclear Information System (INIS)

Wang Xin; Chen Xuelei; Park, Changbom

2012-01-01

The genus of the isodensity contours is a robust measure of the topology of a large-scale structure, and it is relatively insensitive to nonlinear gravitational evolution, galaxy bias, and redshift-space distortion. We show that the growth of density fluctuations is scale dependent even in the linear regime in some modified gravity theories, which opens a new possibility of testing the theories observationally. We propose to use the genus of the isodensity contours, an intrinsic measure of the topology of the large-scale structure, as a statistic to be used in such tests. In Einstein's general theory of relativity, density fluctuations grow at the same rate on all scales in the linear regime, and the genus per comoving volume is almost conserved as structures grow homologously, so we expect that the genus-smoothing-scale relation is basically time independent. However, in some modified gravity models where structures grow with different rates on different scales, the genus-smoothing-scale relation should change over time. This can be used to test the gravity models with large-scale structure observations. We study the cases of the f(R) theory, DGP braneworld theory as well as the parameterized post-Friedmann models. We also forecast how the modified gravity models can be constrained with optical/IR or redshifted 21 cm radio surveys in the near future.
Traffic Flow Prediction Model for Large-Scale Road Network Based on Cloud Computing

Directory of Open Access Journals (Sweden)

Zhaosheng Yang

2014-01-01

Full Text Available To increase the efficiency and precision of large-scale road network traffic flow prediction, a genetic algorithm-support vector machine (GA-SVM model based on cloud computing is proposed in this paper, which is based on the analysis of the characteristics and defects of genetic algorithm and support vector machine. In cloud computing environment, firstly, SVM parameters are optimized by the parallel genetic algorithm, and then this optimized parallel SVM model is used to predict traffic flow. On the basis of the traffic flow data of Haizhu District in Guangzhou City, the proposed model was verified and compared with the serial GA-SVM model and parallel GA-SVM model based on MPI (message passing interface. The results demonstrate that the parallel GA-SVM model based on cloud computing has higher prediction accuracy, shorter running time, and higher speedup.
Finding Tropical Cyclones on a Cloud Computing Cluster: Using Parallel Virtualization for Large-Scale Climate Simulation Analysis

Energy Technology Data Exchange (ETDEWEB)

Hasenkamp, Daren; Sim, Alexander; Wehner, Michael; Wu, Kesheng

2010-09-30

Extensive computing power has been used to tackle issues such as climate changes, fusion energy, and other pressing scientific challenges. These computations produce a tremendous amount of data; however, many of the data analysis programs currently only run a single processor. In this work, we explore the possibility of using the emerging cloud computing platform to parallelize such sequential data analysis tasks. As a proof of concept, we wrap a program for analyzing trends of tropical cyclones in a set of virtual machines (VMs). This approach allows the user to keep their familiar data analysis environment in the VMs, while we provide the coordination and data transfer services to ensure the necessary input and output are directed to the desired locations. This work extensively exercises the networking capability of the cloud computing systems and has revealed a number of weaknesses in the current cloud system software. In our tests, we are able to scale the parallel data analysis job to a modest number of VMs and achieve a speedup that is comparable to running the same analysis task using MPI. However, compared to MPI based parallelization, the cloud-based approach has a number of advantages. The cloud-based approach is more flexible because the VMs can capture arbitrary software dependencies without requiring the user to rewrite their programs. The cloud-based approach is also more resilient to failure; as long as a single VM is running, it can make progress while as soon as one MPI node fails the whole analysis job fails. In short, this initial work demonstrates that a cloud computing system is a viable platform for distributed scientific data analyses traditionally conducted on dedicated supercomputing systems.
Finding Tropical Cyclones on a Cloud Computing Cluster: Using Parallel Virtualization for Large-Scale Climate Simulation Analysis

International Nuclear Information System (INIS)

Hasenkamp, Daren; Sim, Alexander; Wehner, Michael; Wu, Kesheng

2010-01-01

Extensive computing power has been used to tackle issues such as climate changes, fusion energy, and other pressing scientific challenges. These computations produce a tremendous amount of data; however, many of the data analysis programs currently only run a single processor. In this work, we explore the possibility of using the emerging cloud computing platform to parallelize such sequential data analysis tasks. As a proof of concept, we wrap a program for analyzing trends of tropical cyclones in a set of virtual machines (VMs). This approach allows the user to keep their familiar data analysis environment in the VMs, while we provide the coordination and data transfer services to ensure the necessary input and output are directed to the desired locations. This work extensively exercises the networking capability of the cloud computing systems and has revealed a number of weaknesses in the current cloud system software. In our tests, we are able to scale the parallel data analysis job to a modest number of VMs and achieve a speedup that is comparable to running the same analysis task using MPI. However, compared to MPI based parallelization, the cloud-based approach has a number of advantages. The cloud-based approach is more flexible because the VMs can capture arbitrary software dependencies without requiring the user to rewrite their programs. The cloud-based approach is also more resilient to failure; as long as a single VM is running, it can make progress while as soon as one MPI node fails the whole analysis job fails. In short, this initial work demonstrates that a cloud computing system is a viable platform for distributed scientific data analyses traditionally conducted on dedicated supercomputing systems.
Xyce parallel electronic simulator : users' guide. Version 5.1.

Energy Technology Data Exchange (ETDEWEB)

Mei, Ting; Rankin, Eric Lamont; Thornquist, Heidi K.; Santarelli, Keith R.; Fixel, Deborah A.; Coffey, Todd Stirling; Russo, Thomas V.; Schiek, Richard Louis; Keiter, Eric Richard; Pawlowski, Roger Patrick

2009-11-01

This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: (1) Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). Note that this includes support for most popular parallel and serial computers. (2) Improved performance for all numerical kernels (e.g., time integrator, nonlinear and linear solvers) through state-of-the-art algorithms and novel techniques. (3) Device models which are specifically tailored to meet Sandia's needs, including some radiation-aware devices (for Sandia users only). (4) Object-oriented code design and implementation using modern coding practices that ensure that the Xyce Parallel Electronic Simulator will be maintainable and extensible far into the future. Xyce is a parallel code in the most general sense of the phrase - a message passing parallel implementation - which allows it to run efficiently on the widest possible number of computing platforms. These include serial, shared-memory and distributed-memory parallel as well as heterogeneous platforms. Careful attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows. The development of Xyce provides a platform for computational research and development aimed specifically at the needs of the Laboratory. With Xyce, Sandia has an 'in-house' capability with which both new electrical (e.g., device model development) and algorithmic (e.g., faster time-integration methods, parallel solver algorithms) research and development can be performed. As a result, Xyce is a
Xyce Parallel Electronic Simulator : users' guide, version 4.1.

Energy Technology Data Exchange (ETDEWEB)

Mei, Ting; Rankin, Eric Lamont; Thornquist, Heidi K.; Santarelli, Keith R.; Fixel, Deborah A.; Coffey, Todd Stirling; Russo, Thomas V.; Schiek, Richard Louis; Keiter, Eric Richard; Pawlowski, Roger Patrick

2009-02-01

This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: (1) Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). Note that this includes support for most popular parallel and serial computers. (2) Improved performance for all numerical kernels (e.g., time integrator, nonlinear and linear solvers) through state-of-the-art algorithms and novel techniques. (3) Device models which are specifically tailored to meet Sandia's needs, including some radiation-aware devices (for Sandia users only). (4) Object-oriented code design and implementation using modern coding practices that ensure that the Xyce Parallel Electronic Simulator will be maintainable and extensible far into the future. Xyce is a parallel code in the most general sense of the phrase - a message passing parallel implementation - which allows it to run efficiently on the widest possible number of computing platforms. These include serial, shared-memory and distributed-memory parallel as well as heterogeneous platforms. Careful attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows. The development of Xyce provides a platform for computational research and development aimed specifically at the needs of the Laboratory. With Xyce, Sandia has an 'in-house' capability with which both new electrical (e.g., device model development) and algorithmic (e.g., faster time-integration methods, parallel solver algorithms) research and development can be performed. As a result, Xyce is a
Large-scale solar purchasing

International Nuclear Information System (INIS)

1999-01-01

The principal objective of the project was to participate in the definition of a new IEA task concerning solar procurement (''the Task'') and to assess whether involvement in the task would be in the interest of the UK active solar heating industry. The project also aimed to assess the importance of large scale solar purchasing to UK active solar heating market development and to evaluate the level of interest in large scale solar purchasing amongst potential large scale purchasers (in particular housing associations and housing developers). A further aim of the project was to consider means of stimulating large scale active solar heating purchasing activity within the UK. (author)
Real-time nonlinear MPC and MHE for a large-scale mechatronic application

DEFF Research Database (Denmark)

Vukov, Milan; Gros, S.; Horn, G.

2015-01-01

Progress in optimization algorithms and in computational hardware made deployment of Nonlinear Model Predictive Control (NMPC) and Moving Horizon Estimation (MHE) possible to mechatronic applications. This paper aims to assess the computational performance of NMPC and MHE for rotational start-up ...

Parallel Monte Carlo reactor neutronics

International Nuclear Information System (INIS)

Blomquist, R.N.; Brown, F.B.

1994-01-01

The issues affecting implementation of parallel algorithms for large-scale engineering Monte Carlo neutron transport simulations are discussed. For nuclear reactor calculations, these include load balancing, recoding effort, reproducibility, domain decomposition techniques, I/O minimization, and strategies for different parallel architectures. Two codes were parallelized and tested for performance. The architectures employed include SIMD, MIMD-distributed memory, and workstation network with uneven interactive load. Speedups linear with the number of nodes were achieved
Parameter and State Estimation of Large-Scale Complex Systems Using Python Tools

Directory of Open Access Journals (Sweden)

M. Anushka S. Perera

2015-07-01

Full Text Available This paper discusses the topics related to automating parameter, disturbance and state estimation analysis of large-scale complex nonlinear dynamic systems using free programming tools. For large-scale complex systems, before implementing any state estimator, the system should be analyzed for structural observability and the structural observability analysis can be automated using Modelica and Python. As a result of structural observability analysis, the system may be decomposed into subsystems where some of them may be observable --- with respect to parameter, disturbances, and states --- while some may not. The state estimation process is carried out for those observable subsystems and the optimum number of additional measurements are prescribed for unobservable subsystems to make them observable. In this paper, an industrial case study is considered: the copper production process at Glencore Nikkelverk, Kristiansand, Norway. The copper production process is a large-scale complex system. It is shown how to implement various state estimators, in Python, to estimate parameters and disturbances, in addition to states, based on available measurements.
Nonlinear Diamagnetic Stabilization of Double Tearing Modes in Cylindrical MHD Simulations

Science.gov (United States)

Abbott, Stephen; Germaschewski, Kai

2014-10-01

Double tearing modes (DTMs) may occur in reversed-shear tokamak configurations if two nearby rational surfaces couple and begin reconnecting. During the DTM's nonlinear evolution it can enter an ``explosive'' growth phase leading to complete reconnection, making it a possible driver for off-axis sawtooth crashes. Motivated by similarities between this behavior and that of the m = 1 kink-tearing mode in conventional tokamaks we investigate diamagnetic drifts as a possible DTM stabilization mechanism. We extend our previous linear studies of an m = 2 , n = 1 DTM in cylindrical geometry to the fully nonlinear regime using the MHD code MRC-3D. A pressure gradient similar to observed ITB profiles is used, together with Hall physics, to introduce ω* effects. We find the diamagnetic drifts can have a stabilizing effect on the nonlinear DTM through a combination of large scale differential rotation and mechanisms local to the reconnection layer. MRC-3D is an extended MHD code based on the libMRC computational framework. It supports nonuniform grids in curvilinear coordinates with parallel implicit and explicit time integration.
Modeling and control of a large nuclear reactor. A three-time-scale approach

Energy Technology Data Exchange (ETDEWEB)

Shimjith, S.R. [Indian Institute of Technology Bombay, Mumbai (India); Bhabha Atomic Research Centre, Mumbai (India); Tiwari, A.P. [Bhabha Atomic Research Centre, Mumbai (India); Bandyopadhyay, B. [Indian Institute of Technology Bombay, Mumbai (India). IDP in Systems and Control Engineering

2013-07-01

Recent research on Modeling and Control of a Large Nuclear Reactor. Presents a three-time-scale approach. Written by leading experts in the field. Control analysis and design of large nuclear reactors requires a suitable mathematical model representing the steady state and dynamic behavior of the reactor with reasonable accuracy. This task is, however, quite challenging because of several complex dynamic phenomena existing in a reactor. Quite often, the models developed would be of prohibitively large order, non-linear and of complex structure not readily amenable for control studies. Moreover, the existence of simultaneously occurring dynamic variations at different speeds makes the mathematical model susceptible to numerical ill-conditioning, inhibiting direct application of standard control techniques. This monograph introduces a technique for mathematical modeling of large nuclear reactors in the framework of multi-point kinetics, to obtain a comparatively smaller order model in standard state space form thus overcoming these difficulties. It further brings in innovative methods for controller design for systems exhibiting multi-time-scale property, with emphasis on three-time-scale systems.
Massive parallel electromagnetic field simulation program JEMS-FDTD design and implementation on jasmin

International Nuclear Information System (INIS)

Li Hanyu; Zhou Haijing; Dong Zhiwei; Liao Cheng; Chang Lei; Cao Xiaolin; Xiao Li

2010-01-01

A large-scale parallel electromagnetic field simulation program JEMS-FDTD(J Electromagnetic Solver-Finite Difference Time Domain) is designed and implemented on JASMIN (J parallel Adaptive Structured Mesh applications INfrastructure). This program can simulate propagation, radiation, couple of electromagnetic field by solving Maxwell equations on structured mesh explicitly with FDTD method. JEMS-FDTD is able to simulate billion-mesh-scale problems on thousands of processors. In this article, the program is verified by simulating the radiation of an electric dipole. A beam waveguide is simulated to demonstrate the capability of large scale parallel computation. A parallel performance test indicates that a high parallel efficiency is obtained. (authors)
A nonlinear structural subgrid-scale closure for compressible MHD. I. Derivation and energy dissipation properties

Energy Technology Data Exchange (ETDEWEB)

Vlaykov, Dimitar G., E-mail: Dimitar.Vlaykov@ds.mpg.de [Institut für Astrophysik, Universität Göttingen, Friedrich-Hund-Platz 1, D-37077 Göttingen (Germany); Max-Planck-Institut für Dynamik und Selbstorganisation, Am Faßberg 17, D-37077 Göttingen (Germany); Grete, Philipp [Institut für Astrophysik, Universität Göttingen, Friedrich-Hund-Platz 1, D-37077 Göttingen (Germany); Max-Planck-Institut für Sonnensystemforschung, Justus-von-Liebig-Weg 3, D-37077 Göttingen (Germany); Schmidt, Wolfram [Hamburger Sternwarte, Universität Hamburg, Gojenbergsweg 112, D-21029 Hamburg (Germany); Schleicher, Dominik R. G. [Departamento de Astronomía, Facultad Ciencias Físicas y Matemáticas, Universidad de Concepción, Av. Esteban Iturra s/n Barrio Universitario, Casilla 160-C (Chile)

2016-06-15

Compressible magnetohydrodynamic (MHD) turbulence is ubiquitous in astrophysical phenomena ranging from the intergalactic to the stellar scales. In studying them, numerical simulations are nearly inescapable, due to the large degree of nonlinearity involved. However, the dynamical ranges of these phenomena are much larger than what is computationally accessible. In large eddy simulations (LESs), the resulting limited resolution effects are addressed explicitly by introducing to the equations of motion additional terms associated with the unresolved, subgrid-scale dynamics. This renders the system unclosed. We derive a set of nonlinear structural closures for the ideal MHD LES equations with particular emphasis on the effects of compressibility. The closures are based on a gradient expansion of the finite-resolution operator [W. K. Yeo (CUP, 1993)] and require no assumptions about the nature of the flow or magnetic field. Thus, the scope of their applicability ranges from the sub- to the hyper-sonic and -Alfvénic regimes. The closures support spectral energy cascades both up and down-scale, as well as direct transfer between kinetic and magnetic resolved and unresolved energy budgets. They implicitly take into account the local geometry, and in particular, the anisotropy of the flow. Their properties are a priori validated in Paper II [P. Grete et al., Phys. Plasmas 23, 062317 (2016)] against alternative closures available in the literature with respect to a wide range of simulation data of homogeneous and isotropic turbulence.
DEMNUni: massive neutrinos and the bispectrum of large scale structures

Science.gov (United States)

Ruggeri, Rossana; Castorina, Emanuele; Carbone, Carmelita; Sefusatti, Emiliano

2018-03-01

The main effect of massive neutrinos on the large-scale structure consists in a few percent suppression of matter perturbations on all scales below their free-streaming scale. Such effect is of particular importance as it allows to constraint the value of the sum of neutrino masses from measurements of the galaxy power spectrum. In this work, we present the first measurements of the next higher-order correlation function, the bispectrum, from N-body simulations that include massive neutrinos as particles. This is the simplest statistics characterising the non-Gaussian properties of the matter and dark matter halos distributions. We investigate, in the first place, the suppression due to massive neutrinos on the matter bispectrum, comparing our measurements with the simplest perturbation theory predictions, finding the approximation of neutrinos contributing at quadratic order in perturbation theory to provide a good fit to the measurements in the simulations. On the other hand, as expected, a linear approximation for neutrino perturbations would lead to Script O(fν) errors on the total matter bispectrum at large scales. We then attempt an extension of previous results on the universality of linear halo bias in neutrino cosmologies, to non-linear and non-local corrections finding consistent results with the power spectrum analysis.
A Parallel, Finite-Volume Algorithm for Large-Eddy Simulation of Turbulent Flows

Science.gov (United States)

Bui, Trong T.

1999-01-01

A parallel, finite-volume algorithm has been developed for large-eddy simulation (LES) of compressible turbulent flows. This algorithm includes piecewise linear least-square reconstruction, trilinear finite-element interpolation, Roe flux-difference splitting, and second-order MacCormack time marching. Parallel implementation is done using the message-passing programming model. In this paper, the numerical algorithm is described. To validate the numerical method for turbulence simulation, LES of fully developed turbulent flow in a square duct is performed for a Reynolds number of 320 based on the average friction velocity and the hydraulic diameter of the duct. Direct numerical simulation (DNS) results are available for this test case, and the accuracy of this algorithm for turbulence simulations can be ascertained by comparing the LES solutions with the DNS results. The effects of grid resolution, upwind numerical dissipation, and subgrid-scale dissipation on the accuracy of the LES are examined. Comparison with DNS results shows that the standard Roe flux-difference splitting dissipation adversely affects the accuracy of the turbulence simulation. For accurate turbulence simulations, only 3-5 percent of the standard Roe flux-difference splitting dissipation is needed.
The build up of the correlation between halo spin and the large-scale structure

Science.gov (United States)

Wang, Peng; Kang, Xi

2018-01-01

Both simulations and observations have confirmed that the spin of haloes/galaxies is correlated with the large-scale structure (LSS) with a mass dependence such that the spin of low-mass haloes/galaxies tend to be parallel with the LSS, while that of massive haloes/galaxies tend to be perpendicular with the LSS. It is still unclear how this mass dependence is built up over time. We use N-body simulations to trace the evolution of the halo spin-LSS correlation and find that at early times the spin of all halo progenitors is parallel with the LSS. As time goes on, mass collapsing around massive halo is more isotropic, especially the recent mass accretion along the slowest collapsing direction is significant and it brings the halo spin to be perpendicular with the LSS. Adopting the fractional anisotropy (FA) parameter to describe the degree of anisotropy of the large-scale environment, we find that the spin-LSS correlation is a strong function of the environment such that a higher FA (more anisotropic environment) leads to an aligned signal, and a lower anisotropy leads to a misaligned signal. In general, our results show that the spin-LSS correlation is a combined consequence of mass flow and halo growth within the cosmic web. Our predicted environmental dependence between spin and large-scale structure can be further tested using galaxy surveys.
The TeraShake Computational Platform for Large-Scale Earthquake Simulations

Science.gov (United States)

Cui, Yifeng; Olsen, Kim; Chourasia, Amit; Moore, Reagan; Maechling, Philip; Jordan, Thomas

Geoscientific and computer science researchers with the Southern California Earthquake Center (SCEC) are conducting a large-scale, physics-based, computationally demanding earthquake system science research program with the goal of developing predictive models of earthquake processes. The computational demands of this program continue to increase rapidly as these researchers seek to perform physics-based numerical simulations of earthquake processes for larger meet the needs of this research program, a multiple-institution team coordinated by SCEC has integrated several scientific codes into a numerical modeling-based research tool we call the TeraShake computational platform (TSCP). A central component in the TSCP is a highly scalable earthquake wave propagation simulation program called the TeraShake anelastic wave propagation (TS-AWP) code. In this chapter, we describe how we extended an existing, stand-alone, wellvalidated, finite-difference, anelastic wave propagation modeling code into the highly scalable and widely used TS-AWP and then integrated this code into the TeraShake computational platform that provides end-to-end (initialization to analysis) research capabilities. We also describe the techniques used to enhance the TS-AWP parallel performance on TeraGrid supercomputers, as well as the TeraShake simulations phases including input preparation, run time, data archive management, and visualization. As a result of our efforts to improve its parallel efficiency, the TS-AWP has now shown highly efficient strong scaling on over 40K processors on IBM’s BlueGene/L Watson computer. In addition, the TSCP has developed into a computational system that is useful to many members of the SCEC community for performing large-scale earthquake simulations.
Nonlinear electrokinetics at large voltages

Energy Technology Data Exchange (ETDEWEB)

Bazant, Martin Z [Department of Chemical Engineering and Institute for Soldier Nanotechnologies, Massachusetts Institute of Technology, Cambridge, MA 02139 (United States); Sabri Kilic, Mustafa; Ajdari, Armand [Department of Mathematics, Massachusetts Institute of Technology, Cambridge, MA 02139 (United States); Storey, Brian D [Franklin W Olin College of Engineering, Needham, MA 02492 (United States)], E-mail: bazant@mit.edu

2009-07-15

The classical theory of electrokinetic phenomena assumes a dilute solution of point-like ions in chemical equilibrium with a surface whose double-layer voltage is of order the thermal voltage, k{sub B}T/e=25 mV. In nonlinear 'induced-charge' electrokinetic phenomena, such as ac electro-osmosis, several volts {approx}100k{sub B}T/e are applied to the double layer, and the theory breaks down and cannot explain many observed features. We argue that, under such a large voltage, counterions 'condense' near the surface, even for dilute bulk solutions. Based on simple models, we predict that the double-layer capacitance decreases and the electro-osmotic mobility saturates at large voltages, due to steric repulsion and increased viscosity of the condensed layer, respectively. The former suffices to explain observed high-frequency flow reversal in ac electro-osmosis; the latter leads to a salt concentration dependence of induced-charge flows comparable to experiments, although a complete theory is still lacking.
Scalable Parallel Methods for Analyzing Metagenomics Data at Extreme Scale

Energy Technology Data Exchange (ETDEWEB)

Daily, Jeffrey A. [Washington State Univ., Pullman, WA (United States)

2015-05-01

The field of bioinformatics and computational biology is currently experiencing a data revolution. The exciting prospect of making fundamental biological discoveries is fueling the rapid development and deployment of numerous cost-effective, high-throughput next-generation sequencing technologies. The result is that the DNA and protein sequence repositories are being bombarded with new sequence information. Databases are continuing to report a Moore’s law-like growth trajectory in their database sizes, roughly doubling every 18 months. In what seems to be a paradigm-shift, individual projects are now capable of generating billions of raw sequence data that need to be analyzed in the presence of already annotated sequence information. While it is clear that data-driven methods, such as sequencing homology detection, are becoming the mainstay in the field of computational life sciences, the algorithmic advancements essential for implementing complex data analytics at scale have mostly lagged behind. Sequence homology detection is central to a number of bioinformatics applications including genome sequencing and protein family characterization. Given millions of sequences, the goal is to identify all pairs of sequences that are highly similar (or “homologous”) on the basis of alignment criteria. While there are optimal alignment algorithms to compute pairwise homology, their deployment for large-scale is currently not feasible; instead, heuristic methods are used at the expense of quality. In this dissertation, we present the design and evaluation of a parallel implementation for conducting optimal homology detection on distributed memory supercomputers. Our approach uses a combination of techniques from asynchronous load balancing (viz. work stealing, dynamic task counters), data replication, and exact-matching filters to achieve homology detection at scale. Results for a collection of 2.56M sequences show parallel efficiencies of ~75-100% on up to 8K cores
Scalable Parallel Methods for Analyzing Metagenomics Data at Extreme Scale

International Nuclear Information System (INIS)

Daily, Jeffrey A.

2015-01-01

The field of bioinformatics and computational biology is currently experiencing a data revolution. The exciting prospect of making fundamental biological discoveries is fueling the rapid development and deployment of numerous cost-effective, high-throughput next-generation sequencing technologies. The result is that the DNA and protein sequence repositories are being bombarded with new sequence information. Databases are continuing to report a Moore's law-like growth trajectory in their database sizes, roughly doubling every 18 months. In what seems to be a paradigm-shift, individual projects are now capable of generating billions of raw sequence data that need to be analyzed in the presence of already annotated sequence information. While it is clear that data-driven methods, such as sequencing homology detection, are becoming the mainstay in the field of computational life sciences, the algorithmic advancements essential for implementing complex data analytics at scale have mostly lagged behind. Sequence homology detection is central to a number of bioinformatics applications including genome sequencing and protein family characterization. Given millions of sequences, the goal is to identify all pairs of sequences that are highly similar (or 'homologous') on the basis of alignment criteria. While there are optimal alignment algorithms to compute pairwise homology, their deployment for large-scale is currently not feasible; instead, heuristic methods are used at the expense of quality. In this dissertation, we present the design and evaluation of a parallel implementation for conducting optimal homology detection on distributed memory supercomputers. Our approach uses a combination of techniques from asynchronous load balancing (viz. work stealing, dynamic task counters), data replication, and exact-matching filters to achieve homology detection at scale. Results for a collection of 2.56M sequences show parallel efficiencies of ~75-100% on up to 8K
Parallel Computation of RCS of Electrically Large Platform with Coatings Modeled with NURBS Surfaces

Directory of Open Access Journals (Sweden)

Ying Yan

2012-01-01

Full Text Available The significance of Radar Cross Section (RCS in the military applications makes its prediction an important problem. This paper uses large-scale parallel Physical Optics (PO to realize the fast computation of RCS to electrically large targets, which are modeled by Non-Uniform Rational B-Spline (NURBS surfaces and coated with dielectric materials. Some numerical examples are presented to validate this paper’s method. In addition, 1024 CPUs are used in Shanghai Supercomputer Center (SSC to perform the simulation of a model with the maximum electrical size 1966.7 λ for the first time in China. From which, it can be found that this paper’s method can greatly speed the calculation and is capable of solving the real-life problem of RCS prediction.
Conference on High Performance Software for Nonlinear Optimization

CERN Document Server

Murli, Almerico; Pardalos, Panos; Toraldo, Gerardo

1998-01-01

This book contains a selection of papers presented at the conference on High Performance Software for Nonlinear Optimization (HPSN097) which was held in Ischia, Italy, in June 1997. The rapid progress of computer technologies, including new parallel architec tures, has stimulated a large amount of research devoted to building software environments and defining algorithms able to fully exploit this new computa tional power. In some sense, numerical analysis has to conform itself to the new tools. The impact of parallel computing in nonlinear optimization, which had a slow start at the beginning, seems now to increase at a fast rate, and it is reasonable to expect an even greater acceleration in the future. As with the first HPSNO conference, the goal of the HPSN097 conference was to supply a broad overview of the more recent developments and trends in nonlinear optimization, emphasizing the algorithmic and high performance software aspects. Bringing together new computational methodologies with theoretical...
Nonlinear wave mechanics from classical dynamics and scale covariance

International Nuclear Information System (INIS)

Hammad, F.

2007-01-01

Nonlinear Schroedinger equations proposed by Kostin and by Doebner and Goldin are rederived from Nottale's prescription for obtaining quantum mechanics from classical mechanics in nondifferentiable spaces; i.e., from hydrodynamical concepts and scale covariance. Some soliton and plane wave solutions are discussed
GPU-based acceleration of computations in nonlinear finite element deformation analysis.

Science.gov (United States)

Mafi, Ramin; Sirouspour, Shahin

2014-03-01

The physics of deformation for biological soft-tissue is best described by nonlinear continuum mechanics-based models, which then can be discretized by the FEM for a numerical solution. However, computational complexity of such models have limited their use in applications requiring real-time or fast response. In this work, we propose a graphic processing unit-based implementation of the FEM using implicit time integration for dynamic nonlinear deformation analysis. This is the most general formulation of the deformation analysis. It is valid for large deformations and strains and can account for material nonlinearities. The data-parallel nature and the intense arithmetic computations of nonlinear FEM equations make it particularly suitable for implementation on a parallel computing platform such as graphic processing unit. In this work, we present and compare two different designs based on the matrix-free and conventional preconditioned conjugate gradients algorithms for solving the FEM equations arising in deformation analysis. The speedup achieved with the proposed parallel implementations of the algorithms will be instrumental in the development of advanced surgical simulators and medical image registration methods involving soft-tissue deformation. Copyright © 2013 John Wiley & Sons, Ltd.
Performance of Air Pollution Models on Massively Parallel Computers

DEFF Research Database (Denmark)

Brown, John; Hansen, Per Christian; Wasniewski, Jerzy

1996-01-01

To compare the performance and use of three massively parallel SIMD computers, we implemented a large air pollution model on the computers. Using a realistic large-scale model, we gain detailed insight about the performance of the three computers when used to solve large-scale scientific problems...
Nonlinear generation of kinetic-scale waves by magnetohydrodynamic Alfvén waves and nonlocal spectral transport in the solar wind

Energy Technology Data Exchange (ETDEWEB)

Zhao, J. S.; Wu, D. J. [Purple Mountain Observatory, Chinese Academy of Sciences, Nanjing (China); Voitenko, Y.; De Keyser, J., E-mail: js_zhao@pmo.ac.cn [Solar-Terrestrial Centre of Excellence, Space Physics Division, Belgian Institute for Space Aeronomy, Ringlaan-3-Avenue Circulaire, B-1180 Brussels (Belgium)

2014-04-20

We study the nonlocal nonlinear coupling and generation of kinetic Alfvén waves (KAWs) and kinetic slow waves (KSWs) by magnetohydrodynamic Alfvén waves (MHD AWs) in conditions typical for the solar wind in the inner heliosphere. This cross-scale process provides an alternative to the turbulent energy cascade passing through many intermediate scales. The nonlinearities we study are proportional to the scalar products of wave vectors and hence are called 'scalar' ones. Despite the strong Landau damping of kinetic waves, we found fast growing KAWs and KSWs at perpendicular wavelengths close to the ion gyroradius. Using the parametric decay formalism, we investigate two independent decay channels for the pump AW: forward decay (involving co-propagating product waves) and backward decay (involving counter-propagating product waves). The growth rate of the forward decay is typically 0.05 but can exceed 0.1 of the pump wave frequency. The resulting spectral transport is nonlocal and anisotropic, sharply increasing perpendicular wavenumbers but not parallel ones. AWs and KAWs propagating against the pump AW grow with about the same rate and contribute to the sunward wave flux in the solar wind. Our results suggest that the nonlocal decay of MHD AWs into KAWs and KSWs is a robust mechanism for the cross-scale spectral transport of the wave energy from MHD to dissipative kinetic scales in the solar wind and similar media.
The parallel volume at large distances

DEFF Research Database (Denmark)

Kampf, Jürgen

In this paper we examine the asymptotic behavior of the parallel volume of planar non-convex bodies as the distance tends to infinity. We show that the difference between the parallel volume of the convex hull of a body and the parallel volume of the body itself tends to . This yields a new proof...... for the fact that a planar body can only have polynomial parallel volume, if it is convex. Extensions to Minkowski spaces and random sets are also discussed....

The parallel volume at large distances

DEFF Research Database (Denmark)

Kampf, Jürgen

In this paper we examine the asymptotic behavior of the parallel volume of planar non-convex bodies as the distance tends to infinity. We show that the difference between the parallel volume of the convex hull of a body and the parallel volume of the body itself tends to 0. This yields a new proof...... for the fact that a planar body can only have polynomial parallel volume, if it is convex. Extensions to Minkowski spaces and random sets are also discussed....
Energy transfers in large-scale and small-scale dynamos

Science.gov (United States)

Samtaney, Ravi; Kumar, Rohit; Verma, Mahendra

2015-11-01

We present the energy transfers, mainly energy fluxes and shell-to-shell energy transfers in small-scale dynamo (SSD) and large-scale dynamo (LSD) using numerical simulations of MHD turbulence for Pm = 20 (SSD) and for Pm = 0.2 on 10243 grid. For SSD, we demonstrate that the magnetic energy growth is caused by nonlocal energy transfers from the large-scale or forcing-scale velocity field to small-scale magnetic field. The peak of these energy transfers move towards lower wavenumbers as dynamo evolves, which is the reason for the growth of the magnetic fields at the large scales. The energy transfers U2U (velocity to velocity) and B2B (magnetic to magnetic) are forward and local. For LSD, we show that the magnetic energy growth takes place via energy transfers from large-scale velocity field to large-scale magnetic field. We observe forward U2U and B2B energy flux, similar to SSD.
Nonlinear model of short-scale electrodynamics in the auroral ionosphere

Directory of Open Access Journals (Sweden)

J.-M. A. Noël

Full Text Available The optical detection of auroral subarcs a few tens of m wide as well as the direct observation of shears several m/s per m over km to sub km scales by rocket instrumentation both indicate that violent and highly localized electrodynamics can occur at times in the auroral ionosphere over scales 100 m or less in width. These observations as well as the detection of unstable ion-acoustic waves observed by incoherent radars along the geomagnetic field lines has motivated us to develop a detailed time-dependent two-dimensional model of short-scale auroral electrodynamics that uses current continuity, Ohm's law, and 8-moment transport equations for the ions and electrons in the presence of large ambient electric fields to describe wide auroral arcs with sharp edges in response to sharp cut-offs in precipitation (even though it may be possible to describe thin arcs and ultra-thin arcs with our model, we have left such a study for future work. We present the essential elements of this new model and illustrate the model's usefulness with a sample run for which the ambient electric field is 100 mV/m away from the arc and for which electron precipitation cuts off over a region 100 m wide. The sample run demonstrates that parallel current densities of the order of several hundred µA m^-2 can be triggered in these circumstances, together with shears several m/s per m in magnitude and parallel electric fields of the order of 0.1 mV/m around 130 km altitude. It also illustrates that the local ionospheric properties like densities, temperature and composition can strongly be affected by the violent localized electrodynamics and vice-versa.

Key words: Ionosphere (auroral ionosphere, electric fields and currents, ionosphere-magnetosphere interactions
Large-scale Cosmic-Ray Anisotropy as a Probe of Interstellar Turbulence

Energy Technology Data Exchange (ETDEWEB)

Giacinti, Gwenael; Kirk, John G. [Max-Planck-Institut für Kernphysik, Postfach 103980, D-69029 Heidelberg (Germany)

2017-02-01

We calculate the large-scale cosmic-ray (CR) anisotropies predicted for a range of Goldreich–Sridhar (GS) and isotropic models of interstellar turbulence, and compare them with IceTop data. In general, the predicted CR anisotropy is not a pure dipole; the cold spots reported at 400 TeV and 2 PeV are consistent with a GS model that contains a smooth deficit of parallel-propagating waves and a broad resonance function, though some other possibilities cannot, as yet, be ruled out. In particular, isotropic fast magnetosonic wave turbulence can match the observations at high energy, but cannot accommodate an energy dependence in the shape of the CR anisotropy. Our findings suggest that improved data on the large-scale CR anisotropy could provide a valuable probe of the properties—notably the power-spectrum—of the interstellar turbulence within a few tens of parsecs from Earth.
On unravelling mechanism of interplay between cloud and large scale circulation: a grey area in climate science

Science.gov (United States)

De, S.; Agarwal, N. K.; Hazra, Anupam; Chaudhari, Hemantkumar S.; Sahai, A. K.

2018-04-01

The interaction between cloud and large scale circulation is much less explored area in climate science. Unfolding the mechanism of coupling between these two parameters is imperative for improved simulation of Indian summer monsoon (ISM) and to reduce imprecision in climate sensitivity of global climate model. This work has made an effort to explore this mechanism with CFSv2 climate model experiments whose cloud has been modified by changing the critical relative humidity (CRH) profile of model during ISM. Study reveals that the variable CRH in CFSv2 has improved the nonlinear interactions between high and low frequency oscillations in wind field (revealed as internal dynamics of monsoon) and modulates realistically the spatial distribution of interactions over Indian landmass during the contrasting monsoon season compared to the existing CRH profile of CFSv2. The lower tropospheric wind error energy in the variable CRH simulation of CFSv2 appears to be minimum due to the reduced nonlinear convergence of error to the planetary scale range from long and synoptic scales (another facet of internal dynamics) compared to as observed from other CRH experiments in normal and deficient monsoons. Hence, the interplay between cloud and large scale circulation through CRH may be manifested as a change in internal dynamics of ISM revealed from scale interactive quasi-linear and nonlinear kinetic energy exchanges in frequency as well as in wavenumber domain during the monsoon period that eventually modify the internal variance of CFSv2 model. Conversely, the reduced wind bias and proper modulation of spatial distribution of scale interaction between the synoptic and low frequency oscillations improve the eastward and northward extent of water vapour flux over Indian landmass that in turn give feedback to the realistic simulation of cloud condensates attributing improved ISM rainfall in CFSv2.
A comparison of parallel dust and fibre measurements of airborne chrysotile asbestos in a large mine and processing factories in the Russian Federation

NARCIS (Netherlands)

Feletto, Eleonora; Schonfeld, Sara J; Kovalevskiy, Evgeny V; Bukhtiyarov, Igor V; Kashanskiy, Sergey V; Moissonnier, Monika; Straif, Kurt; Kromhout, Hans

2017-01-01

INTRODUCTION: Historic dust concentrations are available in a large-scale cohort study of workers in a chrysotile mine and processing factories in Asbest, Russian Federation. Parallel dust (gravimetric) and fibre (phase-contrast optical microscopy) concentrations collected in 1995, 2007 and 2013/14
Solving Large Quadratic|Assignment Problems in Parallel

DEFF Research Database (Denmark)

Clausen, Jens; Perregaard, Michael

1997-01-01

and recalculation of bounds between branchings when used in a parallel Branch-and-Bound algorithm. The algorithm has been implemented on a 16-processor MEIKO Computing Surface with Intel i860 processors. Computational results from the solution of a number of large QAPs, including the classical Nugent 20...... processors, and have hence not been ideally suited for computations essentially involving non-vectorizable computations on integers.In this paper we investigate the combination of one of the best bound functions for a Branch-and-Bound algorithm (the Gilmore-Lawler bound) and various testing, variable binding...
Tensor-GMRES method for large sparse systems of nonlinear equations

Science.gov (United States)

Feng, Dan; Pulliam, Thomas H.

1994-01-01

This paper introduces a tensor-Krylov method, the tensor-GMRES method, for large sparse systems of nonlinear equations. This method is a coupling of tensor model formation and solution techniques for nonlinear equations with Krylov subspace projection techniques for unsymmetric systems of linear equations. Traditional tensor methods for nonlinear equations are based on a quadratic model of the nonlinear function, a standard linear model augmented by a simple second order term. These methods are shown to be significantly more efficient than standard methods both on nonsingular problems and on problems where the Jacobian matrix at the solution is singular. A major disadvantage of the traditional tensor methods is that the solution of the tensor model requires the factorization of the Jacobian matrix, which may not be suitable for problems where the Jacobian matrix is large and has a 'bad' sparsity structure for an efficient factorization. We overcome this difficulty by forming and solving the tensor model using an extension of a Newton-GMRES scheme. Like traditional tensor methods, we show that the new tensor method has significant computational advantages over the analogous Newton counterpart. Consistent with Krylov subspace based methods, the new tensor method does not depend on the factorization of the Jacobian matrix. As a matter of fact, the Jacobian matrix is never needed explicitly.
Time-Sliced Perturbation Theory for Large Scale Structure I: General Formalism

CERN Document Server

Blas, Diego; Ivanov, Mikhail M.; Sibiryakov, Sergey

2016-01-01

We present a new analytic approach to describe large scale structure formation in the mildly non-linear regime. The central object of the method is the time-dependent probability distribution function generating correlators of the cosmological observables at a given moment of time. Expanding the distribution function around the Gaussian weight we formulate a perturbative technique to calculate non-linear corrections to cosmological correlators, similar to the diagrammatic expansion in a three-dimensional Euclidean quantum field theory, with time playing the role of an external parameter. For the physically relevant case of cold dark matter in an Einstein--de Sitter universe, the time evolution of the distribution function can be found exactly and is encapsulated by a time-dependent coupling constant controlling the perturbative expansion. We show that all building blocks of the expansion are free from spurious infrared enhanced contributions that plague the standard cosmological perturbation theory. This pave...
Massively Parallel Finite Element Programming

KAUST Repository

Heister, Timo; Kronbichler, Martin; Bangerth, Wolfgang

2010-01-01

Today's large finite element simulations require parallel algorithms to scale on clusters with thousands or tens of thousands of processor cores. We present data structures and algorithms to take advantage of the power of high performance computers in generic finite element codes. Existing generic finite element libraries often restrict the parallelization to parallel linear algebra routines. This is a limiting factor when solving on more than a few hundreds of cores. We describe routines for distributed storage of all major components coupled with efficient, scalable algorithms. We give an overview of our effort to enable the modern and generic finite element library deal.II to take advantage of the power of large clusters. In particular, we describe the construction of a distributed mesh and develop algorithms to fully parallelize the finite element calculation. Numerical results demonstrate good scalability. © 2010 Springer-Verlag.
Massively Parallel Finite Element Programming

KAUST Repository

Heister, Timo

2010-01-01

Today\\'s large finite element simulations require parallel algorithms to scale on clusters with thousands or tens of thousands of processor cores. We present data structures and algorithms to take advantage of the power of high performance computers in generic finite element codes. Existing generic finite element libraries often restrict the parallelization to parallel linear algebra routines. This is a limiting factor when solving on more than a few hundreds of cores. We describe routines for distributed storage of all major components coupled with efficient, scalable algorithms. We give an overview of our effort to enable the modern and generic finite element library deal.II to take advantage of the power of large clusters. In particular, we describe the construction of a distributed mesh and develop algorithms to fully parallelize the finite element calculation. Numerical results demonstrate good scalability. © 2010 Springer-Verlag.
HAlign-II: efficient ultra-large multiple sequence alignment and phylogenetic tree reconstruction with distributed and parallel computing.

Science.gov (United States)

Wan, Shixiang; Zou, Quan

2017-01-01

Multiple sequence alignment (MSA) plays a key role in biological sequence analyses, especially in phylogenetic tree construction. Extreme increase in next-generation sequencing results in shortage of efficient ultra-large biological sequence alignment approaches for coping with different sequence types. Distributed and parallel computing represents a crucial technique for accelerating ultra-large (e.g. files more than 1 GB) sequence analyses. Based on HAlign and Spark distributed computing system, we implement a highly cost-efficient and time-efficient HAlign-II tool to address ultra-large multiple biological sequence alignment and phylogenetic tree construction. The experiments in the DNA and protein large scale data sets, which are more than 1GB files, showed that HAlign II could save time and space. It outperformed the current software tools. HAlign-II can efficiently carry out MSA and construct phylogenetic trees with ultra-large numbers of biological sequences. HAlign-II shows extremely high memory efficiency and scales well with increases in computing resource. THAlign-II provides a user-friendly web server based on our distributed computing infrastructure. HAlign-II with open-source codes and datasets was established at http://lab.malab.cn/soft/halign.
Nonlinear waves in solar plasmas - a review

International Nuclear Information System (INIS)

Ballai, I

2006-01-01

Nonlinearity is a direct consequence of large scale dynamics in the solar plasmas. When nonlinear steepening of waves is balanced by dispersion, solitary waves are generated. In the vicinity of resonances, waves can steepen into nonlinear waves influencing the efficiency of energy deposition. Here we review recent theoretical breakthroughs that have lead to a greater understanding of many aspects of nonlinear waves arising in homogeneous and inhomogeneous solar plasmas
Jump phenomena. [large amplitude responses of nonlinear systems

Science.gov (United States)

Reiss, E. L.

1980-01-01

The paper considers jump phenomena composed of large amplitude responses of nonlinear systems caused by small amplitude disturbances. Physical problems where large jumps in the solution amplitude are important features of the response are described, including snap buckling of elastic shells, chemical reactions leading to combustion and explosion, and long-term climatic changes of the earth's atmosphere. A new method of rational functions was then developed which consists of representing the solutions of the jump problems as rational functions of the small disturbance parameter; this method can solve jump problems explicitly.
On the Soft Limit of the Large Scale Structure Power Spectrum: UV Dependence

CERN Document Server

Garny, Mathias; Porto, Rafael A; Sagunski, Laura

2015-01-01

We derive a non-perturbative equation for the large scale structure power spectrum of long-wavelength modes. Thereby, we use an operator product expansion together with relations between the three-point function and power spectrum in the soft limit. The resulting equation encodes the coupling to ultraviolet (UV) modes in two time-dependent coefficients, which may be obtained from response functions to (anisotropic) parameters, such as spatial curvature, in a modified cosmology. We argue that both depend weakly on fluctuations deep in the UV. As a byproduct, this implies that the renormalized leading order coefficient(s) in the effective field theory (EFT) of large scale structures receive most of their contribution from modes close to the non-linear scale. Consequently, the UV dependence found in explicit computations within standard perturbation theory stems mostly from counter-term(s). We confront a simplified version of our non-perturbative equation against existent numerical simulations, and find good agr...
Large scale computing in theoretical physics: Example QCD

International Nuclear Information System (INIS)

Schilling, K.

1986-01-01

The limitations of the classical mathematical analysis of Newton and Leibniz appear to be more and more overcome by the power of modern computers. Large scale computing techniques - which resemble closely the methods used in simulations within statistical mechanics - allow to treat nonlinear systems with many degrees of freedom such as field theories in nonperturbative situations, where analytical methods do fail. The computation of the hadron spectrum within the framework of lattice QCD sets a demanding goal for the application of supercomputers in basic science. It requires both big computer capacities and clever algorithms to fight all the numerical evils that one encounters in the Euclidean world. The talk will attempt to describe both the computer aspects and the present state of the art of spectrum calculations within lattice QCD. (orig.)
Solving very large scattering problems using a parallel PWTD-enhanced surface integral equation solver

KAUST Repository

Liu, Yang

2013-07-01

The computational complexity and memory requirements of multilevel plane wave time domain (PWTD)-accelerated marching-on-in-time (MOT)-based surface integral equation (SIE) solvers scale as O(NtNs(log 2)Ns) and O(Ns 1.5); here N t and Ns denote numbers of temporal and spatial basis functions discretizing the current [Shanker et al., IEEE Trans. Antennas Propag., 51, 628-641, 2003]. In the past, serial versions of these solvers have been successfully applied to the analysis of scattering from perfect electrically conducting as well as homogeneous penetrable targets involving up to Ns ≈ 0.5 × 106 and Nt ≈ 10 3. To solve larger problems, parallel PWTD-enhanced MOT solvers are called for. Even though a simple parallelization strategy was demonstrated in the context of electromagnetic compatibility analysis [M. Lu et al., in Proc. IEEE Int. Symp. AP-S, 4, 4212-4215, 2004], by and large, progress in this area has been slow. The lack of progress can be attributed wholesale to difficulties associated with the construction of a scalable PWTD kernel. © 2013 IEEE.
Large-scale data analytics

CERN Document Server

Gkoulalas-Divanis, Aris

2014-01-01

Provides cutting-edge research in large-scale data analytics from diverse scientific areas Surveys varied subject areas and reports on individual results of research in the field Shares many tips and insights into large-scale data analytics from authors and editors with long-term experience and specialization in the field
Isotropic damage model and serial/parallel mix theory applied to nonlinear analysis of ferrocement thin walls. Experimental and numerical analysis

Directory of Open Access Journals (Sweden)

Jairo A. Paredes

2016-01-01

Full Text Available Ferrocement thin walls are the structural elements that comprise the earthquake resistant system of dwellings built with this material. This article presents the results drawn from an experimental campaign carried out over full-scale precast ferrocement thin walls that were assessed under lateral static loading conditions. The tests allowed the identification of structural parameters and the evaluation of the performance of the walls under static loading conditions. Additionally, an isotropic damage model for modelling the mortar was applied, as well as the classic elasto-plastic theory for modelling the meshes and reinforcing bars. The ferrocement is considered as a composite material, thus the serial/parallel mix theory is used for modelling its mechanical behavior. In this work a methodology for the numerical analysis that allows modeling the nonlinear behavior exhibited by ferrocement walls under static loading conditions, as well as their potential use in earthquake resistant design, is proposed.
Large-scale self-assembled zirconium phosphate smectic layers via a simple spray-coating process

Science.gov (United States)

Wong, Minhao; Ishige, Ryohei; White, Kevin L.; Li, Peng; Kim, Daehak; Krishnamoorti, Ramanan; Gunther, Robert; Higuchi, Takeshi; Jinnai, Hiroshi; Takahara, Atsushi; Nishimura, Riichi; Sue, Hung-Jue

2014-04-01

The large-scale assembly of asymmetric colloidal particles is used in creating high-performance fibres. A similar concept is extended to the manufacturing of thin films of self-assembled two-dimensional crystal-type materials with enhanced and tunable properties. Here we present a spray-coating method to manufacture thin, flexible and transparent epoxy films containing zirconium phosphate nanoplatelets self-assembled into a lamellar arrangement aligned parallel to the substrate. The self-assembled mesophase of zirconium phosphate nanoplatelets is stabilized by epoxy pre-polymer and exhibits rheology favourable towards large-scale manufacturing. The thermally cured film forms a mechanically robust coating and shows excellent gas barrier properties at both low- and high humidity levels as a result of the highly aligned and overlapping arrangement of nanoplatelets. This work shows that the large-scale ordering of high aspect ratio nanoplatelets is easier to achieve than previously thought and may have implications in the technological applications for similar materials.

Synchronization Techniques in Parallel Discrete Event Simulation

OpenAIRE

Lindén, Jonatan

2018-01-01

Discrete event simulation is an important tool for evaluating system models in many fields of science and engineering. To improve the performance of large-scale discrete event simulations, several techniques to parallelize discrete event simulation have been developed. In parallel discrete event simulation, the work of a single discrete event simulation is distributed over multiple processing elements. A key challenge in parallel discrete event simulation is to ensure that causally dependent ...
The multilevel fast multipole algorithm (MLFMA) for solving large-scale computational electromagnetics problems

CERN Document Server

Ergul, Ozgur

2014-01-01

The Multilevel Fast Multipole Algorithm (MLFMA) for Solving Large-Scale Computational Electromagnetic Problems provides a detailed and instructional overview of implementing MLFMA. The book: Presents a comprehensive treatment of the MLFMA algorithm, including basic linear algebra concepts, recent developments on the parallel computation, and a number of application examplesCovers solutions of electromagnetic problems involving dielectric objects and perfectly-conducting objectsDiscusses applications including scattering from airborne targets, scattering from red
Dynamics of large-scale brain activity in normal arousal states and epileptic seizures

Science.gov (United States)

Robinson, P. A.; Rennie, C. J.; Rowe, D. L.

2002-04-01

Links between electroencephalograms (EEGs) and underlying aspects of neurophysiology and anatomy are poorly understood. Here a nonlinear continuum model of large-scale brain electrical activity is used to analyze arousal states and their stability and nonlinear dynamics for physiologically realistic parameters. A simple ordered arousal sequence in a reduced parameter space is inferred and found to be consistent with experimentally determined parameters of waking states. Instabilities arise at spectral peaks of the major clinically observed EEG rhythms-mainly slow wave, delta, theta, alpha, and sleep spindle-with each instability zone lying near its most common experimental precursor arousal states in the reduced space. Theta, alpha, and spindle instabilities evolve toward low-dimensional nonlinear limit cycles that correspond closely to EEGs of petit mal seizures for theta instability, and grand mal seizures for the other types. Nonlinear stimulus-induced entrainment and seizures are also seen, EEG spectra and potentials evoked by stimuli are reproduced, and numerous other points of experimental agreement are found. Inverse modeling enables physiological parameters underlying observed EEGs to be determined by a new, noninvasive route. This model thus provides a single, powerful framework for quantitative understanding of a wide variety of brain phenomena.
Nonlinearity in structural and electronic materials

International Nuclear Information System (INIS)

Bishop, A.R.; Beardmore, K.M.; Ben-Naim, E.

1997-01-01

This is the final report of a three-year, Laboratory Directed Research and Development (LDRD) project at the Los Alamos National Laboratory (LANL). The project strengthens a nonlinear technology base relevant to a variety of problems arising in condensed matter and materials science, and applies this technology to those problems. In this way the controlled synthesis of, and experiments on, novel electronic and structural materials provide an important focus for nonlinear science, while nonlinear techniques help advance the understanding of the scientific principles underlying the control of microstructure and dynamics in complex materials. This research is primarily focused on four topics: (1) materials microstructure: growth and evolution, and porous media; (2) textures in elastic/martensitic materials; (3) electro- and photo-active polymers; and (4) ultrafast photophysics in complex electronic materials. Accomplishments included the following: organization of a ''Nonlinear Materials'' seminar series and international conferences including ''Fracture, Friction and Deformation,'' ''Nonequilibrium Phase Transitions,'' and ''Landscape Paradigms in Physics and Biology''; invited talks at international conference on ''Synthetic Metals,'' ''Quantum Phase Transitions,'' ''1996 CECAM Euroconference,'' and the 1995 Fall Meeting of the Materials Research Society; large-scale simulations and microscopic modeling of nonlinear coherent energy storage at crack tips and sliding interfaces; large-scale simulation and microscopic elasticity theory for precursor microstructure and dynamics at solid-solid diffusionless phase transformations; large-scale simulation of self-assembling organic thin films on inorganic substrates; analysis and simulation of smoothing of rough atomic surfaces; and modeling and analysis of flux pattern formation in equilibrium and nonequilibrium Josephson junction arrays and layered superconductors
A Model of Parallel Kinematics for Machine Calibration

DEFF Research Database (Denmark)

Pedersen, David Bue; Bæk Nielsen, Morten; Kløve Christensen, Simon

2016-01-01

Parallel kinematics have been adopted by more than 25 manufacturers of high-end desktop 3D printers [Wohlers Report (2015), p.118] as well as by research projects such as the WASP project [WASP (2015)], a 12 meter tall linear delta robot for Additive Manufacture of large-scale components for cons......Parallel kinematics have been adopted by more than 25 manufacturers of high-end desktop 3D printers [Wohlers Report (2015), p.118] as well as by research projects such as the WASP project [WASP (2015)], a 12 meter tall linear delta robot for Additive Manufacture of large-scale components...
Large Scale Simulations of the Euler Equations on GPU Clusters

KAUST Repository

Liebmann, Manfred

2010-08-01

The paper investigates the scalability of a parallel Euler solver, using the Vijayasundaram method, on a GPU cluster with 32 Nvidia Geforce GTX 295 boards. The aim of this research is to enable large scale fluid dynamics simulations with up to one billion elements. We investigate communication protocols for the GPU cluster to compensate for the slow Gigabit Ethernet network between the GPU compute nodes and to maintain overall efficiency. A diesel engine intake-port and a nozzle, meshed in different resolutions, give good real world examples for the scalability tests on the GPU cluster. © 2010 IEEE.
Investigation of the large scale regional hydrogeological situation at Ceberg

International Nuclear Information System (INIS)

Boghammar, A.; Grundfelt, B.; Hartley, L.

1997-11-01

The present study forms part of the large-scale groundwater flow studies within the SR 97 project. The site of interest is Ceberg. Within the present study two different regional scale groundwater models have been constructed, one large regional model with an areal extent of about 300 km 2 and one semi-regional model with an areal extent of about 50 km 2 . Different types of boundary conditions have been applied to the models. Topography driven pressures, constant infiltration rates, non-linear infiltration combined specified pressure boundary conditions, and transfer of groundwater pressures from the larger model to the semi-regional model. The present model has shown that: -Groundwater flow paths are mainly local. Large-scale groundwater flow paths are only seen below the depth of the hypothetical repository (below 500 meters) and are very slow. -Locations of recharge and discharge, to and from the site area are in the close vicinity of the site. -The low contrast between major structures and the rock mass means that the factor having the major effect on the flowpaths is the topography. -A sufficiently large model, to incorporate the recharge and discharge areas to the local site is in the order of kilometres. -A uniform infiltration rate boundary condition does not give a good representation of the groundwater movements in the model. -A local site model may be located to cover the site area and a few kilometers of the surrounding region. In order to incorporate all recharge and discharge areas within the site model, the model will be somewhat larger than site scale models at other sites. This is caused by the fact that the discharge areas are divided into three distinct areas to the east, south and west of the site. -Boundary conditions may be supplied to the site model by means of transferring groundwater pressures obtained with the semi-regional model
[Parallel virtual reality visualization of extreme large medical datasets].

Science.gov (United States)

Tang, Min

2010-04-01

On the basis of a brief description of grid computing, the essence and critical techniques of parallel visualization of extreme large medical datasets are discussed in connection with Intranet and common-configuration computers of hospitals. In this paper are introduced several kernel techniques, including the hardware structure, software framework, load balance and virtual reality visualization. The Maximum Intensity Projection algorithm is realized in parallel using common PC cluster. In virtual reality world, three-dimensional models can be rotated, zoomed, translated and cut interactively and conveniently through the control panel built on virtual reality modeling language (VRML). Experimental results demonstrate that this method provides promising and real-time results for playing the role in of a good assistant in making clinical diagnosis.
A cloud-based framework for large-scale traditional Chinese medical record retrieval.

Science.gov (United States)

Liu, Lijun; Liu, Li; Fu, Xiaodong; Huang, Qingsong; Zhang, Xianwen; Zhang, Yin

2018-01-01

Electronic medical records are increasingly common in medical practice. The secondary use of medical records has become increasingly important. It relies on the ability to retrieve the complete information about desired patient populations. How to effectively and accurately retrieve relevant medical records from large- scale medical big data is becoming a big challenge. Therefore, we propose an efficient and robust framework based on cloud for large-scale Traditional Chinese Medical Records (TCMRs) retrieval. We propose a parallel index building method and build a distributed search cluster, the former is used to improve the performance of index building, and the latter is used to provide high concurrent online TCMRs retrieval. Then, a real-time multi-indexing model is proposed to ensure the latest relevant TCMRs are indexed and retrieved in real-time, and a semantics-based query expansion method and a multi- factor ranking model are proposed to improve retrieval quality. Third, we implement a template-based visualization method for displaying medical reports. The proposed parallel indexing method and distributed search cluster can improve the performance of index building and provide high concurrent online TCMRs retrieval. The multi-indexing model can ensure the latest relevant TCMRs are indexed and retrieved in real-time. The semantics expansion method and the multi-factor ranking model can enhance retrieval quality. The template-based visualization method can enhance the availability and universality, where the medical reports are displayed via friendly web interface. In conclusion, compared with the current medical record retrieval systems, our system provides some advantages that are useful in improving the secondary use of large-scale traditional Chinese medical records in cloud environment. The proposed system is more easily integrated with existing clinical systems and be used in various scenarios. Copyright © 2017. Published by Elsevier Inc.
Large-scale grid management

International Nuclear Information System (INIS)

Langdal, Bjoern Inge; Eggen, Arnt Ove

2003-01-01

The network companies in the Norwegian electricity industry now have to establish a large-scale network management, a concept essentially characterized by (1) broader focus (Broad Band, Multi Utility,...) and (2) bigger units with large networks and more customers. Research done by SINTEF Energy Research shows so far that the approaches within large-scale network management may be structured according to three main challenges: centralization, decentralization and out sourcing. The article is part of a planned series
Parallel computing works

Energy Technology Data Exchange (ETDEWEB)

1991-10-23

An account of the Caltech Concurrent Computation Program (C{sup 3}P), a five year project that focused on answering the question: Can parallel computers be used to do large-scale scientific computations '' As the title indicates, the question is answered in the affirmative, by implementing numerous scientific applications on real parallel computers and doing computations that produced new scientific results. In the process of doing so, C{sup 3}P helped design and build several new computers, designed and implemented basic system software, developed algorithms for frequently used mathematical computations on massively parallel machines, devised performance models and measured the performance of many computers, and created a high performance computing facility based exclusively on parallel computers. While the initial focus of C{sup 3}P was the hypercube architecture developed by C. Seitz, many of the methods developed and lessons learned have been applied successfully on other massively parallel architectures.
Scaling laws for soliton pulse compression by cascaded quadratic nonlinearities (vol 24, pg 2752, 2007)

DEFF Research Database (Denmark)

Bache, Morten; Moses, J.; Wise, F.W.

2010-01-01

Erratum for [M. Bache, J. Moses, and F. W. Wise, "Scaling laws for soliton pulse compression by cascaded quadratic nonlinearities," J. Opt. Soc. Am. B 24, 2752-2762 (2007)].......Erratum for [M. Bache, J. Moses, and F. W. Wise, "Scaling laws for soliton pulse compression by cascaded quadratic nonlinearities," J. Opt. Soc. Am. B 24, 2752-2762 (2007)]....
Research of the effectiveness of parallel multithreaded realizations of interpolation methods for scaling raster images

Science.gov (United States)

Vnukov, A. A.; Shershnev, M. B.

2018-01-01

The aim of this work is the software implementation of three image scaling algorithms using parallel computations, as well as the development of an application with a graphical user interface for the Windows operating system to demonstrate the operation of algorithms and to study the relationship between system performance, algorithm execution time and the degree of parallelization of computations. Three methods of interpolation were studied, formalized and adapted to scale images. The result of the work is a program for scaling images by different methods. Comparison of the quality of scaling by different methods is given.
Highly parallel machines and future of scientific computing

International Nuclear Information System (INIS)

Singh, G.S.

1992-01-01

Computing requirement of large scale scientific computing has always been ahead of what state of the art hardware could supply in the form of supercomputers of the day. And for any single processor system the limit to increase in the computing power was realized a few years back itself. Now with the advent of parallel computing systems the availability of machines with the required computing power seems a reality. In this paper the author tries to visualize the future large scale scientific computing in the penultimate decade of the present century. The author summarized trends in parallel computers and emphasize the need for a better programming environment and software tools for optimal performance. The author concludes this paper with critique on parallel architectures, software tools and algorithms. (author). 10 refs., 2 tabs
Modified multiple time scale method for solving strongly nonlinear damped forced vibration systems

Science.gov (United States)

Razzak, M. A.; Alam, M. Z.; Sharif, M. N.

2018-03-01

In this paper, modified multiple time scale (MTS) method is employed to solve strongly nonlinear forced vibration systems. The first-order approximation is only considered in order to avoid complexicity. The formulations and the determination of the solution procedure are very easy and straightforward. The classical multiple time scale (MS) and multiple scales Lindstedt-Poincare method (MSLP) do not give desire result for the strongly damped forced vibration systems with strong damping effects. The main aim of this paper is to remove these limitations. Two examples are considered to illustrate the effectiveness and convenience of the present procedure. The approximate external frequencies and the corresponding approximate solutions are determined by the present method. The results give good coincidence with corresponding numerical solution (considered to be exact) and also provide better result than other existing results. For weak nonlinearities with weak damping effect, the absolute relative error measures (first-order approximate external frequency) in this paper is only 0.07% when amplitude A = 1.5 , while the relative error gives MSLP method is surprisingly 28.81%. Furthermore, for strong nonlinearities with strong damping effect, the absolute relative error found in this article is only 0.02%, whereas the relative error obtained by MSLP method is 24.18%. Therefore, the present method is not only valid for weakly nonlinear damped forced systems, but also gives better result for strongly nonlinear systems with both small and strong damping effect.
Linear and Nonlinear Optical Properties of Micrometer-Scale Gold Nanoplates

International Nuclear Information System (INIS)

Liu Xiao-Lan; Peng Xiao-Niu; Yang Zhong-Jian; Li Min; Zhou Li

2011-01-01

Micrometer-scale gold nanoplates have been synthesized in high yield through a polyol process. The morphology, crystal structure and linear optical extinction of the gold nanoplates have been characterized. These gold nanoplates are single-crystalline with triangular, truncated triangular and hexagonal shapes, exhibiting strong surface plasmon resonance (SPR) extinction in the visible and near-infrared (NIR) region. The linear optical properties of gold nanoplates are also investigated by theoretical calculations. We further investigate the nonlinear optical properties of the gold nanoplates in solution by Z-scan technique. The nonlinear absorption (NLA) coefficient and nonlinear refraction (NLR) index are measured to be 1.18×10 2 cm/GW and −1.04×10 −3 cm 2 /GW, respectively. (condensed matter: electronic structure, electrical, magnetic, and optical properties)
Large Scale Community Detection Using a Small World Model

Directory of Open Access Journals (Sweden)

Ranjan Kumar Behera

2017-11-01

Full Text Available In a social network, small or large communities within the network play a major role in deciding the functionalities of the network. Despite of diverse definitions, communities in the network may be defined as the group of nodes that are more densely connected as compared to nodes outside the group. Revealing such hidden communities is one of the challenging research problems. A real world social network follows small world phenomena, which indicates that any two social entities can be reachable in a small number of steps. In this paper, nodes are mapped into communities based on the random walk in the network. However, uncovering communities in large-scale networks is a challenging task due to its unprecedented growth in the size of social networks. A good number of community detection algorithms based on random walk exist in literature. In addition, when large-scale social networks are being considered, these algorithms are observed to take considerably longer time. In this work, with an objective to improve the efficiency of algorithms, parallel programming framework like Map-Reduce has been considered for uncovering the hidden communities in social network. The proposed approach has been compared with some standard existing community detection algorithms for both synthetic and real-world datasets in order to examine its performance, and it is observed that the proposed algorithm is more efficient than the existing ones.
Topological equivalence of nonlinear autonomous dynamical systems

International Nuclear Information System (INIS)

Nguyen Huynh Phan; Tran Van Nhung

1995-12-01

We show in this paper that the autonomous nonlinear dynamical system Σ(A,B,F): x' = Ax+Bu+F(x) is topologically equivalent to the linear dynamical system Σ(A,B,O): x' = Ax+Bu if the projection of A on the complement in R n of the controllable vectorial subspace is hyperbolic and if lipschitz constant of F is sufficiently small ( * ) and F(x) = 0 when parallel x parallel is sufficiently large ( ** ). In particular, if Σ(A,B,O) is controllable, it is topologically equivalent to Σ(A,B,F) when it is only that F satisfy ( ** ). (author). 18 refs
Planning under uncertainty solving large-scale stochastic linear programs

Energy Technology Data Exchange (ETDEWEB)

Infanger, G. [Stanford Univ., CA (United States). Dept. of Operations Research]|[Technische Univ., Vienna (Austria). Inst. fuer Energiewirtschaft

1992-12-01

For many practical problems, solutions obtained from deterministic models are unsatisfactory because they fail to hedge against certain contingencies that may occur in the future. Stochastic models address this shortcoming, but up to recently seemed to be intractable due to their size. Recent advances both in solution algorithms and in computer technology now allow us to solve important and general classes of practical stochastic problems. We show how large-scale stochastic linear programs can be efficiently solved by combining classical decomposition and Monte Carlo (importance) sampling techniques. We discuss the methodology for solving two-stage stochastic linear programs with recourse, present numerical results of large problems with numerous stochastic parameters, show how to efficiently implement the methodology on a parallel multi-computer and derive the theory for solving a general class of multi-stage problems with dependency of the stochastic parameters within a stage and between different stages.
Ethics of large-scale change

OpenAIRE

Arler, Finn

2006-01-01

The subject of this paper is long-term large-scale changes in human society. Some very significant examples of large-scale change are presented: human population growth, human appropriation of land and primary production, the human use of fossil fuels, and climate change. The question is posed, which kind of attitude is appropriate when dealing with large-scale changes like these from an ethical point of view. Three kinds of approaches are discussed: Aldo Leopold's mountain thinking, th...

Large-Scale Multiantenna Multisine Wireless Power Transfer

Science.gov (United States)

Huang, Yang; Clerckx, Bruno

2017-11-01

Wireless Power Transfer (WPT) is expected to be a technology reshaping the landscape of low-power applications such as the Internet of Things, Radio Frequency identification (RFID) networks, etc. Although there has been some progress towards multi-antenna multi-sine WPT design, the large-scale design of WPT, reminiscent of massive MIMO in communications, remains an open challenge. In this paper, we derive efficient multiuser algorithms based on a generalizable optimization framework, in order to design transmit sinewaves that maximize the weighted-sum/minimum rectenna output DC voltage. The study highlights the significant effect of the nonlinearity introduced by the rectification process on the design of waveforms in multiuser systems. Interestingly, in the single-user case, the optimal spatial domain beamforming, obtained prior to the frequency domain power allocation optimization, turns out to be Maximum Ratio Transmission (MRT). In contrast, in the general weighted sum criterion maximization problem, the spatial domain beamforming optimization and the frequency domain power allocation optimization are coupled. Assuming channel hardening, low-complexity algorithms are proposed based on asymptotic analysis, to maximize the two criteria. The structure of the asymptotically optimal spatial domain precoder can be found prior to the optimization. The performance of the proposed algorithms is evaluated. Numerical results confirm the inefficiency of the linear model-based design for the single and multi-user scenarios. It is also shown that as nonlinear model-based designs, the proposed algorithms can benefit from an increasing number of sinewaves.
Massively parallel Monte Carlo. Experiences running nuclear simulations on a large condor cluster

International Nuclear Information System (INIS)

Tickner, James; O'Dwyer, Joel; Roach, Greg; Uher, Josef; Hitchen, Greg

2010-01-01

The trivially-parallel nature of Monte Carlo (MC) simulations make them ideally suited for running on a distributed, heterogeneous computing environment. We report on the setup and operation of a large, cycle-harvesting Condor computer cluster, used to run MC simulations of nuclear instruments ('jobs') on approximately 4,500 desktop PCs. Successful operation must balance the competing goals of maximizing the availability of machines for running jobs whilst minimizing the impact on users' PC performance. This requires classification of jobs according to anticipated run-time and priority and careful optimization of the parameters used to control job allocation to host machines. To maximize use of a large Condor cluster, we have created a powerful suite of tools to handle job submission and analysis, as the manual creation, submission and evaluation of large numbers (hundred to thousands) of jobs would be too arduous. We describe some of the key aspects of this suite, which has been interfaced to the well-known MCNP and EGSnrc nuclear codes and our in-house PHOTON optical MC code. We report on our practical experiences of operating our Condor cluster and present examples of several large-scale instrument design problems that have been solved using this tool. (author)
Towards a Database System for Large-scale Analytics on Strings

KAUST Repository

Sahli, Majed A.

2015-07-23

Recent technological advances are causing an explosion in the production of sequential data. Biological sequences, web logs and time series are represented as strings. Currently, strings are stored, managed and queried in an ad-hoc fashion because they lack a standardized data model and query language. String queries are computationally demanding, especially when strings are long and numerous. Existing approaches cannot handle the growing number of strings produced by environmental, healthcare, bioinformatic, and space applications. There is a trade- off between performing analytics efficiently and scaling to thousands of cores to finish in reasonable times. In this thesis, we introduce a data model that unifies the input and output representations of core string operations. We define a declarative query language for strings where operators can be pipelined to form complex queries. A rich set of core string operators is described to support string analytics. We then demonstrate a database system for string analytics based on our model and query language. In particular, we propose the use of a novel data structure augmented by efficient parallel computation to strike a balance between preprocessing overheads and query execution times. Next, we delve into repeated motifs extraction as a core string operation for large-scale string analytics. Motifs are frequent patterns used, for example, to identify biological functionality, periodic trends, or malicious activities. Statistical approaches are fast but inexact while combinatorial methods are sound but slow. We introduce ACME, a combinatorial repeated motifs extractor. We study the spatial and temporal locality of motif extraction and devise a cache-aware search space traversal technique. ACME is the only method that scales to gigabyte- long strings, handles large alphabets, and supports interesting motif types with minimal overhead. While ACME is cache-efficient, it is limited by being serial. We devise a lightweight
Nonlinear behavior of stimulated scatter in large underdense plasmas

International Nuclear Information System (INIS)

Kruer, W.L.; Estabrook, K.G.

1979-01-01

Several nonlinear effects which limit Brillouin and Raman scatter of intense light in large underdense plasmas are examined. After briefly considering ion trapping and harmonic generation, we focus on the self-consistent ion heating which occurs as an integral part of the Brillouin scattering process. In the long-term nonlinear state, the ion wave amplitude is determined by damping on the heated ion tail which self-consistently forms. A simple model of the scatter is presented and compared with particle simulations. A similar model is also applied to Raman scatter and compared with simulations. Our calculations emphasize that modest tails on the electron distribution function can significantly limit instabilities involving electron plasma waves
The linearly scaling 3D fragment method for large scale electronic structure calculations

Energy Technology Data Exchange (ETDEWEB)

Zhao Zhengji [National Energy Research Scientific Computing Center (NERSC) (United States); Meza, Juan; Shan Hongzhang; Strohmaier, Erich; Bailey, David; Wang Linwang [Computational Research Division, Lawrence Berkeley National Laboratory (United States); Lee, Byounghak, E-mail: ZZhao@lbl.go [Physics Department, Texas State University (United States)

2009-07-01

The linearly scaling three-dimensional fragment (LS3DF) method is an O(N) ab initio electronic structure method for large-scale nano material simulations. It is a divide-and-conquer approach with a novel patching scheme that effectively cancels out the artificial boundary effects, which exist in all divide-and-conquer schemes. This method has made ab initio simulations of thousand-atom nanosystems feasible in a couple of hours, while retaining essentially the same accuracy as the direct calculation methods. The LS3DF method won the 2008 ACM Gordon Bell Prize for algorithm innovation. Our code has reached 442 Tflop/s running on 147,456 processors on the Cray XT5 (Jaguar) at OLCF, and has been run on 163,840 processors on the Blue Gene/P (Intrepid) at ALCF, and has been applied to a system containing 36,000 atoms. In this paper, we will present the recent parallel performance results of this code, and will apply the method to asymmetric CdSe/CdS core/shell nanorods, which have potential applications in electronic devices and solar cells.
An Efficient Parallel Multi-Scale Segmentation Method for Remote Sensing Imagery

Directory of Open Access Journals (Sweden)

Haiyan Gu

2018-04-01

Full Text Available Remote sensing (RS image segmentation is an essential step in geographic object-based image analysis (GEOBIA to ultimately derive “meaningful objects”. While many segmentation methods exist, most of them are not efficient for large data sets. Thus, the goal of this research is to develop an efficient parallel multi-scale segmentation method for RS imagery by combining graph theory and the fractal net evolution approach (FNEA. Specifically, a minimum spanning tree (MST algorithm in graph theory is proposed to be combined with a minimum heterogeneity rule (MHR algorithm that is used in FNEA. The MST algorithm is used for the initial segmentation while the MHR algorithm is used for object merging. An efficient implementation of the segmentation strategy is presented using data partition and the “reverse searching-forward processing” chain based on message passing interface (MPI parallel technology. Segmentation results of the proposed method using images from multiple sensors (airborne, SPECIM AISA EAGLE II, WorldView-2, RADARSAT-2 and different selected landscapes (residential/industrial, residential/agriculture covering four test sites indicated its efficiency in accuracy and speed. We conclude that the proposed method is applicable and efficient for the segmentation of a variety of RS imagery (airborne optical, satellite optical, SAR, high-spectral, while the accuracy is comparable with that of the FNEA method.
Large-scale visualization system for grid environment

International Nuclear Information System (INIS)

Suzuki, Yoshio

2007-01-01

Center for Computational Science and E-systems of Japan Atomic Energy Agency (CCSE/JAEA) has been conducting R and Ds of distributed computing (grid computing) environments: Seamless Thinking Aid (STA), Information Technology Based Laboratory (ITBL) and Atomic Energy Grid InfraStructure (AEGIS). In these R and Ds, we have developed the visualization technology suitable for the distributed computing environment. As one of the visualization tools, we have developed the Parallel Support Toolkit (PST) which can execute the visualization process parallely on a computer. Now, we improve PST to be executable simultaneously on multiple heterogeneous computers using Seamless Thinking Aid Message Passing Interface (STAMPI). STAMPI, we have developed in these R and Ds, is the MPI library executable on a heterogeneous computing environment. The improvement realizes the visualization of extremely large-scale data and enables more efficient visualization processes in a distributed computing environment. (author)
A Parallel Distributed-Memory Particle Method Enables Acquisition-Rate Segmentation of Large Fluorescence Microscopy Images.

Science.gov (United States)

Afshar, Yaser; Sbalzarini, Ivo F

2016-01-01

Modern fluorescence microscopy modalities, such as light-sheet microscopy, are capable of acquiring large three-dimensional images at high data rate. This creates a bottleneck in computational processing and analysis of the acquired images, as the rate of acquisition outpaces the speed of processing. Moreover, images can be so large that they do not fit the main memory of a single computer. We address both issues by developing a distributed parallel algorithm for segmentation of large fluorescence microscopy images. The method is based on the versatile Discrete Region Competition algorithm, which has previously proven useful in microscopy image segmentation. The present distributed implementation decomposes the input image into smaller sub-images that are distributed across multiple computers. Using network communication, the computers orchestrate the collectively solving of the global segmentation problem. This not only enables segmentation of large images (we test images of up to 10(10) pixels), but also accelerates segmentation to match the time scale of image acquisition. Such acquisition-rate image segmentation is a prerequisite for the smart microscopes of the future and enables online data compression and interactive experiments.
Nonlinear wave forces on large ocean structures

Science.gov (United States)

Huang, Erick T.

1993-04-01

This study explores the significance of second-order wave excitations on a large pontoon and tests the feasibility of reducing a nonlinear free surface problem by perturbation expansions. A simulation model has been developed based on the perturbation expansion technique to estimate the wave forces. The model uses a versatile finite element procedure for the solution of the reduced linear boundary value problems. This procedure achieves a fair compromise between computation costs and physical details by using a combination of 2D and 3D elements. A simple hydraulic model test was conducted to observe the wave forces imposed on a rectangle box by Cnoidal waves in shallow water. The test measurements are consistent with the numerical predictions by the simulation model. This result shows favorable support to the perturbation approach for estimating the nonlinear wave forces on shallow draft vessels. However, more sophisticated model tests are required for a full justification. Both theoretical and experimental results show profound second-order forces that could substantially impact the design of ocean facilities.
A parallel nearly implicit time-stepping scheme

OpenAIRE

Botchev, Mike A.; van der Vorst, Henk A.

2001-01-01

Across-the-space parallelism still remains the most mature, convenient and natural way to parallelize large scale problems. One of the major problems here is that implicit time stepping is often difficult to parallelize due to the structure of the system. Approximate implicit schemes have been suggested to circumvent the problem. These schemes have attractive stability properties and they are also very well parallelizable. The purpose of this article is to give an overall assessment of the pa...
Overview of the Force Scientific Parallel Language

Directory of Open Access Journals (Sweden)

Gita Alaghband

1994-01-01

Full Text Available The Force parallel programming language designed for large-scale shared-memory multiprocessors is presented. The language provides a number of parallel constructs as extensions to the ordinary Fortran language and is implemented as a two-level macro preprocessor to support portability across shared memory multiprocessors. The global parallelism model on which the Force is based provides a powerful parallel language. The parallel constructs, generic synchronization, and freedom from process management supported by the Force has resulted in structured parallel programs that are ported to the many multiprocessors on which the Force is implemented. Two new parallel constructs for looping and functional decomposition are discussed. Several programming examples to illustrate some parallel programming approaches using the Force are also presented.
Large-Scale Cubic-Scaling Random Phase Approximation Correlation Energy Calculations Using a Gaussian Basis.

Science.gov (United States)

Wilhelm, Jan; Seewald, Patrick; Del Ben, Mauro; Hutter, Jürg

2016-12-13

We present an algorithm for computing the correlation energy in the random phase approximation (RPA) in a Gaussian basis requiring [Formula: see text] operations and [Formula: see text] memory. The method is based on the resolution of the identity (RI) with the overlap metric, a reformulation of RI-RPA in the Gaussian basis, imaginary time, and imaginary frequency integration techniques, and the use of sparse linear algebra. Additional memory reduction without extra computations can be achieved by an iterative scheme that overcomes the memory bottleneck of canonical RPA implementations. We report a massively parallel implementation that is the key for the application to large systems. Finally, cubic-scaling RPA is applied to a thousand water molecules using a correlation-consistent triple-ζ quality basis.
3D large-scale calculations using the method of characteristics

International Nuclear Information System (INIS)

Dahmani, M.; Roy, R.; Koclas, J.

2004-01-01

An overview of the computational requirements and the numerical developments made in order to be able to solve 3D large-scale problems using the characteristics method will be presented. To accelerate the MCI solver, efficient acceleration techniques were implemented and parallelization was performed. However, for the very large problems, the size of the tracking file used to store the tracks can still become prohibitive and exceed the capacity of the machine. The new 3D characteristics solver MCG will now be introduced. This methodology is dedicated to solve very large 3D problems (a part or a whole core) without spatial homogenization. In order to eliminate the input/output problems occurring when solving these large problems, we define a new computing scheme that requires more CPU resources than the usual one, based on sweeps over large tracking files. The huge capacity of storage needed in some problems and the related I/O queries needed by the characteristics solver are replaced by on-the-fly recalculation of tracks at each iteration step. Using this technique, large 3D problems are no longer I/O-bound, and distributed CPU resources can be efficiently used. (author)
A Proactive Complex Event Processing Method for Large-Scale Transportation Internet of Things

OpenAIRE

Wang, Yongheng; Cao, Kening

2014-01-01

The Internet of Things (IoT) provides a new way to improve the transportation system. The key issue is how to process the numerous events generated by IoT. In this paper, a proactive complex event processing method is proposed for large-scale transportation IoT. Based on a multilayered adaptive dynamic Bayesian model, a Bayesian network structure learning algorithm using search-and-score is proposed to support accurate predictive analytics. A parallel Markov decision processes model is design...
State-of-the-Art in GPU-Based Large-Scale Volume Visualization

KAUST Repository

Beyer, Johanna

2015-05-01

This survey gives an overview of the current state of the art in GPU techniques for interactive large-scale volume visualization. Modern techniques in this field have brought about a sea change in how interactive visualization and analysis of giga-, tera- and petabytes of volume data can be enabled on GPUs. In addition to combining the parallel processing power of GPUs with out-of-core methods and data streaming, a major enabler for interactivity is making both the computational and the visualization effort proportional to the amount and resolution of data that is actually visible on screen, i.e. \\'output-sensitive\\' algorithms and system designs. This leads to recent output-sensitive approaches that are \\'ray-guided\\', \\'visualization-driven\\' or \\'display-aware\\'. In this survey, we focus on these characteristics and propose a new categorization of GPU-based large-scale volume visualization techniques based on the notions of actual output-resolution visibility and the current working set of volume bricks-the current subset of data that is minimally required to produce an output image of the desired display resolution. Furthermore, we discuss the differences and similarities of different rendering and data traversal strategies in volume rendering by putting them into a common context-the notion of address translation. For our purposes here, we view parallel (distributed) visualization using clusters as an orthogonal set of techniques that we do not discuss in detail but that can be used in conjunction with what we present in this survey. © 2015 The Eurographics Association and John Wiley & Sons Ltd.
State-of-the-Art in GPU-Based Large-Scale Volume Visualization

KAUST Repository

Beyer, Johanna; Hadwiger, Markus; Pfister, Hanspeter

2015-01-01

This survey gives an overview of the current state of the art in GPU techniques for interactive large-scale volume visualization. Modern techniques in this field have brought about a sea change in how interactive visualization and analysis of giga-, tera- and petabytes of volume data can be enabled on GPUs. In addition to combining the parallel processing power of GPUs with out-of-core methods and data streaming, a major enabler for interactivity is making both the computational and the visualization effort proportional to the amount and resolution of data that is actually visible on screen, i.e. 'output-sensitive' algorithms and system designs. This leads to recent output-sensitive approaches that are 'ray-guided', 'visualization-driven' or 'display-aware'. In this survey, we focus on these characteristics and propose a new categorization of GPU-based large-scale volume visualization techniques based on the notions of actual output-resolution visibility and the current working set of volume bricks-the current subset of data that is minimally required to produce an output image of the desired display resolution. Furthermore, we discuss the differences and similarities of different rendering and data traversal strategies in volume rendering by putting them into a common context-the notion of address translation. For our purposes here, we view parallel (distributed) visualization using clusters as an orthogonal set of techniques that we do not discuss in detail but that can be used in conjunction with what we present in this survey. © 2015 The Eurographics Association and John Wiley & Sons Ltd.
EFT of large scale structures in redshift space

Science.gov (United States)

Lewandowski, Matthew; Senatore, Leonardo; Prada, Francisco; Zhao, Cheng; Chuang, Chia-Hsun

2018-03-01

We further develop the description of redshift-space distortions within the effective field theory of large scale structures. First, we generalize the counterterms to include the effect of baryonic physics and primordial non-Gaussianity. Second, we evaluate the IR resummation of the dark matter power spectrum in redshift space. This requires us to identify a controlled approximation that makes the numerical evaluation straightforward and efficient. Third, we compare the predictions of the theory at one loop with the power spectrum from numerical simulations up to ℓ=6 . We find that the IR resummation allows us to correctly reproduce the baryon acoustic oscillation peak. The k reach—or, equivalently, the precision for a given k —depends on additional counterterms that need to be matched to simulations. Since the nonlinear scale for the velocity is expected to be longer than the one for the overdensity, we consider a minimal and a nonminimal set of counterterms. The quality of our numerical data makes it hard to firmly establish the performance of the theory at high wave numbers. Within this limitation, we find that the theory at redshift z =0.56 and up to ℓ=2 matches the data at the percent level approximately up to k ˜0.13 h Mpc-1 or k ˜0.18 h Mpc-1 , depending on the number of counterterms used, with a potentially large improvement over former analytical techniques.
Stability of Large Parallel Tunnels Excavated in Weak Rocks: A Case Study

Science.gov (United States)

Ding, Xiuli; Weng, Yonghong; Zhang, Yuting; Xu, Tangjin; Wang, Tuanle; Rao, Zhiwen; Qi, Zufang

2017-09-01

Diversion tunnels are important structures for hydropower projects but are always placed in locations with less favorable geological conditions than those in which other structures are placed. Because diversion tunnels are usually large and closely spaced, the rock pillar between adjacent tunnels in weak rocks is affected on both sides, and conventional support measures may not be adequate to achieve the required stability. Thus, appropriate reinforcement support measures are needed, and the design philosophy regarding large parallel tunnels in weak rocks should be updated. This paper reports a recent case in which two large parallel diversion tunnels are excavated. The rock masses are thin- to ultra-thin-layered strata coated with phyllitic films, which significantly decrease the soundness and strength of the strata and weaken the rocks. The behaviors of the surrounding rock masses under original (and conventional) support measures are detailed in terms of rock mass deformation, anchor bolt stress, and the extent of the excavation disturbed zone (EDZ), as obtained from safety monitoring and field testing. In situ observed phenomena and their interpretation are also included. The sidewall deformations exhibit significant time-dependent characteristics, and large magnitudes are recorded. The stresses in the anchor bolts are small, but the extents of the EDZs are large. The stability condition under the original support measures is evaluated as poor. To enhance rock mass stability, attempts are made to reinforce support design and improve safety monitoring programs. The main feature of these attempts is the use of prestressed cables that run through the rock pillar between the parallel tunnels. The efficacy of reinforcement support measures is verified by further safety monitoring data and field test results. Numerical analysis is constantly performed during the construction process to provide a useful reference for decision making. The calculated deformations are in
The Parallel System for Integrating Impact Models and Sectors (pSIMS)

Science.gov (United States)

Elliott, Joshua; Kelly, David; Chryssanthacopoulos, James; Glotter, Michael; Jhunjhnuwala, Kanika; Best, Neil; Wilde, Michael; Foster, Ian

2014-01-01

We present a framework for massively parallel climate impact simulations: the parallel System for Integrating Impact Models and Sectors (pSIMS). This framework comprises a) tools for ingesting and converting large amounts of data to a versatile datatype based on a common geospatial grid; b) tools for translating this datatype into custom formats for site-based models; c) a scalable parallel framework for performing large ensemble simulations, using any one of a number of different impacts models, on clusters, supercomputers, distributed grids, or clouds; d) tools and data standards for reformatting outputs to common datatypes for analysis and visualization; and e) methodologies for aggregating these datatypes to arbitrary spatial scales such as administrative and environmental demarcations. By automating many time-consuming and error-prone aspects of large-scale climate impacts studies, pSIMS accelerates computational research, encourages model intercomparison, and enhances reproducibility of simulation results. We present the pSIMS design and use example assessments to demonstrate its multi-model, multi-scale, and multi-sector versatility.
Political consultation and large-scale research

International Nuclear Information System (INIS)

Bechmann, G.; Folkers, H.

1977-01-01

Large-scale research and policy consulting have an intermediary position between sociological sub-systems. While large-scale research coordinates science, policy, and production, policy consulting coordinates science, policy and political spheres. In this very position, large-scale research and policy consulting lack of institutional guarantees and rational back-ground guarantee which are characteristic for their sociological environment. This large-scale research can neither deal with the production of innovative goods under consideration of rentability, nor can it hope for full recognition by the basis-oriented scientific community. Policy consulting knows neither the competence assignment of the political system to make decisions nor can it judge succesfully by the critical standards of the established social science, at least as far as the present situation is concerned. This intermediary position of large-scale research and policy consulting has, in three points, a consequence supporting the thesis which states that this is a new form of institutionalization of science: These are: 1) external control, 2) the organization form, 3) the theoretical conception of large-scale research and policy consulting. (orig.) [de

A Dynamic Optimization Strategy for the Operation of Large Scale Seawater Reverses Osmosis System

Directory of Open Access Journals (Sweden)

Aipeng Jiang

2014-01-01

Full Text Available In this work, an efficient strategy was proposed for efficient solution of the dynamic model of SWRO system. Since the dynamic model is formulated by a set of differential-algebraic equations, simultaneous strategies based on collocations on finite element were used to transform the DAOP into large scale nonlinear programming problem named Opt2. Then, simulation of RO process and storage tanks was carried element by element and step by step with fixed control variables. All the obtained values of these variables then were used as the initial value for the optimal solution of SWRO system. Finally, in order to accelerate the computing efficiency and at the same time to keep enough accuracy for the solution of Opt2, a simple but efficient finite element refinement rule was used to reduce the scale of Opt2. The proposed strategy was applied to a large scale SWRO system with 8 RO plants and 4 storage tanks as case study. Computing result shows that the proposed strategy is quite effective for optimal operation of the large scale SWRO system; the optimal problem can be successfully solved within decades of iterations and several minutes when load and other operating parameters fluctuate.
NonLinear Parallel OPtimization Tool, Phase I

Data.gov (United States)

National Aeronautics and Space Administration — CU Aerospace, in partnership with the University of Illinois propose the further development of a new sparse nonlinear programming architecture that exploits...
Large-scale multimedia modeling applications

International Nuclear Information System (INIS)

Droppo, J.G. Jr.; Buck, J.W.; Whelan, G.; Strenge, D.L.; Castleton, K.J.; Gelston, G.M.

1995-08-01

Over the past decade, the US Department of Energy (DOE) and other agencies have faced increasing scrutiny for a wide range of environmental issues related to past and current practices. A number of large-scale applications have been undertaken that required analysis of large numbers of potential environmental issues over a wide range of environmental conditions and contaminants. Several of these applications, referred to here as large-scale applications, have addressed long-term public health risks using a holistic approach for assessing impacts from potential waterborne and airborne transport pathways. Multimedia models such as the Multimedia Environmental Pollutant Assessment System (MEPAS) were designed for use in such applications. MEPAS integrates radioactive and hazardous contaminants impact computations for major exposure routes via air, surface water, ground water, and overland flow transport. A number of large-scale applications of MEPAS have been conducted to assess various endpoints for environmental and human health impacts. These applications are described in terms of lessons learned in the development of an effective approach for large-scale applications
Cosmological Parameter Estimation with Large Scale Structure Observations

CERN Document Server

Di Dio, Enea; Durrer, Ruth; Lesgourgues, Julien

2014-01-01

We estimate the sensitivity of future galaxy surveys to cosmological parameters, using the redshift dependent angular power spectra of galaxy number counts, $C_\\ell(z_1,z_2)$, calculated with all relativistic corrections at first order in perturbation theory. We pay special attention to the redshift dependence of the non-linearity scale and present Fisher matrix forecasts for Euclid-like and DES-like galaxy surveys. We compare the standard $P(k)$ analysis with the new $C_\\ell(z_1,z_2)$ method. We show that for surveys with photometric redshifts the new analysis performs significantly better than the $P(k)$ analysis. For spectroscopic redshifts, however, the large number of redshift bins which would be needed to fully profit from the redshift information, is severely limited by shot noise. We also identify surveys which can measure the lensing contribution and we study the monopole, $C_0(z_1,z_2)$.
Concept for power scaling second harmonic generation using a cascade of nonlinear crystals

DEFF Research Database (Denmark)

Hansen, Anders Kragh; Tawfieq, Mahmoud; Jensen, Ole Bjarlin

2015-01-01

for efficient power scaling of single-pass SHG beyond such limits using a cascade of nonlinear crystals, in which the first crystal is chosen for high nonlinear efficiency and the subsequent crystal(s) are chosen for power handling ability. Using this highly efficient singlepass concept, we generate 3.7 W...... successfully combines the high efficiency of the first stage with the good power handling properties of the subsequent stages. The concept is generally applicable and can be expanded with more stages to obtain even higher efficiency, and extends also to other combinations of nonlinear media suitable for other......Within the field of high-power second harmonic generation (SHG), power scaling is often hindered by adverse crystal effects such as thermal dephasing arising from the second harmonic (SH) light, which imposes limits on the power that can be generated in many crystals. Here we demonstrate a concept...
Nonlinear Vibration Signal Tracking of Large Offshore Bridge Stayed Cable Based on Particle Filter

Directory of Open Access Journals (Sweden)

Ye Qingwei

2015-12-01

Full Text Available The stayed cables are key stress components of large offshore bridge. The fault detection of stayed cable is very important for safe of large offshore bridge. A particle filter model and algorithm of nonlinear vibration signal are used in this paper. Firstly, the particle filter model of stayed cable of large offshore bridge is created. Nonlinear dynamic model of the stayed-cable and beam coupling system is dispersed in temporal dimension by using the finite difference method. The discrete nonlinear vibration equations of any cable element are worked out. Secondly, a state equation of particle filter is fitted by least square algorithm from the discrete nonlinear vibration equations. So the particle filter algorithm can use the accurate state equations. Finally, the particle filter algorithm is used to filter the vibration signal of bridge stayed cable. According to the particle filter, the de-noised vibration signal can be tracked and be predicted for a short time accurately. Many experiments are done at some actual bridges. The simulation experiments and the actual experiments on the bridge stayed cables are all indicating that the particle filter algorithm in this paper has good performance and works stably.
Generalized Nonlinear Chirp Scaling Algorithm for High-Resolution Highly Squint SAR Imaging.

Science.gov (United States)

Yi, Tianzhu; He, Zhihua; He, Feng; Dong, Zhen; Wu, Manqing

2017-11-07

This paper presents a modified approach for high-resolution, highly squint synthetic aperture radar (SAR) data processing. Several nonlinear chirp scaling (NLCS) algorithms have been proposed to solve the azimuth variance of the frequency modulation rates that are caused by the linear range walk correction (LRWC). However, the azimuth depth of focusing (ADOF) is not handled well by these algorithms. The generalized nonlinear chirp scaling (GNLCS) algorithm that is proposed in this paper uses the method of series reverse (MSR) to improve the ADOF and focusing precision. It also introduces a high order processing kernel to avoid the range block processing. Simulation results show that the GNLCS algorithm can enlarge the ADOF and focusing precision for high-resolution highly squint SAR data.
Generalized Nonlinear Chirp Scaling Algorithm for High-Resolution Highly Squint SAR Imaging

Directory of Open Access Journals (Sweden)

Tianzhu Yi

2017-11-01

Full Text Available This paper presents a modified approach for high-resolution, highly squint synthetic aperture radar (SAR data processing. Several nonlinear chirp scaling (NLCS algorithms have been proposed to solve the azimuth variance of the frequency modulation rates that are caused by the linear range walk correction (LRWC. However, the azimuth depth of focusing (ADOF is not handled well by these algorithms. The generalized nonlinear chirp scaling (GNLCS algorithm that is proposed in this paper uses the method of series reverse (MSR to improve the ADOF and focusing precision. It also introduces a high order processing kernel to avoid the range block processing. Simulation results show that the GNLCS algorithm can enlarge the ADOF and focusing precision for high-resolution highly squint SAR data.
Decentralized Large-Scale Power Balancing

DEFF Research Database (Denmark)

Halvgaard, Rasmus; Jørgensen, John Bagterp; Poulsen, Niels Kjølstad

2013-01-01

problem is formulated as a centralized large-scale optimization problem but is then decomposed into smaller subproblems that are solved locally by each unit connected to an aggregator. For large-scale systems the method is faster than solving the full problem and can be distributed to include an arbitrary...
Simulation of incompressible flows with heat and mass transfer using parallel finite element method

Directory of Open Access Journals (Sweden)

Jalal Abedi

2003-02-01

Full Text Available The stabilized finite element formulations based on the SUPG (Stream-line-Upwind/Petrov-Galerkin and PSPG (Pressure-Stabilization/Petrov-Galerkin methods are developed and applied to solve buoyancy-driven incompressible flows with heat and mass transfer. The SUPG stabilization term allows us to solve flow problems at high speeds (advection dominant flows and the PSPG term eliminates instabilities associated with the use of equal order interpolation functions for both pressure and velocity. The finite element formulations are implemented in parallel using MPI. In parallel computations, the finite element mesh is partitioned into contiguous subdomains using METIS, which are then assigned to individual processors. To ensure a balanced load, the number of elements assigned to each processor is approximately equal. To solve nonlinear systems in large-scale applications, we developed a matrix-free GMRES iterative solver. Here we totally eliminate a need to form any matrices, even at the element levels. To measure the accuracy of the method, we solve 2D and 3D example of natural convection flows at moderate to high Rayleigh numbers.
Large Scale Earth's Bow Shock with Northern IMF as Simulated by PIC Code in Parallel with MHD Model

Science.gov (United States)

Baraka, Suleiman

2016-06-01

In this paper, we propose a 3D kinetic model (particle-in-cell, PIC) for the description of the large scale Earth's bow shock. The proposed version is stable and does not require huge or extensive computer resources. Because PIC simulations work with scaled plasma and field parameters, we also propose to validate our code by comparing its results with the available MHD simulations under same scaled solar wind (SW) and (IMF) conditions. We report new results from the two models. In both codes the Earth's bow shock position is found to be ≈14.8 R E along the Sun-Earth line, and ≈29 R E on the dusk side. Those findings are consistent with past in situ observations. Both simulations reproduce the theoretical jump conditions at the shock. However, the PIC code density and temperature distributions are inflated and slightly shifted sunward when compared to the MHD results. Kinetic electron motions and reflected ions upstream may cause this sunward shift. Species distributions in the foreshock region are depicted within the transition of the shock (measured ≈2 c/ ω pi for Θ Bn = 90° and M MS = 4.7) and in the downstream. The size of the foot jump in the magnetic field at the shock is measured to be (1.7 c/ ω pi ). In the foreshocked region, the thermal velocity is found equal to 213 km s-1 at 15 R E and is equal to 63 km s -1 at 12 R E (magnetosheath region). Despite the large cell size of the current version of the PIC code, it is powerful to retain macrostructure of planets magnetospheres in very short time, thus it can be used for pedagogical test purposes. It is also likely complementary with MHD to deepen our understanding of the large scale magnetosphere.
Solving large nonlinear generalized eigenvalue problems from Density Functional Theory calculations in parallel

DEFF Research Database (Denmark)

Bendtsen, Claus; Nielsen, Ole Holm; Hansen, Lars Bruno

2001-01-01

The quantum mechanical ground state of electrons is described by Density Functional Theory, which leads to large minimization problems. An efficient minimization method uses a self-consistent field (SCF) solution of large eigenvalue problems. The iterative Davidson algorithm is often used, and we...
Automating large-scale reactor systems

International Nuclear Information System (INIS)

Kisner, R.A.

1985-01-01

This paper conveys a philosophy for developing automated large-scale control systems that behave in an integrated, intelligent, flexible manner. Methods for operating large-scale systems under varying degrees of equipment degradation are discussed, and a design approach that separates the effort into phases is suggested. 5 refs., 1 fig
Xyce Parallel Electronic Simulator - User's Guide, Version 1.0

Energy Technology Data Exchange (ETDEWEB)

HUTCHINSON, SCOTT A; KEITER, ERIC R.; HOEKSTRA, ROBERT J.; WATERS, LON J.; RUSSO, THOMAS V.; RANKIN, ERIC LAMONT; WIX, STEVEN D.

2002-11-01

This manual describes the use of the Xyce Parallel Electronic Simulator code for simulating electrical circuits at a variety of abstraction levels. The Xyce Parallel Electronic Simulator has been written to support,in a rigorous manner, the simulation needs of the Sandia National Laboratories electrical designers. As such, the development has focused on improving the capability over the current state-of-the-art in the following areas: (1) Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). Note that this includes support for most popular parallel and serial computers. (2) Improved performance for all numerical kernels (e.g., time integrator, nonlinear and linear solvers) through state-of-the-art algorithms and novel techniques. (3) A client-server or multi-tiered operating model wherein the numerical kernel can operate independently of the graphical user interface (GUI). (4) Object-oriented code design and implementation using modern coding-practices that ensure that the Xyce Parallel Electronic Simulator will be maintainable and extensible far into the future. The code is a parallel code in the most general sense of the phrase--a message passing parallel implementation--which allows it to run efficiently on the widest possible number of computing platforms. These include serial, shared-memory and distributed-memory parallel as well as heterogeneous platforms. Furthermore, careful attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved even as the number of processors grows. Another feature required by designers is the ability to add device models, many specific to the needs of Sandia, to the code. To this end, the device package in the Xyce Parallel Electronic Simulator is designed to support a variety of device model inputs. These input formats include standard analytical models, behavioral models
Robust receding horizon control for networked and distributed nonlinear systems

CERN Document Server

Li, Huiping

2017-01-01

This book offers a comprehensive, easy-to-understand overview of receding-horizon control for nonlinear networks. It presents novel general strategies that can simultaneously handle general nonlinear dynamics, system constraints, and disturbances arising in networked and large-scale systems and which can be widely applied. These receding-horizon-control-based strategies can achieve sub-optimal control performance while ensuring closed-loop stability: a feature attractive to engineers. The authors address the problems of networked and distributed control step-by-step, gradually increasing the level of challenge presented. The book first introduces the state-feedback control problems of nonlinear networked systems and then studies output feedback control problems. For large-scale nonlinear systems, disturbance is considered first, then communication delay separately, and lastly the simultaneous combination of delays and disturbances. Each chapter of this easy-to-follow book not only proposes and analyzes novel ...
Parallel analysis tools and new visualization techniques for ultra-large climate data set

Energy Technology Data Exchange (ETDEWEB)

Middleton, Don [National Center for Atmospheric Research, Boulder, CO (United States); Haley, Mary [National Center for Atmospheric Research, Boulder, CO (United States)

2014-12-10

ParVis was a project funded under LAB 10-05: “Earth System Modeling: Advanced Scientific Visualization of Ultra-Large Climate Data Sets”. Argonne was the lead lab with partners at PNNL, SNL, NCAR and UC-Davis. This report covers progress from January 1st, 2013 through Dec 1st, 2014. Two previous reports covered the period from Summer, 2010, through September 2011 and October 2011 through December 2012, respectively. While the project was originally planned to end on April 30, 2013, personnel and priority changes allowed many of the institutions to continue work through FY14 using existing funds. A primary focus of ParVis was introducing parallelism to climate model analysis to greatly reduce the time-to-visualization for ultra-large climate data sets. Work in the first two years was conducted on two tracks with different time horizons: one track to provide immediate help to climate scientists already struggling to apply their analysis to existing large data sets and another focused on building a new data-parallel library and tool for climate analysis and visualization that will give the field a platform for performing analysis and visualization on ultra-large datasets for the foreseeable future. In the final 2 years of the project, we focused mostly on the new data-parallel library and associated tools for climate analysis and visualization.
A Parallel Distributed-Memory Particle Method Enables Acquisition-Rate Segmentation of Large Fluorescence Microscopy Images

Science.gov (United States)

Afshar, Yaser; Sbalzarini, Ivo F.

2016-01-01

Modern fluorescence microscopy modalities, such as light-sheet microscopy, are capable of acquiring large three-dimensional images at high data rate. This creates a bottleneck in computational processing and analysis of the acquired images, as the rate of acquisition outpaces the speed of processing. Moreover, images can be so large that they do not fit the main memory of a single computer. We address both issues by developing a distributed parallel algorithm for segmentation of large fluorescence microscopy images. The method is based on the versatile Discrete Region Competition algorithm, which has previously proven useful in microscopy image segmentation. The present distributed implementation decomposes the input image into smaller sub-images that are distributed across multiple computers. Using network communication, the computers orchestrate the collectively solving of the global segmentation problem. This not only enables segmentation of large images (we test images of up to 1010 pixels), but also accelerates segmentation to match the time scale of image acquisition. Such acquisition-rate image segmentation is a prerequisite for the smart microscopes of the future and enables online data compression and interactive experiments. PMID:27046144
A Parallel Distributed-Memory Particle Method Enables Acquisition-Rate Segmentation of Large Fluorescence Microscopy Images.

Directory of Open Access Journals (Sweden)

Yaser Afshar

Full Text Available Modern fluorescence microscopy modalities, such as light-sheet microscopy, are capable of acquiring large three-dimensional images at high data rate. This creates a bottleneck in computational processing and analysis of the acquired images, as the rate of acquisition outpaces the speed of processing. Moreover, images can be so large that they do not fit the main memory of a single computer. We address both issues by developing a distributed parallel algorithm for segmentation of large fluorescence microscopy images. The method is based on the versatile Discrete Region Competition algorithm, which has previously proven useful in microscopy image segmentation. The present distributed implementation decomposes the input image into smaller sub-images that are distributed across multiple computers. Using network communication, the computers orchestrate the collectively solving of the global segmentation problem. This not only enables segmentation of large images (we test images of up to 10(10 pixels, but also accelerates segmentation to match the time scale of image acquisition. Such acquisition-rate image segmentation is a prerequisite for the smart microscopes of the future and enables online data compression and interactive experiments.
Synchronization Of Parallel Discrete Event Simulations

Science.gov (United States)

Steinman, Jeffrey S.

1992-01-01

Adaptive, parallel, discrete-event-simulation-synchronization algorithm, Breathing Time Buckets, developed in Synchronous Parallel Environment for Emulation and Discrete Event Simulation (SPEEDES) operating system. Algorithm allows parallel simulations to process events optimistically in fluctuating time cycles that naturally adapt while simulation in progress. Combines best of optimistic and conservative synchronization strategies while avoiding major disadvantages. Algorithm processes events optimistically in time cycles adapting while simulation in progress. Well suited for modeling communication networks, for large-scale war games, for simulated flights of aircraft, for simulations of computer equipment, for mathematical modeling, for interactive engineering simulations, and for depictions of flows of information.
On soft limits of large-scale structure correlation functions

International Nuclear Information System (INIS)

Sagunski, Laura

2016-08-01

Large-scale structure surveys have the potential to become the leading probe for precision cosmology in the next decade. To extract valuable information on the cosmological evolution of the Universe from the observational data, it is of major importance to derive accurate theoretical predictions for the statistical large-scale structure observables, such as the power spectrum and the bispectrum of (dark) matter density perturbations. Hence, one of the greatest challenges of modern cosmology is to theoretically understand the non-linear dynamics of large-scale structure formation in the Universe from first principles. While analytic approaches to describe the large-scale structure formation are usually based on the framework of non-relativistic cosmological perturbation theory, we pursue another road in this thesis and develop methods to derive generic, non-perturbative statements about large-scale structure correlation functions. We study unequal- and equal-time correlation functions of density and velocity perturbations in the limit where one of their wavenumbers becomes small, that is, in the soft limit. In the soft limit, it is possible to link (N+1)-point and N-point correlation functions to non-perturbative 'consistency conditions'. These provide in turn a powerful tool to test fundamental aspects of the underlying theory at hand. In this work, we first rederive the (resummed) consistency conditions at unequal times by using the so-called eikonal approximation. The main appeal of the unequal-time consistency conditions is that they are solely based on symmetry arguments and thus are universal. Proceeding from this, we direct our attention to consistency conditions at equal times, which, on the other hand, depend on the interplay between soft and hard modes. We explore the existence and validity of equal-time consistency conditions within and beyond perturbation theory. For this purpose, we investigate the predictions for the soft limit of the

On soft limits of large-scale structure correlation functions

Energy Technology Data Exchange (ETDEWEB)

Sagunski, Laura

2016-08-15

Large-scale structure surveys have the potential to become the leading probe for precision cosmology in the next decade. To extract valuable information on the cosmological evolution of the Universe from the observational data, it is of major importance to derive accurate theoretical predictions for the statistical large-scale structure observables, such as the power spectrum and the bispectrum of (dark) matter density perturbations. Hence, one of the greatest challenges of modern cosmology is to theoretically understand the non-linear dynamics of large-scale structure formation in the Universe from first principles. While analytic approaches to describe the large-scale structure formation are usually based on the framework of non-relativistic cosmological perturbation theory, we pursue another road in this thesis and develop methods to derive generic, non-perturbative statements about large-scale structure correlation functions. We study unequal- and equal-time correlation functions of density and velocity perturbations in the limit where one of their wavenumbers becomes small, that is, in the soft limit. In the soft limit, it is possible to link (N+1)-point and N-point correlation functions to non-perturbative 'consistency conditions'. These provide in turn a powerful tool to test fundamental aspects of the underlying theory at hand. In this work, we first rederive the (resummed) consistency conditions at unequal times by using the so-called eikonal approximation. The main appeal of the unequal-time consistency conditions is that they are solely based on symmetry arguments and thus are universal. Proceeding from this, we direct our attention to consistency conditions at equal times, which, on the other hand, depend on the interplay between soft and hard modes. We explore the existence and validity of equal-time consistency conditions within and beyond perturbation theory. For this purpose, we investigate the predictions for the soft limit of the
Massively Parallel Computing: A Sandia Perspective

Energy Technology Data Exchange (ETDEWEB)

Dosanjh, Sudip S.; Greenberg, David S.; Hendrickson, Bruce; Heroux, Michael A.; Plimpton, Steve J.; Tomkins, James L.; Womble, David E.

1999-05-06

The computing power available to scientists and engineers has increased dramatically in the past decade, due in part to progress in making massively parallel computing practical and available. The expectation for these machines has been great. The reality is that progress has been slower than expected. Nevertheless, massively parallel computing is beginning to realize its potential for enabling significant break-throughs in science and engineering. This paper provides a perspective on the state of the field, colored by the authors' experiences using large scale parallel machines at Sandia National Laboratories. We address trends in hardware, system software and algorithms, and we also offer our view of the forces shaping the parallel computing industry.
SWAP-Assembler 2: Optimization of De Novo Genome Assembler at Large Scale

Energy Technology Data Exchange (ETDEWEB)

Meng, Jintao; Seo, Sangmin; Balaji, Pavan; Wei, Yanjie; Wang, Bingqiang; Feng, Shengzhong

2016-08-16

In this paper, we analyze and optimize the most time-consuming steps of the SWAP-Assembler, a parallel genome assembler, so that it can scale to a large number of cores for huge genomes with the size of sequencing data ranging from terabyes to petabytes. According to the performance analysis results, the most time-consuming steps are input parallelization, k-mer graph construction, and graph simplification (edge merging). For the input parallelization, the input data is divided into virtual fragments with nearly equal size, and the start position and end position of each fragment are automatically separated at the beginning of the reads. In k-mer graph construction, in order to improve the communication efficiency, the message size is kept constant between any two processes by proportionally increasing the number of nucleotides to the number of processes in the input parallelization step for each round. The memory usage is also decreased because only a small part of the input data is processed in each round. With graph simplification, the communication protocol reduces the number of communication loops from four to two loops and decreases the idle communication time. The optimized assembler is denoted as SWAP-Assembler 2 (SWAP2). In our experiments using a 1000 Genomes project dataset of 4 terabytes (the largest dataset ever used for assembling) on the supercomputer Mira, the results show that SWAP2 scales to 131,072 cores with an efficiency of 40%. We also compared our work with both the HipMER assembler and the SWAP-Assembler. On the Yanhuang dataset of 300 gigabytes, SWAP2 shows a 3X speedup and 4X better scalability compared with the HipMer assembler and is 45 times faster than the SWAP-Assembler. The SWAP2 software is available at https://sourceforge.net/projects/swapassembler.
Alignment between galaxies and large-scale structure

International Nuclear Information System (INIS)

Faltenbacher, A.; Li Cheng; White, Simon D. M.; Jing, Yi-Peng; Mao Shude; Wang Jie

2009-01-01

Based on the Sloan Digital Sky Survey DR6 (SDSS) and the Millennium Simulation (MS), we investigate the alignment between galaxies and large-scale structure. For this purpose, we develop two new statistical tools, namely the alignment correlation function and the cos(2θ)-statistic. The former is a two-dimensional extension of the traditional two-point correlation function and the latter is related to the ellipticity correlation function used for cosmic shear measurements. Both are based on the cross correlation between a sample of galaxies with orientations and a reference sample which represents the large-scale structure. We apply the new statistics to the SDSS galaxy catalog. The alignment correlation function reveals an overabundance of reference galaxies along the major axes of red, luminous (L ∼ * ) galaxies out to projected separations of 60 h- 1 Mpc. The signal increases with central galaxy luminosity. No alignment signal is detected for blue galaxies. The cos(2θ)-statistic yields very similar results. Starting from a MS semi-analytic galaxy catalog, we assign an orientation to each red, luminous and central galaxy, based on that of the central region of the host halo (with size similar to that of the stellar galaxy). As an alternative, we use the orientation of the host halo itself. We find a mean projected misalignment between a halo and its central region of ∼ 25 deg. The misalignment decreases slightly with increasing luminosity of the central galaxy. Using the orientations and luminosities of the semi-analytic galaxies, we repeat our alignment analysis on mock surveys of the MS. Agreement with the SDSS results is good if the central orientations are used. Predictions using the halo orientations as proxies for central galaxy orientations overestimate the observed alignment by more than a factor of 2. Finally, the large volume of the MS allows us to generate a two-dimensional map of the alignment correlation function, which shows the reference
A novel harmonic current sharing control strategy for parallel-connected inverters

DEFF Research Database (Denmark)

Guan, Yajuan; Guerrero, Josep M.; Savaghebi, Mehdi

2017-01-01

A novel control strategy which enables proportional linear and nonlinear loads sharing among paralleled inverters and voltage harmonic suppression is proposed in this paper. The proposed method is based on the autonomous currents sharing controller (ACSC) instead of conventional power droop control...... to provide fast transient response, decoupling control and large stability margin. The current components at different sequences and orders are decomposed by a multi-second-order generalized integrator-based frequency locked loop (MSOGI-FLL). A harmonic-orthogonal-virtual-resistances controller (HOVR......) is used to proportionally share current components at different sequences and orders independently among the paralleled inverters. Proportional resonance controllers tuned at selected frequencies are used to suppress voltage harmonics. Simulations based on two 2.2 kW paralleled three-phase inverters...
Integration experiences and performance studies of A COTS parallel archive systems

Energy Technology Data Exchange (ETDEWEB)

Chen, Hsing-bung [Los Alamos National Laboratory; Scott, Cody [Los Alamos National Laboratory; Grider, Bary [Los Alamos National Laboratory; Torres, Aaron [Los Alamos National Laboratory; Turley, Milton [Los Alamos National Laboratory; Sanchez, Kathy [Los Alamos National Laboratory; Bremer, John [Los Alamos National Laboratory

2010-01-01

Current and future Archive Storage Systems have been asked to (a) scale to very high bandwidths, (b) scale in metadata performance, (c) support policy-based hierarchical storage management capability, (d) scale in supporting changing needs of very large data sets, (e) support standard interface, and (f) utilize commercial-off-the-shelf(COTS) hardware. Parallel file systems have been asked to do the same thing but at one or more orders of magnitude faster in performance. Archive systems continue to move closer to file systems in their design due to the need for speed and bandwidth, especially metadata searching speeds such as more caching and less robust semantics. Currently the number of extreme highly scalable parallel archive solutions is very small especially those that will move a single large striped parallel disk file onto many tapes in parallel. We believe that a hybrid storage approach of using COTS components and innovative software technology can bring new capabilities into a production environment for the HPC community much faster than the approach of creating and maintaining a complete end-to-end unique parallel archive software solution. In this paper, we relay our experience of integrating a global parallel file system and a standard backup/archive product with a very small amount of additional code to provide a scalable, parallel archive. Our solution has a high degree of overlap with current parallel archive products including (a) doing parallel movement to/from tape for a single large parallel file, (b) hierarchical storage management, (c) ILM features, (d) high volume (non-single parallel file) archives for backup/archive/content management, and (e) leveraging all free file movement tools in Linux such as copy, move, ls, tar, etc. We have successfully applied our working COTS Parallel Archive System to the current world's first petaflop/s computing system, LANL's Roadrunner, and demonstrated its capability to address requirements of
Integration experiments and performance studies of a COTS parallel archive system

Energy Technology Data Exchange (ETDEWEB)

Chen, Hsing-bung [Los Alamos National Laboratory; Scott, Cody [Los Alamos National Laboratory; Grider, Gary [Los Alamos National Laboratory; Torres, Aaron [Los Alamos National Laboratory; Turley, Milton [Los Alamos National Laboratory; Sanchez, Kathy [Los Alamos National Laboratory; Bremer, John [Los Alamos National Laboratory

2010-06-16

Current and future Archive Storage Systems have been asked to (a) scale to very high bandwidths, (b) scale in metadata performance, (c) support policy-based hierarchical storage management capability, (d) scale in supporting changing needs of very large data sets, (e) support standard interface, and (f) utilize commercial-off-the-shelf (COTS) hardware. Parallel file systems have been asked to do the same thing but at one or more orders of magnitude faster in performance. Archive systems continue to move closer to file systems in their design due to the need for speed and bandwidth, especially metadata searching speeds such as more caching and less robust semantics. Currently the number of extreme highly scalable parallel archive solutions is very small especially those that will move a single large striped parallel disk file onto many tapes in parallel. We believe that a hybrid storage approach of using COTS components and innovative software technology can bring new capabilities into a production environment for the HPC community much faster than the approach of creating and maintaining a complete end-to-end unique parallel archive software solution. In this paper, we relay our experience of integrating a global parallel file system and a standard backup/archive product with a very small amount of additional code to provide a scalable, parallel archive. Our solution has a high degree of overlap with current parallel archive products including (a) doing parallel movement to/from tape for a single large parallel file, (b) hierarchical storage management, (c) ILM features, (d) high volume (non-single parallel file) archives for backup/archive/content management, and (e) leveraging all free file movement tools in Linux such as copy, move, Is, tar, etc. We have successfully applied our working COTS Parallel Archive System to the current world's first petafiop/s computing system, LANL's Roadrunner machine, and demonstrated its capability to address
Optimization of large-scale industrial systems : an emerging method

Energy Technology Data Exchange (ETDEWEB)

Hammache, A.; Aube, F.; Benali, M.; Cantave, R. [Natural Resources Canada, Varennes, PQ (Canada). CANMET Energy Technology Centre

2006-07-01

This paper reviewed optimization methods of large-scale industrial production systems and presented a novel systematic multi-objective and multi-scale optimization methodology. The methodology was based on a combined local optimality search with global optimality determination, and advanced system decomposition and constraint handling. The proposed method focused on the simultaneous optimization of the energy, economy and ecology aspects of industrial systems (E{sup 3}-ISO). The aim of the methodology was to provide guidelines for decision-making strategies. The approach was based on evolutionary algorithms (EA) with specifications including hybridization of global optimality determination with a local optimality search; a self-adaptive algorithm to account for the dynamic changes of operating parameters and design variables occurring during the optimization process; interactive optimization; advanced constraint handling and decomposition strategy; and object-oriented programming and parallelization techniques. Flowcharts of the working principles of the basic EA were presented. It was concluded that the EA uses a novel decomposition and constraint handling technique to enhance the Pareto solution search procedure for multi-objective problems. 6 refs., 9 figs.
Derivation of Optimal Operating Rules for Large-scale Reservoir Systems Considering Multiple Trade-off

Science.gov (United States)

Zhang, J.; Lei, X.; Liu, P.; Wang, H.; Li, Z.

2017-12-01

Flood control operation of multi-reservoir systems such as parallel reservoirs and hybrid reservoirs often suffer from complex interactions and trade-off among tributaries and the mainstream. The optimization of such systems is computationally intensive due to nonlinear storage curves, numerous constraints and complex hydraulic connections. This paper aims to derive the optimal flood control operating rules based on the trade-off among tributaries and the mainstream using a new algorithm known as weighted non-dominated sorting genetic algorithm II (WNSGA II). WNSGA II could locate the Pareto frontier in non-dominated region efficiently due to the directed searching by weighted crowding distance, and the results are compared with those of conventional operating rules (COR) and single objective genetic algorithm (GA). Xijiang river basin in China is selected as a case study, with eight reservoirs and five flood control sections within four tributaries and the mainstream. Furthermore, the effects of inflow uncertainty have been assessed. Results indicate that: (1) WNSGA II could locate the non-dominated solutions faster and provide better Pareto frontier than the traditional non-dominated sorting genetic algorithm II (NSGA II) due to the weighted crowding distance; (2) WNSGA II outperforms COR and GA on flood control in the whole basin; (3) The multi-objective operating rules from WNSGA II deal with the inflow uncertainties better than COR. Therefore, the WNSGA II can be used to derive stable operating rules for large-scale reservoir systems effectively and efficiently.
Parallel Computing Strategies for Irregular Algorithms

Science.gov (United States)

Biswas, Rupak; Oliker, Leonid; Shan, Hongzhang; Biegel, Bryan (Technical Monitor)

2002-01-01

Parallel computing promises several orders of magnitude increase in our ability to solve realistic computationally-intensive problems, but relies on their efficient mapping and execution on large-scale multiprocessor architectures. Unfortunately, many important applications are irregular and dynamic in nature, making their effective parallel implementation a daunting task. Moreover, with the proliferation of parallel architectures and programming paradigms, the typical scientist is faced with a plethora of questions that must be answered in order to obtain an acceptable parallel implementation of the solution algorithm. In this paper, we consider three representative irregular applications: unstructured remeshing, sparse matrix computations, and N-body problems, and parallelize them using various popular programming paradigms on a wide spectrum of computer platforms ranging from state-of-the-art supercomputers to PC clusters. We present the underlying problems, the solution algorithms, and the parallel implementation strategies. Smart load-balancing, partitioning, and ordering techniques are used to enhance parallel performance. Overall results demonstrate the complexity of efficiently parallelizing irregular algorithms.
Efficient parallel and out of core algorithms for constructing large bi-directed de Bruijn graphs

Directory of Open Access Journals (Sweden)

Vaughn Matthew

2010-11-01

-directed de Bruijn graph is a fundamental data structure for any sequence assembly program based on Eulerian approach. Our algorithms for constructing Bi-directed de Bruijn graphs are efficient in parallel and out of core settings. These algorithms can be used in building large scale bi-directed de Bruijn graphs. Furthermore, our algorithms do not employ any all-to-all communications in a parallel setting and perform better than the prior algorithms. Finally our out-of-core algorithm is extremely memory efficient and can replace the existing graph construction algorithm in VELVET.
Efficient parallel and out of core algorithms for constructing large bi-directed de Bruijn graphs.

Science.gov (United States)

Kundeti, Vamsi K; Rajasekaran, Sanguthevar; Dinh, Hieu; Vaughn, Matthew; Thapar, Vishal

2010-11-15

any sequence assembly program based on Eulerian approach. Our algorithms for constructing Bi-directed de Bruijn graphs are efficient in parallel and out of core settings. These algorithms can be used in building large scale bi-directed de Bruijn graphs. Furthermore, our algorithms do not employ any all-to-all communications in a parallel setting and perform better than the prior algorithms. Finally our out-of-core algorithm is extremely memory efficient and can replace the existing graph construction algorithm in VELVET.
Nonlinear Multigrid for Reservoir Simulation

DEFF Research Database (Denmark)

Christensen, Max la Cour; Eskildsen, Klaus Langgren; Engsig-Karup, Allan Peter

2016-01-01

efficiency for a black-oil model. Furthermore, the use of the FAS method enables a significant reduction in memory usage compared with conventional techniques, which suggests new possibilities for improved large-scale reservoir simulation and numerical efficiency. Last, nonlinear multilevel preconditioning...
Large-scale structure after COBE: Peculiar velocities and correlations of cold dark matter halos

Science.gov (United States)

Zurek, Wojciech H.; Quinn, Peter J.; Salmon, John K.; Warren, Michael S.

1994-01-01

Large N-body simulations on parallel supercomputers allow one to simultaneously investigate large-scale structure and the formation of galactic halos with unprecedented resolution. Our study shows that the masses as well as the spatial distribution of halos on scales of tens of megaparsecs in a cold dark matter (CDM) universe with the spectrum normalized to the anisotropies detected by Cosmic Background Explorer (COBE) is compatible with the observations. We also show that the average value of the relative pairwise velocity dispersion sigma(sub v) - used as a principal argument against COBE-normalized CDM models-is significantly lower for halos than for individual particles. When the observational methods of extracting sigma(sub v) are applied to the redshift catalogs obtained from the numerical experiments, estimates differ significantly between different observation-sized samples and overlap observational estimates obtained following the same procedure.
Eighth SIAM conference on parallel processing for scientific computing: Final program and abstracts

Energy Technology Data Exchange (ETDEWEB)

NONE

1997-12-31

This SIAM conference is the premier forum for developments in parallel numerical algorithms, a field that has seen very lively and fruitful developments over the past decade, and whose health is still robust. Themes for this conference were: combinatorial optimization; data-parallel languages; large-scale parallel applications; message-passing; molecular modeling; parallel I/O; parallel libraries; parallel software tools; parallel compilers; particle simulations; problem-solving environments; and sparse matrix computations.
A hybrid parallel framework for the cellular Potts model simulations

Energy Technology Data Exchange (ETDEWEB)

Jiang, Yi [Los Alamos National Laboratory; He, Kejing [SOUTH CHINA UNIV; Dong, Shoubin [SOUTH CHINA UNIV

2009-01-01

The Cellular Potts Model (CPM) has been widely used for biological simulations. However, most current implementations are either sequential or approximated, which can't be used for large scale complex 3D simulation. In this paper we present a hybrid parallel framework for CPM simulations. The time-consuming POE solving, cell division, and cell reaction operation are distributed to clusters using the Message Passing Interface (MPI). The Monte Carlo lattice update is parallelized on shared-memory SMP system using OpenMP. Because the Monte Carlo lattice update is much faster than the POE solving and SMP systems are more and more common, this hybrid approach achieves good performance and high accuracy at the same time. Based on the parallel Cellular Potts Model, we studied the avascular tumor growth using a multiscale model. The application and performance analysis show that the hybrid parallel framework is quite efficient. The hybrid parallel CPM can be used for the large scale simulation ({approx}10{sup 8} sites) of complex collective behavior of numerous cells ({approx}10{sup 6}).
Numerical methods for the design of large-scale nonlinear discrete ill-posed inverse problems

International Nuclear Information System (INIS)

Haber, E; Horesh, L; Tenorio, L

2010-01-01

Design of experiments for discrete ill-posed problems is a relatively new area of research. While there has been some limited work concerning the linear case, little has been done to study design criteria and numerical methods for ill-posed nonlinear problems. We present an algorithmic framework for nonlinear experimental design with an efficient numerical implementation. The data are modeled as indirect, noisy observations of the model collected via a set of plausible experiments. An inversion estimate based on these data is obtained by a weighted Tikhonov regularization whose weights control the contribution of the different experiments to the data misfit term. These weights are selected by minimization of an empirical estimate of the Bayes risk that is penalized to promote sparsity. This formulation entails a bilevel optimization problem that is solved using a simple descent method. We demonstrate the viability of our design with a problem in electromagnetic imaging based on direct current resistivity and magnetotelluric data
Xyce parallel electronic simulator release notes.

Energy Technology Data Exchange (ETDEWEB)

Keiter, Eric R; Hoekstra, Robert John; Mei, Ting; Russo, Thomas V.; Schiek, Richard Louis; Thornquist, Heidi K.; Rankin, Eric Lamont; Coffey, Todd S; Pawlowski, Roger P; Santarelli, Keith R.

2010-05-01

The Xyce Parallel Electronic Simulator has been written to support, in a rigorous manner, the simulation needs of the Sandia National Laboratories electrical designers. Specific requirements include, among others, the ability to solve extremely large circuit problems by supporting large-scale parallel computing platforms, improved numerical performance and object-oriented code design and implementation. The Xyce release notes describe: Hardware and software requirements New features and enhancements Any defects fixed since the last release Current known defects and defect workarounds For up-to-date information not available at the time these notes were produced, please visit the Xyce web page at http://www.cs.sandia.gov/xyce.
A multiple-scale power series method for solving nonlinear ordinary differential equations

Directory of Open Access Journals (Sweden)

Chein-Shan Liu

2016-02-01

Full Text Available The power series solution is a cheap and effective method to solve nonlinear problems, like the Duffing-van der Pol oscillator, the Volterra population model and the nonlinear boundary value problems. A novel power series method by considering the multiple scales $R_k$ in the power term $(t/R_k^k$ is developed, which are derived explicitly to reduce the ill-conditioned behavior in the data interpolation. In the method a huge value times a tiny value is avoided, such that we can decrease the numerical instability and which is the main reason to cause the failure of the conventional power series method. The multiple scales derived from an integral can be used in the power series expansion, which provide very accurate numerical solutions of the problems considered in this paper.
The Software Reliability of Large Scale Integration Circuit and Very Large Scale Integration Circuit

OpenAIRE

Artem Ganiyev; Jan Vitasek

2010-01-01

This article describes evaluation method of faultless function of large scale integration circuits (LSI) and very large scale integration circuits (VLSI). In the article there is a comparative analysis of factors which determine faultless of integrated circuits, analysis of already existing methods and model of faultless function evaluation of LSI and VLSI. The main part describes a proposed algorithm and program for analysis of fault rate in LSI and VLSI circuits.

Nonlinear theory of collisionless trapped ion modes

International Nuclear Information System (INIS)

Hahm, T.S.; Tang, W.M.

1996-01-01

A simplified two field nonlinear model for collisionless trapped-ion-mode turbulence has been derived from nonlinear bounce-averaged drift kinetic equations. The renormalized thermal diffusivity obtained from this analysis exhibits a Bohm-like scaling. A new nonlinearity associated with the neoclassical polarization density is found to introduce an isotope-dependent modification to this Bohm-like diffusivity. The asymptotic balance between the equilibrium variation and the finite banana width induced reduction of the fluctuation potential leads to the result that the radial correlation length decreases with increasing plasma current. Other important conclusions from the present analysis include the predictions that (i) the relative density fluctuation level δn/n 0 is lower than the conventional mixing length estimate, Δr/L n (ii) the ion temperature fluctuation level δT i /T i significantly exceeds the density fluctuation level δn/n 0 ; and (iii) the parallel ion velocity fluctuation level δv iparallel /v Ti is expected to be negligible
Modeling and Control of a Large Nuclear Reactor A Three-Time-Scale Approach

CERN Document Server

Shimjith, S R; Bandyopadhyay, B

2013-01-01

Control analysis and design of large nuclear reactors requires a suitable mathematical model representing the steady state and dynamic behavior of the reactor with reasonable accuracy. This task is, however, quite challenging because of several complex dynamic phenomena existing in a reactor. Quite often, the models developed would be of prohibitively large order, non-linear and of complex structure not readily amenable for control studies. Moreover, the existence of simultaneously occurring dynamic variations at different speeds makes the mathematical model susceptible to numerical ill-conditioning, inhibiting direct application of standard control techniques. This monograph introduces a technique for mathematical modeling of large nuclear reactors in the framework of multi-point kinetics, to obtain a comparatively smaller order model in standard state space form thus overcoming these difficulties. It further brings in innovative methods for controller design for systems exhibiting multi-time-scale property,...
Scaling Behavior of Dilute Polymer Solutions Confined between Parallel Plates

NARCIS (Netherlands)

Vliet, J.H. van; Luyten, M.C.; Brinke, G. ten

1992-01-01

The average size and shape of a polymer coil confined in a slit between two parallel plates depends on the distance L between the plates. On the basis of numerical results, four different regimes can be distingubhed. For large values of L the coil is essentially unconfined. For intermediate values
DEMNUni: the clustering of large-scale structures in the presence of massive neutrinos

International Nuclear Information System (INIS)

Castorina, Emanuele; Carbone, Carmelita; Bel, Julien; Sefusatti, Emiliano; Dolag, Klaus

2015-01-01

We analyse the clustering features of Large Scale Structures (LSS) in the presence of massive neutrinos, employing a set of large-volume, high-resolution cosmological N-body simulations, where neutrinos are treated as separate collisionless particles. The volume of 8 h -3 Gpc 3 , combined with a resolution of about 8×10 10 h -1 M ⊚ for the cold dark matter (CDM) component, represents a significant improvement over previous N-body simulations in massive neutrino cosmologies. In this work we focus, in the first place, on the analysis of nonlinear effects in CDM and neutrinos perturbations contributing to the total matter power spectrum. We show that most of the nonlinear evolution is generated exclusively by the CDM component. We therefore compare mildly nonlinear predictions from Eulerian Perturbation Theory (PT), and fully nonlinear prescriptions (HALOFIT) with the measurements obtained from the simulations. We find that accounting only for the nonlinear evolution of the CDM power spectrum allows to recover the total matter power spectrum with the same accuracy as the massless case. Indeed, we show that, the most recent version of the (HALOFIT) formula calibrated on ΛCDM simulations can be applied directly to the linear CDM power spectrum without requiring additional fitting parameters in the massive case. As a second step, we study the abundance and clustering properties of CDM halos, confirming that, in massive neutrino cosmologies, the proper definition of the halo bias should be made with respect to the cold rather than the total matter distribution, as recently shown in the literature. Here we extend these results to the redshift space, finding that, when accounting for massive neutrinos, an improper definition of the linear bias can lead to a systematic error of about 1-2 % in the determination of the linear growth rate from anisotropic clustering. This result is quite important if we consider that future spectroscopic galaxy surveys, as e.g. Euclid, are
Coupling graph perturbation theory with scalable parallel algorithms for large-scale enumeration of maximal cliques in biological graphs

International Nuclear Information System (INIS)

Samatova, N F; Schmidt, M C; Hendrix, W; Breimyer, P; Thomas, K; Park, B-H

2008-01-01

Data-driven construction of predictive models for biological systems faces challenges from data intensity, uncertainty, and computational complexity. Data-driven model inference is often considered a combinatorial graph problem where an enumeration of all feasible models is sought. The data-intensive and the NP-hard nature of such problems, however, challenges existing methods to meet the required scale of data size and uncertainty, even on modern supercomputers. Maximal clique enumeration (MCE) in a graph derived from such biological data is often a rate-limiting step in detecting protein complexes in protein interaction data, finding clusters of co-expressed genes in microarray data, or identifying clusters of orthologous genes in protein sequence data. We report two key advances that address this challenge. We designed and implemented the first (to the best of our knowledge) parallel MCE algorithm that scales linearly on thousands of processors running MCE on real-world biological networks with thousands and hundreds of thousands of vertices. In addition, we proposed and developed the Graph Perturbation Theory (GPT) that establishes a foundation for efficiently solving the MCE problem in perturbed graphs, which model the uncertainty in the data. GPT formulates necessary and sufficient conditions for detecting the differences between the sets of maximal cliques in the original and perturbed graphs and reduces the enumeration time by more than 80% compared to complete recomputation
Thresholds, switches and hysteresis in hydrology from the pedon to the catchment scale: a non-linear systems theory

Directory of Open Access Journals (Sweden)

2007-01-01

Full Text Available Hysteresis is a rate-independent non-linearity that is expressed through thresholds, switches, and branches. Exceedance of a threshold, or the occurrence of a turning point in the input, switches the output onto a particular output branch. Rate-independent branching on a very large set of switches with non-local memory is the central concept in the new definition of hysteresis. Hysteretic loops are a special case. A self-consistent mathematical description of hydrological systems with hysteresis demands a new non-linear systems theory of adequate generality. The goal of this paper is to establish this and to show how this may be done. Two results are presented: a conceptual model for the hysteretic soil-moisture characteristic at the pedon scale and a hysteretic linear reservoir at the catchment scale. Both are based on the Preisach model. A result of particular significance is the demonstration that the independent domain model of the soil moisture characteristic due to Childs, Poulavassilis, Mualem and others, is equivalent to the Preisach hysteresis model of non-linear systems theory, a result reminiscent of the reduction of the theory of the unit hydrograph to linear systems theory in the 1950s. A significant reduction in the number of model parameters is also achieved. The new theory implies a change in modelling paradigm.
Apparatus to examine pulsed parallel field losses in large conductors

International Nuclear Information System (INIS)

Miller, J.R.; Shen, S.S.

1977-01-01

Conductors in tokamak toroidal field coils will be exposed to pulsed fields both parallel and perpendicular to the current direction. These conductors will likely be quite high capacity (10 to 20 kA) and therefore probably will be built up out of smaller units. We have previously published measurements of losses in conductors exposed to a pulsed parallel field, but those experiments necessarily used monolithic conductors of relatively small cross section because the pulse coil, a torus that surrounded the test conductor, was itself small. Here we describe an apparatus that is conceptually similar but has been scaled up to accept conductors of much larger cross section and current capacity. The apparatus consists basically of a superconducting torus that contains a movable spool to allow test samples to be wound inside without unwinding the torus. Details of apparatus design and capabilities are described and preliminary results from tests of the apparatus and from loss measurements using it are reported
An efficient method based on the uniformity principle for synthesis of large-scale heat exchanger networks

International Nuclear Information System (INIS)

Zhang, Chunwei; Cui, Guomin; Chen, Shang

2016-01-01

Highlights: • Two dimensionless uniformity factors are presented to heat exchange network. • The grouping of process streams reduces the computational complexity of large-scale HENS problems. • The optimal sub-network can be obtained by Powell particle swarm optimization algorithm. • The method is illustrated by a case study involving 39 process streams, with a better solution. - Abstract: The optimal design of large-scale heat exchanger networks is a difficult task due to the inherent non-linear characteristics and the combinatorial nature of heat exchangers. To solve large-scale heat exchanger network synthesis (HENS) problems, two dimensionless uniformity factors to describe the heat exchanger network (HEN) uniformity in terms of the temperature difference and the accuracy of process stream grouping are deduced. Additionally, a novel algorithm that combines deterministic and stochastic optimizations to obtain an optimal sub-network with a suitable heat load for a given group of streams is proposed, and is named the Powell particle swarm optimization (PPSO). As a result, the synthesis of large-scale heat exchanger networks is divided into two corresponding sub-parts, namely, the grouping of process streams and the optimization of sub-networks. This approach reduces the computational complexity and increases the efficiency of the proposed method. The robustness and effectiveness of the proposed method are demonstrated by solving a large-scale HENS problem involving 39 process streams, and the results obtained are better than those previously published in the literature.
Accurate and Efficient Parallel Implementation of an Effective Linear-Scaling Direct Random Phase Approximation Method.

Science.gov (United States)

Graf, Daniel; Beuerle, Matthias; Schurkus, Henry F; Luenser, Arne; Savasci, Gökcen; Ochsenfeld, Christian

2018-05-08

An efficient algorithm for calculating the random phase approximation (RPA) correlation energy is presented that is as accurate as the canonical molecular orbital resolution-of-the-identity RPA (RI-RPA) with the important advantage of an effective linear-scaling behavior (instead of quartic) for large systems due to a formulation in the local atomic orbital space. The high accuracy is achieved by utilizing optimized minimax integration schemes and the local Coulomb metric attenuated by the complementary error function for the RI approximation. The memory bottleneck of former atomic orbital (AO)-RI-RPA implementations ( Schurkus, H. F.; Ochsenfeld, C. J. Chem. Phys. 2016 , 144 , 031101 and Luenser, A.; Schurkus, H. F.; Ochsenfeld, C. J. Chem. Theory Comput. 2017 , 13 , 1647 - 1655 ) is addressed by precontraction of the large 3-center integral matrix with the Cholesky factors of the ground state density reducing the memory requirements of that matrix by a factor of [Formula: see text]. Furthermore, we present a parallel implementation of our method, which not only leads to faster RPA correlation energy calculations but also to a scalable decrease in memory requirements, opening the door for investigations of large molecules even on small- to medium-sized computing clusters. Although it is known that AO methods are highly efficient for extended systems, where sparsity allows for reaching the linear-scaling regime, we show that our work also extends the applicability when considering highly delocalized systems for which no linear scaling can be achieved. As an example, the interlayer distance of two covalent organic framework pore fragments (comprising 384 atoms in total) is analyzed.
Phylogenetic distribution of large-scale genome patchiness

Directory of Open Access Journals (Sweden)

Hackenberg Michael

2008-04-01

Full Text Available Abstract Background The phylogenetic distribution of large-scale genome structure (i.e. mosaic compositional patchiness has been explored mainly by analytical ultracentrifugation of bulk DNA. However, with the availability of large, good-quality chromosome sequences, and the recently developed computational methods to directly analyze patchiness on the genome sequence, an evolutionary comparative analysis can be carried out at the sequence level. Results The local variations in the scaling exponent of the Detrended Fluctuation Analysis are used here to analyze large-scale genome structure and directly uncover the characteristic scales present in genome sequences. Furthermore, through shuffling experiments of selected genome regions, computationally-identified, isochore-like regions were identified as the biological source for the uncovered large-scale genome structure. The phylogenetic distribution of short- and large-scale patchiness was determined in the best-sequenced genome assemblies from eleven eukaryotic genomes: mammals (Homo sapiens, Pan troglodytes, Mus musculus, Rattus norvegicus, and Canis familiaris, birds (Gallus gallus, fishes (Danio rerio, invertebrates (Drosophila melanogaster and Caenorhabditis elegans, plants (Arabidopsis thaliana and yeasts (Saccharomyces cerevisiae. We found large-scale patchiness of genome structure, associated with in silico determined, isochore-like regions, throughout this wide phylogenetic range. Conclusion Large-scale genome structure is detected by directly analyzing DNA sequences in a wide range of eukaryotic chromosome sequences, from human to yeast. In all these genomes, large-scale patchiness can be associated with the isochore-like regions, as directly detected in silico at the sequence level.
Managing large-scale models: DBS

International Nuclear Information System (INIS)

1981-05-01

A set of fundamental management tools for developing and operating a large scale model and data base system is presented. Based on experience in operating and developing a large scale computerized system, the only reasonable way to gain strong management control of such a system is to implement appropriate controls and procedures. Chapter I discusses the purpose of the book. Chapter II classifies a broad range of generic management problems into three groups: documentation, operations, and maintenance. First, system problems are identified then solutions for gaining management control are disucssed. Chapters III, IV, and V present practical methods for dealing with these problems. These methods were developed for managing SEAS but have general application for large scale models and data bases
Large Scale Self-Organizing Information Distribution System

National Research Council Canada - National Science Library

Low, Steven

2005-01-01

This project investigates issues in "large-scale" networks. Here "large-scale" refers to networks with large number of high capacity nodes and transmission links, and shared by a large number of users...
Self-assembly of highly fluorescent semiconductor nanorods into large scale smectic liquid crystal structures by coffee stain evaporation dynamics

International Nuclear Information System (INIS)

Nobile, Concetta; Carbone, Luigi; Fiore, Angela; Cingolani, Roberto; Manna, Liberato; Krahne, Roman

2009-01-01

We deposit droplets of nanorods dispersed in solvents on substrate surfaces and let the solvent evaporate. We find that strong contact line pinning leads to dense nanorod deposition inside coffee stain fringes, where we observe large scale lateral ordering of the nanorods with the long axis of the rods oriented parallel to the contact line. We observe birefringence of these coffee stain fringes by polarized microscopy and we find the direction of the extraordinary refractive index parallel to the long axis of the nanorods.
Development of a large-scale general purpose two-phase flow analysis code

International Nuclear Information System (INIS)

Terasaka, Haruo; Shimizu, Sensuke

2001-01-01

A general purpose three-dimensional two-phase flow analysis code has been developed for solving large-scale problems in industrial fields. The code uses a two-fluid model to describe the conservation equations for two-phase flow in order to be applicable to various phenomena. Complicated geometrical conditions are modeled by FAVOR method in structured grid systems, and the discretization equations are solved by a modified SIMPLEST scheme. To reduce computing time a matrix solver for the pressure correction equation is parallelized with OpenMP. Results of numerical examples show that the accurate solutions can be obtained efficiently and stably. (author)
Highly uniform parallel microfabrication using a large numerical aperture system

Energy Technology Data Exchange (ETDEWEB)

Zhang, Zi-Yu; Su, Ya-Hui, E-mail: ustcsyh@ahu.edu.cn, E-mail: dongwu@ustc.edu.cn [School of Electrical Engineering and Automation, Anhui University, Hefei 230601 (China); Zhang, Chen-Chu; Hu, Yan-Lei; Wang, Chao-Wei; Li, Jia-Wen; Chu, Jia-Ru; Wu, Dong, E-mail: ustcsyh@ahu.edu.cn, E-mail: dongwu@ustc.edu.cn [CAS Key Laboratory of Mechanical Behavior and Design of Materials, Department of Precision Machinery and Precision Instrumentation, University of Science and Technology of China, Hefei 230026 (China)

2016-07-11

In this letter, we report an improved algorithm to produce accurate phase patterns for generating highly uniform diffraction-limited multifocal arrays in a large numerical aperture objective system. It is shown that based on the original diffraction integral, the uniformity of the diffraction-limited focal arrays can be improved from ∼75% to >97%, owing to the critical consideration of the aperture function and apodization effect associated with a large numerical aperture objective. The experimental results, e.g., 3 × 3 arrays of square and triangle, seven microlens arrays with high uniformity, further verify the advantage of the improved algorithm. This algorithm enables the laser parallel processing technology to realize uniform microstructures and functional devices in the microfabrication system with a large numerical aperture objective.
Algorithm and Implementation of Distributed ESN Using Spark Framework and Parallel PSO

Directory of Open Access Journals (Sweden)

Kehe Wu

2017-04-01

Full Text Available The echo state network (ESN employs a huge reservoir with sparsely and randomly connected internal nodes and only trains the output weights, which avoids the suboptimal problem, exploding and vanishing gradients, high complexity and other disadvantages faced by traditional recurrent neural network (RNN training. In light of the outstanding adaption to nonlinear dynamical systems, ESN has been applied into a wide range of applications. However, in the era of Big Data, with an enormous amount of data being generated continuously every day, the data are often distributed and stored in real applications, and thus the centralized ESN training process is prone to being technologically unsuitable. In order to achieve the requirement of Big Data applications in the real world, in this study we propose an algorithm and its implementation for distributed ESN training. The mentioned algorithm is based on the parallel particle swarm optimization (P-PSO technique and the implementation uses Spark, a famous large-scale data processing framework. Four extremely large-scale datasets, including artificial benchmarks, real-world data and image data, are adopted to verify our framework on a stretchable platform. Experimental results indicate that the proposed work is accurate in the era of Big Data, regarding speed, accuracy and generalization capabilities.
Visual Interfaces for Parallel Simulations (VIPS), Phase I

Data.gov (United States)

National Aeronautics and Space Administration — Configuring the 3D geometry and physics of large scale parallel physics simulations is increasingly complex. Given the investment in time and effort to run these...
Large scale structure and baryogenesis

International Nuclear Information System (INIS)

Kirilova, D.P.; Chizhov, M.V.

2001-08-01

We discuss a possible connection between the large scale structure formation and the baryogenesis in the universe. An update review of the observational indications for the presence of a very large scale 120h -1 Mpc in the distribution of the visible matter of the universe is provided. The possibility to generate a periodic distribution with the characteristic scale 120h -1 Mpc through a mechanism producing quasi-periodic baryon density perturbations during inflationary stage, is discussed. The evolution of the baryon charge density distribution is explored in the framework of a low temperature boson condensate baryogenesis scenario. Both the observed very large scale of a the visible matter distribution in the universe and the observed baryon asymmetry value could naturally appear as a result of the evolution of a complex scalar field condensate, formed at the inflationary stage. Moreover, for some model's parameters a natural separation of matter superclusters from antimatter ones can be achieved. (author)
A Parallel Algorithm for Connected Component Labelling of Gray-scale Images on Homogeneous Multicore Architectures

International Nuclear Information System (INIS)

Niknam, Mehdi; Thulasiraman, Parimala; Camorlinga, Sergio

2010-01-01

Connected component labelling is an essential step in image processing. We provide a parallel version of Suzuki's sequential connected component algorithm in order to speed up the labelling process. Also, we modify the algorithm to enable labelling gray-scale images. Due to the data dependencies in the algorithm we used a method similar to pipeline to exploit parallelism. The parallel algorithm method achieved a speedup of 2.5 for image size of 256 x 256 pixels using 4 processing threads.
Automatic management software for large-scale cluster system

International Nuclear Information System (INIS)

Weng Yunjian; Chinese Academy of Sciences, Beijing; Sun Gongxing

2007-01-01

At present, the large-scale cluster system faces to the difficult management. For example the manager has large work load. It needs to cost much time on the management and the maintenance of large-scale cluster system. The nodes in large-scale cluster system are very easy to be chaotic. Thousands of nodes are put in big rooms so that some managers are very easy to make the confusion with machines. How do effectively carry on accurate management under the large-scale cluster system? The article introduces ELFms in the large-scale cluster system. Furthermore, it is proposed to realize the large-scale cluster system automatic management. (authors)

A review of advanced small-scale parallel bioreactor technology for accelerated process development: current state and future need.

Science.gov (United States)

Bareither, Rachel; Pollard, David

2011-01-01

The pharmaceutical and biotech industries face continued pressure to reduce development costs and accelerate process development. This challenge occurs alongside the need for increased upstream experimentation to support quality by design initiatives and the pursuit of predictive models from systems biology. A small scale system enabling multiple reactions in parallel (n ≥ 20), with automated sampling and integrated to purification, would provide significant improvement (four to fivefold) to development timelines. State of the art attempts to pursue high throughput process development include shake flasks, microfluidic reactors, microtiter plates and small-scale stirred reactors. The limitations of these systems are compared to desired criteria to mimic large scale commercial processes. The comparison shows that significant technological improvement is still required to provide automated solutions that can speed upstream process development. Copyright © 2010 American Institute of Chemical Engineers (AIChE).
Large time asymptotics of solutions to the anharmonic oscillator model from nonlinear optics

OpenAIRE

Jochmann, Frank

2005-01-01

The anharmonic oscillator model describing the propagation of electromagnetic waves in an exterior domain containing a nonlinear dielectric medium is investigated. The system under consideration consists of a generally nonlinear second order differential equation for the dielectrical polarization coupled with Maxwell's equations for the electromagnetic field. Local decay of the electromagnetic field for t to infinity in the charge free case is shown for a large class of potentials. (This pape...
Isogeometric analysis of free-form Timoshenko curved beams including the nonlinear effects of large deformations

Science.gov (United States)

Hosseini, Seyed Farhad; Hashemian, Ali; Moetakef-Imani, Behnam; Hadidimoud, Saied

2018-03-01

In the present paper, the isogeometric analysis (IGA) of free-form planar curved beams is formulated based on the nonlinear Timoshenko beam theory to investigate the large deformation of beams with variable curvature. Based on the isoparametric concept, the shape functions of the field variables (displacement and rotation) in a finite element analysis are considered to be the same as the non-uniform rational basis spline (NURBS) basis functions defining the geometry. The validity of the presented formulation is tested in five case studies covering a wide range of engineering curved structures including from straight and constant curvature to variable curvature beams. The nonlinear deformation results obtained by the presented method are compared to well-established benchmark examples and also compared to the results of linear and nonlinear finite element analyses. As the nonlinear load-deflection behavior of Timoshenko beams is the main topic of this article, the results strongly show the applicability of the IGA method to the large deformation analysis of free-form curved beams. Finally, it is interesting to notice that, until very recently, the large deformations analysis of free-form Timoshenko curved beams has not been considered in IGA by researchers.
Large-scale Modeling of Nitrous Oxide Production: Issues of Representing Spatial Heterogeneity

Science.gov (United States)

Morris, C. K.; Knighton, J.

2017-12-01

Nitrous oxide is produced from the biological processes of nitrification and denitrification in terrestrial environments and contributes to the greenhouse effect that warms Earth's climate. Large scale modeling can be used to determine how global rate of nitrous oxide production and consumption will shift under future climates. However, accurate modeling of nitrification and denitrification is made difficult by highly parameterized, nonlinear equations. Here we show that the representation of spatial heterogeneity in inputs, specifically soil moisture, causes inaccuracies in estimating the average nitrous oxide production in soils. We demonstrate that when soil moisture is averaged from a spatially heterogeneous surface, net nitrous oxide production is under predicted. We apply this general result in a test of a widely-used global land surface model, the Community Land Model v4.5. The challenges presented by nonlinear controls on nitrous oxide are highlighted here to provide a wider context to the problem of extraordinary denitrification losses in CLM. We hope that these findings will inform future researchers on the possibilities for model improvement of the global nitrogen cycle.
Large-scale tides in general relativity

Energy Technology Data Exchange (ETDEWEB)

Ip, Hiu Yan; Schmidt, Fabian, E-mail: iphys@mpa-garching.mpg.de, E-mail: fabians@mpa-garching.mpg.de [Max-Planck-Institut für Astrophysik, Karl-Schwarzschild-Str. 1, 85741 Garching (Germany)

2017-02-01

Density perturbations in cosmology, i.e. spherically symmetric adiabatic perturbations of a Friedmann-Lemaȋtre-Robertson-Walker (FLRW) spacetime, are locally exactly equivalent to a different FLRW solution, as long as their wavelength is much larger than the sound horizon of all fluid components. This fact is known as the 'separate universe' paradigm. However, no such relation is known for anisotropic adiabatic perturbations, which correspond to an FLRW spacetime with large-scale tidal fields. Here, we provide a closed, fully relativistic set of evolutionary equations for the nonlinear evolution of such modes, based on the conformal Fermi (CFC) frame. We show explicitly that the tidal effects are encoded by the Weyl tensor, and are hence entirely different from an anisotropic Bianchi I spacetime, where the anisotropy is sourced by the Ricci tensor. In order to close the system, certain higher derivative terms have to be dropped. We show that this approximation is equivalent to the local tidal approximation of Hui and Bertschinger [1]. We also show that this very simple set of equations matches the exact evolution of the density field at second order, but fails at third and higher order. This provides a useful, easy-to-use framework for computing the fully relativistic growth of structure at second order.
Parallel finite elements with domain decomposition and its pre-processing

International Nuclear Information System (INIS)

Yoshida, A.; Yagawa, G.; Hamada, S.

1993-01-01

This paper describes a parallel finite element analysis using a domain decomposition method, and the pre-processing for the parallel calculation. Computer simulations are about to replace experiments in various fields, and the scale of model to be simulated tends to be extremely large. On the other hand, computational environment has drastically changed in these years. Especially, parallel processing on massively parallel computers or computer networks is considered to be promising techniques. In order to achieve high efficiency on such parallel computation environment, large granularity of tasks, a well-balanced workload distribution are key issues. It is also important to reduce the cost of pre-processing in such parallel FEM. From the point of view, the authors developed the domain decomposition FEM with the automatic and dynamic task-allocation mechanism and the automatic mesh generation/domain subdivision system for it. (author)
Analysis of Parallel Algorithms on SMP Node and Cluster of Workstations Using Parallel Programming Models with New Tile-based Method for Large Biological Datasets.

Science.gov (United States)

Shrimankar, D D; Sathe, S R

2016-01-01

Sequence alignment is an important tool for describing the relationships between DNA sequences. Many sequence alignment algorithms exist, differing in efficiency, in their models of the sequences, and in the relationship between sequences. The focus of this study is to obtain an optimal alignment between two sequences of biological data, particularly DNA sequences. The algorithm is discussed with particular emphasis on time, speedup, and efficiency optimizations. Parallel programming presents a number of critical challenges to application developers. Today's supercomputer often consists of clusters of SMP nodes. Programming paradigms such as OpenMP and MPI are used to write parallel codes for such architectures. However, the OpenMP programs cannot be scaled for more than a single SMP node. However, programs written in MPI can have more than single SMP nodes. But such a programming paradigm has an overhead of internode communication. In this work, we explore the tradeoffs between using OpenMP and MPI. We demonstrate that the communication overhead incurs significantly even in OpenMP loop execution and increases with the number of cores participating. We also demonstrate a communication model to approximate the overhead from communication in OpenMP loops. Our results are astonishing and interesting to a large variety of input data files. We have developed our own load balancing and cache optimization technique for message passing model. Our experimental results show that our own developed techniques give optimum performance of our parallel algorithm for various sizes of input parameter, such as sequence size and tile size, on a wide variety of multicore architectures.
Analysis of Parallel Algorithms on SMP Node and Cluster of Workstations Using Parallel Programming Models with New Tile-based Method for Large Biological Datasets

Science.gov (United States)

Shrimankar, D. D.; Sathe, S. R.

2016-01-01

Sequence alignment is an important tool for describing the relationships between DNA sequences. Many sequence alignment algorithms exist, differing in efficiency, in their models of the sequences, and in the relationship between sequences. The focus of this study is to obtain an optimal alignment between two sequences of biological data, particularly DNA sequences. The algorithm is discussed with particular emphasis on time, speedup, and efficiency optimizations. Parallel programming presents a number of critical challenges to application developers. Today’s supercomputer often consists of clusters of SMP nodes. Programming paradigms such as OpenMP and MPI are used to write parallel codes for such architectures. However, the OpenMP programs cannot be scaled for more than a single SMP node. However, programs written in MPI can have more than single SMP nodes. But such a programming paradigm has an overhead of internode communication. In this work, we explore the tradeoffs between using OpenMP and MPI. We demonstrate that the communication overhead incurs significantly even in OpenMP loop execution and increases with the number of cores participating. We also demonstrate a communication model to approximate the overhead from communication in OpenMP loops. Our results are astonishing and interesting to a large variety of input data files. We have developed our own load balancing and cache optimization technique for message passing model. Our experimental results show that our own developed techniques give optimum performance of our parallel algorithm for various sizes of input parameter, such as sequence size and tile size, on a wide variety of multicore architectures. PMID:27932868
Parallel-In-Time For Moving Meshes

Energy Technology Data Exchange (ETDEWEB)

Falgout, R. D. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Manteuffel, T. A. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Southworth, B. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Schroder, J. B. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)

2016-02-04

With steadily growing computational resources available, scientists must develop e ective ways to utilize the increased resources. High performance, highly parallel software has be- come a standard. However until recent years parallelism has focused primarily on the spatial domain. When solving a space-time partial di erential equation (PDE), this leads to a sequential bottleneck in the temporal dimension, particularly when taking a large number of time steps. The XBraid parallel-in-time library was developed as a practical way to add temporal parallelism to existing se- quential codes with only minor modi cations. In this work, a rezoning-type moving mesh is applied to a di usion problem and formulated in a parallel-in-time framework. Tests and scaling studies are run using XBraid and demonstrate excellent results for the simple model problem considered herein.
Large scale network-centric distributed systems

CERN Document Server

Sarbazi-Azad, Hamid

2014-01-01

A highly accessible reference offering a broad range of topics and insights on large scale network-centric distributed systems Evolving from the fields of high-performance computing and networking, large scale network-centric distributed systems continues to grow as one of the most important topics in computing and communication and many interdisciplinary areas. Dealing with both wired and wireless networks, this book focuses on the design and performance issues of such systems. Large Scale Network-Centric Distributed Systems provides in-depth coverage ranging from ground-level hardware issu
Ward identities and consistency relations for the large scale structure with multiple species

International Nuclear Information System (INIS)

Peloso, Marco; Pietroni, Massimo

2014-01-01

We present fully nonlinear consistency relations for the squeezed bispectrum of Large Scale Structure. These relations hold when the matter component of the Universe is composed of one or more species, and generalize those obtained in [1,2] in the single species case. The multi-species relations apply to the standard dark matter + baryons scenario, as well as to the case in which some of the fields are auxiliary quantities describing a particular population, such as dark matter halos or a specific galaxy class. If a large scale velocity bias exists between the different populations new terms appear in the consistency relations with respect to the single species case. As an illustration, we discuss two physical cases in which such a velocity bias can exist: (1) a new long range scalar force in the dark matter sector (resulting in a violation of the equivalence principle in the dark matter-baryon system), and (2) the distribution of dark matter halos relative to that of the underlying dark matter field
Symmetry and exact solutions of nonlinear spinor equations

International Nuclear Information System (INIS)

Fushchich, W.I.; Zhdanov, R.Z.

1989-01-01

This review is devoted to the application of algebraic-theoretical methods to the problem of constructing exact solutions of the many-dimensional nonlinear systems of partial differential equations for spinor, vector and scalar fields widely used in quantum field theory. Large classes of nonlinear spinor equations invariant under the Poincare group P(1, 3), Weyl group (i.e. Poincare group supplemented by a group of scale transformations), and the conformal group C(1, 3) are described. Ansaetze invariant under the Poincare and the Weyl groups are constructed. Using these we reduce the Poincare-invariant nonlinear Dirac equations to systems of ordinary differential equations and construct large families of exact solutions of the nonlinear Dirac-Heisenberg equation depending on arbitrary parameters and functions. In a similar way we have obtained new families of exact solutions of the nonlinear Maxwell-Dirac and Klein-Gordon-Dirac equations. The obtained solutions can be used for quantization of nonlinear equations. (orig.)
Large-Scale Outflows in Seyfert Galaxies

Science.gov (United States)

Colbert, E. J. M.; Baum, S. A.

1995-12-01

\\catcode`\\@=11 \\ialign{m @th#1hfil ##hfil \\crcr#2\\crcr\\sim\\crcr}}} \\catcode`\\@=12 Highly collimated outflows extend out to Mpc scales in many radio-loud active galaxies. In Seyfert galaxies, which are radio-quiet, the outflows extend out to kpc scales and do not appear to be as highly collimated. In order to study the nature of large-scale (>~1 kpc) outflows in Seyferts, we have conducted optical, radio and X-ray surveys of a distance-limited sample of 22 edge-on Seyfert galaxies. Results of the optical emission-line imaging and spectroscopic survey imply that large-scale outflows are present in >~{{1} /{4}} of all Seyferts. The radio (VLA) and X-ray (ROSAT) surveys show that large-scale radio and X-ray emission is present at about the same frequency. Kinetic luminosities of the outflows in Seyferts are comparable to those in starburst-driven superwinds. Large-scale radio sources in Seyferts appear diffuse, but do not resemble radio halos found in some edge-on starburst galaxies (e.g. M82). We discuss the feasibility of the outflows being powered by the active nucleus (e.g. a jet) or a circumnuclear starburst.
On the Phenomenology of an Accelerated Large-Scale Universe

Directory of Open Access Journals (Sweden)

Martiros Khurshudyan

2016-10-01

Full Text Available In this review paper, several new results towards the explanation of the accelerated expansion of the large-scale universe is discussed. On the other hand, inflation is the early-time accelerated era and the universe is symmetric in the sense of accelerated expansion. The accelerated expansion of is one of the long standing problems in modern cosmology, and physics in general. There are several well defined approaches to solve this problem. One of them is an assumption concerning the existence of dark energy in recent universe. It is believed that dark energy is responsible for antigravity, while dark matter has gravitational nature and is responsible, in general, for structure formation. A different approach is an appropriate modification of general relativity including, for instance, f ( R and f ( T theories of gravity. On the other hand, attempts to build theories of quantum gravity and assumptions about existence of extra dimensions, possible variability of the gravitational constant and the speed of the light (among others, provide interesting modifications of general relativity applicable to problems of modern cosmology, too. In particular, here two groups of cosmological models are discussed. In the first group the problem of the accelerated expansion of large-scale universe is discussed involving a new idea, named the varying ghost dark energy. On the other hand, the second group contains cosmological models addressed to the same problem involving either new parameterizations of the equation of state parameter of dark energy (like varying polytropic gas, or nonlinear interactions between dark energy and dark matter. Moreover, for cosmological models involving varying ghost dark energy, massless particle creation in appropriate radiation dominated universe (when the background dynamics is due to general relativity is demonstrated as well. Exploring the nature of the accelerated expansion of the large-scale universe involving generalized
Advances in dynamic relaxation techniques for nonlinear finite element analysis

International Nuclear Information System (INIS)

Sauve, R.G.; Metzger, D.R.

1995-01-01

Traditionally, the finite element technique has been applied to static and steady-state problems using implicit methods. When nonlinearities exist, equilibrium iterations must be performed using Newton-Raphson or quasi-Newton techniques at each load level. In the presence of complex geometry, nonlinear material behavior, and large relative sliding of material interfaces, solutions using implicit methods often become intractable. A dynamic relaxation algorithm is developed for inclusion in finite element codes. The explicit nature of the method avoids large computer memory requirements and makes possible the solution of large-scale problems. The method described approaches the steady-state solution with no overshoot, a problem which has plagued researchers in the past. The method is included in a general nonlinear finite element code. A description of the method along with a number of new applications involving geometric and material nonlinearities are presented. They include: (1) nonlinear geometric cantilever plate; (2) moment-loaded nonlinear beam; and (3) creep of nuclear fuel channel assemblies
Large scale electromechanical transistor with application in mass sensing

Energy Technology Data Exchange (ETDEWEB)

Jin, Leisheng; Li, Lijie, E-mail: L.Li@swansea.ac.uk [Multidisciplinary Nanotechnology Centre, College of Engineering, Swansea University, Swansea SA2 8PP (United Kingdom)

2014-12-07

Nanomechanical transistor (NMT) has evolved from the single electron transistor, a device that operates by shuttling electrons with a self-excited central conductor. The unfavoured aspects of the NMT are the complexity of the fabrication process and its signal processing unit, which could potentially be overcome by designing much larger devices. This paper reports a new design of large scale electromechanical transistor (LSEMT), still taking advantage of the principle of shuttling electrons. However, because of the large size, nonlinear electrostatic forces induced by the transistor itself are not sufficient to drive the mechanical member into vibration—an external force has to be used. In this paper, a LSEMT device is modelled, and its new application in mass sensing is postulated using two coupled mechanical cantilevers, with one of them being embedded in the transistor. The sensor is capable of detecting added mass using the eigenstate shifts method by reading the change of electrical current from the transistor, which has much higher sensitivity than conventional eigenfrequency shift approach used in classical cantilever based mass sensors. Numerical simulations are conducted to investigate the performance of the mass sensor.
Vibration characteristics of a hydraulic generator unit rotor system with parallel misalignment and rub-impact

Energy Technology Data Exchange (ETDEWEB)

Huang, Zhiwei; Zhou, Jianzhong; Yang, Mengqi; Zhang, Yongchuan [Huazhong University of Science and Technology, College of Hydraulic and Digitalization Engineering, Wuhan, Hubei Province (China)

2011-07-15

The object of this research aims at the hydraulic generator unit rotor system. According to fault problems of the generator rotor local rubbing caused by the parallel misalignment and mass eccentricity, a dynamic model for the rotor system coupled with misalignment and rub-impact is established. The dynamic behaviors of this system are investigated using numerical integral method, as the parallel misalignment, mass eccentricity and bearing stiffness vary. The nonlinear dynamic responses of the generator rotor and turbine rotor with coupling faults are analyzed by means of bifurcation diagrams, Poincare maps, axis orbits, time histories and amplitude spectrum diagrams. Various nonlinear phenomena in the system, such as periodic, three-periodic and quasi-periodic motions, are studied with the change of the parallel misalignment. The results reveal that vibration characteristics of the rotor system with coupling faults are extremely complex and there are some low frequencies with large amplitude in the 0.3-0.4 x components. As the increase in mass eccentricity, the interval of nonperiodic motions will be continuously moved forward. It suggests that the reduction in mass eccentricity or increase in bearing stiffness could preclude nonlinear vibration. These might provide some important theory references for safety operating and exact identification of the faults in rotating machinery. (orig.)
Recent Progress in Large-Scale Structure

CERN Multimedia

CERN. Geneva

2014-01-01

I will discuss recent progress in the understanding of how to model galaxy clustering. While recent analyses have focussed on the baryon acoustic oscillations as a probe of cosmology, galaxy redshift surveys contain a lot more information than the acoustic scale. In extracting this additional information three main issues need to be well understood: nonlinear evolution of matter fluctuations, galaxy bias and redshift-space distortions. I will present recent progress in modeling these three effects that pave the way to constraining cosmology and galaxy formation with increased precision.
GENESIS: a hybrid-parallel and multi-scale molecular dynamics simulator with enhanced sampling algorithms for biomolecular and cellular simulations.

Science.gov (United States)

Jung, Jaewoon; Mori, Takaharu; Kobayashi, Chigusa; Matsunaga, Yasuhiro; Yoda, Takao; Feig, Michael; Sugita, Yuji

2015-07-01

GENESIS (Generalized-Ensemble Simulation System) is a new software package for molecular dynamics (MD) simulations of macromolecules. It has two MD simulators, called ATDYN and SPDYN. ATDYN is parallelized based on an atomic decomposition algorithm for the simulations of all-atom force-field models as well as coarse-grained Go-like models. SPDYN is highly parallelized based on a domain decomposition scheme, allowing large-scale MD simulations on supercomputers. Hybrid schemes combining OpenMP and MPI are used in both simulators to target modern multicore computer architectures. Key advantages of GENESIS are (1) the highly parallel performance of SPDYN for very large biological systems consisting of more than one million atoms and (2) the availability of various REMD algorithms (T-REMD, REUS, multi-dimensional REMD for both all-atom and Go-like models under the NVT, NPT, NPAT, and NPγT ensembles). The former is achieved by a combination of the midpoint cell method and the efficient three-dimensional Fast Fourier Transform algorithm, where the domain decomposition space is shared in real-space and reciprocal-space calculations. Other features in SPDYN, such as avoiding concurrent memory access, reducing communication times, and usage of parallel input/output files, also contribute to the performance. We show the REMD simulation results of a mixed (POPC/DMPC) lipid bilayer as a real application using GENESIS. GENESIS is released as free software under the GPLv2 licence and can be easily modified for the development of new algorithms and molecular models. WIREs Comput Mol Sci 2015, 5:310-323. doi: 10.1002/wcms.1220.
On the relationship between large-scale climate modes and regional synoptic patterns that drive Victorian rainfall

OpenAIRE

D. C. Verdon-Kidd; A. S. Kiem

2009-01-01

In this paper regional (synoptic) and large-scale climate drivers of rainfall are investigated for Victoria, Australia. A non-linear classification methodology known as self-organizing maps (SOM) is used to identify 20 key regional synoptic patterns, which are shown to capture a range of significant synoptic features known to influence the climate of the region. Rainfall distributions are assigned to each of the 20 patterns for nine rainfall stations located across Victoria, resulting in a cl...

Statistics and Dynamics in the Large-scale Structure of the Universe

International Nuclear Information System (INIS)

Matsubara, Takahiko

2006-01-01

In cosmology, observations and theories are related to each other by statistics in most cases. Especially, statistical methods play central roles in analyzing fluctuations in the universe, which are seeds of the present structure of the universe. The confrontation of the statistics and dynamics is one of the key methods to unveil the structure and evolution of the universe. I will review some of the major statistical methods in cosmology, in connection with linear and nonlinear dynamics of the large-scale structure of the universe. The present status of analyses of the observational data such as the Sloan Digital Sky Survey, and the future prospects to constrain the nature of exotic components of the universe such as the dark energy will be presented
Large-scale transportation network congestion evolution prediction using deep learning theory.

Science.gov (United States)

Ma, Xiaolei; Yu, Haiyang; Wang, Yunpeng; Wang, Yinhai

2015-01-01

Understanding how congestion at one location can cause ripples throughout large-scale transportation network is vital for transportation researchers and practitioners to pinpoint traffic bottlenecks for congestion mitigation. Traditional studies rely on either mathematical equations or simulation techniques to model traffic congestion dynamics. However, most of the approaches have limitations, largely due to unrealistic assumptions and cumbersome parameter calibration process. With the development of Intelligent Transportation Systems (ITS) and Internet of Things (IoT), transportation data become more and more ubiquitous. This triggers a series of data-driven research to investigate transportation phenomena. Among them, deep learning theory is considered one of the most promising techniques to tackle tremendous high-dimensional data. This study attempts to extend deep learning theory into large-scale transportation network analysis. A deep Restricted Boltzmann Machine and Recurrent Neural Network architecture is utilized to model and predict traffic congestion evolution based on Global Positioning System (GPS) data from taxi. A numerical study in Ningbo, China is conducted to validate the effectiveness and efficiency of the proposed method. Results show that the prediction accuracy can achieve as high as 88% within less than 6 minutes when the model is implemented in a Graphic Processing Unit (GPU)-based parallel computing environment. The predicted congestion evolution patterns can be visualized temporally and spatially through a map-based platform to identify the vulnerable links for proactive congestion mitigation.
Nonlinear power spectrum from resummed perturbation theory: a leap beyond the BAO scale

International Nuclear Information System (INIS)

Anselmi, Stefano; Pietroni, Massimo

2012-01-01

A new computational scheme for the nonlinear cosmological matter power spectrum (PS) is presented. Our method is based on evolution equations in time, which can be cast in a form extremely convenient for fast numerical evaluations. A nonlinear PS is obtained in a time comparable to that needed for a simple 1-loop computation, and the numerical implementation is very simple. Our results agree with N-body simulations at the percent level in the BAO range of scales, and at the few-percent level up to k ≅ 1 h/Mpc at z∼>0.5, thereby opening the possibility of applying this tool to scales interesting for weak lensing. We clarify the approximations inherent to this approach as well as its relations to previous ones, such as the Time Renormalization Group, and the multi-point propagator expansion. We discuss possible lines of improvements of the method and its intrinsic limitations by multi streaming at small scales and low redshifts
SCALE INTERACTION IN A MIXING LAYER. THE ROLE OF THE LARGE-SCALE GRADIENTS

KAUST Repository

Fiscaletti, Daniele

2015-08-23

The interaction between scales is investigated in a turbulent mixing layer. The large-scale amplitude modulation of the small scales already observed in other works depends on the crosswise location. Large-scale positive fluctuations correlate with a stronger activity of the small scales on the low speed-side of the mixing layer, and a reduced activity on the high speed-side. However, from physical considerations we would expect the scales to interact in a qualitatively similar way within the flow and across different turbulent flows. Therefore, instead of the large-scale fluctuations, the large-scale gradients modulation of the small scales has been additionally investigated.
Xyce Parallel Electronic Simulator : users' guide, version 2.0.

Energy Technology Data Exchange (ETDEWEB)

Hoekstra, Robert John; Waters, Lon J.; Rankin, Eric Lamont; Fixel, Deborah A.; Russo, Thomas V.; Keiter, Eric Richard; Hutchinson, Scott Alan; Pawlowski, Roger Patrick; Wix, Steven D.

2004-06-01

This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator capable of simulating electrical circuits at a variety of abstraction levels. Primarily, Xyce has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability the current state-of-the-art in the following areas: {sm_bullet} Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). Note that this includes support for most popular parallel and serial computers. {sm_bullet} Improved performance for all numerical kernels (e.g., time integrator, nonlinear and linear solvers) through state-of-the-art algorithms and novel techniques. {sm_bullet} Device models which are specifically tailored to meet Sandia's needs, including many radiation-aware devices. {sm_bullet} A client-server or multi-tiered operating model wherein the numerical kernel can operate independently of the graphical user interface (GUI). {sm_bullet} Object-oriented code design and implementation using modern coding practices that ensure that the Xyce Parallel Electronic Simulator will be maintainable and extensible far into the future. Xyce is a parallel code in the most general sense of the phrase - a message passing of computing platforms. These include serial, shared-memory and distributed-memory parallel implementation - which allows it to run efficiently on the widest possible number parallel as well as heterogeneous platforms. Careful attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows. One feature required by designers is the ability to add device models, many specific to the needs of Sandia, to the code. To this end, the device package in the Xyce
Improved decomposition–coordination and discrete differential dynamic programming for optimization of large-scale hydropower system

International Nuclear Information System (INIS)

Li, Chunlong; Zhou, Jianzhong; Ouyang, Shuo; Ding, Xiaoling; Chen, Lu

2014-01-01

Highlights: • Optimization of large-scale hydropower system in the Yangtze River basin. • Improved decomposition–coordination and discrete differential dynamic programming. • Generating initial solution randomly to reduce generation time. • Proposing relative coefficient for more power generation. • Proposing adaptive bias corridor technology to enhance convergence speed. - Abstract: With the construction of major hydro plants, more and more large-scale hydropower systems are taking shape gradually, which brings up a challenge to optimize these systems. Optimization of large-scale hydropower system (OLHS), which is to determine water discharges or water levels of overall hydro plants for maximizing total power generation when subjecting to lots of constrains, is a high dimensional, nonlinear and coupling complex problem. In order to solve the OLHS problem effectively, an improved decomposition–coordination and discrete differential dynamic programming (IDC–DDDP) method is proposed in this paper. A strategy that initial solution is generated randomly is adopted to reduce generation time. Meanwhile, a relative coefficient based on maximum output capacity is proposed for more power generation. Moreover, an adaptive bias corridor technology is proposed to enhance convergence speed. The proposed method is applied to long-term optimal dispatches of large-scale hydropower system (LHS) in the Yangtze River basin. Compared to other methods, IDC–DDDP has competitive performances in not only total power generation but also convergence speed, which provides a new method to solve the OLHS problem
Time domain contact model for tyre/road interaction including nonlinear contact stiffness due to small-scale roughness

Science.gov (United States)

Andersson, P. B. U.; Kropp, W.

2008-11-01

Rolling resistance, traction, wear, excitation of vibrations, and noise generation are all attributes to consider in optimisation of the interaction between automotive tyres and wearing courses of roads. The key to understand and describe the interaction is to include a wide range of length scales in the description of the contact geometry. This means including scales on the order of micrometres that have been neglected in previous tyre/road interaction models. A time domain contact model for the tyre/road interaction that includes interfacial details is presented. The contact geometry is discretised into multiple elements forming pairs of matching points. The dynamic response of the tyre is calculated by convolving the contact forces with pre-calculated Green's functions. The smaller-length scales are included by using constitutive interfacial relations, i.e. by using nonlinear contact springs, for each pair of contact elements. The method is presented for normal (out-of-plane) contact and a method for assessing the stiffness of the nonlinear springs based on detailed geometry and elastic data of the tread is suggested. The governing equations of the nonlinear contact problem are solved with the Newton-Raphson iterative scheme. Relations between force, indentation, and contact stiffness are calculated for a single tread block in contact with a road surface. The calculated results have the same character as results from measurements found in literature. Comparison to traditional contact formulations shows that the effect of the small-scale roughness is large; the contact stiffness is only up to half of the stiffness that would result if contact is made over the whole element directly to the bulk of the tread. It is concluded that the suggested contact formulation is a suitable model to include more details of the contact interface. Further, the presented result for the tread block in contact with the road is a suitable input for a global tyre/road interaction model
Analysis of ground response data at Lotung large-scale soil- structure interaction experiment site

International Nuclear Information System (INIS)

Chang, C.Y.; Mok, C.M.; Power, M.S.

1991-12-01

The Electric Power Research Institute (EPRI), in cooperation with the Taiwan Power Company (TPC), constructed two models (1/4-scale and 1/2-scale) of a nuclear plant containment structure at a site in Lotung (Tang, 1987), a seismically active region in northeast Taiwan. The models were constructed to gather data for the evaluation and validation of soil-structure interaction (SSI) analysis methodologies. Extensive instrumentation was deployed to record both structural and ground responses at the site during earthquakes. The experiment is generally referred to as the Lotung Large-Scale Seismic Test (LSST). As part of the LSST, two downhole arrays were installed at the site to record ground motions at depths as well as at the ground surface. Structural response and ground response have been recorded for a number of earthquakes (i.e. a total of 18 earthquakes in the period of October 1985 through November 1986) at the LSST site since the completion of the installation of the downhole instruments in October 1985. These data include those from earthquakes having magnitudes ranging from M L 4.5 to M L 7.0 and epicentral distances range from 4.7 km to 77.7 km. Peak ground surface accelerations range from 0.03 g to 0.21 g for the horizontal component and from 0.01 g to 0.20 g for the vertical component. The objectives of the study were: (1) to obtain empirical data on variations of earthquake ground motion with depth; (2) to examine field evidence of nonlinear soil response due to earthquake shaking and to determine the degree of soil nonlinearity; (3) to assess the ability of ground response analysis techniques including techniques to approximate nonlinear soil response to estimate ground motions due to earthquake shaking; and (4) to analyze earth pressures recorded beneath the basemat and on the side wall of the 1/4 scale model structure during selected earthquakes
Massively parallel Fokker-Planck calculations

International Nuclear Information System (INIS)

Mirin, A.A.

1990-01-01

This paper reports that the Fokker-Planck package FPPAC, which solves the complete nonlinear multispecies Fokker-Planck collision operator for a plasma in two-dimensional velocity space, has been rewritten for the Connection Machine 2. This has involved allocation of variables either to the front end or the CM2, minimization of data flow, and replacement of Cray-optimized algorithms with ones suitable for a massively parallel architecture. Calculations have been carried out on various Connection Machines throughout the country. Results and timings on these machines have been compared to each other and to those on the static memory Cray-2. For large problem size, the Connection Machine 2 is found to be cost-efficient
A Web-based Distributed Voluntary Computing Platform for Large Scale Hydrological Computations

Science.gov (United States)

Demir, I.; Agliamzanov, R.

2014-12-01

Distributed volunteer computing can enable researchers and scientist to form large parallel computing environments to utilize the computing power of the millions of computers on the Internet, and use them towards running large scale environmental simulations and models to serve the common good of local communities and the world. Recent developments in web technologies and standards allow client-side scripting languages to run at speeds close to native application, and utilize the power of Graphics Processing Units (GPU). Using a client-side scripting language like JavaScript, we have developed an open distributed computing framework that makes it easy for researchers to write their own hydrologic models, and run them on volunteer computers. Users will easily enable their websites for visitors to volunteer sharing their computer resources to contribute running advanced hydrological models and simulations. Using a web-based system allows users to start volunteering their computational resources within seconds without installing any software. The framework distributes the model simulation to thousands of nodes in small spatial and computational sizes. A relational database system is utilized for managing data connections and queue management for the distributed computing nodes. In this paper, we present a web-based distributed volunteer computing platform to enable large scale hydrological simulations and model runs in an open and integrated environment.
An introduction to complex systems society, ecology, and nonlinear dynamics

CERN Document Server

Fieguth, Paul

2017-01-01

This undergraduate text explores a variety of large-scale phenomena - global warming, ice ages, water, poverty - and uses these case studies as a motivation to explore nonlinear dynamics, power-law statistics, and complex systems. Although the detailed mathematical descriptions of these topics can be challenging, the consequences of a system being nonlinear, power-law, or complex are in fact quite accessible. This book blends a tutorial approach to the mathematical aspects of complex systems together with a complementary narrative on the global/ecological/societal implications of such systems. Nearly all engineering undergraduate courses focus on mathematics and systems which are small scale, linear, and Gaussian. Unfortunately there is not a single large-scale ecological or social phenomenon that is scalar, linear, and Gaussian. This book offers students insights to better understand the large-scale problems facing the world and to realize that these cannot be solved by a single, narrow academic field or per...
A Parallel, Multi-Scale Watershed-Hydrologic-Inundation Model with Adaptively Switching Mesh for Capturing Flooding and Lake Dynamics

Science.gov (United States)

Ji, X.; Shen, C.

2017-12-01

Flood inundation presents substantial societal hazards and also changes biogeochemistry for systems like the Amazon. It is often expensive to simulate high-resolution flood inundation and propagation in a long-term watershed-scale model. Due to the Courant-Friedrichs-Lewy (CFL) restriction, high resolution and large local flow velocity both demand prohibitively small time steps even for parallel codes. Here we develop a parallel surface-subsurface process-based model enhanced by multi-resolution meshes that are adaptively switched on or off. The high-resolution overland flow meshes are enabled only when the flood wave invades to floodplains. This model applies semi-implicit, semi-Lagrangian (SISL) scheme in solving dynamic wave equations, and with the assistant of the multi-mesh method, it also adaptively chooses the dynamic wave equation only in the area of deep inundation. Therefore, the model achieves a balance between accuracy and computational cost.
Dissecting the large-scale galactic conformity

Science.gov (United States)

Seo, Seongu

2018-01-01

Galactic conformity is an observed phenomenon that galaxies located in the same region have similar properties such as star formation rate, color, gas fraction, and so on. The conformity was first observed among galaxies within in the same halos (“one-halo conformity”). The one-halo conformity can be readily explained by mutual interactions among galaxies within a halo. Recent observations however further witnessed a puzzling connection among galaxies with no direct interaction. In particular, galaxies located within a sphere of ~5 Mpc radius tend to show similarities, even though the galaxies do not share common halos with each other ("two-halo conformity" or “large-scale conformity”). Using a cosmological hydrodynamic simulation, Illustris, we investigate the physical origin of the two-halo conformity and put forward two scenarios. First, back-splash galaxies are likely responsible for the large-scale conformity. They have evolved into red galaxies due to ram-pressure stripping in a given galaxy cluster and happen to reside now within a ~5 Mpc sphere. Second, galaxies in strong tidal field induced by large-scale structure also seem to give rise to the large-scale conformity. The strong tides suppress star formation in the galaxies. We discuss the importance of the large-scale conformity in the context of galaxy evolution.
FLUCTUATING ENERGY STORAGE AND NONLINEAR CASCADE IN AN INHOMOGENEOUS CORONAL LOOP

International Nuclear Information System (INIS)

Malara, F.; Nigro, G.; Onofri, M.; Veltri, P.

2010-01-01

The dynamics and the energy balance of large-scale fluctuations in a coronal loop are studied. The loop is represented by a simplified structure where the curvature is neglected and the background magnetic field is uniform. In a previous paper, we studied a similar model where a uniform background density was assumed. The present paper represents a generalization of the previous one and it has the purpose of investigating possible modifications to the large-scale energy balance and dynamics due to a more realistic longitudinally nonuniform density. Large-scale fluctuations are dominated by coherent eigenmodes that nonlinearly couple to produce an energy cascade to smaller scales. Eigenmodes properties are calculated by a simplified linear dissipative model, deriving an expression for the input energy flux that is not substantially modified by the presence of the density inhomogeneity and is independent of dissipation. For typical values of the parameters, the derived input energy flux is comparable with that required to heat the active region corona. Nonlinear couplings are dominated by coherence effects due to the symmetry properties of eigenmodes; the consequences are that the system is in a weakly nonlinear regime that produces fluctuating energy storage in the loop, and that the kinetic and magnetic nonlinear energy fluxes are of the same order, despite the dominance of magnetic energy at large scales. From the energy balance, an expression for the velocity fluctuation is derived, which is valid in the more general case of a nonuniform background density; this estimate is in agreement both with measures of nonthermal velocities in the solar corona and with previous numerical results.
Probing cosmology with the homogeneity scale of the Universe through large scale structure surveys

International Nuclear Information System (INIS)

Ntelis, Pierros

2017-01-01

. It is thus possible to reconstruct the distribution of matter in 3 dimensions in gigantic volumes. We can then extract various statistical observables to measure the BAO scale and the scale of homogeneity of the universe. Using Data Release 12 CMASS galaxy catalogs, we obtained precision on the homogeneity scale reduced by 5 times compared to Wiggle Z measurement. At large scales, the universe is remarkably well described in linear order by the ΛCDM-model, the standard model of cosmology. In general, it is not necessary to take into account the nonlinear effects which complicate the model at small scales. On the other hand, at large scales, the measurement of our observables becomes very sensitive to the systematic effects. This is particularly true for the analysis of cosmic homogeneity, which requires an observational method so as not to bias the measurement. In order to study the homogeneity principle in a model independent way, we explore a new way to infer distances using cosmic clocks and type Ia Supernovae. This establishes the Cosmological Principle using only a small number of a priori assumption, i.e. the theory of General Relativity and astrophysical assumptions that are independent from Friedmann Universes and in extend the homogeneity assumption. This manuscript is as follows. After a short presentation of the knowledge in cosmology necessary for the understanding of this manuscript, presented in Chapter 1, Chapter 2 will deal with the challenges of the Cosmological Principle as well as how to overcome those. In Chapter 3, we will discuss the technical characteristics of the large scale structure surveys, in particular focusing on BOSS and eBOSS galaxy surveys. Chapter 4 presents the detailed analysis of the measurement of cosmic homogeneity and the various systematic effects likely to impact our observables. Chapter 5 will discuss how to use the cosmic homogeneity as a standard ruler to constrain dark energy models from current and future surveys. In
Nonlinear Dot Plots.

Science.gov (United States)

Rodrigues, Nils; Weiskopf, Daniel

2018-01-01

Conventional dot plots use a constant dot size and are typically applied to show the frequency distribution of small data sets. Unfortunately, they are not designed for a high dynamic range of frequencies. We address this problem by introducing nonlinear dot plots. Adopting the idea of nonlinear scaling from logarithmic bar charts, our plots allow for dots of varying size so that columns with a large number of samples are reduced in height. For the construction of these diagrams, we introduce an efficient two-way sweep algorithm that leads to a dense and symmetrical layout. We compensate aliasing artifacts at high dot densities by a specifically designed low-pass filtering method. Examples of nonlinear dot plots are compared to conventional dot plots as well as linear and logarithmic histograms. Finally, we include feedback from an expert review.
Nonlinear Analysis of the Space Shuttle Superlightweight External Fuel Tank

Science.gov (United States)

Nemeth, Michael P.; Britt, Vicki O.; Collins, Timothy J.; Starnes, James H., Jr.

1996-01-01

Results of buckling and nonlinear analyses of the Space Shuttle external tank superlightweight liquid-oxygen (LO2) tank are presented. Modeling details and results are presented for two prelaunch loading conditions and for two full-scale structural tests that were conducted on the original external tank. The results illustrate three distinctly different types of nonlinear response for thin-walled shells subjected to combined mechanical and thermal loads. The nonlinear response phenomena consist of bifurcation-type buckling, short-wavelength nonlinear bending, and nonlinear collapse associated with a limit point. For each case, the results show that accurate predictions of non- linear behavior generally require a large-scale, high-fidelity finite-element model. Results are also presented that show that a fluid-filled launch-vehicle shell can be highly sensitive to initial geometric imperfections. In addition, results presented for two full-scale structural tests of the original standard-weight external tank suggest that the finite-element modeling approach used in the present study is sufficient for representing the nonlinear behavior of the superlightweight LO2 tank.
Fast electrostatic force calculation on parallel computer clusters

International Nuclear Information System (INIS)

Kia, Amirali; Kim, Daejoong; Darve, Eric

2008-01-01

The fast multipole method (FMM) and smooth particle mesh Ewald (SPME) are well known fast algorithms to evaluate long range electrostatic interactions in molecular dynamics and other fields. FMM is a multi-scale method which reduces the computation cost by approximating the potential due to a group of particles at a large distance using few multipole functions. This algorithm scales like O(N) for N particles. SPME algorithm is an O(NlnN) method which is based on an interpolation of the Fourier space part of the Ewald sum and evaluating the resulting convolutions using fast Fourier transform (FFT). Those algorithms suffer from relatively poor efficiency on large parallel machines especially for mid-size problems around hundreds of thousands of atoms. A variation of the FMM, called PWA, based on plane wave expansions is presented in this paper. A new parallelization strategy for PWA, which takes advantage of the specific form of this expansion, is described. Its parallel efficiency is compared with SPME through detail time measurements on two different computer clusters
On the soft limit of the large scale structure power spectrum. UV dependence

International Nuclear Information System (INIS)

Garny, Mathias

2015-08-01

We derive a non-perturbative equation for the large scale structure power spectrum of long-wavelength modes. Thereby, we use an operator product expansion together with relations between the three-point function and power spectrum in the soft limit. The resulting equation encodes the coupling to ultraviolet (UV) modes in two time-dependent coefficients, which may be obtained from response functions to (anisotropic) parameters, such as spatial curvature, in a modified cosmology. We argue that both depend weakly on fluctuations deep in the UV. As a byproduct, this implies that the renormalized leading order coefficient(s) in the effective field theory (EFT) of large scale structures receive most of their contribution from modes close to the non-linear scale. Consequently, the UV dependence found in explicit computations within standard perturbation theory stems mostly from counter-term(s). We confront a simplified version of our non-perturbative equation against existent numerical simulations, and find good agreement within the expected uncertainties. Our approach can in principle be used to precisely infer the relevance of the leading order EFT coefficient(s) using small volume simulations in an 'anisotropic separate universe' framework. Our results suggest that the importance of these coefficient(s) is a ∝ 10% effect, and plausibly smaller.
Analysis of passive scalar advection in parallel shear flows: Sorting of modes at intermediate time scales

Science.gov (United States)

Camassa, Roberto; McLaughlin, Richard M.; Viotti, Claudio

2010-11-01

The time evolution of a passive scalar advected by parallel shear flows is studied for a class of rapidly varying initial data. Such situations are of practical importance in a wide range of applications from microfluidics to geophysics. In these contexts, it is well-known that the long-time evolution of the tracer concentration is governed by Taylor's asymptotic theory of dispersion. In contrast, we focus here on the evolution of the tracer at intermediate time scales. We show how intermediate regimes can be identified before Taylor's, and in particular, how the Taylor regime can be delayed indefinitely by properly manufactured initial data. A complete characterization of the sorting of these time scales and their associated spatial structures is presented. These analytical predictions are compared with highly resolved numerical simulations. Specifically, this comparison is carried out for the case of periodic variations in the streamwise direction on the short scale with envelope modulations on the long scales, and show how this structure can lead to "anomalously" diffusive transients in the evolution of the scalar onto the ultimate regime governed by Taylor dispersion. Mathematically, the occurrence of these transients can be viewed as a competition in the asymptotic dominance between large Péclet (Pe) numbers and the long/short scale aspect ratios (LVel/LTracer≡k), two independent nondimensional parameters of the problem. We provide analytical predictions of the associated time scales by a modal analysis of the eigenvalue problem arising in the separation of variables of the governing advection-diffusion equation. The anomalous time scale in the asymptotic limit of large k Pe is derived for the short scale periodic structure of the scalar's initial data, for both exactly solvable cases and in general with WKBJ analysis. In particular, the exactly solvable sawtooth flow is especially important in that it provides a short cut to the exact solution to the

Domain decomposition solvers for nonlinear multiharmonic finite element equations

KAUST Repository

Copeland, D. M.

2010-01-01

In many practical applications, for instance, in computational electromagnetics, the excitation is time-harmonic. Switching from the time domain to the frequency domain allows us to replace the expensive time-integration procedure by the solution of a simple elliptic equation for the amplitude. This is true for linear problems, but not for nonlinear problems. However, due to the periodicity of the solution, we can expand the solution in a Fourier series. Truncating this Fourier series and approximating the Fourier coefficients by finite elements, we arrive at a large-scale coupled nonlinear system for determining the finite element approximation to the Fourier coefficients. The construction of fast solvers for such systems is very crucial for the efficiency of this multiharmonic approach. In this paper we look at nonlinear, time-harmonic potential problems as simple model problems. We construct and analyze almost optimal solvers for the Jacobi systems arising from the Newton linearization of the large-scale coupled nonlinear system that one has to solve instead of performing the expensive time-integration procedure. © 2010 de Gruyter.
Nonlinear Analysis of the Space Shuttle Super-Lightweight External Fuel Tank

Science.gov (United States)

Nemeth, Michael P.; Britt, Vicki O.; Collins, Timothy J.; Starnes, James H., Jr.

1996-01-01

The results of buckling and nonlinear analyses of the Space Shuttle External Tank super-lightweight liquid oxygen (LOX) tank are presented. Modeling details and results are presented for two prelaunch loading conditions and for two full-scale structural tests conducted on the original external tank. These results illustrate three distinctly different types of nonlinear responses for thin-walled shells subjected to combined mechanical and thermal loads. These nonlinear response phenomena consist of bifurcation-type buckling, short-wavelength nonlinear bending, and nonlinear collapse associated with a limit point. For each case, the results show that accurate predictions of nonlinear behavior generally require a large scale high-fidelity finite element model. Results are also presented that show that a fluid filled launch vehicle shell can be highly sensitive to initial geometric imperfections. In addition, results presented for two full scale structural tests of the original standard weight external tank suggest that the finite element modeling approach used in the present study is sufficient for representing the nonlinear behavior of the super lightweight LOX tank.
An accurate and computationally efficient small-scale nonlinear FEA of flexible risers

OpenAIRE

Rahmati, MT; Bahai, H; Alfano, G

2016-01-01

This paper presents a highly efficient small-scale, detailed finite-element modelling method for flexible risers which can be effectively implemented in a fully-nested (FE2) multiscale analysis based on computational homogenisation. By exploiting cyclic symmetry and applying periodic boundary conditions, only a small fraction of a flexible pipe is used for a detailed nonlinear finite-element analysis at the small scale. In this model, using three-dimensional elements, all layer components are...
Parallel and perpendicular velocity sheared flows driven tripolar vortices in an inhomogeneous electron-ion quantum magnetoplasma

International Nuclear Information System (INIS)

Mirza, Arshad M.; Masood, W.

2011-01-01

Nonlinear equations governing the dynamics of finite amplitude drift-ion acoustic-waves are derived by taking into account sheared ion flows parallel and perpendicular to the ambient magnetic field in a quantum magnetoplasma comprised of electrons and ions. It is shown that stationary solution of the nonlinear equations can be represented in the form of a tripolar vortex for specific profiles of the equilibrium sheared flows. The tripolar vortices are, however, observed to form on very short scales in dense quantum plasmas. The relevance of the present investigation with regard to dense astrophysical environments is also pointed out.
Parallel and perpendicular velocity sheared flows driven tripolar vortices in an inhomogeneous electron-ion quantum magnetoplasma

Science.gov (United States)

Mirza, Arshad M.; Masood, W.

2011-12-01

Nonlinear equations governing the dynamics of finite amplitude drift-ion acoustic-waves are derived by taking into account sheared ion flows parallel and perpendicular to the ambient magnetic field in a quantum magnetoplasma comprised of electrons and ions. It is shown that stationary solution of the nonlinear equations can be represented in the form of a tripolar vortex for specific profiles of the equilibrium sheared flows. The tripolar vortices are, however, observed to form on very short scales in dense quantum plasmas. The relevance of the present investigation with regard to dense astrophysical environments is also pointed out.
Parallel and perpendicular velocity sheared flows driven tripolar vortices in an inhomogeneous electron-ion quantum magnetoplasma

Energy Technology Data Exchange (ETDEWEB)

Mirza, Arshad M. [Theoretical Plasma Physics Group, Department of Physics, Quaid-i-Azam University, Islamabad 45320 (Pakistan); Masood, W. [TPPD, PINSTECH, P.O. Nilore, Islamabad (Pakistan) and National Centre for Physics (NCP), Shahdara Valley Road, 44000 Islamabad (Pakistan)

2011-12-15

Nonlinear equations governing the dynamics of finite amplitude drift-ion acoustic-waves are derived by taking into account sheared ion flows parallel and perpendicular to the ambient magnetic field in a quantum magnetoplasma comprised of electrons and ions. It is shown that stationary solution of the nonlinear equations can be represented in the form of a tripolar vortex for specific profiles of the equilibrium sheared flows. The tripolar vortices are, however, observed to form on very short scales in dense quantum plasmas. The relevance of the present investigation with regard to dense astrophysical environments is also pointed out.
Comparison of stochastic resonance in static and dynamical nonlinearities

International Nuclear Information System (INIS)

Ma, Yumei; Duan, Fabing

2014-01-01

We compare the stochastic resonance (SR) effects in parallel arrays of static and dynamical nonlinearities via the measure of output signal-to-noise ratio (SNR). For a received noisy periodic signal, parallel arrays of both static and dynamical nonlinearities can enhance the output SNR by optimizing the internal noise level. The static nonlinearity is easily implementable, while the dynamical nonlinearity has more parameters to be tuned, at the risk of not exploiting the beneficial role of internal noise components. It is of interest to note that, for an input signal buried in the external Laplacian noise, we show that the dynamical nonlinearity is superior to the static nonlinearity in obtaining a better output SNR. This characteristic is assumed to be closely associated with the kurtosis of noise distribution. - Highlights: • Comparison of SR effects in arrays of both static and dynamical nonlinearities. • Static nonlinearity is easily implementable for the SNR enhancement. • Dynamical nonlinearity yields a better output SNR for external Laplacian noise
Large-scale perspective as a challenge

NARCIS (Netherlands)

Plomp, M.G.A.

2012-01-01

1. Scale forms a challenge for chain researchers: when exactly is something ‘large-scale’? What are the underlying factors (e.g. number of parties, data, objects in the chain, complexity) that determine this? It appears to be a continuum between small- and large-scale, where positioning on that
Applications of Data Assimilation to Analysis of the Ocean on Large Scales

Science.gov (United States)

Miller, Robert N.; Busalacchi, Antonio J.; Hackert, Eric C.

1997-01-01

It is commonplace to begin talks on this topic by noting that oceanographic data are too scarce and sparse to provide complete initial and boundary conditions for large-scale ocean models. Even considering the availability of remotely-sensed data such as radar altimetry from the TOPEX and ERS-1 satellites, a glance at a map of available subsurface data should convince most observers that this is still the case. Data are still too sparse for comprehensive treatment of interannual to interdecadal climate change through the use of models, since the new data sets have not been around for very long. In view of the dearth of data, we must note that the overall picture is changing rapidly. Recently, there have been a number of large scale ocean analysis and prediction efforts, some of which now run on an operational or at least quasi-operational basis, most notably the model based analyses of the tropical oceans. These programs are modeled on numerical weather prediction. Aside from the success of the global tide models, assimilation of data in the tropics, in support of prediction and analysis of seasonal to interannual climate change, is probably the area of large scale ocean modeling and data assimilation in which the most progress has been made. Climate change is a problem which is particularly suited to advanced data assimilation methods. Linear models are useful, and the linear theory can be exploited. For the most part, the data are sufficiently sparse that implementation of advanced methods is worthwhile. As an example of a large scale data assimilation experiment with a recent extensive data set, we present results of a tropical ocean experiment in which the Kalman filter was used to assimilate three years of altimetric data from Geosat into a coarsely resolved linearized long wave shallow water model. Since nonlinear processes dominate the local dynamic signal outside the tropics, subsurface dynamical quantities cannot be reliably inferred from surface height
Scale interactions in a mixing layer – the role of the large-scale gradients

KAUST Repository

Fiscaletti, D.

2016-02-15

© 2016 Cambridge University Press. The interaction between the large and the small scales of turbulence is investigated in a mixing layer, at a Reynolds number based on the Taylor microscale of , via direct numerical simulations. The analysis is performed in physical space, and the local vorticity root-mean-square (r.m.s.) is taken as a measure of the small-scale activity. It is found that positive large-scale velocity fluctuations correspond to large vorticity r.m.s. on the low-speed side of the mixing layer, whereas, they correspond to low vorticity r.m.s. on the high-speed side. The relationship between large and small scales thus depends on position if the vorticity r.m.s. is correlated with the large-scale velocity fluctuations. On the contrary, the correlation coefficient is nearly constant throughout the mixing layer and close to unity if the vorticity r.m.s. is correlated with the large-scale velocity gradients. Therefore, the small-scale activity appears closely related to large-scale gradients, while the correlation between the small-scale activity and the large-scale velocity fluctuations is shown to reflect a property of the large scales. Furthermore, the vorticity from unfiltered (small scales) and from low pass filtered (large scales) velocity fields tend to be aligned when examined within vortical tubes. These results provide evidence for the so-called \\'scale invariance\\' (Meneveau & Katz, Annu. Rev. Fluid Mech., vol. 32, 2000, pp. 1-32), and suggest that some of the large-scale characteristics are not lost at the small scales, at least at the Reynolds number achieved in the present simulation.
Impact of large scale flows on turbulent transport

Energy Technology Data Exchange (ETDEWEB)

Sarazin, Y [Association Euratom-CEA, CEA/DSM/DRFC centre de Cadarache, 13108 St-Paul-Lez-Durance (France); Grandgirard, V [Association Euratom-CEA, CEA/DSM/DRFC centre de Cadarache, 13108 St-Paul-Lez-Durance (France); Dif-Pradalier, G [Association Euratom-CEA, CEA/DSM/DRFC centre de Cadarache, 13108 St-Paul-Lez-Durance (France); Fleurence, E [Association Euratom-CEA, CEA/DSM/DRFC centre de Cadarache, 13108 St-Paul-Lez-Durance (France); Garbet, X [Association Euratom-CEA, CEA/DSM/DRFC centre de Cadarache, 13108 St-Paul-Lez-Durance (France); Ghendrih, Ph [Association Euratom-CEA, CEA/DSM/DRFC centre de Cadarache, 13108 St-Paul-Lez-Durance (France); Bertrand, P [LPMIA-Universite Henri Poincare Nancy I, Boulevard des Aiguillettes BP239, 54506 Vandoe uvre-les-Nancy (France); Besse, N [LPMIA-Universite Henri Poincare Nancy I, Boulevard des Aiguillettes BP239, 54506 Vandoe uvre-les-Nancy (France); Crouseilles, N [IRMA, UMR 7501 CNRS/Universite Louis Pasteur, 7 rue Rene Descartes, 67084 Strasbourg (France); Sonnendruecker, E [IRMA, UMR 7501 CNRS/Universite Louis Pasteur, 7 rue Rene Descartes, 67084 Strasbourg (France); Latu, G [LSIIT, UMR 7005 CNRS/Universite Louis Pasteur, Bd Sebastien Brant BP10413, 67412 Illkirch (France); Violard, E [LSIIT, UMR 7005 CNRS/Universite Louis Pasteur, Bd Sebastien Brant BP10413, 67412 Illkirch (France)

2006-12-15

The impact of large scale flows on turbulent transport in magnetized plasmas is explored by means of various kinetic models. Zonal flows are found to lead to a non-linear upshift of turbulent transport in a 3D kinetic model for interchange turbulence. Such a transition is absent from fluid simulations, performed with the same numerical tool, which also predict a much larger transport. The discrepancy cannot be explained by zonal flows only, despite they being overdamped in fluids. Indeed, some difference remains, although reduced, when they are artificially suppressed. Zonal flows are also reported to trigger transport barriers in a 4D drift-kinetic model for slab ion temperature gradient (ITG) turbulence. The density gradient acts as a source drive for zonal flows, while their curvature back stabilizes the turbulence. Finally, 5D simulations of toroidal ITG modes with the global and full-f GYSELA code require the equilibrium density function to depend on the motion invariants only. If not, the generated strong mean flows can completely quench turbulent transport.
Impact of large scale flows on turbulent transport

International Nuclear Information System (INIS)

Sarazin, Y; Grandgirard, V; Dif-Pradalier, G; Fleurence, E; Garbet, X; Ghendrih, Ph; Bertrand, P; Besse, N; Crouseilles, N; Sonnendruecker, E; Latu, G; Violard, E

2006-01-01

The impact of large scale flows on turbulent transport in magnetized plasmas is explored by means of various kinetic models. Zonal flows are found to lead to a non-linear upshift of turbulent transport in a 3D kinetic model for interchange turbulence. Such a transition is absent from fluid simulations, performed with the same numerical tool, which also predict a much larger transport. The discrepancy cannot be explained by zonal flows only, despite they being overdamped in fluids. Indeed, some difference remains, although reduced, when they are artificially suppressed. Zonal flows are also reported to trigger transport barriers in a 4D drift-kinetic model for slab ion temperature gradient (ITG) turbulence. The density gradient acts as a source drive for zonal flows, while their curvature back stabilizes the turbulence. Finally, 5D simulations of toroidal ITG modes with the global and full-f GYSELA code require the equilibrium density function to depend on the motion invariants only. If not, the generated strong mean flows can completely quench turbulent transport
Geometric scaling in ultrahigh energy neutrinos and nonlinear perturbative QCD

International Nuclear Information System (INIS)

Machado, Magno V.T.

2011-01-01

The ultrahigh energy neutrino cross section is a crucial ingredient in the calculation of the event rate in high energy neutrino telescopes. Currently there are several approaches which predict different behaviors for its magnitude for ultrahigh energies. In this contribution is presented a summary of current predictions based on the non-linear QCD evolution equations, the so-called perturbative saturation physics. In particular, predictions are shown based on the parton saturation approaches and the consequences of geometric scaling property at high energies are discussed. The scaling property allows an analytical computation of the neutrino scattering on nucleon/nucleus at high energies, providing a theoretical parameterization. (author)
Parallel processing for artificial intelligence 2

CERN Document Server

Kumar, V; Suttner, CB

1994-01-01

With the increasing availability of parallel machines and the raising of interest in large scale and real world applications, research on parallel processing for Artificial Intelligence (AI) is gaining greater importance in the computer science environment. Many applications have been implemented and delivered but the field is still considered to be in its infancy. This book assembles diverse aspects of research in the area, providing an overview of the current state of technology. It also aims to promote further growth across the discipline. Contributions have been grouped according to their
Non-linear, non-monotonic effect of nano-scale roughness on particle deposition in absence of an energy barrier: Experiments and modeling

Science.gov (United States)

Jin, Chao; Glawdel, Tomasz; Ren, Carolyn L.; Emelko, Monica B.

2015-12-01

Deposition of colloidal- and nano-scale particles on surfaces is critical to numerous natural and engineered environmental, health, and industrial applications ranging from drinking water treatment to semi-conductor manufacturing. Nano-scale surface roughness-induced hydrodynamic impacts on particle deposition were evaluated in the absence of an energy barrier to deposition in a parallel plate system. A non-linear, non-monotonic relationship between deposition surface roughness and particle deposition flux was observed and a critical roughness size associated with minimum deposition flux or “sag effect” was identified. This effect was more significant for nanoparticles (<1 μm) than for colloids and was numerically simulated using a Convective-Diffusion model and experimentally validated. Inclusion of flow field and hydrodynamic retardation effects explained particle deposition profiles better than when only the Derjaguin-Landau-Verwey-Overbeek (DLVO) force was considered. This work provides 1) a first comprehensive framework for describing the hydrodynamic impacts of nano-scale surface roughness on particle deposition by unifying hydrodynamic forces (using the most current approaches for describing flow field profiles and hydrodynamic retardation effects) with appropriately modified expressions for DLVO interaction energies, and gravity forces in one model and 2) a foundation for further describing the impacts of more complicated scales of deposition surface roughness on particle deposition.
HIGH-PRECISION PREDICTIONS FOR THE ACOUSTIC SCALE IN THE NONLINEAR REGIME

International Nuclear Information System (INIS)

Seo, Hee-Jong; Eckel, Jonathan; Eisenstein, Daniel J.; Mehta, Kushal; Metchnik, Marc; Pinto, Phillip; Xu Xiaoying; Padmanabhan, Nikhil; Takahashi, Ryuichi; White, Martin

2010-01-01

We measure shifts of the acoustic scale due to nonlinear growth and redshift distortions to a high precision using a very large volume of high-force-resolution simulations. We compare results from various sets of simulations that differ in their force, volume, and mass resolution. We find a consistency within 1.5σ for shift values from different simulations and derive shift α(z) - 1 = (0.300 ± 0.015) %[D(z)/D(0)] 2 using our fiducial set. We find a strong correlation with a non-unity slope between shifts in real space and in redshift space and a weak correlation between the initial redshift and low redshift. Density-field reconstruction not only removes the mean shifts and reduces errors on the mean, but also tightens the correlations. After reconstruction, we recover a slope of near unity for the correlation between the real and redshift space and restore a strong correlation between the initial and the low redshifts. We derive propagators and mode-coupling terms from our N-body simulations and compare with the Zel'dovich approximation and the shifts measured from the χ 2 fitting, respectively. We interpret the propagator and the mode-coupling term of a nonlinear density field in the context of an average and a dispersion of its complex Fourier coefficients relative to those of the linear density field; from these two terms, we derive a signal-to-noise ratio of the acoustic peak measurement. We attempt to improve our reconstruction method by implementing 2LPT and iterative operations, but we obtain little improvement. The Fisher matrix estimates of uncertainty in the acoustic scale is tested using 5000 h -3 Gpc 3 of cosmological Particle-Mesh simulations from Takahashi et al. At an expected sample variance level of 1%, the agreement between the Fisher matrix estimates based on Seo and Eisenstein and the N-body results is better than 10%.
Efficient Computation of Sparse Matrix Functions for Large-Scale Electronic Structure Calculations: The CheSS Library.

Science.gov (United States)

Mohr, Stephan; Dawson, William; Wagner, Michael; Caliste, Damien; Nakajima, Takahito; Genovese, Luigi

2017-10-10

We present CheSS, the "Chebyshev Sparse Solvers" library, which has been designed to solve typical problems arising in large-scale electronic structure calculations using localized basis sets. The library is based on a flexible and efficient expansion in terms of Chebyshev polynomials and presently features the calculation of the density matrix, the calculation of matrix powers for arbitrary powers, and the extraction of eigenvalues in a selected interval. CheSS is able to exploit the sparsity of the matrices and scales linearly with respect to the number of nonzero entries, making it well-suited for large-scale calculations. The approach is particularly adapted for setups leading to small spectral widths of the involved matrices and outperforms alternative methods in this regime. By coupling CheSS to the DFT code BigDFT, we show that such a favorable setup is indeed possible in practice. In addition, the approach based on Chebyshev polynomials can be massively parallelized, and CheSS exhibits excellent scaling up to thousands of cores even for relatively small matrix sizes.
Large-scale matrix-handling subroutines 'ATLAS'

International Nuclear Information System (INIS)

Tsunematsu, Toshihide; Takeda, Tatsuoki; Fujita, Keiichi; Matsuura, Toshihiko; Tahara, Nobuo

1978-03-01

Subroutine package ''ATLAS'' has been developed for handling large-scale matrices. The package is composed of four kinds of subroutines, i.e., basic arithmetic routines, routines for solving linear simultaneous equations and for solving general eigenvalue problems and utility routines. The subroutines are useful in large scale plasma-fluid simulations. (auth.)
Large Scale Flutter Data for Design of Rotating Blades Using Navier-Stokes Equations

Science.gov (United States)

Guruswamy, Guru P.

2012-01-01

A procedure to compute flutter boundaries of rotating blades is presented; a) Navier-Stokes equations. b) Frequency domain method compatible with industry practice. Procedure is initially validated: a) Unsteady loads with flapping wing experiment. b) Flutter boundary with fixed wing experiment. Large scale flutter computation is demonstrated for rotating blade: a) Single job submission script. b) Flutter boundary in 24 hour wall clock time with 100 cores. c) Linearly scalable with number of cores. Tested with 1000 cores that produced data in 25 hrs for 10 flutter boundaries. Further wall-clock speed-up is possible by performing parallel computations within each case.
On spectral scaling laws for incompressible anisotropic magnetohydrodynamic turbulence

International Nuclear Information System (INIS)

Galtier, Sebastien; Pouquet, Annick; Mangeney, Andre

2005-01-01

A heuristic model is given for anisotropic magnetohydrodynamics turbulence in the presence of a uniform external magnetic field B 0 e parallel . The model is valid for both moderate and strong B 0 and is able to describe both the strong and weak wave turbulence regimes as well as the transition between them. The main ingredient of the model is the assumption of constant ratio at all scales between the linear wave period and the nonlinear turnover time scale. Contrary to the model of critical balance introduced by Goldreich and Sridhar [Astrophys. J. 438, 763 (1995)], it is not assumed, in addition, that this ratio be equal to unity at all scales. This allows us to make use of the Iroshnikov-Kraichnan phenomenology; it is then possible to recover the widely observed anisotropic scaling law k parallel ∝k perpendicular 2/3 between parallel and perpendicular wave numbers (with reference to B 0 e parallel and to obtain for the total-energy spectrum E(k perpendicular ,k parallel )∼k perpendicular -α k parallel -β the universal prediction, 3α+2β=7. In particular, with such a prediction, the weak Alfven wave turbulence constant-flux solution is recovered and, for the first time, a possible explanation to its precursor found numerically by Galtier et al. [J. Plasma Phys. 63, 447 (2000)] is given.

Generation of Nonlinear Electric Field Bursts in the Outer Radiation Belt through Electrons Trapping by Oblique Whistler Waves

Science.gov (United States)

Agapitov, Oleksiy; Drake, James; Mozer, Forrest

2016-04-01

Huge numbers of different nonlinear structures (double layers, electron holes, non-linear whistlers, etc. referred to as Time Domain Structures - TDS) have been observed by the electric field experiment on board the Van Allen Probes. A large part of the observed non-linear structures are associated with whistler waves and some of them can be directly driven by whistlers. The parameters favorable for the generation of TDS were studied experimentally as well as making use of 2-D particle-in-cell (PIC) simulations for the system with inhomogeneous magnetic field. It is shown that an outward propagating front of whistlers and hot electrons amplifies oblique whistlers which collapse into regions of intense parallel electric field with properties consistent with recent observations of TDS from the Van Allen Probe satellites. Oblique whistlers seed the parallel electric fields that are driven by the beams. The resulting parallel electric fields trap and heat the precipitating electrons. These electrons drive spikes of intense parallel electric field with characteristics similar to the TDSs seen in the VAP data. The decoupling of the whistler wave and the nonlinear electrostatic component is shown in PIC simulation in the inhomogeneous magnetic field system. These effects are observed by the Van Allen Probes in the radiation belts. The precipitating hot electrons propagate away from the source region in intense bunches rather than as a smooth flux.
Parallel computing by Monte Carlo codes MVP/GMVP

International Nuclear Information System (INIS)

Nagaya, Yasunobu; Nakagawa, Masayuki; Mori, Takamasa

2001-01-01

General-purpose Monte Carlo codes MVP/GMVP are well-vectorized and thus enable us to perform high-speed Monte Carlo calculations. In order to achieve more speedups, we parallelized the codes on the different types of parallel computing platforms or by using a standard parallelization library MPI. The platforms used for benchmark calculations are a distributed-memory vector-parallel computer Fujitsu VPP500, a distributed-memory massively parallel computer Intel paragon and a distributed-memory scalar-parallel computer Hitachi SR2201, IBM SP2. As mentioned generally, linear speedup could be obtained for large-scale problems but parallelization efficiency decreased as the batch size per a processing element(PE) was smaller. It was also found that the statistical uncertainty for assembly powers was less than 0.1% by the PWR full-core calculation with more than 10 million histories and it took about 1.5 hours by massively parallel computing. (author)
Large-scale solar heat

Energy Technology Data Exchange (ETDEWEB)

Tolonen, J.; Konttinen, P.; Lund, P. [Helsinki Univ. of Technology, Otaniemi (Finland). Dept. of Engineering Physics and Mathematics

1998-12-31

In this project a large domestic solar heating system was built and a solar district heating system was modelled and simulated. Objectives were to improve the performance and reduce costs of a large-scale solar heating system. As a result of the project the benefit/cost ratio can be increased by 40 % through dimensioning and optimising the system at the designing stage. (orig.)
Parallel Implementation and Scaling of an Adaptive Mesh Discrete Ordinates Algorithm for Transport

International Nuclear Information System (INIS)

Howell, L H

2004-01-01

Block-structured adaptive mesh refinement (AMR) uses a mesh structure built up out of locally-uniform rectangular grids. In the BoxLib parallel framework used by the Raptor code, each processor operates on one or more of these grids at each refinement level. The decomposition of the mesh into grids and the distribution of these grids among processors may change every few timesteps as a calculation proceeds. Finer grids use smaller timesteps than coarser grids, requiring additional work to keep the system synchronized and ensure conservation between different refinement levels. In a paper for NECDC 2002 I presented preliminary results on implementation of parallel transport sweeps on the AMR mesh, conjugate gradient acceleration, accuracy of the AMR solution, and scalar speedup of the AMR algorithm compared to a uniform fully-refined mesh. This paper continues with a more in-depth examination of the parallel scaling properties of the scheme, both in single-level and multi-level calculations. Both sweeping and setup costs are considered. The algorithm scales with acceptable performance to several hundred processors. Trends suggest, however, that this is the limit for efficient calculations with traditional transport sweeps, and that modifications to the sweep algorithm will be increasingly needed as job sizes in the thousands of processors become common
Multiple Independent File Parallel I/O with HDF5

Energy Technology Data Exchange (ETDEWEB)

Miller, M. C.

2016-07-13

The HDF5 library has supported the I/O requirements of HPC codes at Lawrence Livermore National Labs (LLNL) since the late 90’s. In particular, HDF5 used in the Multiple Independent File (MIF) parallel I/O paradigm has supported LLNL code’s scalable I/O requirements and has recently been gainfully used at scales as large as O(10⁶) parallel tasks.
Reducing computational costs in large scale 3D EIT by using a sparse Jacobian matrix with block-wise CGLS reconstruction

International Nuclear Information System (INIS)

Yang, C L; Wei, H Y; Soleimani, M; Adler, A

2013-01-01

Electrical impedance tomography (EIT) is a fast and cost-effective technique to provide a tomographic conductivity image of a subject from boundary current–voltage data. This paper proposes a time and memory efficient method for solving a large scale 3D EIT inverse problem using a parallel conjugate gradient (CG) algorithm. The 3D EIT system with a large number of measurement data can produce a large size of Jacobian matrix; this could cause difficulties in computer storage and the inversion process. One of challenges in 3D EIT is to decrease the reconstruction time and memory usage, at the same time retaining the image quality. Firstly, a sparse matrix reduction technique is proposed using thresholding to set very small values of the Jacobian matrix to zero. By adjusting the Jacobian matrix into a sparse format, the element with zeros would be eliminated, which results in a saving of memory requirement. Secondly, a block-wise CG method for parallel reconstruction has been developed. The proposed method has been tested using simulated data as well as experimental test samples. Sparse Jacobian with a block-wise CG enables the large scale EIT problem to be solved efficiently. Image quality measures are presented to quantify the effect of sparse matrix reduction in reconstruction results. (paper)
Reducing computational costs in large scale 3D EIT by using a sparse Jacobian matrix with block-wise CGLS reconstruction.

Science.gov (United States)

Yang, C L; Wei, H Y; Adler, A; Soleimani, M

2013-06-01

Electrical impedance tomography (EIT) is a fast and cost-effective technique to provide a tomographic conductivity image of a subject from boundary current-voltage data. This paper proposes a time and memory efficient method for solving a large scale 3D EIT inverse problem using a parallel conjugate gradient (CG) algorithm. The 3D EIT system with a large number of measurement data can produce a large size of Jacobian matrix; this could cause difficulties in computer storage and the inversion process. One of challenges in 3D EIT is to decrease the reconstruction time and memory usage, at the same time retaining the image quality. Firstly, a sparse matrix reduction technique is proposed using thresholding to set very small values of the Jacobian matrix to zero. By adjusting the Jacobian matrix into a sparse format, the element with zeros would be eliminated, which results in a saving of memory requirement. Secondly, a block-wise CG method for parallel reconstruction has been developed. The proposed method has been tested using simulated data as well as experimental test samples. Sparse Jacobian with a block-wise CG enables the large scale EIT problem to be solved efficiently. Image quality measures are presented to quantify the effect of sparse matrix reduction in reconstruction results.
A Pervasive Parallel Processing Framework for Data Visualization and Analysis at Extreme Scale

Energy Technology Data Exchange (ETDEWEB)

Moreland, Kenneth [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Geveci, Berk [Kitware, Inc., Clifton Park, NY (United States)

2014-11-01

The evolution of the computing world from teraflop to petaflop has been relatively effortless, with several of the existing programming models scaling effectively to the petascale. The migration to exascale, however, poses considerable challenges. All industry trends infer that the exascale machine will be built using processors containing hundreds to thousands of cores per chip. It can be inferred that efficient concurrency on exascale machines requires a massive amount of concurrent threads, each performing many operations on a localized piece of data. Currently, visualization libraries and applications are based off what is known as the visualization pipeline. In the pipeline model, algorithms are encapsulated as filters with inputs and outputs. These filters are connected by setting the output of one component to the input of another. Parallelism in the visualization pipeline is achieved by replicating the pipeline for each processing thread. This works well for today’s distributed memory parallel computers but cannot be sustained when operating on processors with thousands of cores. Our project investigates a new visualization framework designed to exhibit the pervasive parallelism necessary for extreme scale machines. Our framework achieves this by defining algorithms in terms of worklets, which are localized stateless operations. Worklets are atomic operations that execute when invoked unlike filters, which execute when a pipeline request occurs. The worklet design allows execution on a massive amount of lightweight threads with minimal overhead. Only with such fine-grained parallelism can we hope to fill the billions of threads we expect will be necessary for efficient computation on an exascale machine.
Probes of large-scale structure in the Universe

International Nuclear Information System (INIS)

Suto, Yasushi; Gorski, K.; Juszkiewicz, R.; Silk, J.

1988-01-01

Recent progress in observational techniques has made it possible to confront quantitatively various models for the large-scale structure of the Universe with detailed observational data. We develop a general formalism to show that the gravitational instability theory for the origin of large-scale structure is now capable of critically confronting observational results on cosmic microwave background radiation angular anisotropies, large-scale bulk motions and large-scale clumpiness in the galaxy counts. (author)
Micrometer and nanometer-scale parallel patterning of ceramic and organic-inorganic hybrid materials

NARCIS (Netherlands)

ten Elshof, Johan E.; Khan, Sajid; Göbel, Ole

2010-01-01

This review gives an overview of the progress made in recent years in the development of low-cost parallel patterning techniques for ceramic materials, silica, and organic–inorganic silsesquioxane-based hybrids from wet-chemical solutions and suspensions on the micrometer and nanometer-scale. The
Theory-based scaling of the SOL width in circular limited tokamak plasmas

International Nuclear Information System (INIS)

Halpern, F.D.; Ricci, P.; Labit, B.; Furno, I.; Jolliet, S.; Loizu, J.; Mosetto, A.; Arnoux, G.; Silva, C.; Gunn, J.P.; Horacek, J.; Kočan, M.; LaBombard, B.

2013-01-01

A theory-based scaling for the characteristic length of a circular, limited tokamak scrape-off layer (SOL) is obtained by considering the balance between parallel losses and non-linearly saturated resistive ballooning mode turbulence driving anomalous perpendicular transport. The SOL size increases with plasma size, resistivity, and safety factor q. The scaling is verified against flux-driven non-linear turbulence simulations, which reveal good agreement within a wide range of dimensionless parameters, including parameters closely matching the TCV tokamak. An initial comparison of the theory against experimental data from several tokamaks also yields good agreement. (letter)
Dynamic Analysis and Vibration Attenuation of Cable-Driven Parallel Manipulators for Large Workspace Applications

Directory of Open Access Journals (Sweden)

Jingli Du

2013-01-01

Full Text Available Cable-driven parallel manipulators are one of the best solutions to achieving large workspace since flexible cables can be easily stored on reels. However, due to the negligible flexural stiffness of cables, long cables will unavoidably vibrate during operation for large workspace applications. In this paper a finite element model for cable-driven parallel manipulators is proposed to mimic small amplitude vibration of cables around their desired position. Output feedback of the cable tension variation at the end of the end-effector is utilized to design the vibration attenuation controller which aims at attenuating the vibration of cables by slightly varying the cable length, thus decreasing its effect on the end-effector. When cable vibration is attenuated, motion controller could be designed for implementing precise large motion to track given trajectories. A numerical example is presented to demonstrate the dynamic model and the control algorithm.
Robust nonlinear PID-like fuzzy logic control of a planar parallel (2PRP-PPR) manipulator.

Science.gov (United States)

Londhe, P S; Singh, Yogesh; Santhakumar, M; Patre, B M; Waghmare, L M

2016-07-01

In this paper, a robust nonlinear proportional-integral-derivative (PID)-like fuzzy control scheme is presented and applied to complex trajectory tracking control of a 2PRP-PPR (P-prismatic, R-revolute) planar parallel manipulator (motion platform) with three degrees-of-freedom (DOF) in the presence of parameter uncertainties and external disturbances. The proposed control law consists of mainly two parts: first part uses a feed forward term to enhance the control activity and estimated perturbed term to compensate for the unknown effects namely external disturbances and unmodeled dynamics, and the second part uses a PID-like fuzzy logic control as a feedback portion to enhance the overall closed-loop stability of the system. Experimental results are presented to show the effectiveness of the proposed control scheme. Copyright © 2016 ISA. Published by Elsevier Ltd. All rights reserved.
Large-scale grid management; Storskala Nettforvaltning

Energy Technology Data Exchange (ETDEWEB)

Langdal, Bjoern Inge; Eggen, Arnt Ove

2003-07-01

The network companies in the Norwegian electricity industry now have to establish a large-scale network management, a concept essentially characterized by (1) broader focus (Broad Band, Multi Utility,...) and (2) bigger units with large networks and more customers. Research done by SINTEF Energy Research shows so far that the approaches within large-scale network management may be structured according to three main challenges: centralization, decentralization and out sourcing. The article is part of a planned series.
Moditored unsaturated soil transport processes as a support for large scale soil and water management

Science.gov (United States)

Vanclooster, Marnik

2010-05-01

The current societal demand for sustainable soil and water management is very large. The drivers of global and climate change exert many pressures on the soil and water ecosystems, endangering appropriate ecosystem functioning. The unsaturated soil transport processes play a key role in soil-water system functioning as it controls the fluxes of water and nutrients from the soil to plants (the pedo-biosphere link), the infiltration flux of precipitated water to groundwater and the evaporative flux, and hence the feed back from the soil to the climate system. Yet, unsaturated soil transport processes are difficult to quantify since they are affected by huge variability of the governing properties at different space-time scales and the intrinsic non-linearity of the transport processes. The incompatibility of the scales between the scale at which processes reasonably can be characterized, the scale at which the theoretical process correctly can be described and the scale at which the soil and water system need to be managed, calls for further development of scaling procedures in unsaturated zone science. It also calls for a better integration of theoretical and modelling approaches to elucidate transport processes at the appropriate scales, compatible with the sustainable soil and water management objective. Moditoring science, i.e the interdisciplinary research domain where modelling and monitoring science are linked, is currently evolving significantly in the unsaturated zone hydrology area. In this presentation, a review of current moditoring strategies/techniques will be given and illustrated for solving large scale soil and water management problems. This will also allow identifying research needs in the interdisciplinary domain of modelling and monitoring and to improve the integration of unsaturated zone science in solving soil and water management issues. A focus will be given on examples of large scale soil and water management problems in Europe.
Theory of Nonlinear Dispersive Waves and Selection of the Ground State

International Nuclear Information System (INIS)

Soffer, A.; Weinstein, M.I.

2005-01-01

A theory of time-dependent nonlinear dispersive equations of the Schroedinger or Gross-Pitaevskii and Hartree type is developed. The short, intermediate and large time behavior is found, by deriving nonlinear master equations (NLME), governing the evolution of the mode powers, and by a novel multitime scale analysis of these equations. The scattering theory is developed and coherent resonance phenomena and associated lifetimes are derived. Applications include Bose-Einstein condensate large time dynamics and nonlinear optical systems. The theory reveals a nonlinear transition phenomenon, 'selection of the ground state', and NLME predicts the decay of excited state, with half its energy transferred to the ground state and half to radiation modes. Our results predict the recent experimental observations of Mandelik et al. in nonlinear optical waveguides
Intelligent control for large-scale variable speed variable pitch wind turbines

Institute of Scientific and Technical Information of China (English)

Xinfang ZHANG; Daping XU; Yibing LIU

2004-01-01

Large-scale wind turbine generator systems have strong nonlinear multivariable characteristics with many uncertain factors and disturbances.Automatic control is crucial for the efficiency and reliability of wind turbines.On the basis of simplified and proper model of variable speed variable pitch wind turbines,the effective wind speed is estimated using extended Kalman filter.Intelligent control schemes proposed in the paper include two loops which operate in synchronism with each other.At below-rated wind speed,the inner loop adopts adaptive fuzzy control based on variable universe for generator torque regulation to realize maximum wind energy capture.At above-rated wind speed, a controller based on least square support vector machine is proposed to adjust pitch angle and keep rated output power.The simulation shows the effectiveness of the intelligent control.
Logical inference techniques for loop parallelization

KAUST Repository

Oancea, Cosmin E.; Rauchwerger, Lawrence

2012-01-01

This paper presents a fully automatic approach to loop parallelization that integrates the use of static and run-time analysis and thus overcomes many known difficulties such as nonlinear and indirect array indexing and complex control flow. Our hybrid analysis framework validates the parallelization transformation by verifying the independence of the loop's memory references. To this end it represents array references using the USR (uniform set representation) language and expresses the independence condition as an equation, S = Ø, where S is a set expression representing array indexes. Using a language instead of an array-abstraction representation for S results in a smaller number of conservative approximations but exhibits a potentially-high runtime cost. To alleviate this cost we introduce a language translation F from the USR set-expression language to an equally rich language of predicates (F(S) ⇒ S = Ø). Loop parallelization is then validated using a novel logic inference algorithm that factorizes the obtained complex predicates (F(S)) into a sequence of sufficient-independence conditions that are evaluated first statically and, when needed, dynamically, in increasing order of their estimated complexities. We evaluate our automated solution on 26 benchmarks from PERFECTCLUB and SPEC suites and show that our approach is effective in parallelizing large, complex loops and obtains much better full program speedups than the Intel and IBM Fortran compilers. Copyright © 2012 ACM.
Logical inference techniques for loop parallelization

KAUST Repository

Oancea, Cosmin E.

2012-01-01

This paper presents a fully automatic approach to loop parallelization that integrates the use of static and run-time analysis and thus overcomes many known difficulties such as nonlinear and indirect array indexing and complex control flow. Our hybrid analysis framework validates the parallelization transformation by verifying the independence of the loop\\'s memory references. To this end it represents array references using the USR (uniform set representation) language and expresses the independence condition as an equation, S = Ø, where S is a set expression representing array indexes. Using a language instead of an array-abstraction representation for S results in a smaller number of conservative approximations but exhibits a potentially-high runtime cost. To alleviate this cost we introduce a language translation F from the USR set-expression language to an equally rich language of predicates (F(S) ⇒ S = Ø). Loop parallelization is then validated using a novel logic inference algorithm that factorizes the obtained complex predicates (F(S)) into a sequence of sufficient-independence conditions that are evaluated first statically and, when needed, dynamically, in increasing order of their estimated complexities. We evaluate our automated solution on 26 benchmarks from PERFECTCLUB and SPEC suites and show that our approach is effective in parallelizing large, complex loops and obtains much better full program speedups than the Intel and IBM Fortran compilers. Copyright © 2012 ACM.
Japanese large-scale interferometers

CERN Document Server

Kuroda, K; Miyoki, S; Ishizuka, H; Taylor, C T; Yamamoto, K; Miyakawa, O; Fujimoto, M K; Kawamura, S; Takahashi, R; Yamazaki, T; Arai, K; Tatsumi, D; Ueda, A; Fukushima, M; Sato, S; Shintomi, T; Yamamoto, A; Suzuki, T; Saitô, Y; Haruyama, T; Sato, N; Higashi, Y; Uchiyama, T; Tomaru, T; Tsubono, K; Ando, M; Takamori, A; Numata, K; Ueda, K I; Yoneda, H; Nakagawa, K; Musha, M; Mio, N; Moriwaki, S; Somiya, K; Araya, A; Kanda, N; Telada, S; Sasaki, M; Tagoshi, H; Nakamura, T; Tanaka, T; Ohara, K

2002-01-01

The objective of the TAMA 300 interferometer was to develop advanced technologies for kilometre scale interferometers and to observe gravitational wave events in nearby galaxies. It was designed as a power-recycled Fabry-Perot-Michelson interferometer and was intended as a step towards a final interferometer in Japan. The present successful status of TAMA is presented. TAMA forms a basis for LCGT (large-scale cryogenic gravitational wave telescope), a 3 km scale cryogenic interferometer to be built in the Kamioka mine in Japan, implementing cryogenic mirror techniques. The plan of LCGT is schematically described along with its associated R and D.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.